%0 Conference Proceedings %@nexthigherunit 8JMKD3MGPCW/3ESGTTP %2 sid.inpe.br/mtc-m21c/2020/10.25.14.55.28 %4 sid.inpe.br/mtc-m21c/2020/10.25.14.55 %3 pinheiro_graph.pdf %8 01-04 July %@issn 03029743 %A Pinheiro, Gabriel Augusto Lins Leal, %A Silva, Juarez L. F. da Silva, %A Soares, Marinalva D., %A Quiles, Marcos Gonçalves, %B International Conference on Computational Science and Its Applications (ICCSA), 20 %@secondarytype PRE CI %C Cagliari, Italy %D 2020 %E Gervasi, O., %E Murgante, B., %E Misra, S., %E Garau, C., %E Blecic, I., %E Taniar, D., %E Apduhan, B. O., %E Rocha, A. M. A. C., %E Tarantino, E., %E Torre, C. M., %E Karaca, Y., %@secondarykey INPE--PRE/ %I Springer %K Clustering · Graph · Quantum-chemistry. %O Lecture Notes in Computer Science, v.12249 %P 421-433 %S Proceedings %T A graph-based clustering analysis of the QM9 dataset via SMILES descriptors %X Machine learning has become a new hot-topic in Materials Sciences. For instance, several approaches from unsupervised and supervised learning have been applied as surrogate models to study the properties of several classes of materials. Here, we investigate, from a graphbased clustering perspective, the Quantum QM9 dataset. This dataset is one of the most used datasets in this scenario. Our investigation is twofold: 1) understand whether the QM9 samples are organized in clusters, and 2) if the clustering structure might provide us with some insights regarding anomalous molecules, or molecules that jeopardize the accuracy of supervised property prediction methods. Our results show that the QM9 is indeed structured into clusters. These clusters, for instance, might suggest better approaches for splitting the dataset when using cross-correlation approaches in supervised learning. However, regarding our second question, our finds indicate that the clustering structure, obtained via Simplified Molecular Input Line Entry System (SMILES) representation, cannot be used to filter anomalous samples in property prediction. Thus, further investigation regarding this limitation should be conducted in future research. %@area COMP %@electronicmailaddress gabriel.pinheiro@inpe.br %@electronicmailaddress juarez.dasilva@iqsc.usp.br %@electronicmailaddress mdiasoraes@gmail.com %@electronicmailaddress quiles@unifesp.br %@documentstage not transferred %@group LABAC-COCTE-INPE-MCTIC-GOV-BR %@orcid %@orcid %@orcid %@orcid 0000-0001-8147-554X %@usergroup simone %@isbn 978-303058798-7 %@affiliation Instituto Nacional de Pesquisas Espaciais (INPE) %@affiliation Universidade de São Paulo (USP) %@affiliation Universidade Federal de São Paulo (UNIFESP) %@affiliation Universidade Federal de São Paulo (UNIFESP) %@versiontype publisher %@holdercode {isadg {BR SPINPE} ibi 8JMKD3MGPCW/3DT298S} %@doi 10.1007/978-3-030-58799-4_74