Data Mining in Material Science


Data Mining in Material Science

Moganapriya Chinnasamy, Rajasekar Rathanasamy, Samir Kumar Pal, Manoj Kumar Kathiresan, Sathish Kumar Palaniappan

The invention of novel materials is the most propelling factor promoting the growth of contemporary civilization and technological innovation; nevertheless, previous materials research relied mostly on random techniques, which is arduous and labor-intensive. With the introduction of big data, which brings a deep upheaval in human society and considerably advances science, artificial intelligence, machine learning, and deep learning methods have recently made remarkable progress in materials science research. However, there are few comprehensive generalizations and descriptions of its applications in materials research. In this chapter, a brief summary of the evolution of materials science research is presented, followed by an emphasis on the key principles and basic processes of AI technique.

Artificial Intelligence, Machine Learning, Deep Learning, Materials

Published online , 25 pages

Citation: Moganapriya Chinnasamy, Rajasekar Rathanasamy, Samir Kumar Pal, Manoj Kumar Kathiresan, Sathish Kumar Palaniappan, Data Mining in Material Science, Materials Research Foundations, Vol. 147, pp 24-46, 2023


Part of the book on Application of Artificial Intelligence in New Materials Discovery

[1] J. Bohannon, The cyber scientist, Science. 357 (2017) 18-21.
[2] P.D. Luna, J. Wei, Y. Bengio, A.A. Guzik, E. Sargent, Use Machine Learning to Find Energy Materials, Nature Publishing Group, 2017.
[3] Y. Gil, M. Greaves, J. Hendler, H. Hirsh, Amplify scientific discovery with artificial intelligence, Science. 346 (2014) 171-172.
[4] C.P. Gomes, B. Selman, J.M. Gregoire, Artificial intelligence for materials discovery, MRS Bulletin. 44 (2019) 538-544.
[5] E.A. Hansen, S. Zilberstein, Monitoring and control of anytime algorithms: A dynamic programming approach, Artif. Intell. 126 (2001) 139-157.
[6] A. Kelly, Think Twice: Review of Thinking, Fast, and Slow by Daniel Kahneman (2011), Numeracy. 10 (2017) 15.
[7] R.D. King, K.E. Whelan, F.M. Jones, P.G. Reiser, C.H. Bryant, S.H. Muggleton, D.B. Kell, S.G. Oliver, Functional genomic hypothesis generation and experimentation by a robot scientist, Nature. 427 (2004) 247-252.
[8] J.R. Kitchin, Machine learning in catalysis, Nat. Catal. 1 (2018) 230-232.
[9] P. Nikolaev, D. Hooper, F. Webber, R. Rao, K. Decker, M. Krein, J. Poleski, R. Barto, B. Maruyama, Autonomy in materials research: A case study in carbon nanotube growth, NPJ Comput. Mater. 2 (2016) 1-6.
[10] R. Ramprasad, R. Batra, G. Pilania, A.M. Kanakkithodi, C. Kim, Machine learning in materials informatics: Recent applications and prospects, NPJ Comput. Mater. 3 (2017) 1-13.
[11] E. Smalley, AI-powered drug discovery captures pharma interest, Nat. Biotechnol. 35 (2017) 604-606.
[12] A. Agrawal, A. Choudhary, Perspective: Materials informatics and big data: Realization of the “fourth paradigm” of science in materials science, Apl Mater. 4 (2016) 053208.
[13] L. Himanen, A. Geurts, A.S. Foster, P. Rinke, Data‐driven materials science: Status, challenges, and perspectives, Adv. Sci. 6 (2019) 1900808.
[14] J. Noh, J. Kim, H.S. Stein, B.S. Lengeling, J.M. Gregoire, A.A. Guzik, Y. Jung, Inverse design of solid-state materials via a continuous representation, Matter. 1 (2019) 1370-1384.
[15] Q. Zhou, P. Tang, S. Liu, J. Pan, Q. Yan, S.C. Zhang, Learning atoms for materials discovery, Proc. Natl. Acad. Sci. 115 (2018) E6411-E6417.
[16] C.C. Fischer, K.J. Tibbetts, D. Morgan, G. Ceder, Predicting crystal structure by merging data mining with quantum mechanics, Nat. Mater. 5 (2006) 641-646.
[17] J. Schmidt, M.R. Marques, S. Botti, M.A. Marques, Recent advances and applications of machine learning in solid-state materials science, NPJ Comput. Mater. 5 (2019) 1-36.
[18] C. Moganapriya, R. Rajasekar, V.K. Gobinath, P. Prabhakaran, S.K. Jaganathan, A Frontier Statistical Approach Towards Online Tool Condition Monitoring and Optimization for Dry Turning Operation of SAE 1015 Steel, Archiv. Metall. Mater. 66 (2021) 901-909.
[19] J. Zhou, L. Shen, M.D. Costa, K.A. Persson, S.P. Ong, P. Huck, Y. Lu, X. Ma, Y. Chen, H. Tang, 2DMatPedia, an open computational database of two-dimensional materials from top-down and bottom-up approaches, Sci. Data. 6 (2019) 1-10.
[20] I.E. Castelli, K.S. Thygesen, K.W. Jacobsen, the Calculated optical absorption of different perovskite phases, J. Mater. Chem. A. 3 (2015) 12343-12349.
[21] B. Meredig, A. Agrawal, S. Kirklin, J.E. Saal, J.W. Doak, A. Thompson, K. Zhang, A. Choudhary, C. Wolverton, Combinatorial screening for new materials in unconstrained composition space with machine learning, Phys. Rev. B. 89 (2014) 094104.
[22] A.K. Singh, K. Mathew, H.L. Zhuang, R.G. Hennig, Computational screening of 2D materials for photocatalysis, J. Phys. Chem. Lett. 6 (2015) 1087-1098.
[23] R.G. Bombarelli, J.A. Iparraguirre, T.D. Hirzel, D. Duvenaud, D. Maclaurin, M.A.B. Forsythe, H.S. Chae, M. Einzinger, D.G. Ha, T. Wu, Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach, Nat. Mater. 15 (2016) 1120-1127.
[24] S. Curtarolo, G.L. Hart, M.B. Nardelli, N. Mingo, S. Sanvito, O. Levy, The high-throughput highway to computational materials design, Nat. Mater. 12 (2013) 191-201.
[25] L. Yu, A. Zunger, Identification of potential photovoltaic absorbers based on first-principles spectroscopic screening of materials, Phys. Rev. Lett. 108 (2012) 068701.
[26] J. Hachmann, R.O. Amaya, A. Jinich, A.L. Appleton, M.A.B. Forsythe, L.R. Seress, C.R. Salgado, K. Trepte, S.A. Evrenk, S. Er, Lead candidates for high-performance organic photovoltaics from high-throughput quantum chemistry-the Harvard Clean Energy Project, Energy Envtl. Sci. 7 (2014) 698-704.
[27] E.O.P. Knapp, C. Suh, R.G. Bombarelli, J.A. Iparraguirre, A.A. Guzik, What is high-throughput virtual screening? A perspective from organic materials discovery, Annual Rev. Mater. Res. 45 (2015) 195-216.
[28] M.C. Sorkun, A. Khetan, S. Er, AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds, Sci. Data. 6 (2019) 1-8.
[29] S. Li, L. Ma, M. Zhou, Y. Li, Y. Xia, X. Fan, C. Cheng, H. Luo, New opportunities for emerging 2D materials in bioelectronics and biosensors, Curr. Opin. Biomed. Eng. 13 (2020) 32-41.
[30] N. Briggs, S. Subramanian, Z. Lin, X. Li, X. Zhang, K. Zhang, K. Xiao, D. Geohegan, R. Wallace, L.Q. Chen, A roadmap for electronic grade 2D materials, 2D Mater. 6 (2019) 022001.
[31] C. Moganapriya, R. Rajasekar, T. Mohanraj, V. Gobinath, P.S. Kumar, C. Poongodi, Dry machining performance studies on TiAlSiN coated inserts in turning of AISI 420 martensitic stainless steel and multi-criteria decision making using Taguchi-DEAR Approach, Silicon. 14 (2021) 4183-4196.
[32] C. Moganapriya, R. Rajasekar, K. Ponappa, P.S. Kumar, S.K. Pal, J.S. Kumar, Effect of coating on tool inserts and cutting fluid flow rate on the machining performance of AISI 1015 steel, Materials Testing. 60 (2018) 1202-1208.
[33] G.R. Bhimanapati, Z. Lin, V. Meunier, Y. Jung, J. Cha, S. Das, D. Xiao, Y. Son, M.S. Strano, V.R. Cooper, Recent advances in two-dimensional materials beyond graphene, ACS Nano. 9 (2015) 11509-11539.
[34] K.S. Novoselov, A.K. Geim, S.V. Morozov, D.E. Jiang, Y. Zhang, S.V. Dubonos, I.V. Grigorieva, A.A. Firsov, Electric field effect in atomically thin carbon films. Science. 306 (2004) 666-669.
[35] S. Haastrup, M. Strange, M. Pandey, T. Deilmann, P.S. Schmidt, N.F. Hinsche, M.N. Gjerding, D. Torelli, P.M. Larsen, A.C.R. Jensen, The computational 2D materials database: High-throughput modeling and discovery of atomically thin crystals, 2D Mater. 5 (2018) 042002.
[36] M. Ashton, J. Paul, S.B. Sinnott, R.G. Hennig, Topology-scaling identification of layered solids and stable exfoliated 2D materials, Phys. Rev. Lett. 118 (2017) 106101.
[37] A. Jain, S.P. Ong, G. Hautier, W. Chen, W.D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder, Commentary: The Materials Project: A materials genome approach to accelerating materials innovation, APL Mater. 1 (2013) 011002.
[38] G. Bergerhoff, I. Brown, F. Allen, Crystallographic databases, International Union of Crystallography, Chester. 360 (1987) 77-95.
[39] S. Gražulis, A. Daškevič, A. Merkys, D. Chateigner, L. Lutterotti, M. Quiros, N.R. Serebryanaya, P. Moeck, R.T. Downs, A.L. Bail, Crystallography Open Database (COD): An open-access collection of crystal structures and platform for world-wide collaboration, Nucleic Acids Res. 40 (2012) D420-D427.
[40] A. Jain, Y. Shin, K.A. Persson, Computational predictions of energy materials using density functional theory, Nat. Rev. Mater. 1 (2016) 1-13.
[41] M.C. Sorkun, S. Astruc, J. Koelman, S. Er, An artificial intelligence-aided virtual screening recipe for two-dimensional materials discovery, NPJ Comput. Mater. 6 (2020) 1-10.
[42] P.R. Regonia, C.M. Pelicano, R. Tani, A. Ishizumi, H. Yanagi, K. Ikeda, Predicting the band gap of ZnO quantum dots via supervised machine learning models, Optik. 207 (2020) 164469.
[43] G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science. 313 (2006) 504-507.
[44] M. Becker, J. Lippel, A. Stuhlsatz, T. Zielke, Robust dimensionality reduction for data visualization with deep neural networks, Graphical Models. 108 (2020) 101060.
[45] C. Moganapriya, R. Rajasekar, P. Sathish Kumar, T. Mohanraj, V. Gobinath, J. Saravanakumar, Achieving machining effectiveness for AISI 1015 structural steel through coated inserts and grey-fuzzy coupled Taguchi optimization approach, Structural and Multidisciplinary Optimization. 63 (2021) 1169-1186.
[46] C. Moganapriya, M. Vigneshwaran, G. Abbas, A. Ragavendran, V.H. Ragavendra, R. Rajasekar, Technical performance of nano-layered CNC cutting tool inserts-An extensive review, Mater. Today: Proc. 45 (2021) 663-669.
[47] G.C. Nayak, R. Rajasekar, C.K. Das, Effect of SiC coated MWCNTs on the thermal and mechanical properties of PEI/LCP blend, Composites Part A: Appl. Sci. Manuf. 41 (2010) 1662-1667.
[48] M.N. Rastgoo, B. Nakisa, F. Maire, A. Rakotonirainy, V. Chandran, Automatic driver stress level classification using multimodal deep learning, Expert Systems with Applications. 138 (2019) 112793.
[49] L. Romeo, J. Loncarski, M. Paolanti, G. Bocchini, A. Mancini, E. Frontoni, Machine learning-based design support system for the prediction of heterogeneous machine parameters in industry 4.0, Expert Systems with Applications. 140 (2020) 112869.
[50] M. Paolanti, L. Romeo, M. Martini, A. Mancini, E. Frontoni, P. Zingaretti, Robotic retail surveying by deep learning visual and textual data, Robot. Auto. Syst. 118 (2019) 179-188.
[51] P. Hähnel, J. Mareček, J. Monteil, F.O. Donncha, Using deep learning to extend the range of air pollution monitoring and forecasting, J. Comput. Phys. 408 (2020) 109278.
[52] R.R. Blázquez, M.M. Organero, Using multivariate outliers from smartphone sensor data to detect physical barriers while walking in urban areas, Technol. 8 (2020) 58.
[53] Y. Juan, Y. Dai, Y. Yang, J. Zhang, Accelerating materials discovery using machine learning, J. Mater. Sci. Technol. 79 (2021) 178-190.
[54] Y. Liu, T. Zhao, W. Ju, S. Shi, Materials discovery and design using machine learning, J. Materiomics 3 (2017) 159-177.
[55] C.M. Bishop, N.M. Nasrabadi, Pattern recognition and machine learning, Springer, 2006.
[56] S. Fang, M. Wang, W. Qi, F. Zheng, Hybrid genetic algorithms and support vector regression in forecasting atmospheric corrosion of metallic materials, Comput. Mater. Sci. 44 (2008) 647-655.
[57] W. Paszkowicz, K.D.M. Harris, R.L. Johnston, Genetic algorithms: A universal tool for solving computational tasks in Materials Science Preface, Comput. Mater. Sci. 45 (2009) IX-X.
[58] C.E. Mohn, W. Kob, A genetic algorithm for the atomistic design and global optimisation of substitutionally disordered materials, Comput. Mater. Sci. 45 (2009) 111-117.
[59] X.J. Zhang, K.Z. Chen, X.A. Feng, Material selection using an improved genetic algorithm for material design of components made of a multiphase material, Mater. Design. 29 (2008) 972-981.
[60] T.M. Mitchell, Machine Learning, McGraw-hill, New York, 1997.
[61] C.W. Coley, R. Barzilay, T.S. Jaakkola, W.H. Green, K.F. Jensen, Prediction of organic reaction outcomes using machine learning, ACS Central Sci. 3 (2017) 434-443.
[62] E.D. Cubuk, S.S. Schoenholz, J.M. Rieser, B.D. Malone, J. Rottler, D.J. Durian, E. Kaxiras, A.J. Liu, Identifying structural flow defects in disordered solids using machine-learning methods, Phys. Rev. Lett. 114 (2015) 108001.
[63] Y. Dong, C. Wu, C. Zhang, Y. Liu, J. Cheng, J. Lin, Bandgap prediction by deep learning in configurationally hybridized graphene and boron nitride, NPJ Comput. Mater. 5 (2019) 1-8.
[64] Y. Zhuo, A.M. Tehrani, A.O. Oliynyk, A.C. Duke, J. Brgoch, Identifying an efficient, thermally robust inorganic phosphor host via machine learning, Nat. Commun. 9 (2018) 1-10.
[65] S.K. Kauwe, J. Graser, A. Vazquez, T.D. Sparks, Machine learning prediction of heat capacity for solid inorganics, Integrating Materials and Manufacturing Innovation. 7 (2018) 43-51.
[66] L. Kaufman, P.J. Rousseeuw, Finding groups in data: An introduction to cluster analysis, John Wiley & Sons, 2009.
[67] M.E. Celebi, Partitional clustering algorithms, Springer, 2014.
[68] N. Grira, M. Crucianu, N. Boujemaa, Unsupervised and semi-supervised clustering: A brief survey, A review of machine learning techniques for processing multimedia content. 1 (2004) 9-16.
[69] A.R. Kitahara, E.A. Holm, Microstructure cluster analysis with transfer learning and unsupervised learning, Integrating Materials and Manufacturing Innovation. 7 (2018) 148-156.
[70] L.L.C. Kasun, Y. Yang, G.B. Huang, Z. Zhang, Dimension reduction with extreme learning machine, IEEE Trans. Image Process. 25 (2016) 3906-3918.
[71] T. Xie, A.F. Lanord, Y. Wang, Y.S. Horn, J.C. Grossman, Graph dynamical networks for unsupervised learning of atomic scale dynamics in materials, Nat. Commun. 10 (2019) 1-9.
[72] A. Mardt, L. Pasquali, H. Wu, F. Noé, VAMPnets for deep learning of molecular kinetics, Nat. Commun. 9 (2018) 1-11.
[73] E.Y. Lee, B.M. Fulan, G.C. Wong, A.L. Ferguson, Mapping membrane activity in undiscovered peptide sequence space using machine learning, Proc. Natl. Acad. Sci. 113 (2016) 13588-13593.
[74] K. Tran, Z.W. Ulissi, Active learning across intermetallics to guide discovery of electrocatalysts for CO2 reduction and H2 evolution, Nat. Catal. 1 (2018) 696-703.
[75] A. Talapatra, S. Boluki, T. Duong, X. Qian, E. Dougherty, R. Arróyave, Autonomous efficient experiment design for materials discovery with Bayesian model averaging, Phys. Rev. Mater. 2 (2018) 113803.
[76] B.W. Matthews, Comparison of the predicted and observed secondary structure of T4 phage lysozyme, Biochimica et Biophysica Acta (BBA). 405 (1975) 442-451.
[77] A. Thalamuthu, I. Mukhopadhyay, X. Zheng, G.C. Tseng, Evaluation and comparison of gene clustering methods in microarray analysis, Bioinformatics. 22 (2006) 2405-2412.
[78] S. Dudoit, J. Fridlyand, A prediction-based resampling method for estimating the number of clusters in a dataset, Genome Biology. 3 (2002) 1-21.
[79] P.M. Shenai, Z. Xu, Y. Zhao, Applications of Principal Component Analysis (PCA) in Materials Science, in: P. Sanguansat (Eds.), Principal Component Analysis, 2012, pp. 25-40.
[80] M. Ayyar, M.P. Mani, S.K. Jaganathan, R. Rathanasamy, Preparation, characterization and blood compatibility assessment of a novel electrospun nanocomposite comprising polyurethane and ayurvedic-indhulekha oil for tissue engineering applications, Biomed. Tech. 63 (2018) 245-253.
[81] J. Li, K. Lim, H. Yang, Z. Ren, S. Raghavan, P.Y. Chen, T. Buonassisi, X. Wang, AI applications through the whole life cycle of material discovery, Matter. 3 (2020) 393-432.