Predicting the strength of recycled glass powder-based geopolymers for improving mechanical behavior of clay soils using artificial intelligence

. The paper investigates the use of artificial intelligence (AI) methods to predict the strength of recycled glass powder (RGP) and soil mixtures based on different input parameters. The study utilized a database of 57 sets with 5 inputs, including RGP percentage, ordinary Portland cement (OPC) percentage, molar concentration, curing temperature and time, and one output, mixed UCS. There were two artificial intelligence models used in this study, a support vector machines (SVM) and classification and regression random forest (CRRF). The results demonstrate the potential of RGP-based geopolymers to improve the mechanical behavior of clay soils, and the use of AI methods to predict the strength of RGP and soil mixtures with high accuracy. Using SVM model, the testing dataset had a mean absolute error (MAE) and R 2 of 0.072 and 0.978, respectively. Also, CRRF had an accurate performance with a MAE of 0.075 and the R 2 of 0.979. These results suggest that the AI models fits well with the data. Also, by analyzing the results of the SVM and CRRF models, it is found that curing time is the most important input parameter, while RGP and OPC are the least significant.


Introduction
Increasing environmental concerns have led to an interest in reusing waste materials as a partial replacement for traditional construction materials [1][2][3][4][5][6].Recycled glass powder (RGP) has been investigated for its potential use in geopolymer applications [7].As an inorganic polymer, geopolymers can be produced from a variety of waste materials, including RGP [8].A geopolymer based on RGP may improve the mechanical properties of clay soils, as it has been demonstrated that it enhances the strength and durability of other construction materials [5].
A geopolymer is an inorganic material that is formed by the reaction of an aluminosilicate precursor with an alkali activator [9].A three-dimensional network of Si-O-Al bonds results in a material with unique mechanical and chemical properties [10].In addition to concrete, ceramics, and composites, geopolymers have been used in a variety of applications.Geopolymers have been shown to improve the mechanical properties of construction materials, such as their strength, durability, and resistance to corrosion [11].
During the recycling of glass, a waste material called recycled glass powder is generated.In the construction industry, it has shown potential as a partial replacement for cement or as an additive to concrete.As well as reducing the amount of waste sent to landfills, the use of RGP has been shown to reduce the carbon footprint of construction materials.Clay soils have poor mechanical properties, such as low strength and high compressibility [12].Geopolymer has been investigated as a means of improving the mechanical performance of clay soils.The addition of geopolymer to clay soils has been shown to improve their compressive strength, stiffness, and durability.
Recent studies have investigated the use of RGP-based geopolymers to improve the mechanical behavior of clay soils.Bilondi et al. [5] investigated the effects of RGP-based geopolymer on the mechanical properties of expansive soils.According to the results, the addition of RGP-based geopolymer enhanced the unconfined compressive strength and reduced the compressibility of the soil.In a study by Ashiq et al. [13], RGP-based geopolymer was examined for its effect on the strength and deformation behavior of soft clay.Based on the results, RGP-based geopolymer increased the soil's unconfined compressive strength and reduced its deformation.
It has been shown that various factors, such as the glass content, curing time and temperature, can affect the effect of adding glass.It has not yet been possible to develop a comprehensive model for determining the strength of a mixture of glass and soil.One of the reasons for this problem is the multiplicity of effective factors and the non-linearity of their effects.Artificial intelligence is one method that can be used to solve this problem.Based on AI methods, it is possible to predict the output with high accuracy without knowing the relationship between the parameters in advance [14].In the last two decades, AI methods were used in geotechnical engineering applications include slope stability [15][16][17], tunnelling [18][19], road construction [20][21], and soil cracking [22][23], soil dynamics [24][25][26] and recycled material [27][28][29][30][31].There has not yet been an article published on artificial intelligence methods for determining the strength of RGP and soil mixtures based on different input parameters.
In this study, for the first time, two AI methods, namely, the support vector machines (SVM) and classification and regression random forest (CRRF) method, are used to predict RGP and soil mixtures mixture strength using differing parameters.Input parameters include RGP percentage, ordinary Portland cement (OPC) percentage, molar concentration, curing temperature and time.There are 57 data sets included in the database.After AI modelling, selection of the best artificial intelligence model, sensitivity analysis, and parameter importance have been conducted.

Database Collection and Processing Experiment and data collection
This study utilized a database of 57 datasets with 5 inputs, including RGP percentage, ordinary Portland cement (OPC) percentage, molar concentration, curing temperature and time, and one output, the strength of RGP and soil mixtures.Database was collected from the study conducted by Bilondi et al. [5].Table 1 displays descriptive statistics for five variables collected from 57 observations.The variables include: -UCS: This variable refers to the unconfined compressive strength of a material in mega-pascals (MPa).The observations range from 0.15 MPa to 2.2 MPa, with a mean of 1.003 MPa and a standard deviation of 0.613 MPa.
-Molar Concentration: This variable measures the concentration of a certain compound in solution in moles per liter.The observations range from 0 to 7 moles per liter, with a mean of 2.719 moles per liter and a standard deviation of 1.800 moles per liter.
-RGP: This variable represents the percentage of recycled glass powder used in the material.The observations range from 0% to 25%, with a mean of 8.211% and a standard deviation of 6.681%.
-OPC: This variable refers to the percentage of ordinary Portland cement in the material.The observations range from 0% to 5%, with a mean of 0.263% and a standard deviation of 1.126%.
-Curing Temperature (˚C) and Curing Time (Days): These variables represent the curing conditions for the material, with the temperature measured in degrees Celsius and the time measured in days.The observations for curing temperature range from 25 ˚C to 70 ˚C, with a mean of 28.158 ˚C and a standard deviation of 11.597 ˚C.The observations for curing time range from 7 days to 91 days, with a mean of 32.053 days and a standard deviation of 33.543 days.

Preparation of the data for AI modelling
In the database, the parameters have different units.The accuracy and performance of artificial intelligence models can be adversely affected by this issue.Therefore, linear normalization has been used to normalize the database.Eq. 1 shows the linear normalization equation. ( where Xmax, Xmin, X and Xnorm are maximum, minimum, actual, and normalized values, respectively.A linear normalization technique is a commonly used technique in data pre-processing that aims to scale the values of parameters in a database to a common range (in this study from 0 to 1).Data is transformed linearly so that it falls within a specified range using this technique.The units of the parameters are thus standardized, which makes it easier for the AI models to process the data and make accurate predictions.Additionally, dividing the database into training and testing parts is an important part of the preparation process.As part of this study, randomly 20% (12 datasets) of the total database was used for testing, while 80% (45 datasets) was used for training.Tables 2 and 3 provide statistical information about these two databases.As shown in Tables 2 and 3, the statistical information of the two databases is quite similar, which can lead to more accurate performance of artificial intelligence models.By using similar statistical information in the training and testing data sets, the model is less likely to overfit to the training data and can make better predictions on new, unseen data.Data-driven modeling Support vector machine (SVM) Support vector machine (SVM) is a powerful and popular machine learning algorithm used in classification and regression analyses.A group of scientists led by Vladimir Vapnik developed the technique in the 1990s.A SVM is particularly useful when dealing with complex, highdimensional, or nonlinear data [32].This technique is widely used in many fields, including image and text classification, bioinformatics, and many others.In SVM, the basic idea is to find a hyperplane that divides the data into two classes with the greatest margin.Margin refers to the distance between the hyperplane and the nearest data points in each class.It is the hyperplane that maximizes the margin that is most robust to new data points and has the best generalization capability.
In SVM, the data is transformed into a high-dimensional feature space, where it is easier to find separate hyperplanes.During the transformation, a kernel function measures the similarity between pairs of data points in the original space.There are several types of kernel functions, including linear, polynomial, and radial basis functions.In the case of nonlinearly separable data, SVM uses a technique known as the kernel trick.By using this trick, the algorithm is able to find a hyperplane in a high-dimensional feature space without having to calculate the coordinates of the data.As a result, SVM is computationally efficient and scalable.
In addition to binary classifications, SVMs can also be used to analyze regression data and perform multi-class classifications.SVM uses several binary classifiers to separate each pair of classes in the case of multi-class classification.These classifiers determine the final decision.SVMs have several advantages over other machine learning algorithms, such as decision trees and artificial neural networks.As a result, it is less prone to overfitting, requires fewer data preprocessing, and can be used with both numerical and categorical data.There are, however, some limitations associated with SVM, including the selection of the kernel function and the difficulty of interpreting the results.

Classification and regression random forest (CRRF)
Classification and Regression Random Forest (CRRF) is a powerful machine learning algorithm that combines decision trees and random forests to perform both classification and regression tasks.To create a robust and accurate model, random forests are ensembles of decision trees trained on different subsets of data and feature sets.The CRRF algorithm can be applied to both classification and regression problems, making it a versatile algorithm with a wide range of applications.Based on different features, the CRRF algorithm divides the data into smaller and smaller subsets using decision trees.Each decision tree divides the data into subsets based on different rules, and each subset corresponds to a particular decision.To predict the output of the classification or regression problem, the algorithm uses an ensemble of decision trees.
In the case of classification, CRRF uses a combination of decision trees to classify data points into different categories.A random subset of the data and a random subset of the features are used to train each decision tree.In this way, the model is able to reduce overfitting and improve accuracy.As a result of the ensemble of decision trees, CRRF predicts the class label of a new data point based on the results from the ensemble of decision trees.The class with the highest number of votes is considered to be the final prediction.As for regression, CRRF uses a similar approach to predict continuous values rather than class labels.A combination of decision trees is used to predict the output value for a given set of input features.Based on a random subset of the data and a random subset of the features, each decision tree is trained, and the final prediction is derived from the average of all the predictions.
In comparison with other machine learning algorithms, CRRF has several advantages, including its ability to handle both classification and regression tasks, its accuracy and robustness, and its ability to handle missing or noisy data.In addition, CRRF is less likely to overfit than other models, making it a good choice for complex datasets.

Support vector machine (SVM)
By trial and error, different models with opposite values of effective parameters were tested in order to determine the most optimal SVM model.In Fig. 1, the predicted UCS values are compared with the actual UCS values.Using the obtained results, SVM model has successfully determined the UCS values for the RGP and soil mixture.

Fig. 1. Results of SVM
The performance metrics for SVM model are discussed in Table 4, including the mean absolute error (MAE) and R-squared (R 2 ) values for both the training and testing datasets.The MAE for the training dataset is 0.098, whereas the MAE for the testing dataset is 0.072.Having a lower MAE value indicates that the model performs better on the testing dataset.For the training dataset, the R 2 value is 0.969, while for the testing dataset, the R 2 value is 0.978.This indicates that the model fits well with the data, particularly with the testing data.

Classification and regression random forest (CRRF)
To find the most optimal CRRF model, various CRRF models were constructed by changing the effective parameters.This process involved testing different combinations of parameter values to find the combination that results in the best CRRF model performance.The results of these experiments are presented in Table 5, which shows the specifications of the best CRRF model.The parameters that were optimized for the best model include the minimum node size, minimum son size, maximum depth, Mtry, CP, sampling method, sample size, and number of trees.These parameters were selected based on their impact on the model performance, with the goal of maximizing accuracy while minimizing computational resources.Also, Table 6 shows the results for the test database.R 2 for predicting the UCS in the test database is 0.979, which indicates that the model is also well suited to the test data.Nevertheless, the MAE value for the test database is slightly higher than that of the training database, indicating that the magnitude of errors in the predicted values is greater for the test dataset.The variable importance of input parameters It is important to investigate the sensitivity of artificial intelligence models to their input parameters in order to evaluate their importance.The error was calculated by varying one input parameter at a time from -100% to +100% while keeping the other parameters constant.The purpose of this analysis is to identify the input parameters that are most important for the accuracy of the AI model.Fig. 3 shows the results of this analysis for two different models.According to the results of the SVM model, curing time is the most important input parameter, while RGP is the least significant.As a result, changes in curing time have a greater impact on the accuracy of the SVM model than changes in RGB.Similarly, curing time is identified as the most significant input parameter in the CRRF model, while OPC is identified as the least significant.

Conclusion
The use of recycled glass powder in geotechnical applications has become increasingly important due to the significant benefits it provides.As a byproduct of the glass manufacturing process, glass powder can be recycled to reduce landfill waste, conserve natural resources, and reduce greenhouse gas emissions.The study aimed to predict soil and glass mixture strength using two AI methods, namely the artificial intelligence method and the vector machine method, using different input parameters.The database used in the study contained 57 items with 5 inputs, including density, and one output, mixed resistance.The input parameters used in the study were RGP percentage, ordinary Portland cement (OPC) percentage, molar concentration, curing temperature, and time.
The SVM model was successful in determining the UCS values for the soil and glass mixture.According to the results, the MAE for the training dataset was 0.098, while the MAE for the testing dataset was 0.072.In the training dataset, the R2 value was 0.969, while in the testing dataset, the R2 value was 0.978, indicating that the model is well suited to the data.
The results showed that the model, called Classification and Regression Random Forest (CRRF), performed better than SVm model in predicting the UCS, with a high R2 value and low mean absolute error (MAE) for both the training and test databases.The study also conducted a sensitivity analysis to identify the most important input parameters for the accuracy of the CRRF model.The analysis showed that curing time was the most significant input parameter for both CRRF and Support Vector Machine (SVM) models, while RGP and OPG were the least significant.

Fig. 2
Fig. 2 shows the predicted values of UCS compared to their actual values for both the training and test databases.According to the figure, the CRRF model is performing well, since the predicted values are close to the actual values.

Table 2 .
Statistical information of training database

Table 3 .
Statistical information of testing database

Table 4 .
The performance of SVM model

Table 5 .
The specifications of the best CRRF.

Table 6
provides additional information regarding the accuracy and error values of the CRRF model for both the training and test databases.According to the table, the R 2 value for predicting the training database is 0.986, indicating a good fit between the model and the data.MAE for the training database is 0.060, indicating that the predicted values are generally accurate.

Table 6 .
The performance of CRRF model