A proposal of classification for machine-learning vibration-based damage identification methods

. Recent advances in computing power and sensing technology led to a significant evolution of Structural Health Monitoring (SHM) techniques, transforming SHM into a “Big Data” problem. The use of data-driven approaches for damage identification purposes, specifically Machine Learning (ML) methods, has gained popularity. ML can help at various levels of the SHM process: to pre-and post-process input data, extract damage sensitive features, and operate pattern recognition in measured data and output valuable information for damage identification. In this paper, the role of ML in SHM applications is discussed together with a new scheme for classifying ML applications in SHM, especially focusing on vibration-based monitoring, given its consolidated theoretical base. Finally, the implications of the application of these methods to historic structures are discussed, with a brief account of existing case studies. The proposed classification is exemplified using the most recent studies available in the literature on cultural heritage structures.


Introduction
Structural Health Monitoring (SHM), as the process of implementing strategies for Damage Identification (DI) [6], is an interdisciplinary field which has been successfully investigated over the last decades.Meanwhile, advancements in computational power and data science have opened new avenues for the development of data-driven approaches for SHM and Machine-Learning (ML) [7].The research effort in this direction led to a literature explosion in ML, with the number of papers published on the topic rapidly increasing in the last 20 years (Figure 1).

Figure 1 -Number of publications per year in the last two decades. Research operated on
Scopus for words in Title, Keywords and Abstract.
Since the knowledge and methodologies on Data Science (DS) and Artificial Intelligence (AI) are increasingly being transferred to Civil Engineering, the need for a shared, clear glossary and framework is arising, to allow professionals and researchers to explore all capabilities of ML algorithms and connect transversal topics in the growing interdisciplinarity of SHM.Numerous published reviews have successfully formulated guidelines to approach the state-of-the-art research on ML applications to SHM [2], [3], [4], [8], [9], [10], [11].However, differences in the classification of existing studies can still be found.The objectives of this paper are: (i) to provide basic notions to approach the study of ML applications to SHM, focusing on the use of vibration signatures, (ii) to propose a new classification methodology to operate a review of existing studies, and (iii) to briefly discuss peculiarities related to historic structures in the framework of ML applications to data-driven and vibration-based SHM.

Definitions
ML was introduced, as a subset of AI, to overcome the limitations of knowledge-based approaches [2].ML algorithms "learn" systematically from a sufficient amount of data without the use of explicit programming [13].The construction of a ML model entails the presence of input data, commonly divided into training, validation, and testing sets.The datasets are kept independent to ensure a correct assessment of the prediction accuracy over the validation set, preventing an overfitting of the model against the training set [12].After repeated training, once the model is optimized, the testing set is fed to the model to operate a check against new data.
The ML training process is called supervised or unsupervised, based on the type of training data, which can be labelled or unlabelled, respectively.If both labelled and unlabelled data are used, the process is called semi-supervised ML.Moreover, in reinforcement learning, the use of unlabelled data is accompanied by agents that positively correct predictions in a trial-and-error process, reducing the requirements of training data.The availability and type of data is a key element in the choice of ML model.[10], it is the problem of discovering automatically irregularities in data through the use of computer algorithms [13].The expression Big Data is now often used to indicate a "field" of CS [14].(Image adapted from [14]).

Figure 2 -Relationship between techniques related to CS, AI and data mining, the process of extracting useful knowledge and information from the bases of data. Pattern recognition is not a methodology or a technique
Basic ML algorithms require the conversion of data in a fixed number of features.Big Data will often present higher sparsity requiring more features to describe it [1], lowering model reliability and statistical significance.The higher the number of features, the larger the data requirements of the ML algorithms, hence the so-called curse of dimensionality [12].Discarding redundant information is often applied to overcome this issue.Moreover, to avoid the handcrafting of features in complex applications, Deep Learning (DL) methodologies were developed, to operate the feature selection process autonomously.DL algorithms explain high-level and abstract features as a hierarchy of simple and low-level learned features, ultimately reducing the dimension of feature vectors [2].Relationships between AI, ML, DL, etc. are shown graphically in Figure 2.
The ML model is then finally used to solve a specific learning problem, such as classification, regression or prediction, clustering or density estimation problems [2].The output of the application of such processes can yield both true and false "predictions".

ML in the SHM process
In data-driven vibration-based SHM, ML applications can be found at different levels of the process [11].The most relevant are feature extraction/selection, dimensionality reduction and discrimination of the effect of environmental and operational variabilities (EOVs), statistical pattern recognition (SPR) and, finally, DI.
In an SPR framework, detecting the presence of damage means distinguishing between an initial "healthy" or undamaged state and a damaged state [12].Detection problems represent the largest studied level of the DI hierarchy [15], comprised of detection, localization, assessment, quantification and prognosis.If operated unsupervised, damage detection is referred to as novelty/anomaly or outlier detection, a long-established statistical technique that can also be addressed through ML inference.Classification between damaged/undamaged is mostly operated, so much that ML algorithms are often called simply classifiers [2].
ML applications are steadily gaining recognition as viable techniques to operate data-driven vibration-based SHM.Several challenges are still to be addressed.The lack of data with damage is a long-standing obstruction to a more widespread use of ML.Over time viable solutions are being identified, by creating thresholds, synthetically generating data with damage and operating experiments.At the forefront of this research are the studies on Population-Based SHM (PBSHM), seeking to group similar structures in populations and using transfer-learning to overcome the lack of damaged data [20].
The selection of damage sensitive features, namely factors that make explicit the damage pattern to be learned from data [6], is still an ongoing research topic.Finally, going beyond the detection stage with unsupervised techniques is still a challenge, and damage prognosis is still achieved only when the physics of damage progression is included in a hybrid data/model-driven approach.Physics-informed ML applications are working towards this goal [21].

Proposed classification scheme for Machine-Learning Application to SHM
Considering the complexity of the subject and the interdisciplinarity necessary to approach the state-of-the-art research on ML applications, a new classification scheme for the development of a detailed review is proposed herein.The methodology encompasses a bottom-up strategy from a civil engineering perspective while gathering the necessary information to draw relevant conclusions from the analysed literature, in terms of recurrence of the methodologies, performance, success rate, etc.
According to the proposed classification, the level of the damage hierarchy reached is identified first.Then, a distinction is operated among three aspects: the type of extracted features (i.e., modal parameters, or statistical parameters in Autoregressive models etc.), the metrics used as a novelty/damage index (i.e., a distance metrics like the Mahalanobis distance), and the model built to highlight the presence of an underlying pattern (i.e., a simple Artificial Neural Network (ANN)).
To summarise, four elements are at the core of the classification, which should be used to operate a clear distinction in applications, as follows.
• The level of the damage hierarchy.
• The type of features.
• The type of SPR/ML method.
• The damage index/indices.It is important to notice how ML could be used for one or both feature extraction/selection and SPR, but also at other stages of the SHM process, like data pre-processing or performance optimization.Moreover, the identification of the adopted damage index or indices is not highlighted in existing reviews, while it could show how the same index could apply to different extracted features.
Additionally, other information on the input data should be part of the classification if available.The methodology may have been tested on data from various sources, numerical, experimental or real data.The number and type of sensors may play a role in the quality and significance of the data obtained with respect to the structure analysed.Feature extraction and selection techniques should be added to the classification as well.Together with the number and types of features, the classification should specify whether they are considered as stochastic variables allowing for uncertainty estimation, and which technique was used, for example Gaussian Processes or Bayesian statistical inference.Also, if a dimensionality reduction technique was employed it should be noted, even if it is not a ML one.These techniques are used to reduce the size of input parameters without losing the information content of the data, for example Principal Component Analysis (PCA), and they can be instrumental in the overall performance of the application.
The algorithm for SPR is also to be specified further, in terms of architecture family, distinguishing for example between instance-based and clustering algorithms.The same stage of the damage hierarchy could be addressed with a different learning problem depending on the structure and the available data, for example the type of damage can be classified or clustered into several known damage scenarios or mechanisms.Strictly connected to the learning problem chosen, the nature of the output (Boolean, cluster, etc.) should also be identified.Finally, specific information on whether and how the performance of the algorithm was evaluated in the study should be included, as they could give insights into the validity of the methodology.Critical considerations on the pros and cons of the use of ML in the analysis application could be added, if provided by the paper authors.The classification should be complemented with all the necessary information to reference authors and publications.An example of two of the recent ML applications to data-driven vibration-based SHM of historic structures is provided in Table 1.

ML applications on SHM for Historic Structures
SHM operated on historic structures present a series of peculiarities which make the application of a data-driven approach and ML techniques even more challenging than in civil structures.Architectural heritage often encompasses complex structures in terms of geometry and mechanical behaviour.Materials are often heterogeneous and behave nonlinearly, with strict dependence on environmental conditions.Operating manual mapping of damages on these structures is even more expensive and time-consuming than it is on civil structures, given the ancient designs and construction techniques, and the fact that damage is often hidden to visual inspection.
In historic structures, damage can present itself at a global and a local level, potentially with equal relevance.One challenge is to detect relevant sudden changes in the state of the structure in real-time, to trigger consequent inspections and controls.Another is the question of identifying trends of accumulation of damage over extended periods of time, filtering out the effect of environmental and operational parameters, learning how they factor in, in the different damage mechanisms.
Many uncertainties arise when model-based approaches are pursued to monitor historic structures.SHM approaches based on dynamic identification are among the most widespread used techniques, given their strong theoretical base and the direct interpretability of the output.Nonetheless, their support for engineering decisions is still limited, given the global nature of the output and the often local nature of the damage [5].To date, a replicable and generalized approach to SHM of historic structures is yet to be reached.ML methods are slowly starting to gain traction also for SHM of historic structures, on different scales, tackling a variety of problems [17], but a lot of work still needs to be done in this direction.

Table 1 -Classification of two ML applications to data-driven vibration-based SHM according
to the proposed scheme.

Conclusions
The proposed classification is intended as a base for a future review work, aimed at critically examining the existing methods, applying them to benchmark case studies and providing meaningful comparisons in terms of computational cost and accuracy.Future work is aimed at identifying key factors in the successful applicability of ML techniques to SHM of historic structures.

Year of publication: 2018
Title: An artificial intelligence strategy to detect damage from response measurements: application to an ancient tower.Authors: Marrongelli et al. [18] Published in: MATEC Web of Conferences 211, 21002 (2018) VETOMAC XIV Damage level: Level 1 -Damage detection Features: Modal parameters -5 natural frequencies Statistical Pattern Recognition Model: Support Vector Machines (SVMs) Damage index: SVM probability Input Methodology Data source: real data Type of structure: ancient tower Material: brick masonry Sensors: 3 piezoelectric triaxial accelerometers positioned along the tower height, 1 temperature sensor Feature extraction technique: covariance-driven Stochastic Subspace Identification (SSI-Cov) Features: 5 natural frequencies, alone and combined with value of temperature Algorithm architecture: Instance-based Learning problem: Classification Objective: seismic damage detection Performance evaluation: N/A Comments: Uncoupling frequency and temperature input yields better results.Year of publication: 2017 Title: Damage detection in railway bridges using Machine Learning: application to a historic structure.Authors: Chalouhi et al. [19] Published in: X International Conference on Structural Dynamics, EURODYN17 Damage level: Level 1 -Damage detection Features: Acceleration signals Statistical Pattern Recognition Model: Artificial Neural Network (ANNs) and Gaussian Processes (GP) Damage index: Normalized prediction error between the real and simulated data Input Methodology Data source: real and numerically simulated data Type of structure: historic bridge Material: cast iron deck and arches and brick masonry piers Sensors: 21 MEMS accelerometers, 3 for each bridge section, 2 air temperature sensors Feature extraction technique: Selection of acquired acceleration signals at each train passage Features: Acceleration signals Algorithm architecture: Neural Network and Gaussian Process Learning problem: Classification Objective: anomaly detection against predicted values Performance evaluation: Receiver Operating Characteristics (ROC) Comments: Detection reliability decreases with the number of train passages.