Disclaimer: This dissertation has been written by a student and is not an example of our professional work, which you can see examples of here.

Any opinions, findings, conclusions, or recommendations expressed in this dissertation are those of the authors and do not necessarily reflect the views of UKDiss.com.

A Data Mining Framework for Network-Level Pavement Condition Assessment Using Remote Sensing

Info: 10382 words (42 pages) Dissertation
Published: 18th May 2020

Reference this

Tagged: Computer Science

A Data Mining Framework for Network-Level Pavement Condition Assessment Using Remote Sensing

Pavement condition monitoring is essential for efficient resource allocation in transportation asset management. However, the collection of data involves laborious and costly procedures. The intention of this study is to investigate the usage of remote sensing data for network level pavement condition assessment to provide a more cost-effective alternative. Based on an extensive literature review, a data mining framework has been established utilizing the inherent information of remote sensing images to train models that will be capable of predicting the pavement condition of different road segments. In order to identify pavement sampling areas, an automated procedure using image segmentation is proposed. Unlike previous research, different classification models are trained to approximate the mapping function from spectral information to pavement condition. A preliminary case study was conducted with data provided by the City of Dallas and remote sensing images acquired from the Texas Natural Resources Information System. The mean-shift segmentation algorithm was used to identify noise introducing areas on the pavement surface. Four different classification models were trained using k-nearest neighbors, naïve Bayes, support vector machines and a multilayer perceptron. The developed models were used to predict the pavement condition class of a test set that was not used in the training procedure. The multilayer perceptron presented the highest accuracy level of 71 percent, indicating that the framework might have potential for future implementation if further research is conducted on the different constituent steps to increase classification accuracy.

Keywords: classification; data mining; image processing; infrastructure management; pavement condition; remote sensing; roads


Pavement data collection technologies have seen a widespread application during the past decade (Pierce, McGovern, and Zimmerman 2013). Manual field inspections and video logging that have traditionally been used to collect pavement condition data are increasingly getting replaced by mobile automated systems that include high-speed laser, acoustic, and infrared imaging sensors. Based on measurements of roughness, surface distress, skid resistance, and deflection, pavements are usually assigned an index score that reflects their overall condition. For this reason, many different indices have been developed and are used by various transportation infrastructure management agencies, since they act as a basis for resource allocation and maintenance decisions.

Automated surveys are typically conducted with the use of equipment mounted to vans specifically designed for collecting pavement and roadway characteristic data. Assessing the condition of pavements through the different inspection approaches can be expensive, laborious, and time consuming (Witten and Frank 2005). The costs of automated pavement condition data collection and processing vary greatly depending on the specific items addressed and on logistics. Full-featured collection and processing averages more than $30 per lane-km ($50 per lane-mi) and may reach $125 per lane-km ($200 per lane-mi) or more in urban, high-traffic areas (McGhee 2004). Considering the vast network size that a typical transportation agency manages, the total cost of pavement condition surveying and processing can reach several million dollars per year. Furthermore, the traditional pavement data collection process usually involves disruptions of traffic of the stop-and-go slow speed nature of pavement data collections, which might also contribute safety hazards (Ullman, Ragsdale, and Chaudhary 1998). As an alternative, developments in remote sensing and data science have shown potential in using high-resolution images to assess pavement conditions inexpensively (Schnebele et al. 2015). These images can be captured through sensors mounted on different platforms such as unmanned aerial vehicles, airplanes, and satellites.

Data collection requirements for network and project level decision making present major differences (Flintsch and McGhee 2009). At the network level, a large amount of pavement condition data is collected, which is usually transformed into composite condition indices or scores. This level of information is most appropriate for decision makers to prioritize pavement segments and to make multi-year projections with respect to the overall network condition. On the other hand, project level data collection involves more detailed distress identification and severity assessments that can be used for the selection of specific maintenance and rehabilitation treatments and project level cost estimation. In order to assist in data collection and satisfy the corresponding requirements, Paterson and Scullion (1990) proposed dividing data needs into different information quality levels that correlate to the degree of sophistication required for transportation asset management decision making. In this context, very detailed data can be aggregated into progressively higher-level forms such as key performance measures or indicators, which might combine key factors from several pieces of information. Five such information quality levels (IQL) were defined subsequently ranging from project level detailed data (IQL 1) to high level system performance monitoring (IQL 5) (Bennett and Paterson 2000). The methods employed in this paper and the corresponding literature review focus more on the planning and performance evaluation information level (IQL 4) based on this classification.  Consequently, this research is focused on assessing overall pavement condition and does not consider detecting individual distresses.

Literature Review

After delimitating the scope of this paper, a critical review of the remote sensing applications for pavement performance assessment is provided in this section. The applications that this review entails, utilize images of very high resolution (2-m up to 15-cm) to predict overall pavement condition in terms of distress, ride quality and safety. Images with even higher resolution have been used for specific project-level distress identification. One of the very first attempts to evaluate pavement condition through airborne or satellite remote sensing products was discussed by Noronha et al. (2002). In this research, a spectral library containing the signatures of a limited amount of pavement segments of different ages and conditions was developed. It was observed that for recently paved roads, the spectral reflectance was generally low. On the contrary, for old and deteriorated road surfaces, a general increase in reflectance in all parts of the spectrum was shown. This difference was higher in the infrared spectrum. To the extent that this correlates with pavement health, hyperspectral analysis is useful. However, the spectral effect of common indicators of pavement condition, such as cracking and rutting, are not essentially detectable in the 4-meter resolution imagery that was available at the time. The same field collected hyperspectral data were further investigated by Herold et al. (2003). In this research, some initial justification was given for the development of specific absorption features reflecting the decreasing asphalt content in the aggregate. These features refer to the oxidation of in-place material and degradation from polished aggregates and raveling. The corresponding spectral signatures derived from the 4-meter resolution imagery were presented and compared to the ground measurements. It was explained that the observed differences were expected since the amount of spectral detail decreases, because of atmospheric and system noise, as well as spectral mixture effects of surfaces other than pavements. Out of the extremely large amount of information given in hyperspectral images, the authors also identified four bands in the visible and near infrared spectrum as well as two bands in the short-wave infrared spectrum, which could potentially be used to identify raveling and aging effects. Subsequently, an extended road spectra acquisition campaign was initiated. An important feature of this campaign was that the collected spectral data were integrated with in-situ pavement condition surveys using an actual pavement performance indicator. The spectral effects of structural distresses were also studied. The main spectral impact of cracking was decreased brightness in all parts of the spectrum as cracking exposes deeper layers of the pavement with higher contents of the original asphalt mix. This work was discussed in several publications (Herold et al. 2004; Herold and Roberts 2005; Herold et al. 2008), while a more detailed chemometric description for the spectral characteristics of different pavement surface distresses and features was also given (Herold 2007). The research focused only on the relationship between the reflectance difference of two specific bands in the blue visible and near infrared with the pavement condition index and a structural index. It was found that there was significant amount of correlation when ground collected spectra were used against detailed condition measurements. For spectral data collected from lower spatial resolution images and network-level pavement condition, the strength of this relationship seemed to attenuate.

Pascucci et al. (2008) focused more on the pavement surface reflectance in the long-wave infrared spectrum, as they tried to exploit the absorption property of limestone at a specific band of that spectrum. The notion behind this research was that since limestone was the dominant aggregate material used in asphalt concrete segments of the study area, then its exposure that signifies surface defects could be captured by a sensor in this thermal infrared spectrum. However, since this part of the spectrum is heavily affected by atmospheric effects, a specific correction had to be applied in the data. Based on limited visual inspections of two study areas of pavements, the authors identified a threshold in the thermal reflectance metric they developed to discriminate between pavements in good condition and pavements that had to be checked for maintenance. The authors reported a high degree of agreement between other test sections exceeding that threshold and the subjective need for maintenance based on their observations. Later, Mei et al. (2014) revisited the issue of aggregate exposure, but this time a quantitative metric named exposed aggregate index was used. This index was developed by Manzo et al. (2014), and it was based on area calculation of exposed aggregate coverage in natural color band images. The area is calculated based on a supervised classification scheme. The result was then compared with a series of proposed spectral indices of the visible and near infrared spectrum, and significant statistical correlations were found. The index with the highest correlation involved findings from a previous work (Mei, Salvatori, and Allegrini 2011) which provided asphalt surface brightness clusters in images from commercial multispectral satellite sensors (Mei and Salvatori 2013; Mei, Salvatori, et al. 2014) that were associated with bitumen removal.

Andreou , Karathanassi, and Kolokoussis (2011) developed a spectral library for asphalt from field measurements, defining five potential categories of asphalt condition and minimizing the dimension of the hyperspectral space. These categories ranged from good condition to highly distressed pavements. Different processing methods were used, and it was found that principal component analysis performed well at distinguishing between asphalt conditions. Mohammadi (2012) used hyperspectral data and asphalt signatures assuming that the mean reflectance from the visible to the short-wave infrared spectrum would be disproportional with the pavement condition. Three states of condition (good, intermediate, and bad) were defined, and the classification results were compared to limited field visits, which were not adequate for drawing confident conclusions. Resende, Bernucci, and Quintanilha (2014) used hyperspectral and multispectral images of 50-cm and 25-cm resolution respectively and employed an object-based classification to identify pavement areas with specific distresses directly. For this reason, attributes describing the geometric and spectral characteristics of the objects were generated. Accuracy metrics were provided only for the class corresponding to patches where more than half of the training instances were mis-classified. Mettas et al. (2015) created another spectral library from field surveys to capture characteristics of specific distresses in different pavement ages. Afterwards, separability indices for cracked and uncracked pavement were developed, and it was recognized that the blue visible and the short wave infrared spectrum had the highest potential (Mettas et al. 2016). Finally, the separability indices were extended for an in-band analysis of a medium resolution multispectral satellite sensor with similar results that showed the potential for rapid age and defect assessment using higher resolution satellite images (Mettas et al. 2016).

There has been a general shift in the literature from approaches that tried to find the highest correlation between specific bands of the spectrum and a performance indicator to approaches that try to exploit as much information given overall by the spectral spatial patterns. Shahi, Shafri, and Hamedianfar (2017) used object-based image analysis to differentiate between good and poor pavement condition using satellite images. Spatial, spectral, textural, and color related attributes were generated, and different feature selection techniques were used and compared. These techniques included support vector machine, random forest, and chi-square algorithms that were evaluated to select the most effective one in identifying the best set of attributes. The classification result based on the chi-square algorithm achieved the highest accuracy. However, a significant drop in accuracy is expected if more than two condition classes are added to the prediction. Pan et al. (2017) used 2-m resolution commercial satellite images and spectral mixture analysis to classify the age of the pavements. Some asphalt pavement pixels were covered by other objects (e.g., vegetation, vehicles, sidewalks), and spectral unmixing was performed to calculate the percentage of pavement from the mixed pixels. Based on field investigation and in situ measurements, aged asphalt pavements were categorized into four stages: preliminarily aged, moderately aged, heavily aged, and distressed.

Carmon and Ben-Dor (2016) used a near-infrared spectral based model to predict the dynamic friction coefficient as a pavement safety performance indicator. Spectral measurements were acquired using a car-mounted field spectrometer, and friction measurements were conducted in a multitude of points along one highway corridor. In this work, the spectral information was modeled to predict the road’s dynamic friction coefficient using an artificial neural network after a principal component analysis was used for attribute generation. The model achieved a high degree of agreement between predicted and actual friction values which relied heavily on the selected sophisticated statistical modeling tools that were designed to reduce bias and error propagation. Recently, a similar approach was followed, but this time using near-infrared and short-wave infrared spectral data from airborne sensors using the same statistical learning tool (Carmon and Ben-Dor 2018). A partial least squares model was employed, but the results did not achieve the same accuracy since the spatial resolution of the data used (1-m) is almost eight times lower than the field measurements (~13-cm), and surface spectral mixing noise is introduced.

Previous research in the field of pavement condition assessment using remote sensing has not fully exploited developments in image analysis and data science. The majority of the studies focus only on limited image processing steps and consider specific prediction model formulations. In this paper a comprehensive methodological framework is suggested were different machine learning models are trained and used to assess pavement condition. In this framework a multitude of techniques to analyze image data are included. Also, instead of manually digitizing pavement sampling areas upon the images an automated method employing image segmentation is implemented. A description of the methodological framework is provided in the next section.

Methodological Framework

The goal of this study is to develop a pavement condition classification framework for network level assessments based on remote sensing data, that will provide high level information for transportation asset management and will contribute to the cost reduction of pavement data collection. To achieve this goal, different data mining techniques are employed with supervised methods specifically used to classify road pavements’ condition based on the inherent multi-spectral orthoimage information. A synthesis of best practices found in the literature is used in the development of the framework. The methodological framework (Figure 1) is designed to be generic, which means that it depicts several basic steps for analysis using readily available data. It is divided into three main stages, which include data selection, analysis, and data mining techniques, that are embedded in order to extract useful information. A description of the processes included in the framework is provided as follows.

Figure 1. Framework for pavement condition assessment using remote sensing data

Data selection

The first  stage of the framework deals initially with collecting and preparing pavement condition data for analysis. The critical factors of this process are the selection of specific pavement types and condition updates based on maintenance projects completed between pavement inspections and imagery sensing. Since the condition of pavement that are made of different materials (e.g. asphalt or cement concrete), different models might have to be developed. Except for pavement condition inspection data, transportation agencies usually publish an inventory of geospatial data that corresponds to the centerlines of the road network. This inventory is usually provided in vector format and is made available for public use. In this research it is assumed that readily available centerline information will provide a sufficient pavement surface sample while advanced road centerline extraction techniques can be used to increase location precision (Wang et al. 2016). If the network linework information is not available, remote sensing techniques can be applied to extract roadway centerlines. There has been extensive research in this area, but it is not part of the scope of this investigation.

Multispectral images capture data within specific wavelength ranges across the electromagnetic spectrum. The wavelengths may be separated by filters or by the use of instruments that are sensitive to particular wavelengths, including light from frequencies beyond the visible light range, i.e. infrared and ultra-violet. Spectral imaging can allow extraction of additional information the human eye cannot sense. Contrary to hyperspectral imaging where often hundreds of contiguous spectral bands are captured, in multispectral images a small number typically 3 to 15 of spectral bands are included. Many organizations perform imaging acquisition programs on a recurrent basis and that has made multispectral images easily accessible comparing to hyperspectral data that are limited. In this framework multispectral images are utilized in order to extract spectral attributes that can be used for pavement condition classification.

Data analysis

The first step for data analysis is data cleaning which involves removing irrelevant attributes from the pavement condition inspection tables, as well as the attribute table of the roadway linework vector data. The removed attributes include street names and numbers, roadway type and other information. After this process each road segment is only characterized by the corresponding pavement condition index attribute that is selected.   Pavement condition data can be cross-checked with maintenance and rehabilitation history for changes and inaccurate records. The pavement condition data can be integrated with road centerline and project information data, using a linear referencing tool or a unique key attribute. This way, the inspection values of pavement condition can be assigned to the corresponding road segments. After integrating the linework with the condition inspection data, the vector features that contain the information have to be re-projected to the coordinate system of the multispectral image. This is necessary in order for the two datasets to align spatially (Figure 2a) in support of geoprocessing operations to take place later. By aligning the two datasets, buffer zones with variable width are then created around the linework so that pavement sections can be sampled in the multispectral images (Figure 2b). In locations where two street segments intersect a small circular buffer zone is removed to avoid overlays in pavement sampling. The multispectral images need to be radiometrically is done to reduce or correct errors in the digital numbers of images. The process improves the interpretability and quality of remote sensed data. Radiometric calibration and correction are particularly important when comparing data sets over a multiple time periods.

An important aspect of the classification model development, especially when using multiple images from different sensing periods or different sensors, is the transformation of intensity from digital numbers to apparent reflectance. Reflectance is a spectral unit that provides a common measure between different sensing systems. The spectral signature of a pavement surface is not transferable if the intensity values of the electromagnetic radiation are recorded in digital numbers. Digital numbers are image specific, meaning they are dependent on the viewing geometry of the sensor at the moment the image was taken, the gain and bias of the sensor at each band, the location of the sun, specific weather conditions, and so on. It is generally far more useful to convert the digital number values to reflectance so that the developed classification models can be easily applied to other images for rapidly assessing pavement condition.


(a)                                                                                   (b)


(c)                                                                           (d)

Figure 2. Small subregion of study area (a), Buffer zones (b), image segmentation (c), and extracted pavement sections (d)

An attribute generation procedure can be applied to extract not only spectral but also textural and geometric information from the different bands. There are many metrics that can be used, some of which were employed in Shahi et al. (2017). The different vector polygons are used to extract the information. If the distribution of each attribute value cannot be used as an input, basic descriptive statistics can be calculated for each polygon that might include minimum and maximum values, ranges, mean, median, mode, standard deviation, variance, percentiles, etc.

After having formed an attribute table, a supervised classification scheme is used to identify and extract the clusters that represent pavement while removing all the other sections covered by non-pavement surfaces automatically. There are two ways to deal with this classification procedure. The first is to create a training set by manually assigning a binary class that separates pavement and non-pavement covered polygons. Some classification algorithms that can be used for this classification are described in section 3.4. An alternative method would be using existing libraries of the spectral signatures of different materials. The objects that are extracted are subsequently dissolved within each road segment (Figure 2d). Following this procedure, the main noise-introducing areas, such as trees, cars, and shadows, are excluded from the sampling.

By removing the noisy areas, the zonal attributes are subsequently recalculated for the collection of remaining polygons that are located within a buffer zone of each road segment of the linework. A spectral unmixing technique can be employed for pixels adjacent to noisy areas. Spectral unmixing is the procedure by which the measured spectrum of a mixed pixel is decomposed into a collection of constituent spectra, or endmembers. A set of corresponding fractions, indicate the proportion of each material present in the pixel (Pan et al. 2017).

An attribute selection or a more sophisticated dimensionality reduction process is needed to guarantee the efficiency of the classification algorithms. There are many different approaches to deal with this issue. Attribute selection (Shahi, Shafri, and Hamedianfar 2017) can be based in terms of correlation and information gain. Dimensionality reduction and feature extraction can be performed by principal components analysis (Andreou, Karathanassi, and Kolokoussis 2011; Carmon and Ben-Dor 2016; Carmon and Ben-Dor 2018), linear discriminant analysis, canonical components, non-negative matrix factorization, or other techniques. Since the pavement condition classes might be dominated by imbalanced datasets leading to overrated accuracy results, a resampling method can optionally eliminate this negative effect. Examples of such methods are undersampling, oversampling, synthetic data generation, and cost sensitive learning.


A clustering module is embedded in order to extract pavement surface areas from remote sensing data, excluding surfaces that might introduce noise in the distributions of the pixel intensities. It can be observed that noisy areas, such as those covered with vehicles, vegetation, and shadows, are included in those resulting buffer zones. In order to deal with this problem, an image segmentation procedure can be utilized, so that areas with different spectral characteristics, such as markings, vegetation, building, and vehicles, could be separated from pure pavement surfaces (Figure 2c). Also, based on the quality of the road centerline geospatial data and the remote sensing images, the roads might not always align properly. As a result, the buffer zones might not capture all the pavement surfaces. There are many clustering algorithms that can be explored for image segmentation starting from a simple distance-based k-means to a more sophisticated mean-shift algorithm (used in Figure 2c). However, there might be areas in which the segmentation algorithm might not discern different objects very accurately. Consequently, errors in the spectral information of some polygons might still remain.


For segmentation, the mean-shift clustering algorithm (Michel, Youssefi, and Grizonnet 2015) can be selected to be used among others. This method does not require specifying the number of clusters in the data a priori; instead, mean shift builds upon the probability density function for a set of data, where the algorithm works by placing a kernel on each point in the data set. By considering a set of points in an input space that corresponds to all the pixels of a multi-band image, a window is assumed to be centered on each point and have a radius as the kernel. If p is an initial estimate and pξ an input sample point, the Gaussian kernel or radial basis function K, which is typically used, can be expressed as:


where γ is a parameter that sets the “spread” of the kernel.

The above function determines the weight of nearby points for re-estimation of the mean. The weighted mean of the density in the window determined by K is:


where B(p) is the neighborhood of p, a set of points for which Φ(pξ) is non-zero. 

Mean shift is a hill climbing algorithm which involves shifting this kernel iteratively to a higher density region until convergence. The difference g(p) – pis called mean-shift and at each iteration the algorithm sets p = g(p) until g(p) converges. At convergence, there will be no direction at which a shift can accommodate more points inside the kernel.


For both the noisy area detection and the pavement condition classification schemes, the samples under study have to be divided into three groups: training, validation, and test. The training dataset is used to fit the parameters of the model initially; the validation dataset is used to tune the hyperparameters or the architecture of a classifier; and the test data group is used to provide an unbiased evaluation of the final model fit on the training data. Whether a class resampling technique is used or not, the three groups should have the same level of variability to achieve more accurate results (Carmon and Ben-Dor 2016; Carmon and Ben-Dor 2018).

There are many classifiers that can be used to predict the condition class of a pavement segment. These include artificial neural networks and their variations, Bayesian classifiers such as the naïve Bayes and belief networks, instance-based learners such as k-nearest neighbors and support vector machines, decision trees, and ensemble techniques that enrich base classifiers with metaheuristic techniques such as random forests. The performance of these classifiers can be compared based on predefined metrics, such as computation effort and accuracy which show the percentage of correctly classified instances. The accuracy can be calculated as:


Finally, the classifier with the best performance can be selected for a pavement management system application. In the following sections three classifiers that are used in the preliminary case study, are discussed. The classifiers that are selected are simple enough to demonstrate the applicability of the framework and are available in the majority of data mining platforms and software. The accuracy of a classifier on the test set is the percentage of test set tuples that were correctly classified.

k-Nearest Neighbors

The k-nearest neighbor algorithms use instance-based learning, where the function is only approximated locally and are considered among the simplest of classification techniques (Aha, Kibler, and Albert 1991). A k-nearest neighbor classifier may use the Euclidean distance D as a metric among others. It searches the pattern space for the “k” training tuples that are closest to the unknown tuple. For instance, if the first data tuple on a given list is to be classified based on the information of the other given instances then the following distances have to be calculated:


where tn the vector tn1,…,tnM of the attribute values of the n-th data tuple, N the number of data instances, and M the number of attributes.

After calculating the distance, all the tuples are sorted based on the minimum distance from the tuple that is to be classified. The number of nearest neighbors “k” can be specified explicitly. If only one neighbor is used, then the class C of the first tuple to be classified can be given by:


Predictions from more than one neighbor can be based on majority vote or weights according to their distance from the test instance.

Naïve Bayes

A naïve Bayes classifier is based on the assumption that each attribute is independent from each other (John and Langley 1995). Given the n-th data instance represented as an M-dimensional vector tn, where M isthe number of attributes after the dimensionality reduction procedure, the classifier assigns to the instance probabilities that its class Cn is equal to each existing class ck. According to the Bayes theorem, the posterior probability is equal to the product of the prior with the likelihood of an instance given a specific class divided by the probability of the given evidence:


Since the probability of the evidence is fixed, the posterior probabilities are proportional to the nominator while the probability acts as a scaling factor. Then according to the assumption of the naïve Bayes classifier, each attribute is assumed to be independent from each other. Consequently, the class label which has the maximum value is selected as the class label of the test data:


Multilayer Perceptron

The multilayer perceptron is a specific feed-forward artificial neural network. It comprises at least three layers of nodes (Witten and Frank 2005). In this study, the number of layers is equal to the number of attributes selected after the dimensionality reduction process. In each node i, a sigmoid activation function is used to map the weighted sum input xi to the output f (xi). A logistic function can be used as an activation function given by:


The training of the perceptrons is performed by changing the weights wij of the links between the nodes i and j of two layers after each data tuple is processed. The change of weights is related to the amount of the output error compared to the expected result yj. The most widely used loss function is the squared error, which can be expressed at the n-th data point as:


The training process is carried out through backpropagation of the activation function. Using gradient descent, the change of the weight will then be:


where r is the learning rate, which is selected so that the weights converge to a set of values efficiently.

Support vector machines

A support vector machines classifier uses a nonlinear mapping to transform the original training data into a higher dimension feature space. Within this new dimension, it searches for the linear optimal separating hyperplane which corresponds to the decision boundary separating the tuples of one class from another. With an appropriate nonlinear mapping to a sufficiently high dimension, data can always be separated by a hyperplane. The classifier finds this hyperplane using support vectors, which are the essential training tuples and margins which are defined by the support vectors. For a binary classification problem, the classifier is seeking to minimize the hinge loss:


where zi is the i-thtuple in the higher dimensional feature space, ci is the class of that tuple and w is the normal vector of the hyperplane.The parameter λ determines the trade-off between increasing the margin size and ensuring that each tuple lies on the correct side of the margin. Thus, for sufficiently small values of λ, the second term in the loss function will become negligible. For multiclass problems, the labels are drawn from a finite set of several elements by reducing the single multiclass problem into multiple binary classification problems.

Preliminary Case Study

In order to demonstrate the applicability of the proposed framework, a case study was conducted using pavement inspection data provided by the Department of Public Works of the City of Dallas. For the case study a subset of 362 pavement segments were used covering a wide street network. These street segments had an average length of 30 m and each of them was characterized by a pavement condition index. This index is a numerical indicator of the surface condition of the pavement according to distress type, extent and severity (ASTM 2018). An index value of 100 denotes a pavement on which there is no distress and which is in the best condition, while a 0 value indicates a pavement that is failed and cannot be used by vehicles. The pavement condition data were discretized into three bins corresponding to a custom rating scale. These three classes along with the corresponding pavement condition index values were labeled as poor (0-55), fair (55-70), and good (70-100).

Multispectral orthoimages with 30-cm pixel resolution were acquired from the Texas Natural Resources Information System.  Each image consists of four bands, which are located in the red, green, blue, and near-infrared bandwidth spectra. A visualization of some spectral characteristics of three pavement sample areas detected in a subset of one image is provided in Figure 3. The three sampled areas are marked with green, pink, and blue. The area sampled in green color depicts a pavement in relatively poor condition indicated by the lower pavement condition index, as there is an evident amount of cracking and rutting. On the other hand, the pavement area in pink is sampled from a section with fair condition. The blue depicts a pavement area covered with parked vehicles. The corresponding distributions of the pixel intensities included in these areas are shown in the scatter diagram on the right where each grey dot represents the intensity of each pixel in the image and the corresponding colored dots refer to the sampled areas on the left image. The vertical axis represents the intensity of the near-infrared band, and the horizontal axis is the corresponding intensity of the green band of the multispectral image. From the scatter plot, it can be observed that pavement in the worse condition shows a higher variation of the intensity values in both the horizontal and the vertical axis. Finally, pavement covered with vehicles introduces a significant amount of noise in the distribution of pixel intensities, as the variance of these areas gets significantly higher. Thus, it is vital for areas that induce noise in the pavement segments’ spectral distributions to be excluded from sampling.

Pavement in poor condition

Pavement in fair condition

Noise introducing area

(a)                                                                                                     (b)

Figure 3. Sampled pavement areas (a) and scatter plot of pixel intensities for these areas (b)

The framework for data pre-processing, image segmentation, and pavement condition classification was implemented through open source geographic information systems and machine learning software (QGIS Development Team 2009; OTB Development Team 2002; GDAL/OGR Contributors 2018; GRASS Development Team 2018; Conrad et al. 2015; Frank, Hall, and Witten 2016). In this initial stage, only spectral related attributes were generated, and a simple feature selection procedure based on information gain was employed. The mean and the standard deviation from the near-infrared band as well as attributes from other bands provided significant information gain.

Using training data to derive a classifier and then estimate the accuracy of the resulting model on the same dataset might lead to overoptimistic estimates due to overfitting. Instead, it is better to measure the classifier’s accuracy on a test set comprising class-labeled tuples that were not used to train the model. Thus, the original dataset was divided: 70 percent for training and 30 percent for testing. In other words, out of the 362 pavements segments in the dataset, 253 were used to derive the model, while 109 segments were kept out of the training procedure to estimate the accuracy. This test sample that is practically “unseen” by the classification algorithms is later used to estimate the accuracy of each classification model. Thus, the accuracy results that we report and the segments that have been misclassified do not correspond to the training error but rather the generalization error of the models. This is the common practice of verifying the reliability in data mining applications.

In order to demonstrate the effect of the clustering module, two runs were conducted for each classifier. In the first run, the training procedure was performed without following the clustering module and excluding the noisy areas. In the second run, the sampling was conducted only in clusters representing the pavement surface after having excluded the noisy areas. As it is shown in Figure 4, there was a significant increase in the accuracy of each classifier by using the clustering procedure and ruling out the surfaces that do not correspond to pavements.

Figure 4. Accuracy level achieved for each model type

The accuracy of the naïve Bayes, multilayer perceptron, support vector machines and k-nearest neighbors were 62.15, 70.71, 69.38 and 56.07 percent respectively. In Figure 4 it is shown that the accuracy achieved without following the clustering and noise filtering process was much lower for all the algorithms as it achieved 38.67, 40.33, 39.50 and 37.29 percent accordingly. The multilayer perceptron achieved the highest accuracy. Artificial neural networks are universal approximators and it seems that they adjust better to the intricate nature of the problem. Research has shown that pavement inspection data can be influenced by measurement variation and errors depending on the data source (Serigos et al. 2015). For this reason, the accuracy achieved is satisfactory compared to a ~33.3 percent equal probability of a segment randomly belonging to one of the three classes. This is an indicator that there is some underlying connection between remote sensing image inherent spectral information and pavement condition.


In this study, we investigated the usage of remote sensing data for network level pavement condition assessment. A detailed data mining framework was established, utilizing the inherent spectral information of orthoimages to train models that will be capable of predicting the pavement condition of different road segments. The preliminary case study that was conducted with data provided by the City of Dallas and remote sensing images acquired from the Texas Natural Resources Information System examined three simple classification algorithms. Out of those three, a multilayer perceptron achieved 71 percent accuracy in classifying pavements in three different condition states. However, many of the interrelated subprocesses included in the framework were not explicitly followed in this application. Thus, the results indicate that the framework might have potential for future implementation if further research is conducted on the different constituent steps to increase classification accuracy.  Furthermore, the usage of larger training datasets will contribute to better approximations of the classification models to the underlying mapping functions of remote sensing data to pavement condition.


Based on the results generated from the classification models it seems that the methodological framework can be potentially used for high-level information checks. More specifically, the predicted condition classes can be compared with existing pavement condition datasets to identify potential errors in data collection. The framework is based on synthesizing knowledge from the literature, and is directed towards guiding future research in this topic. The application of more sophisticated data processing techniques and classification algorithms is expected to provide results with even higher accuracy.


  • Aha, David W., Dennis Kibler, and Marc K. Albert. 1991. “Instance-Based Learning Algorithms.” Machine Learning. Boston: Kluwer Academic Publishers. doi:10.1007/BF00153759.
  • Andreou, Charoula, Vassilia Karathanassi, and Polychronis Kolokoussis. 2011. “Investigation of Hyperspectral Remote Sensing for Mapping Asphalt Road Conditions.” International Journal of Remote Sensing 32 (21): 6315–6333. https://www.tandfonline.com/doi/abs/10.1080/01431161.2010.508799.
  • ASTM. 2018. “D6433-18: Standard Practice for Roads and Parking Lots Pavement Condition Index Surveys.” West Cornhohocken, PA, U.S.A.: ASTM International. doi:10.1520/D6433-18.
  • Bennett, Christopher R., and William D. O. Paterson. 2000. A Guide to Calibration and Adaptation. Vol. 5. Highway Development and Management Series. Washington, D.C.: The World Bank. https://www.scribd.com/document/216302491/2000-Bennett-Paterson-Calibration-Guide.
  • Carmon, Nimrod, and Eyal Ben-Dor. 2016. “Rapid Assessment of Dynamic Friction Coefficient of Asphalt Pavement Using Reflectance Spectroscopy.” IEEE Geoscience and Remote Sensing Letters. Institute of Electrical and Electronics Engineers. doi:10.1109/LGRS.2016.2539301.
  • Carmon, Nimrod, and Eyal Ben-Dor. 2018. “Mapping Asphaltic Roads’ Skid Resistance Using Imaging Spectroscopy.” Remote Sensing 10 (3): 1–13. doi:10.3390/rs10030430.
  • Conrad, Olaf, Benjamin Bechtel, Michael Bock, Helge Dietrich, Elke Fischer, Lars Gerlitz, Jan Wehberg, Voler Wichmann, and Jürgen Böhner. 2015. “System for Automated Geoscientific Analyses (SAGA) v. 2.1.4.” Geoscientific Model Development 8 (7): 1991–2007. doi:10.5194/gmd-8-1991-2015.
  • Flintsch, Gerardo W., and Kevin Kenneth McGhee. 2009. NCHRP Synthesis of Highway Practice 401: Quality Management of Pavement Condition Data Collection. Washington, D.C.: Transportation Research Board of the National Academies. doi:10.17226/14325.
  • Frank, Eibe, Mark A. Hall, and Ian H. Witten. 2016. “The WEKA Workbench. Online Appendix.” In Data Mining: Practical Machine Learning Tools and Techniques, 4th ed. Morgan Kaufmann. https://www.cs.waikato.ac.nz/ml/weka/citing.html.
  • GDAL/OGR Contributors. 2018. “GDAL/OGR Geospatial Data Abstraction Software Library.” Open Source Geospatial Foundation. http://gdal.org.
  • GRASS Development Team. 2018. “Geographic Resources Analysis Support System (GRASS GIS) Software.” Open Source Geospatial Foundation. https://grass.osgeo.org/.
  • Herold, Martin. 2007. “Spectral Characteristics of Asphalt Road Surfaces.” In Remote Sensing of Impervious Surfaces, edited by Qihao Weng, 237–248. Taylor & Francis Series in Remote Sensing Applications. Boca Raton, FL: CRC Press. doi:10.1201/9781420043754.ch12.
  • Herold, Martin, Margaret E. Gardner, Val Noronha, and Dar A. Roberts. 2003. “Spectrometry and Hyperspectral Remote Sensing of Urban Road Infrastructure.” Online Journal of Space Communications, no. 3: 1–29. https://spacejournal.ohio.edu/issue3/abst_herold.html.
  • Herold, Martin, and Dar A. Roberts. 2005. “Mapping Asphalt Road Conditions with Hyperspectral Remote Sensing.” In Proceedings of the URS 2005 Conference, 1–3. Phoenix, AZ. http://www.iwrms.uni-jena.de/~c5hema/pub/herold_roberts_roads_condition.pdf.
  • Herold, Martin, Dar A. Roberts, Val Noronha, and Omar Smadi. 2008. “Imaging Spectrometry and Asphalt Road Surveys.” Transportation Research Part C: Emerging Technologies 16 (2): 153–166. doi:10.1016/j.trc.2007.07.001.
  • Herold, Martin, Dar A. Roberts, Omar Smadi, and Val Noronha. 2004. “Road Condition Mapping Using Hyperspectral Remote Sensing.” Proceedings of the 2004 AVIRIS Workshop.
  • John, George H., and Pat Langley. 1995. “Estimating Continuous Distributions in Bayesian Classifiers.” In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, 338–345. San Mateo, CA: Morgan Kaufmann. https://arxiv.org/abs/1302.4964.
  • Manzo, Ciro, Alessandro Mei, Rosamaria Salvatori, Cristiana Bassani, and Alessia Allegrini. 2014. “Spectral Modelling Used to Identify the Aggregates Index of Asphalted Surfaces and Sensitivity Analysis.” Construction and Building Materials 61. Elsevier Ltd: 147–155. doi:10.1016/j.conbuildmat.2014.02.056.
  • McGhee, Kenneth H. 2004. NCHRP Synthesis of Highway Practice 334: Automated Pavement Distress Collection Techniques. Washington, D.C.: Transportation Research Board of the National Academies. doi:10.17226/23348.
  • Mei, Alessandro, Ciro Manzo, Cristiana Bassani, Rosamaria Salvatori, and Alessia Allegrini. 2014. “Bitumen Removal Determination on Asphalt Pavement Using Digital Imaging Processing and Spectral Analysis.” Open Journal of Applied Sciences 4 (6): 366–374. doi:10.4236/ojapps.2014.46034.
  • Mei, Alessandro, and Rosamaria Salvatori. 2013. “Urban Mapping Using Ikonos Imagery.” International Journal of Remote Sensing & Geoscience 2 (3): 55–58.
  • Mei, Alessandro, Rosamaria Salvatori, and Alessia Allegrini. 2011. “Analysis of Paved Areas with Field Data and MIVIS Hyperspectral Images.” Italian Journal of Remote Sensing. doi:10.5721/ItJRS201143212.
  • Mei, Alessandro, Rosamaria Salvatori, Nicola Fiore, Alessia Allegrini, and Antonio D’Andrea. 2014. “Integration of Field and Laboratory Spectral Data with Multi-Resolution Remote Sensed Imagery for Asphalt Surface Differentiation.” Remote Sensing 6 (4): 2765–2781. doi:10.3390/rs6042765.
  • Mettas, Christodoulos, Athos Agapiou, Kyriacos Themistocleous, Kyriacos Neocleous, Diofantos Hadjimitsis, and Silas Michaelides. 2016. “Risk Provision Using Field Spectroscopy to Identify Spectral Regions for the Detection of Defects in Flexible Pavements.” Natural Hazards 83 (1). Springer Netherlands: 83–96. doi:10.1007/s11069-016-2262-8.
  • Mettas, Christodoulos, Kyriacos Themistocleous, Kyriacos Neocleous, Andreas Christofe, Kypros Pilakoutas, and Diofantos Hadjimitsis. 2015. “Monitoring Asphalt Pavement Damages Using Remote Sensing Techniques.” In Proceedings of SPIE, Third International Conference on Remote Sensing and Geoinformation of the Environment. Vol. 9535. Proceedings of SPIE. Paphos, Cyprus: Society of Photo-Optical Instrumentation Engineers. doi:10.1117/12.2195702.
  • Michel, Julien, David Youssefi, and Manuel Grizonnet. 2015. “Stable Mean-Shift Algorithm and Its Application to the Segmentation of Arbitrarily Large Remote Sensing Images.” IEEE Transactions on Geoscience and Remote Sensing 53 (2). Institute of Electrical and Electronics Engineers: 952–964. doi:10.1109/TGRS.2014.2330857.
  • Mohammadi, M. 2012. “Road Classification and Condition Determination Using Hyperpsectral Imagery.” In International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 39:141–146. Melbourne: International Society for Photogrammetry and Remote Sensing. doi:10.5194/isprsarchives-XXXIX-B7-141-2012.
  • Noronha, Val, Martin Herold, Dar A. Roberts, and Meg Gardner. 2002. “Spectrometry and Hyperspectral Remote Sensing for Road Centerline Extraction and Evaluation of Pavement Condition.” In Proceedings of the Pecora Conference, 12. Denver, CO. http://www.geogr.uni-jena.de/~c5hema/spec/Pecora_noronha_herold_final.pdf.
  • OTB Development Team. 2002. “Orfeo ToolBox – Open Source Processing of Remote Sensing Images.” CNES. https://www.orfeo-toolbox.org/.
  • Pan, Yifan, Xianfeng Zhang, Jie Tian, Xu Jin, Lun Luo, and Ke Yang. 2017. “Mapping Asphalt Pavement Aging and Condition Using Multiple Endmember Spectral Mixture Analysis in Beijing, China.” Journal of Applied Remote Sensing 11 (1): 016003. doi:10.1117/1.JRS.11.016003.
  • Pascucci, Simone, Cristiana Bassani, Angelo Palombo, Maurizio Poscolieri, and Rosa Cavalli. 2008. “Road Asphalt Pavements Analyzed by Airborne Thermal Remote Sensing: Preliminary Results of the Venice Highway.” Sensors 8 (2): 1278–1296. doi:10.3390/s8021278.
  • Paterson, William D. O., and Thomas Scullion. 1990. Information Systems for Road Management: Draft Guidelines on System Design and Data Issues. Washington, D.C.: The World Bank. http://documents.worldbank.org/curated/en/196321468762908116/Information-systems-for-road-management-draft-guidelines-on-system-design-and-data-issues.
  • Pierce, Linda M., Ginger McGovern, and Kathryn A. Zimmerman. 2013. “Practical Guide for Quality Management of Pavement Condition Data Collection,” 170. http://www.fhwa.dot.gov/pavement/management/qm/data_qm_guide.pdf%5Cndata_qm_guide.
  • QGIS Development Team. 2009. “QGIS Geographic Information System.” Open Source Geospatial Foundation. https://qgis.org/.
  • Resende, Marcos Ribeiro, Liedi Legi Bariani Bernucci, and José Alberto Quintanilha. 2014. “Monitoring the Condition of Roads Pavement Surfaces: Proposal of Methodology Using Hyperspectral Images.” Journal of Transport Literature 8 (2): 201–220. doi:10.1590/S2238-10312014000200009.
  • Schnebele, E., B. F. Tanyu, G. Cervone, and N. Waters. 2015. “Review of Remote Sensing Methodologies for Pavement Management and Assessment.” European Transport Research Review 7 (2). doi:10.1007/s12544-015-0156-6.
  • Serigos, Pedro A, Chen Kuan-Yu, Andre Smit, Mike R. Murphy, and Jorge A. Prozzi. 2015. “Automated Distress Surveys: Analysis of Network Level Data (Phase III) (FHWA Report 0-6663-3)” 7.
  • Shahi, Kaveh, Helmi Zulhaidi Mohd Shafri, and Alireza Hamedianfar. 2017. “Road Condition Assessment by OBIA and Feature Selection Techniques Using Very High-Resolution WorldView-2 Imagery.” Geocarto International 32 (12): 1389–1406. https://doi.org/10.1080/10106049.2016.1213888.
  • Ullman, Gerald, John Ragsdale, and Nadeem Chaudhary. 1998. Recommendations for Highway Construction, Maintenance, and Service Equipment Warning Lights and Pavement Data Collection System Safety. Austin, TX, U.S.A.: Texas Department of Transportation. https://rosap.ntl.bts.gov/view/dot/4882.
  • Wang, Weixing, Nan Yang, Yi Zhang, Fengping Wang, Ting Cao, and Patrik Eklund. 2016. “A Review of Road Extraction From Remote Sensing Images.” Journal of Traffic and Transportation Engineering 3 (3). Elsevier Ltd: 271–282. doi:10.1016/j.jtte.2016.05.005.
  • Witten, Ian H., and Eibe Frank. 2005. Data Mining: Practical Machine Learning Tools and Techniques. 2nd ed. Morgan Kaufmann Series in Data Management Systems. San Francisco, CA: Morgan Kaufmann. http://cs.du.edu/~mitchell/mario_books/Data_Mining:_Practical_Machine_Learning_Tools_and_Techniques_-_2e_-_Witten_&_Frank.pdf.

Cite This Work

To export a reference to this article please select a referencing stye below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

Related Content

All Tags

Content relating to: "Computer Science"

Computer science is the study of computer systems, computing technologies, data, data structures and algorithms. Computer science provides essential skills and knowledge for a wide range of computing and computer-related professions.

Related Articles

DMCA / Removal Request

If you are the original writer of this dissertation and no longer wish to have your work published on the UKDiss.com website then please: