Predicting Local Road Crashes Using Socio-economic and Land Cover Data
10272 words (41 pages) Dissertation
16th Dec 2019 Dissertation Reference this
Disclaimer: This work has been submitted by a student. This is not an example of the work produced by our Dissertation Writing Service. You can view samples of our professional work here.
Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NursingAnswers.net.
ABSTRACT
Estimating and applying safety performance functions (SPFs), or models for predicting expected crash counts, for roads under local jurisdiction is often challenging due to the lack of vehicle count data to be used for exposure, which is a critical variable in such functions. This paper describes estimation of SPFs for local road intersections and segments in Connecticut using socio-economic and network topological data instead of traffic counts as exposure. SPFs are developed at the traffic analysis zone (TAZ) level, where the TAZs are categorized into six homogeneous clusters based on land cover intensities and population density. SPFs were estimated for each cluster to predict the number of intersection and segment crashes occurring in each TAZ. The number of intersections and the total local roadway length were also used as exposure in the intersection and segment SPFs, respectively. One aggregate SPF using the entire dataset was also estimated to compare with the individual cluster SPFs. Ten percent of the observed data points were reserved for out of sample testing and in all cases, these out of sample predictions were as good as the in sample predictions. Models including total population, retail and non-retail employment and average household income are found to be the best both on the basis of model fit and out of sample prediction.
Key Words: safety performance function, crash count, local road, cluster analysis
- INTRODUCTION
A Safety Performance Function (SPF) is an equation used to predict crash counts at a location as a function of exposure and other roadway characteristics (e.g. number of lanes, lane width, shoulder width) (Highway Safety Manual 2010). One of the uses for SPFs is estimating the expected number of crashes on traffic facilities to identify road locations with higher crash potential for safety improvements, select and implement cost-effective countermeasures to reduce future crashes (Jonsson et al. 2007). SPFs are often developed for different traffic facilities such as road segments and intersections. Local roads owned and operated by local entities including towns, counties and tribal governments play an important role in the roadway network, as approximately 60 percent of all road miles in the U.S. are maintained by these jurisdictions (Ceifetz et al. 2012). A recent Iowa study (Souleyrette et al. 2010) reported that local roads had higher crash rates compared to primary roads under State jurisdiction and the reported local road crash rate was 1.5 times higher than that of primary roads from 1974 to 2000. As a result, traffic safety on local roads is important to both traffic safety organizations and engineers. Given this situation, it is important to develop accurate tools to predict the number of crashes occurred on local roads to support identifying sites with promise for safety improvements and selecting and implementing effective countermeasures to reduce future crash volume or severity.
The Highway Safety Manual (HSM) (2010) provides SPFs for two lane rural highways, multilane rural highways, urban and suburban arterials, freeways and freeway ramp junctions. The SPFs in HSM were estimated using data collected from a limited number of States in the USA, including Washington, California, Minnesota, Texas, Michigan, North Carolina and Illinois. Because crash relationships in these states are not necessarily representative of those in the entire country, the HSM recommends a calibration procedure to adjust the predicted crash counts for individual jurisdiction in using the prediction from the SPF. The HSM SPFs include traffic counts for intersections or roadway segments as the most critical variables in accurately predicting the number of crashes (Highway Safety Manual 2010, Ivan 2004, Vogt 1999). This presents a problem for roads under local jurisdiction, where traffic counts are generally not available because it is economically impractical to implement traffic counting programs for so many facilities on which the traffic volume is typically below 400 per day (Souleyrette 2010). As well, the data sets used to estimate the two-lane road models in the HSM do not include roads with traffic volumes as low as are usually found on many city or town jurisdiction streets and roads. In order to implement highway safety improvement strategies on these low volume local roads, new crash prediction approaches are desirable, in which the traffic counts are not required.
The objective of this study was to estimate SPFs for both intersections and segments on roads under local jurisdiction in the State of Connecticut using demographic data as a surrogate for traffic count data. The SPFs are estimated at the level of Traffic Analysis Zone (TAZ), instead of the intersection or roadway segment level. The intersection counts (i.e. the number of city/town road intersections in a TAZ) and segment mileage (i.e. total city/town roadway length in a TAZ) are used as exposure in this study in lieu of traffic volume. Demographic records such as population, total retail and non-retail employment, household income and vehicle availability work in tandem with the exposure to predict the estimated crash counts. To account for data and crash relationship heterogeneity, the TAZs in the entire state are categorized into six clusters based on the percentage of three land cover categories – high, medium and low intensities – and the population density (i.e. the number of population per km^{2}). A different SPF was estimated for each cluster, and the similarities and differences among these functions are discussed. We also discuss how to apply the functions as a network screening tool.
It is noted that the term “local” can mean different things depending on the context in which it is used. In one context it can refer to one level of the hierarchical functional classification scheme (arterial, collector and local). It can also be used to refer to the level of agency jurisdiction responsible for a road facility (state, county, local). It is possible for the same road to be called two different things, for example a “collector” in the first context, but “local” in the second. To avoid this confusion, we use the word “local” for the first context and “city or town” in the second context. Note also that there are no roads in the State of Connecticut under county jurisdiction.
- LITERATURE REVIEW
SPFs have been estimated for city or town roads by various researchers at two levels: the facility level (e.g. roadway segment and intersection) and the zonal level (e.g. TAZ). Among facility level models, Vogt (1999) provides a good review of the factors associated with crashes on city or town roads according to past research studies. These include channelization (right and left turn lane), number of driveways, sight distance, intersection angle, median width, surface width, shoulder width, signal characteristics, lighting, roadside condition, truck percentage in the traffic volume, posted speed, and weather. Most research on two-lane roads confirms traffic volume as the major explanatory factor for traffic crashes, which is unfortunate for the cases where the traffic volume is not available (Vogt and Bared 1998, Oh et al. 2004). There is little literature on investigating alternative exposure measures in addition to or in place of traffic volume for predicting crashes. Bindra et al. (2009) considered the use of geographic information system (GIS) land use inventories to supplement traffic volumes as exposure for estimating SPFs for predicting segment-intersections crashes for rural two-lane and urban two-and four-lane undivided roads. They concluded that the number of trips generated and the land use data (i.e., population, retail and non-retail employment, and driveway data) were good predictors for estimating segment-intersection crashes, that is, crashes on segments located at minor roads and driveways without traffic counts.
Zonal SPFs (ZSPFs), of which the most popular is TAZ level, make use of highly available zonal-level variables (Pirdavani et al. 2012). Among the studies focusing on developing TAZ-level SPFs, Pulugurtha et al. (2004) used socioeconomic and network variables to develop TAZ level SPFs to estimate the crash counts by severity level (injury and property damage only crashes). Ladron de Guevara et al. (2004), Lovegrove and Sayed (2006), Lovegrove (2012) and Hadayeghi et al. (2003) developed TAZ level SPFs to estimate the number of both intersection and segment crashes. Factors such as population density, the number of employees and the intersection density were considered as predictors for the number of crashes. Furthermore, Khondakar et al. (2010) found that TAZ level SPFs can safely be transferred both temporally and spatially. Noland and Quddus (2004) showed that TAZs with high employment density had more traffic crashes, whereas in urbanized areas with more densely populated TAZs fewer crashes were observed. Jin et al. (2011) identified that besides traditional variables such as segment length, structure of roadway network should be considered in developing TAZ-level SPFs to improve prediction accuracy. Several studies developed TAZ-level SPFs using number of trips generated inside of each TAZ. Naderan and Shahi (2010), Abdel-Aty et al. (2011) found that number of trips generated have significant impacts on TAZ-level crashes.
Recently, an analysis tool (PLANSAFE) was developed on a National Cooperative Highway Research Program (NCHRP) project (Washington et al. 2006) to predict the expected crash counts by TAZ. The predictors include population, employment and some land use intensity variables. The purpose was to use the predicted crash counts as one of the measures of effectiveness to select the most cost-effective transportation improvement plan. Another study of TAZ level SPFs by Pirdavani et al. (2012) considered establishing an association between observed crashes and a set of predictor variables in each TAZ. The study compared models using two different exposures – VHT (total daily vehicle hours traveled) and VKT (total daily vehicle kilometers traveled) along with network and socio-demographic variables. The results show that the model containing the combination of two exposures outperformed the models containing only one of the exposure variables. Lee et al. (2015) applied a multivariate Poisson Lognormal crash modeling to simultaneously estimate motor vehicle crashes, bicycle crashes and pedestrian crashes by using several socio-demographic variables in each TAZ. The study illustrates that the number of households, employments and hotels etc. are positively associated with three types of crash counts. Except for TAZ-level SPFs, some studies have investigated SPFs on other macroscopic levels, such as block group (Abdel-Aty et al. 2013, Levine et al. 1995), state level (Noland 2003), grid structure level (Kim et al. 2006) and county level (Aguero-Valverde and Jovanis 2006, Huang et al. 2010, Norland and Quddus 2004).
These zonal level SPFs are all able to predict expected crash frequencies without traffic volume, however most of them estimate the number of crashes using network and social-demographic variables, etc., without accounting for the data and crash heterogeneity among different types of TAZs or zones. To address this issue, our study focuses on estimating TAZ level SPFs that do not require ADT counts for city and town jurisdiction roads by different categories of TAZ. The TAZs were clustered into different categories using a data mining technology (K-means clustering analysis), based on their land-use intensities and population density. Socio-demographic data and roadway network data such as population, employment, income, car ownership, number of city/town jurisdiction road intersections and total city/town road length inside the TAZ are used to predict crash counts. The intention is for some of the variables to serve as surrogates for actual traffic counts which are generally not available for these roads.
The remainder of the paper is organized as follows. The next section presents the methodology and the process of data collection. The third section describes the estimation of SPFs and the results. The final section discusses how to use the estimated SPFs as a network screening tool.
- METHODOLOGY AND DATA PREPARATION
Our procedure for the estimation of TAZ level SPFs for city and town roads requires four types of data at the TAZ level: roadway network shape features, demographic records, geographic/land cover features and crash records. We chose to use the TAZ structure defined by CT DOT for statewide planning purposes to take advantage of the extensive array of demographic data available by TAZ. Below is a brief description of the required data and data sources.
- Roadway Network Shape Features
The number of intersections and the total length of roadways under city or town jurisdiction were extracted from the 2010 Census TIGER/LINE files for Connecticut (United States Census Bureau 2010). The original TIGER/LINE files contained correction of errors, such as typos for roadway name and discrepancies in the network representation of some road links. The number of intersections and the total length of roadways under city or town jurisdiction were calculated for each TAZ. Details about our procedures for calculating the number of intersections and the total length of roadways are provided in the Appendix to the project final report (Ivan and Burnicki 2015).
- TAZ Level Demographic Records
TAZ level demographic records were collected from the Census Transportation Planning Package Database (CTPP 2010). They include population, retail and non-retail employment, households, vehicles and average household income summarized by TAZ and used as the independent variables in safety performance functions. In the 2010 census, 1806 TAZs were defined for the State of Connecticut. Two of these TAZs were apparently defined to represent special generators and have no population or employment, so they were eliminated from the analysis. The remaining 1804 TAZs were used to estimate the SPFs.
- TAZ Level Geographic/Land Cover Features
Land-cover information was collected from the 2011 National Land Cover Database (NLCD 2011). We calculated the proportion of land area in three developed land-use categories – low, medium and high intensity development – as defined by USGS (NLCD 2011). All developed areas contain a mixture of vegetation and impervious surfaces (e.g., buildings, roadways), where development intensity reflects differences in the relative proportions of these cover types. The classification system employed by the 2011 NLCD defines low intensity areas as having 20%-49% impervious cover, medium intensity areas as having 50%-79% impervious cover, and high intensity areas as having greater than 80% impervious cover (NLCD 2011). These values along with the population density were used to categorize the TAZs into homogeneous groups using K-means clustering analysis (discussed in the next section). Originally we used only the land cover intensities, but we found that adding the population density helped to correct aberrant cluster assignments for unique development sites (e.g., airports).
- Crash Records and Integration of Crash to TAZ
Intersection and segment crash records were collected from the Connecticut Crash Data Repository (CTCDR 2016). We gathered counts of K (fatal injury), A (incapacitating injury) and B (non-incapacitating injury) intersection and segment crashes occurring on roads under city and town jurisdiction in Connecticut from 2010 to 2012. Crashes at intersections with one or more approaches maintained by the State were not included. As requested by the Technical Advisory Committee for the project, we excluded property damage only PDO (O) and minor injury (C) crashes because they lead to less serious consequences and are also subject to underreporting (PDO’s in particular). Also, cities and towns were not required to report PDO crashes in 2011, so the dataset would have been incomplete if we included them. In total, 5403 intersection crashes and 5502 segment crashes were extracted.
Intersection and segment crashes were assigned to TAZs based on their locations. If the crash was located on the boundary of more than one TAZ, it was evenly assigned between the two TAZs on both sides of the road where the crash occurred in the case of a segment crash. For an intersection crash on the intersection of several TAZs, it was equally assigned among all TAZs that touch the intersection (an intersection crash would be evenly assigned among four TAZs for a four-way intersection that forms the corner of four TAZs). Details about our procedures for assigning crashes are provided in the Appendix to the project final report (Ivan and Burnicki 2015).
- Clustering of TAZs
Clustering analysis seeks to maximize the similarity of contents within the same cluster and the dissimilarity of elements between clusters (Depaire et al. 2008). K-means clustering analysis (Depaire et al. 2008, Hair et al. 1998, Mohamed et al. 2013) is a traditional distance-based technique which has a limitation that a distance measured objective function is required to be pre-determined. The second issue of this methodology is that it requires large memory demands especially for a large dataset (Depaire et al. 2008). To account for these issues, the latent class clustering (LCC) analysis or finite mixture model (FMM) was applied by numerous studies, as it doesn’t require selecting a distance measure. However, LCC is a model-based technology which is not appropriate for our data, as there is no dependent variable in our clustering process. Therefore, considering the simplicity and data structure, K-means clustering analysis with the Euclidean distance measured objective function (STATA 2011) was selected to categorize the TAZs into homogeneous groups using the three land cover intensities and the population density. Different numbers of clusters were respectively tested, and the Calinski and Harabase pseudo-F index (Calinski and Harabase 1974) was used to select the final number of clusters. The larger the Calinski and Harabase pseudo-F index, the more accurate is the clustering analysis.
The optimum number of clusters was found to be six. Figure 1(a) shows the distributions of the three land-use intensities and the population density among the six clusters. The overall land-use intensity and the population density decrease from cluster 1 to cluster 6. The number of TAZs assigned into cluster 1 through cluster 6 is 80, 161, 270, 284, 382 and 627, respectively. Figure 1(b) shows the distribution of the six clusters across the state. Note that two TAZs with legend 0 in the western and southeastern areas were eliminated in estimating the safety performance functions, as these two TAZs have no population. Cluster 1 has the lowest number of TAZs, and is the most urbanized in nature, and cluster 6 is the most common cluster type and is the most rural in nature. Clusters 2 through 5 represent areas with decreasing levels of urbanization. The areas with higher land-use intensities (those with the darkest shading and colors on the map) are mainly located in the central and southern parts of the state.
FIGURE 1 HERE (PAGE 28)
FIGURE 2 HERE (PAGE 29)
Figure 2 illustrates the distribution of KAB crashes by cluster. Comparing the two types of crashes, there are substantially more intersection crashes than segment crashes in clusters 1, 2 and 3, but fewer intersection crashes than segment crashes in clusters 5 and 6. The two types of crashes have nearly the same distributions in cluster 4. Figure 3 and Figure 4 display the distributions of the number of intersections, city or town roadway mileage and demographic variables by cluster. The number of intersections increases from cluster 1 to cluster 5, and then decreases to cluster 6. The roadway mileage increases consistently from cluster 1 to cluster 6. The average household income slightly increases from cluster 1 to cluster 6. Cluster 1 has the highest average numbers for both retail and non-retail employment, and cluster 6 has the lowest numbers. One important finding is that the distribution patterns are similar among population (Figure 3(c)), households (Figure 3(d)) and vehicles (Figure 4(a)). This is caused by the high correlation among these three factors, which was also verified by a correlation test. The selection and application of these three correlated variables is discussed under SPF development.
FIGURE 3 HERE (PAGE 30)
FIGURE 4 HERE (PAGE 31)
- Statistical Methodology
Safety performance functions were estimated to predict the number of city and town road intersection and segment crashes in each TAZ. The number of crashes is estimated by count regression models, such as the Poisson regression model, formulated as (Washington et al. 2011):
Probyi|μi=exp-μiμiyiyi! (1)
where
Probyiμiis the probability of y crashes occurring at TAZ i and
μiis the expected number of crashes at TAZ i. Given a vector of covariates
Xi, which describes the demographic and roadway characteristics of a TAZ i, and a vector of estimable coefficients β, the
μican be estimated by the equation:
ln(μi)=βXi
(2)
The limitation of the Poisson model is that the variance of the data is constrained to be equal to the mean, i.e.:
Varyi=Eyi=μi
(3)
This constraint might be questionable as the variance of crash data is usually greater than the mean, which is known as over-dispersion (Washington et al. 2011). The negative binomial regression model addresses this issue, which is derived by rewriting Equation 2 such that:
μi=exp(βXi+εi)
(4)
where
exp(εi)is an error term assumed to follow a gamma distribution with mean 1 and variance
σ2. The distribution of the negative binomial model has the form (Washington et al. 2011):
Probyi|μi=Γ1σ+yiΓ1σyi!1σ1σ+μi1σμi1σ+μiμi (5)
where Γ is a gamma function; the variance of the negative binomial model can be written as follows:
Varyi=μi(1+σμi)=μi+σμi2
(6)
We define the function for the predicted intersection crashes at TAZ i as follows:
μint,i=YIiβIexp(β0+βPPi+βRRi+βNNi+βVVi+βCCi+βHHi
) (7)
Where
μint,i | = | predicted intersection crashes in TAZ i |
Y | = | the number of years in the time period |
Ii | = | the number of intersections in TAZ i |
Pi | = | the population of TAZ i |
Ri
Ni |
=
= |
the total retail employment of TAZ i
the total non-retail employment of TAZ i |
Vi | = | the number of vehicles in TAZ i |
Ci | = | the average income in TAZ i |
Hi | = | the number of households in TAZ i |
βs | = | the estimated parameters |
We define the function for the predicted segment crashes at TAZ i as follows:
μseg,i=YLiβLexp(β0+βPPi+βRRi+βNNi+βVVi+βCCi+βHHi
) (8)
Where
μseg, i | = | predicted segment crashes in TAZ i |
Li | = | the mileage of roadways under local jurisdiction in TAZ i |
and the remaining variables are as defined above.
- VARIABLE SELECTION AND SPF RESULTS
The SPFs were estimated at the TAZ level for each cluster type. One statewide SPF using the aggregate data (i.e., for all TAZ’s without splitting by cluster) was also estimated for comparison purposes. When estimating each function, the observations by TAZ were randomly divided into two parts: one part including ninety percent of the observations was used to estimate the function; and the other part including the remaining ten percent of the observations was used to evaluate the prediction performance of the function. Three functions, each using one of the correlated independent variables at a time (population, number of households and number of vehicles), were estimated for both intersection and segment crashes. We checked for correlation among the variables included in each model; no significant correlation was found. These three functions were compared according to the model goodness-of-fit (Akaike Information Criterion-AIC and Bayesian Information Criterion-BIC) to determine which one performed best for each cluster and for the statewide database. The number of crashes was predicted using both estimation and prediction datasets for the entire state using the cluster-based functions and the statewide function to test the efficacy of each approach. Function performance for each cluster and the statewide database was compared using two measures of effectiveness (MOEs), Mean Absolute Deviation (MAD) and Mean Squared Predictor Error (MSPE), proposed by Oh et al. (2003). These criteria are calculated as:
AIC=2K-2ln(LL)
(9)
BIC=K*lnN-2ln(LL)
(10)
Mean Absolute Deviation MAD=1N∑i=1Nyî-yi (11)
Mean Squared Predictor Error MSPE=1N∑i=1Nyî-yi2 (12)
Where
K | = | the number of estimated parameters |
LL | = | the maximized value of model likelihood function |
N | = | the number of observations |
yî | = | the predicted number of crashes at TAZ i |
yi | = | the observed number of crashes at TAZ i |
The smaller the AIC, BIC, MAD or MSPE value, the better is the function performance. Table 1 shows the goodness-of-fit of the cluster based SPFs and Statewide SPFs including one of the correlated variables at a time. Due to the poorer performance of the function using the number of vehicles, only the functions including population or the number of households are presented here. For the statewide SPF, both intersection and segment SPFs have lower AIC and BIC values using population than using households. For the intersection SPF, the function for clusters 2, 3 and 4 have a lower AIC or BIC value using population as an independent variable than that using the number of households, while the reverse is observed for clusters 1, 5 and 6. The segment SPFs for all clusters have lower AIC and BIC values using population than using households.
TABLE 1 HERE (PAGE 32)
Table 2 displays the SPF prediction performance for the statewide and cluster-based functions using both estimation data and prediction data. The cluster-based SPFs using either population or households outperform the statewide SPF in crash prediction, as they have a lower MAD or MSPE value for both estimation data and prediction data. This is to be expected, as it has the possibility of accounting for heterogeneity related to land cover intensity. Furthermore, the cluster-based SPFs with population slightly outperform the SPFs with the number of households. Additionally, it seems that the SPF performance using the prediction data is even better than that using the estimation data. This may be due to the smaller size of the prediction data set, but it also demonstrates that there is no over-fitting to the estimation data, and that the functions are transferable within Connecticut. Therefore, considering all of these MOEs (model fit and prediction), the cluster-based SPFs with population were selected.
TABEL 2 HERE (PAGE 33)
Table 3 shows the coefficient estimates for the intersection SPFs using population as a predictor. Coefficients for all other models are omitted here for brevity; they may be found in the Appendix to the Final Report (Ivan and Burnicki 2015). The first row in each table cell is the coefficient, the second row is the p-significance, and coefficients shown in bold are statistically significant with 95% confidence. With respect to the six cluster-based functions, the number of intersections (exposure surrogate for intersection SPFs) was not statistically significant in the cluster 2, 3 and 4 functions. The effect of total population on number of intersection crashes is shown to be positive in all functions (as expected), except for clusters 5 and 6, in which it was not statistically significant. The amount of retail employment is positively associated with the number of intersection crashes in the functions for cluster 4, 5 and 6. The amount of non-retail employment is positively associated with the number of intersection crashes for cluster 1, 2 and 6. The number of intersection crashes decreases with the increase of average household income in the first five cluster functions, but increases in the cluster 6 function.
TABLE 3 HERE (PAGE 34)
Table 4 shows the coefficient estimates for the segment SPFs. Similar to the intersection SPFs, the association between the exposure surrogate, i.e. city or town roadway length, and the number of segment crashes is positive in all six functions, but is only statistically significant in clusters 1, 5 and 6. The coefficient for population is positive and significant in all six cluster-based functions. The retail employment is statistically significant in clusters 3, 4 and 5, and the non-retail employment is statistically significant in clusters 1, 2 and 3. The number of segment crashes decreases with the increase of average household income in the first five cluster functions, but increases in cluster 6 function, which is consistent with the intersection SPFs.
TABLE 4 HERE (PAGE 35)
- APPLICATIONS FOR NETWORK SCREENING
To apply these models, we predicted the number of crashes using the cluster-based SPFs, and estimated the expected number of crashes if no countermeasure had been implemented in the future using the Empirical Bayes (EB) method as prescribed in the HSM (2010) for all TAZs in the State. The EB method increases the precision of predictions for the future when only limited historical crash data are available, and it corrects for the regression-to-mean bias (Hauer et al. 2002). Details about our procedures for applying the EB method and developing the network screening application tool are provided in the Appendix to the project final report (Ivan and Burnicki 2015). The resulting EB Expected Crash Counts are added to a GIS layer along with the other data for each TAZ. The resulting GIS layer can be used for reporting and manipulation within a GIS environment by road safety analysts in CTDOT (Connecticut Department of Transportation) and regional or local government to identify locations that have promise for implementing road safety interventions according to HSM procedures (Highway Safety Manual 2010).
- CONCLUSIONS AND FUTURE RESEARCH
This study demonstrates an alternative for predicting the number of crashes on city or town roads where the traffic volumes are not available. Both intersection SPFs and segment SPFs were estimated at the TAZ level. The TAZs were categorized into six clusters based on land cover intensities and population density using the K-means clustering approach. Cluster-based SPFs were estimated for predicting city and town road intersection and segment crash counts using, respectively, the number of city and town road intersections and the total city and town roadway length. Demographic variables such as population, retail and non-retail employment, total households, and average household income were used as covariates to predict the crash counts.
Due to the high correlation between population and the number of households, two cluster-based SPFs including either population or the number of households were estimated for both intersection and segment crashes. Additionally, an aggregate function using the entire dataset was also developed for comparison. Based on the goodness-of-fit (AIC and BIC values) and prediction performances (MAD and MSPE values), the cluster-based SPFs outperform the aggregate SPFs. The cluster-based SPFs with population perform better than those with the number of households for both intersection and segment crashes.
Finally, the cluster-based SPFs were applied and adjusted using the EB method to produce expected annual crash counts for all TAZs in the State. It is anticipated that the example applications can help regional and municipal agencies identify areas of cities and towns with higher potential for safety improvements, and develop cost-effective countermeasures to improve safety for city and town roads.
This study has demonstrated an initial exploration into developing TAZ level SPFs using demographic variables for city and town roads when the traffic volumes are not available, by clustering TAZs into different types to account for the data heterogeneity. These cluster based TAZ level SPFs can be used to predict the average annual intersection and segment crashes in a TAZ in the context of HSM analyses. They also might be used to help agencies evaluate alternative options for future roadway network and economic development, by identifying the effects of roadway geometric and socio-economic factors on crash counts. However, it is likely to be more difficult to transfer these models to other jurisdictions compared with facility level SPFs (e.g. roadway segment and intersection). These TAZ level SPFs are highly dependent upon not only the clustering of the TAZs, but also the definitions of the TAZs themselves, as well as the character of land development. The relationship between these factors and crash occurrence is likely to vary much more from one place to another than would the relationship between road characteristics and traffic volume. As a consequence, attempts to calibrate these models to another State are not likely to be successful. To use the cluster based TAZ level SPFs, we recommend users to collect their own data and estimate their own SPFs.
One significant challenge in conducting this study was to geo-locate crashes on city and town roads, as at the time of data collection the Connecticut crash data set included only route and milepost. Having geocoded crash records would substantially simplify the process. Other relevant variables that were not available when conducting this study (e.g. trip distance and trip duration for a TAZ) may also affect roadway safety, as crash counts are expected to increase with the increase of trip distance and duration in a TAZ. Future research could focus on collecting these variables at the TAZ level, and then estimate new SPFs to improve prediction accuracy.
It is also noted that the observed, predicted and expected annual crash counts for many TAZs were quite small (less than 3). Because each TAZ contains dozens of road segments and intersections, this indicates that the annual crash counts at each individual segment or intersection would be so small as to preclude successful estimation of crash prediction models by segment or intersection. This data condition is further justification for using an area based approach for predicting crashes on city and town roads.
ACKNOWLEDGMENTS
This research was sponsored by the Joint Highway Research Advisory Council of the University of Connecticut and the Connecticut Department of Transportation through Project 14-1 of the Connecticut Cooperative Transportation Research Program. The contents reflect the views of the authors who are responsible for the accuracy of the information presented herein. The contents do not necessarily reflect the official views or policies of the University of Connecticut or the Connecticut Department of Transportation. The authors would like to thank Mrs. Judy B. Raymond of the Connecticut Department of Transportation for kindly providing demographic data to support this effort. This paper was peer-reviewed by the Transportation Research Board and presented at the 95^{th} annual meeting of the Transportation Research Board, January 2016, Washington, D.C. The authors would also like to thank all reviewers for providing constructive comments to help us improve the paper.
REFERENCES
Abdel-Aty, M., Lee, J., Siddiqui, C. and Choi, K. (2013). Geographical Unit Based Analysis in the Context of Transportation Safety Planning. Transportation Research Part A. Vol. 49. pp. 62-75.
Abdel-Aty, M., Siddiqui, C., Huang, H. and Wang, X. (2011). Integrating Trip and Roadway Characteristics to Manage Safety in Traffic Analysis Zones. Transportation Research Record. No. 2213. pp. 20-28.
Aguero-Valverde, J. and Jovanis, P. (2006). Spatial Analysis of Fatal and Injury Crashes in Pennsylvania. Accident Analysis & Prevention. Vol. 38. No. 3. pp. 618-625.
Bindra S., Ivan, J. and Jonsson, T. (2009), Predicting Segment-Intersection Crashes with Land Development Data. In Transportation Research Record: Journal of the Transportation Research Board, No. 2102, Transportation Research Board of the National Academies, Washington, D.C., pp. 9–17.
Calinski, T. and Harabasz, J.(1974). A Dendriter Method for Cluster Analysis. Communications in Statistics. Vol. 3, pp. 1-27.
Ceifetz, A., Bagdade, J., Nabors, D., Sawyer, M., and Eccles, K.(2012). Developing safety plans: A manual for local rural road owner. Project Report, Project 12-017, Federal Highway Administration (FHWA).
Connecticut Crash Data Repository (2016). http://www.ctcrash.uconn.edu/
CTPP 2010, Census Transportation Planning Package Database.
http://ctpp.transportation.org/Pages/5-Year-Data.aspx
Depaire B., Wets G., and Vanhoof K (2008). Traffic Accident Segmentation by Means of Latent Class Clustering. Accident Analysis and Prevention. Vol. 40. pp. 1257-1266.
Hadayeghi, A., Shalaby, A., and Persaud, B. (2003). Macro-level Accident Prediction Models for Evaluating the Safety of Urban Transportation System. Transportation Research Board. National Research Council. Washington, D.C..
Hair L., Anderson, R., Tatham, R. and Black, W. (1998). Multivariate Data Analysis. Prentive Hall.
Hauer, E., Harwood, D. W., Council, F. M. and Griffith, M. S. (2002). The Empirical Bayes Method for Estimating Safety: A Tutorial. In Transportation Research Record. No. 1784. pp.126-131.
Highway Safety Manual (2010), 1^{st} Edition, American Association of State Highway and Transportation Officials, Washington D.C..
Huang, H., Abdel-Aty, M. and Darwiche, A. (2010). County-level Crash Risk Analysis in Florida. Transportation Research Record 2149. pp. 27-37.
Ivan J. and Burnicki, A. (2015). Improvements to Road Safety Improvement Selection Procedures For Connecticut. Connecticut Cooperative Transportation Research Program, Final Report, Project 14-1.
Ivan, J. (2004). New approach for including traffic volumes in crash rate analysis and forecasting. In Transportation Research Record: Journal of the Transportation Research Board, No. 1897, Transportation Research Board of the National Academies, Washington, D.C., pp. 134-141.
Jonsson, T., Ivan, J. and Zhang, C. (2007). Crash prediction models for intersections on rural multilane highways: differences by collision type. In Transportation Research Record: Journal of the Transportation Research Board, No. 2019, Transportation Research Board of the National Academies, Washington, D.C., pp. 91-98.
Jin, Y., Wang, X. and Chen, X. (2011). Incorporating Road Network Structure into Macro Level Traffic Safety Analysis. American Society of Civil Engineers. pp. 2224-2232.
Khondakar, B., Sayed, T.and Lovegrove, G. (2010). Transferability of Community-based Collision Prediction Models for Use in Road Safety Planning Applications. Journal of Transportation Engineering. Vol. 136. No. 10. pp. 871-880.
Kim, K., Brunner, I. M., and Yamashita, E. Y. (2006). Influence of Land Use, Population, Employment and Economic Activity on Accidents. Transportation Research Record 1953. pp. 56-64.
Ladron de Guevara, F., Washington, S. P., and Oh, J. (2004). Forecasting Crashes at the Planning Level: Simultaneous Negative Binomial Crash Model Applied in Tucson, Arizona. In Transportation Research Record: Journal of the Transportation Research Board, No. 1897, Transportation Research Board of the National Academies, Washington, D.C., pp. 191–199.
Lee, J., Abdel-Aty, M. and Jiang, X. (2015). Multivariate Crash Modeling for Motor Vehicle and Non-motorized Modes at the Macroscopic Level. Accident Analysis and Prevention. No. 78. pp. 146-154.
Levine, N., Kim, K. and Nitz, L. (1995). Spatial Analysis of Honolulu Motor Vehicle Crashes. II. Zonal Generators. Accident Analysis and Prevention. Vol. 27, No. 5. pp. 675-685.
Lovegrove, G. R., and Sayed, T. (2006). Macro-level Collision Prediction Models for Evaluating Neighborhood Traffic Safety. Canadian Journal of Civil Engineering. Vol. 33. No. 5. pp 609-621.
Lovegrove, G. (2012). Road Safety Planning, New Tools for Sustainable Road Safety and Community Development. AV Akademikerverlag. Berlin.
Mohamed M., Saunier, N., Miranda-Moreno, L. and Ukkusuri, S. (2013). A Clustering Regresion Approach: A Comprehensive Injury Severity Analysis of Pedestrian-Vehicle Crashes in New York, US and Montreal, Canada. Safety Science. Vol. 54, pp. 27-37.
Naderan, A. and Shahi, J. (2010). Aggregate Crash Prediction Models: Introducing Crash Generation Concept. Accident Analysis & Prevention. Vol. 42, No. 1. pp. 339-346.
National Land Cover Database 2011 (2011). http://www.mrlc.gov/nlcd11_data.php
Norland, R. B. (2003). Traffic Fatalities and Injuries: The Effect of Changes in Infrastructure and Other Trends. Accident Analysis and Prevention. No. 35. Vol. 4. pp. 599-611.
Norland, R. B. and Quddus, M. A. (2004). Analysis of Pedestrian and Bicycle Casualties with Regional Panel Data. Transportation Research Record 1897. pp. 28-33.
Norland, R. B. and Quddus, M. A. (2004). A Spatially Disaggregate Analysis of Road Casualties in England. Accident Analysis and Prevention, Vol. 36, No. 6, pp. 973–984.
Oh, J., Washington, S. P., and Choi, K. (2004). Development of Accident Prediction Models for Rural Highway Intersections. In Transportation Research Record: Journal of the Transportation Research Board, No. 1897, Transportation Research Board of the National Academies, Washington, D.C., pp. 18–27.
Oh. J., Lyon, C., Washington S. P., Persaud, B. and Bared, J. (2003). Validation of FHWA Crash Models for Rural Intersections: Lessons Learned. In Transportation Research Record: Journal of the Transportation Research Board. No. 1840, Transportation Research Board of the National Academies, Washington, D.C., pp. 41-49.
Pirdavani, A., Brijs, T., Bellemans, T., Kochan, B. and Wets, G. (2012). Application of Different Exposure Measures in Development of Planning-level Zonal Crash Prediction Models. In Transportation Research Record: Journal of the Transportation Research Board, No. 2280, Transportation Research Board of the National Academies, Washington, D.C., pp. 145-153.
Pulugurtha S., Duddu, V. R, and Kotagiri, Y. (2004). Traffic Analysis Zone Level Crash Estimation Models Based on Land Use Characteristics. Accident Analysis and Prevention, Vol. 36, No. 6, pp. 973–984.
Souleyrette, R., Caputcu, M., Cook, D., McDonald, T., Sperry, R. and Hans, Z. (2010). Safety Analysis of Low-Volume Rural Roads in Iowa, Final Report, Project 07-309, Institute for Transportation, Iowa State University.
STATA (2011). Clustering Kmeans and Kmedians. Release 12. A Stata Press, StataCorp LP. College Station, Texas http://www.stata.com/manuals13/mvclusterkmeansandkmedians.pdf.
United States Census Bureau (2010). https://www.census.gov/geo/maps-data/data/tiger-line.html
Vogt, A. (1999). Crash Models For Rural Intersections: Four-Lane by Two-Lane Stop-Controlled and Two-Lane by Two-Lane Signalized US Department of Transportation, Federal Highway Administration Report, FHWA-RD-99-128.
Vogt, A. and Bared, J. (1998). Accident Models for Two-Lane Rural Segments and Intersections. In Transportation Research Record: Journal of the Transportation Research Board, No. 1635, Transportation Research Board of the National Academies, Washington, D.C., pp. 18-29.
Washington, S. P., Karlaftis, M., Mannering, F. L. (2011). Statistical and Econometric Methods for Transportation Data Analysis, 2^{nd} ed. Chapman and Hall/CRC, Boca Raton, FL.
Washington, S. P., Schalkwyk, V., Mitra, S., Meyer, M., Dumbaugh, E., Zoll, M. (2006). Incorporating Safety into Long-Range Transportation Planning. NCHRP Report 546, National Cooperative Highway Research Program, Transportation Research Board, Washington D.C..
Percentage (%) |
(a) land-use intensities and population density distributions (b) cluster distribution over Connecticut
by cluster
FIGURE 1 Clustering Results and Cluster Distribution
(the boxplot from left to right under each cluster is related to intersection crashes and segment crashes)
FIGURE 2 Distributions of KAB Crashes by Cluster
(a) Distribution of the number of intersections |
(b) Distribution of city or town roadway mileage |
(c) Distribution of total population |
(d) Distribution of total household |
FIGURE 3 Distributions of Independent Variables by Cluster
(a) Distribution of total vehicles |
(b) Distribution of average household income |
(c) Distribution of total retail employment |
(d) Distribution of total non-retail employment |
FIGURE 4 Distributions of Independent Variables by Cluster (Continued)
TABLE 1 Goodness-of-fit of the Cluster Based and Statewide SPFs
Cluster SPF | Intersection SPF | Segment SPF | ||||||
Population | Households | Population | Households | |||||
AIC | BIC | AIC | BIC | AIC | BIC | AIC | BIC | |
1 | 432 | 448 | 428 | 444 | 330 | 346 | 334 | 350 |
2 | 887 | 908 | 896 | 917 | 692 | 713 | 718 | 739 |
3 | 1,231 | 1,256 | 1,246 | 1,271 | 1,081 | 1,105 | 1,109 | 1,134 |
4 | 1,110 | 1,135 | 1,120 | 1,145 | 1,051 | 1,075 | 1,063 | 1,088 |
5 | 1,220 | 1,247 | 1,219 | 1,246 | 1,475 | 1,502 | 1,489 | 1,516 |
6 | 1,247 | 1,278 | 1,246 | 1,277 | 2,120 | 2,151 | 2,125 | 2,155 |
Statewide SPF | 6,935 | 6,972 | 6,977 | 7,015 | 6,826 | 6,863 | 6,970 | 7,008 |
TABLE 2 SPF Prediction Performance
MOEs | Statewide SPF
(Population) |
Statewide SPF
(Households) |
Cluster-based SPF (Population) | Cluster-based SPF (Households) |
Intersection SPF | ||||
MAD Estimation | 2.65 | 2.72 | 1.95 | 1.95 |
MAD Prediction | 2.65 | 2.74 | 1.62 | 1.75 |
MSPE Estimation | 18.25 | 20.72 | 11.14 | 11.29 |
MSPE Prediction | 13.29 | 14.95 | 6.41 | 7.50 |
Segment SPF | ||||
MAD Estimation | 2.00 | 2.01 | 1.77 | 1.87 |
MAD Prediction | 1.52 | 1.58 | 1.30 | 1.47 |
MSPE Estimation | 8.28 | 9.13 | 7.55 | 7.62 |
MSPE Prediction | 4.00 | 4.48 | 3.51 | 3.74 |
TABLE3 Coefficient Estimates for KAB Intersection Crashes
Variables | Coefficient Estimates by Cluster | |||||
1 | 2 | 3 | 4 | 5 | 6 | |
Intercept | -1.275 | 0.270 | -0.150 | -0.984 | -2.688 | -4.908 |
(0.001) | (0.487) | (0.717) | (0.044) | (0.000) | (0.000) | |
Log (number of intersections) | 0.682 | 0.170 | 0.078 | 0.040 | 0.606 | 0.844 |
(0.000) | (0.225) | (0.587) | (0.810) | (0.000) | (0.000) | |
Population (*1000) | 0.161 | 0.282 | 0.360 | 0.372 | 0.054 | 0.129 |
(0.014) | (0.000) | (0.000) | (0.000) | (0.368) | (0.145) | |
Retail employment (*1000) | 0.196 | -0.295 | -0.221 | 0.462 | 0.845 | 0.992 |
(0.530) | (0.451) | (0.261) | (0.045) | (0.000) | (0.000) | |
Non-retail employment (*1000) | 0.090 | 0.182 | 0.121 | -0.003 | -0.064 | 0.174 |
(0.003) | (0.000) | (0.072) | (0.966) | (0.195) | (0.008) | |
Average household income (*1000) | -0.005 | -0.013 | -0.010 | -0.002 | -0.003 | 0.002 |
(0.067) | (0.000) | (0.000) | (0.240) | (0.009) | (0.001) | |
Overdispersion | 0.258 | 0.280 | 0.422 | 0.616 | 0.357 | 0.227 |
(0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | |
Deviance/DF | 1.090 | 1.001 | 0.899 | 0.832 | 0.874 | 0.802 |
Notes: first row is the coefficient, second row is the p-significance, and bold coefficients are statistically significant at 5% level of significance. |
TABLE4 Coefficient Estimates for KAB Segment Crashes
Variables | Coefficient Estimates by Cluster | |||||
1 | 2 | 3 | 4 | 5 | 6 | |
Intercept | -3.648 | -1.769 | -1.300 | -1.621 | -5.429 | -5.946 |
(0.008) | (0.213) | (0.305) | (0.265) | (0.000) | (0.000) | |
Log (roadway length in miles) | 0.403 | 0.248 | 0.160 | 0.100 | 0.539 | 0.504 |
(0.020) | (0.161) | (0.297) | (0.552) | (0.000) | (0.000) | |
Population (*1000) | 0.166 | 0.188 | 0.239 | 0.311 | 0.165 | 0.301 |
(0.030) | (0.001) | (0.000) | (0.000) | (0.005) | (0.000) | |
Retail employment (*1000) | 0.446 | -0.442 | 0.256 | 0.587 | 0.477 | 0.376 |
(0.185) | (0.268) | (0.039) | (0.003) | (0.003) | (0.090) | |
Non-retail employment (*1000) | 0.066 | 0.100 | 0.126 | 0.001 | -0.037 | 0.029 |
(0.030) | (0.044) | (0.050) | (0.533) | (0.392) | (0.697) | |
Average household income (*1000) | -0.003 | -0.012 | -0.012 | -0.003 | -0.002 | 0.001 |
(0.327) | (0.001) | (0.000) | (0.027) | (0.009) | (0.015) | |
Overdispersion | 0.263 | 0.178 | 0.264 | 0.338 | 0.381 | 0.175 |
(0.000) | (0.000) | (0.000) | (0.000) | (0.000) | (0.000) | |
Deviance/DF | 0.719 | 0.784 | 0.783 | 0.749 | 0.912 | 0.701 |
Notes: first row is the coefficient, second row is the p-significance, and bold coefficients are statistically significant at 5% level of significance. |
Cite This Work
To export a reference to this article please select a referencing stye below:
Related Services
View allDMCA / Removal Request
If you are the original writer of this dissertation and no longer wish to have your work published on the UKDiss.com website then please: