Dissertation on Identifying Stages of Chronic Kidney Disease

Info: 8396 words (34 pages) Dissertation
Published: 12th Nov 2021

Tagged: Health and Social CareMedicine

Share this: Facebook Twitter Reddit LinkedIn WhatsApp

Chronic Kidney Disease Stratification using Office Visit Records: Handling Data Imbalance via Hierarchical Meta-Classification

Abstract

Background

Chronic Kidney Disease (CKD) is one of several conditions that are affecting a growing percentage of the US population; it is accompanied by multiple co-morbidities, and is hard to diagnose in-and-of itself. In its advanced forms the disease has severe outcomes and can lead to death, thus it is important to detect the disease as early as possible, which can help devise effective intervention and treatment plan. Here we investigate ways to utilize information available in electronic health records (EHRs) from regular office visits of >13,000 patients, in order to distinguish among several stages of the disease. While EHRs provide valuable patient health information for patient risk-stratification, one of the major challenges in using them arises from data imbalance. That is, records associated with a more severe condition are typically under-represented compared to those associated with only a milder manifestation of the disease. To address imbalance, we propose a hierarchical meta-classification method, aiming to stratify CKD by severity levels, employing simple quantitative non-text features gathered from office visit records.

Methods

The proposed hierarchical meta-classification method frames the multiclass classification task as a sequence of two subtasks. The first is binary classification, separating records associated with majority class from those associated with a combined minority class, using meta-classification. The second subtask separates the records assigned to the combined minority class into the individual classes.

Results

The proposed method effectively stratifies CKD severity levels obtaining high average sensitivity, specificity and F-measure ( ≥93%). Our results show that the good performance of our system is retained even when using much reduced training sets, indicating that the method is stable and generalizable.

Conclusion

We have presented a new approach, to perform classification while addressing data imbalance, which is inherent in the biomedical domain. Our model effectively identifies CKD severity stages of patients using information readily available in office visit records, within the realistic context of high data imbalance.

Keywords: Imbalanced Data; Meta-classification; Hierarchical Classification, Electronic Health Records; Kidney Disease

Background

Chronic kidney disease (CKD) is defined as kidney damage persisting for more than three months. It is currently affecting about 15% of the adult population in the US, accompanied by co-morbidities and associated with increased mortality rates [1]. The disease is typically classified into five stages, 1–5, indicating increasing order of severity [2]. These severity stages are clinically quantified through the use of the Estimated glomerular filtration rate (eGFR), an indicator of the level of kidney function[1]. The glomerular filtration rate is estimated from serum creatinine lab tests, race, sex and age. As chronic kidney disease – even in its advanced stages – is often asymptomatic, the relevant lab tests are not typically ordered and many CKD patients go undiagnosed. Patients who remain under-treated, especially in stages 4 and 5, are at high risk for end-stage renal disease and death. A study reported by the Kidney Early Evaluation Program (KEEP) [1], indicates that fewer than 30% of the 122,502 patients enrolled in the program at stages 4 and 5 have ever been seen by a nephrologist. Notably, 95% of the enrolled patients did visit their general practitioner during the year preceding the study, for other conditions.

As such, developing a risk stratification model based on information gathered during these office visits, which can separate CKD patients into severity stages, can be used to alert general practitioners about a patient’s risk for advanced stage (4 or 5) CKD, prompting the physician to order the lab tests needed to confirm the diagnosis.

Several recent studies have proposed approaches for patient risk-stratification and disease prediction using machine learning [5-9]. Unlike our study, none of these studies are based solely on simple quantitative attributes available in office visit records; they rely on lab test results, insurance information and narrative text, in addition to office visit records. Moreover, most of the studies were based on only very small datasets (

Collaborating with physicians from Christiana Care Health System, the largest health-system in Delaware, we analyze a dataset gathered from 13,111 patients who had been seen in primary care or specialty practices over a nine-year period. Data from the Nephrology Practices EHR were not included in this dataset. The dataset comprises information collected during patients’ visit to multiple primary care and specialty practices across Delaware. Each record in the dataset comprises 495 simple quantitative non-text attributes summarizing a patient’s demographics, vital signs, diagnosed conditions and medications. We represent each patient’s visit record using the values of these attributes as features. Notably, unlike physicians’ notes, these 495 non-text attributes are available for the vast majority of patients, and their semantics is unambiguous and readily interpretable. The dataset is further described in the Data and Methods section.

While clinical data stored in EHRs provide valuable information for patient risk stratification, one of the major challenges in using them arises from data imbalance. That is, only a small proportion of patients suffer from the more severe conditions, while most patients either do not manifest the condition or manifest it only mildly. Specifically, in our dataset, the number of records associated with stage 3 is 10 times larger than the number of records associated with stage 4, and 23 timeslarger than that of stage 5. Imbalanced datasets, whose class distribution is skewed, pose a common challenge in data mining applications, such as fraud detection and disease detection, where the class of interest is heavily underrepresented compared to the other classes. Classifiers learned from such an imbalanced dataset using off-the-shelf packages typically show poor performance in identifying minority class records (the class usually associated with the more severe condition), as demonstrated in our Experiments and Results section. Thus, addressing imbalance is critical for correctly identifying the important records associated with the minority class.

We thus propose and develop a meta-learning based hierarchical classification approach that addresses data imbalance, while performing multiclass classification. We utilize the proposed method to stratify CKD patients already identified as advanced, into severity stages (stages 3–5), using information gathered from standard office visit records. Our method effectively identifies a significant proportion of patients suffering from the more severe conditions (stages 4 and 5), attaining high sensitivity, while also correctly identifying most of the less severe cases (Stage 3), maintaining a high level of specificity.

Data imbalance is often handled in machine learning via sampling strategies [10-16]. Typically, these involve either under-sampling – reducing the size of the majority class by removing instances from the training set, or over-sampling – increasing the impact of the minority class, by sampling from it with repetition. Several previous studies have proposed variants of under/over-sampling approaches to address class imbalance. Examples include one-sided selection [10] and synthetic minority oversampling technique (SMOTE) [11]. Notably, most of these are concerned with binary classification, while ours is a multiclass task, as we are looking to assign one of three possible stages to each record (possibly more in future studies).

Two approaches commonly used to transform multiclass classification into multiple binary-classification tasks are one-against-all (OAA) and one-against-one (OAO) [12]. Tan et al. [13] use both these schemes in the context of protein fold classification, and subsequently build rule-based learners to improve coverage of the minority class. Zhao et al. [14] use OAA, while employing under-sampling and SMOTE techniques on imbalanced data, for protein classification. Another approach to address class imbalance employs cost sensitive ensemble methods [15].

Before developing our own approach, we have applied versions of the above methods, specifically, random under-sampling and over-sampling using SMOTE, within the OAA scheme (See Results section). None has improved on the results obtained by simple classifiers that do not account for class imbalance (to which we refer as baseline classifiers), such as simple random forest. Thus, as mentioned earlier, we develop and present a multiclass classification method, hierarchical meta-classification, aiming to stratify CKD patients into severity levels (stages 3-5), while addressing data imbalance. Unlike approaches that utilize under-sampling and ignore much of the majority data, or approaches that use over-sampling, which create a large amount of mock-up data that can lead to over-fitting, our approach neither ignores data nor creates synthetic samples, yielding higher level of performance, while avoiding over-fitting.

We frame the multiclass classification task as a hierarchy of two subtasks. The first is binary classification, separating records associated with stage 3 (majority class) from those associated with a combined class consisting of stages 4 and 5 (minority classes), using meta-classification. Meta-classification assembles results obtained from multiple simple classifiers (base-classifiers) into a single classification decision [17]. The second subtask separates the records assigned to the combined stages 4 and 5 into the respective stage-based classes. Ours is the first study that utilizes meta-classification in combination with a hierarchical approach to address data imbalance.

To train the hierarchical meta-classifier, we take advantage of the earlier office visit records, gathered between the years 2007-2014, while testing is done on later records, gathered during 2015. We evaluate the performance of our methods using standard metrics, namely, overall accuracy, as well as specificity, sensitivity, precision and F-measure [18]. Our results show that the proposed method trained on a dataset represented via the complete set of features, improves upon multiple baselines, showing a performance level greater than 93% according to all evaluation measures. The good performance of our method indicates that simple quantitative attributes from office visit records form a sound basis for CKD severity detection and stratification.

To further demonstrate the predictive ability of our proposed sampling based ensemble approach, we gradually reduced the size of the training set by pruning the early years of patient history, one year at a time, yielding eight training sets. We used the records collected in 2015 as the test set for assessing the performance of our model trained on each of the eight training sets, ensuring that the training set always comprises temporally earlier records than the test set. Our results show that the classification remains as effective when the number of years (and of visit records) included in the patient history and used by the classifier is reduced, illustrating the stability and generalizability of our classifier.

Methods

Dataset

The dataset used in this study comprises 120,739 records obtained during patients’ visits to multiple primary care and specialty practices across Delaware; these records form part of the information stored in EHRs. Patient records were selected for inclusion in the dataset if at any time during follow-up, there was an indication of a decline in kidney function, determined by a lower than normal eGFR value (60 mL/min/1.73m2), indicative of CKD at stage 3 or higher. The resulting dataset includes all records of patients associated with stages 3, 4 and 5. This dataset and the project were approved by the Christiana Care Institutional Review Board with a waiver of consent according to 45CFR46.116d.

We removed from the dataset 27,521 records that were missing essential values, leaving a set of 93,218 complete records. Table 1 summarizes key characteristics of the dataset. Table 2 shows the three categories of features comprising the dataset, along with the number of actual features per category. Information pertaining to the attributes listed in Table 2 is routinely collected and stored in the EHR during each visit to the general practitioner, making our approach broadly applicable to office visit records beyond the specific disease and dataset analyzed here.

We note that in contrast to the earlier version of this work published in the proceeding of the IEEE International Conference on Bioinformatics and Biomedicine [19], here we do not include patients’ medications as part of the feature set, reducing the number of features used to represent a patient record from 495 to 462. Several medications prescribed to Stage 3 patients can be harmful to patients at advanced stages of the condition (Stages 4-5). As such, medications can be indicative of a diagnosed disease stage, rather than predictive of it. To ensure our model is truly predictive, we have removed medications from the feature set. The latter actually lead to improved average specificity, and only slightly reduced sensitivity with respect to the advanced CKD stages.

Given our aim of stratifying CKD by severity, we have removed additional seven features that are directly reflective of CKD, from our feature set. The seven features denote seven diagnosed conditions, namely: CKD stage 2, CKD stage 3, CKD stage 4, End Stage Renal Disease, Chronic Renal Failure, History of renal transplant (situation), Renal Failure Syndrome. We thus use a total of 455 features for representing the patients’ records in our dataset. Specifically, each record,

rk(1 ≤ k ≤ 93,218), is represented as a 455-dimensional vector,

Vk=,where each dimension corresponds to one of the 455 features.

As noted in Background section, our dataset is highly imbalanced, as is often the case in a biomedical setting, where the outcome of interest, in our case, stages 4 and 5, is rare, and thus underrepresented. In our study, the ratio among the number of records associated with each of the stages 3, 4 and 5 is 23:2:1, respectively.

Recall that we progressively reduced the training set size by pruning the early years of patient history, one year at time, yielding 8 training sets. Table 3 shows the number of records per class, for each of the 8 training sets. The table also shows the number of records per stage collected in 2015 and used as the test set.

Methods

In this section we describe the hierarchical meta-classifier we have developed, including the simple baseline classifier used for comparison. We also briefly describe the performance measures employed throughout for evaluation.

Baseline Classifiers: As a baseline for comparison, we use four standard methods, namely, logistic regression, naïve Bayes, decision tree and random forests, employing the one-against-all strategy to assign a CKD stage to each office visit record. We use the Python scikit-learn implementation for training the four classifiers [20].

Hierarchical Meta-Classifiers (Our Proposed Method): To separate CKD stages 3–5, while addressing data imbalance, we propose a sampling based ensemble approach, hierarchical meta-classification. As CKD stage 3 (eGFRrange of 30– ≤60) is characterized by moderately reduced kidney function, while stages 4 and 5 (eGFR 30) are characterized by severely reduced kidney function [1], records associated with patients at stages 4 and 5 are likely to be more similar to one another than to records of patients at stage 3. Thus, we utilize a hierarchical approach to first separate records associated with a combined class consisting of stages 4 and 5 cases from those associated with stage 3,and then further separate the combined class into the two individual subclasses.

To separate the combined stages 4 and 5 records from stage 3 cases, while addressing data imbalance, we utilize a meta-classification scheme, which aims to assemble results obtained from multiple classifiers into a single classification outcome. We show in the Experiments and Results section that our proposed scheme significantly improves upon the baseline classifiers, simple non-hierarchical meta-classifier and previously reported methods to address imbalance, as all of the latter fail to identify a significant proportion of stage 4 and stage 5 cases.

Figure 1 summarizes our approach, where the top dashed-square corresponds to the coarse high-level classification, separating stage 3 from the combined stages 4 and 5 records, while the bottom dashed-square depicts the refinement step, separating the combined class into the refined stage 4 and stage 5 classes. The individual steps are further described below.

Coarse Classification: This step comprises two sub-tasks. First, a set of M simple classifiers, {C1,…,CM} are trained, and applied to each visit record, rk, where the latter is represented as a 455-dimensional vector, as described above. We refer to the simple classifiers as base-classifiers. Each of the latter assigns a label Cjk(where Cjkis either 3 or 4-5) to the vector Vk. Second, the class labels assigned by these M simple classifiers are used to re-represent the visit record rk as an M-dimensional vector ,…CMk>. This representation is then used to train a meta-classifier that assigns a class label Stage 3 or Stage 4-5 to each record [17]. The meta-classifier thus treats the judgment from each base-classifier for each class as a feature value and uses these features to make the final decision.

To train the base-classifiers for the coarse-classification step (distinguishing stage 3 from the combined two other stages), we produce balanced training sets by first partitioning the data stemming from the over-represented stage 3 class into smaller subsets. Specifically, as there are 7 times more records associated with stage 3 than with the combined set of stages 4 and 5, we partition the stage 3 set into 7 equal subsets. Each subset contains the same number of records as that included in the stages 4 and 5 combined. Next, we combine each of the stage 3 subsets with the stage 4-5 set, thus forming a total of 7 datasets, each having a uniform distribution across CKD stage 3 and combined stages 3-4. Figure 2 illustrates the data partitioning scheme.

We train the 7-respective base-classifiers on the resulting seven balanced training sets. To choose the base classifier, from among four commonly used simple classifiers, namely logistic regression, naïve Bayes, decision tree and random forest, we conducted four sets of experiments, in each we employed one of these methods as a base classifier. We trained each of the four classifier types on the 7 balanced sets, thus generating 7 base-classifiers per type. Using each set of 7 base-classifiers, we trained a meta-classifier in which the training set was re-represented as 7-dimensional vectors (M=7), where the value along the ith dimension consists of the label obtained from the ith base-classifier when applied to the original record representation. In all four sets of experiments the meta-classifier used is naïve Bayes, as it proved to perform best, and has proven effective by others as well [21]. The resulting classifier aims to separate stage 3 records from records that are stage 4 or 5 (see Figure 1A).

Refinement Classification Step: In this step, we separate the combined minority class records obtained from the coarse classification step (combined stages 4 and 5, in our study) into the individual classes (see Figure 1B). To do so, we experimented with multiple simple classifiers, including random forest and naïve Bayes. The random forest classifier is most effective in separating stage 4 from stage 5 patients, and as such this is the one we employ. To train the random forest classifier, we use the set of records from the training data that are associated with stages 4 and 5.

Testing: To test our classifier, first, each record in the test set is classified by each of the base-classifiers, and the labels obtained from each of these base-classifiers are used to form a feature vector, which becomes the input to the meta-classifier. Next, the meta-classifier is applied to each newly represented vector, thus separating stage 3 records from records of stages 4 or 5 (in the coarse classification step). Records classified into the combined stage-4 and –5 class are further categorized by the simple random forest classifier (in the refinement step), and assigned to either of the two individual classes, stage 4 or stage 5.

Evaluation: To evaluate the performance of our methods, we use several standard performance measures, namely, specificity, sensitivity (recall), precision and F-measure. For each stage i its specificity, sensitivity (recall), precision and F-measure as defined below:

Specificity = TNiTNi+FPi,

Sensitivity = TPiTPi+FNi,

Precision = TPiTPi+FPi,

F-measure = 2∙Precision∙Sensitivity Precision + Sensitivity,

where TPi(true positives) denotes records of stage i that are correctly assigned to stage i by the classifier; TNi(True Negatives) denotes records that are not associated with stage i and are not assigned to stage i by the classifier; FPi (False Positives) denotes records not associated with stage i that are misclassified as stage i ; while FNi (false negatives) denotes stage i records that were incorrectly assigned to other stages by the classifier.

Results

As a baseline, we first trained and tested simple naïve Bayes, logistic regression, decision tree, and random forest classifiers using the set of 455 features to represent each record.

Next, to handle data imbalance, we first experimented with previously reported methods, namely, random under-sampling and over-sampling using SMOTE. We also applied simple meta-classification to the records to address imbalance (detailed description of the simple meta-classifier was discussed in the earlier, conference-version, of this work [19]). Since these methods failed to identify a large number of stages 4 and 5 cases (minority class cases), we applied our proposed hierarchical meta-classification approach to separate the records associated with different CKD severity levels.

For each of the methods mentioned above, we used records from the first 8 years (2007– 2014) for training, and the ninth year (2015) for testing. We did not use cross-validation for training and testing, since we limited the training set to temporally early records (from the first 8 years, 2007-2014) while restricting the test set to later records (from the ninth year, 2015). Notably, to ensure stability of the results, we employed multiple random splits to partition the training set stemming from the over-represented class into smaller subsets for training the base-classifiers in the coarse classification stage (see Figure 2B) of the hierarchical meta-classifier, while keeping the test set fixed to all records from 2015.

After each classification step, we evaluated the performance using standard measures, namely, sensitivity, specificity, precision and F-measure (see Evaluation sub-section in Data and Methods). We compared the performance attained from the four classifiers (naïve Bayes, logistic regression, decision tree, and random forest) as baseline and as a component of the simple and the hierarchical meta-classifiers, to assess their respective efficacy in separating CKD stages.

Using random forest, either alone as a baseline classifier, or as a base-classifier within a meta-classifier outperforms the other base classifiers logistic regression, naïve Bayes or decision tree. We thus report here only results obtained using random forest, as a standalone baseline classifier and as a component within meta-classification. Similarly, to compare our methods to earlier approaches for addressing imbalance we use random forest classifier in combination with two such earlier approaches, namely, random under-sampling and over-sampling using SMOTE.

Table 4 shows the averagespecificity, sensitivity and F-measure, attained by our hierarchical meta-classification scheme, compared to those attained by the baseline classifier, the simple, non-hierarchical meta-classifier, the random under-sampling scheme and the over-sampling using SMOTE. Figure 3 shows the sensitivity and F-measure per-class, attained by the baseline classifier, by the over-sampling with SMOTE scheme and by our method. Notably, the figure shows the improved performance of our method for identifying CKD stages 4 and 5.

Additionally, we examined the performance of our method when the number of years (and of visit records) included in the patient history and used to train the classifier, is reduced. To do so, we repeated the experiments using our proposed method while progressively truncating the early years of patient history included in the training data, one year at a time, yielding 8 sets of training data. As in previous experiments, we kept the test set fixed to records collected in 2015. The first set included records gathered between the years of 2007 and 2014, yielding a training set containing 83,642 records, while the eighth set included data collected in 2014 alone, yielding a training set of 19,664 records. We trained the hierarchical meta-classifier over each of the training sets represented by the complete set of 455 features. This set of experiments assessed the generalizability of our approach, i.e. its ability to assign the correct severity level based only on the most recent history of the patient.

Figure 4 shows the True Positive Rates (TPR) and False Negative Rates (FNR) per-class for the 8 training sets (see Table 4) that were obtained by progressively pruning the early years of patient history in the training data, one year at time. We used each of the eight sets to train the random forest hierarchical meta-classifier. The average accuracy, specificity, sensitivity, precision and F-measure are all about 0.93 (std ) for all eight sets.

Recall that to ensure the stability of our model, we repeated each experiment 20 times, using different splits to partition the set of records associated with the over-represented class. We obtained similar results in all runs (std ).

Discussion

The hierarchical meta-classifier we have introduced attains significantly higher sensitivity with respect to stages 4 and 5 compared to that obtained by other methods (Figure 3), which fail to identify a large proportion of the CKD stage 4 records and many stage 5 records.

The proposed method thus demonstrates improved identification of records associated with severe stages, within the realistic context of highly imbalanced data. Our method (denoted RF-Hier-MC) also outperforms all other classifiers according to F-measure and specificity (Table 4). We note that the performance of the baseline random forest classifier (denoted RF-Baseline) and the over-sampling with SMOTE using random forest classifier (denoted RF-SMOTE) are similar to that of our method. However, as shown in Figure 3, the performance of the three models varies significantly across CKD stages. Clearly, hierarchical meta-classifier shows a higher sensitivity and F-measure for both stages 4 and 5 than the baseline and the SMOTE based classifiers, indicating that our method identifies advanced CKD stage records (stages 4 and 5) more effectively than other methods. Notably, unlike under-sampling based approaches, our method does not ignore any record associated with the majority class, nor does it create any synthetic sample, as is commonly done in approaches that use over-sampling.

We note that in the context of risk-stratification, and particularly when assigning an advanced stage label to a record, false negatives (i.e. missing a severe case) have much more severe implications than false positives (assigning a stage 4 or 5 label to a stage 3 case). That said, having a very large portion of false-positives is clearly undesirable as it generates false alarms. We further examine these points by calculating the precision, (also referred to as positive predictive value, PPV) and sensitivity for the set of records associated with the combined stage 4 and 5 class. Precision penalizes for false positives, while sensitivity penalizes for false negatives. The precision with respect to the combined set is 0.76. That is, of the 1,085 test records classified as stages 4 or 5 by our classifier, 829 are correctly identified. It is important to note that of the remaining 256 false-positive records, 181 (~70%) are borderline cases, as indicated by eGFR values in the range of 30-44, which is associated with patients suffering from advanced stage 3 CKD (stage 3b) [22]. Recent studies demonstrate that stage 3b is the inflection point for adverse outcomes, including progression to end stage renal disease (stage 5) [23]. Thus, our classifier effectively identifies not only the advanced stage records already marked, but also the likely-to-be severe cases that are not yet labeled as such. As for sensitivity, Figure 3 clearly shows that hierarchical meta-classifier has a higher sensitivity for stage 4 and 5 records than the other methods, while retaining about the same sensitivity with respect to stage 3.

The high performance levels obtained from the classifier we developed here when trained on the datasets obtained by progressively truncating the early years of patient history, one year at time, indicate that the model effectively identifies CKD stages even when it is trained on limited, recent patient history. As shown in Figure 4, the true-positive rate remains almost constant, regardless of the number of years included in the visit record, except for a slight decline in predicting stages 4 and 5 when using data from 2013/14 or 2014 alone. The false-negative rate is also not impacted by the reduction in a patient’s history. Our results thus indicate that training the performance of our classifiers is not significantly affected by training over only a limited patient history.

The consistent good performance of our method for datasets of different sizes containing patients’ visit records gathered over different year ranges, indicates stability and generalizability of our model, and highlights its applicability in clinical settings, where old records are not always available to train the model.

Limitations: While our proposed sampling based ensemble method have shown good performance even in the face of data imbalance, the dataset used is limited in size and comprises records only associated with kidney disease patients. Future work includes testing and extending the generalizability of our model using additional datasets and in the context of other diseases.

Conclusion

In this study, we have shown that CKD severity levels can be effectively stratified using a supervised machine learning method that is based on simple quantitative non-text attributes collected during standard office visits, in the realistic context of highly imbalanced case population. We proposed and developed a sampling based ensemble classification approach, hierarchical meta-classification, to identify CKD stages from a highly imbalanced dataset, achieving high sensitivity, specificity, precision and F-measure, all at or above 0.93. Our method significantly outperforms baseline classifiers, simple meta-classifier and previously reported approaches for addressing imbalance, in identifying advanced CKD stages (stage 4 and stage 5). Moreover, the method maintains its high level of performance when the number of records is significantly truncated, demonstrating its stability and generalizability. As a future direction, we plan to evaluate the efficacy of our method in attaining severity stratification in other health conditions using standard office visit records. We also plan to conduct prospective testing of the model in CKD patients in future studies.

Abbreviations

CKD: Chronic Kidney Disease; EHR: Electronic Health Record; eGFR: estimated glomerular filtration rate; KEEP: Kidney Early Evaluation Program; USRDS: United States Renal Disease System; OAA: one-against-all; OAO: one-against-one; std: Standard Deviation; RF: Random Forest; RF-Hier-MC: Random Forest Hierarchical Meta-Classification; SMOTE: Synthetic Minority Over-Sampling Technique; TPR: True Positive Rates; FNR: False Negative Rates; PPV: positive predictive value

References

National Kidney Foundation (2017). https://www.kidney.org/news/one-seven-american-adults-estimated-to-have-chronic-kidney-disease. Last accessed: 09/14/2018

Agrawal V, Jaar BG, Frisby XY, Chen SC, et al. Access to health care among adults evaluated for CKD: findings from the Kidney Early Evaluation Program (KEEP). Am J Kidney Dis. 2012; 59(3): S5-S15.

Bhattacharya M, Jurkovitz C, Shatkay H. Identifying patterns of associated-conditions through topic models of Electronic Medical Records. In Proc. of the IEEE Int. Conf. on BIBM. 2016; pp. 466-469.

Levin A, Stevens PE, Bilous RW, Coresh J, et al. Kidney Disease: Improving Global Outcomes (KDIGO) CKD Work Group. KDIGO 2012 Clinical Practice Guideline for the Evaluation and Management of Chronic Kidney Disease. Kidney International Supplements. 2013; 3: 1-150.

Saran R, Robinson B, Abbott KC, Agodoa LYC, et al. US Renal Data System 2016 annual data report: Epidemiology of kidney disease in the United States. Am J Kidney Dis.2007; 69(3): Svii-Sviii.

Mani S, Chen Y, Elasy T, Clayton W, Denny J. Type 2 diabetes risk forecasting from EMR data using machine learning. In Proc. of the AMIA Annual Symposium. 2012; 606-615.

Ogunyemi O, Kermah D. Machine Learning Approaches for Detecting Diabetic Retinopathy from Clinical and Public Health Records. In Proc. of the AMIA Annual Symposium. 2015; 983-990.

Teixeira PL, Wei WQ, Cronin RM, Mo H, et al. Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals. J Am Med Inform Assoc. 2017; 24 (1): 162-171.

Huang SH, LePendu P, Iyer SV, Tai-Seale M, Carrell D, Shah N H. Toward personalizing treatment for depression: predicting diagnosis and severity. J Am Med Inform Assoc. 2014; 21(6), 1069-1075.

Klimov D, Shknevsky A, and Shahar Y. Exploration of patterns predicting renal damage in diabetes type II patients using a visual temporal analysis laboratory. J Am Med Inform Assoc. 2015; 22 (2):275-289.

Kubat M, Holte R, Matwin S. Machine Learning for the Detection of Oil Spills in Satellite Radar Images. Machine Learning. 1998; 30: 195–215.

Chawla NV. Data mining for imbalanced datasets: An overview. In Data mining and knowledge discovery handbook, Springer US. 2005; 853-867.

Rifkin R, Klautau A. In defense of one-vs-all classification. Journal of machine learning research. 2004; 5(Jan):101-141.

Tan AC, Gilbert D, Deville Y. Multi-class protein fold classification using a new ensemble machine learning approach. Genome Inform. 2003; 14: 206-217.

Zhao XM, Li X, Chen L, Aihara K. Protein classification with imbalanced data. Proteins: Structure, function, and bioinformatics. 2008; 70(4): 1125-1132.

Sun Y, Kamel MS, Wang Y. Boosting for learning multiple classes with imbalanced class distribution. In Proc. of the IEEE Int. Conf. on Data Mining (ICDM). 2006; 592-602.

Zmiri D, Shahar Y, and Taieb-Maimon M. Classification of patients by severity grades during triage in the emergency department using data-mining methods. The Journal of Evaluation in Clinical Practice. 2012; 18 (2): 378-388.

Lin WH., Hauptmann A. Meta-classification: Combining multimodal classifiers. In Proc. of the Pacific-Asia Conf. on Knowledge Discovery and Data Mining. 2002; 217-231.

Murphy KP. Machine learning: a probabilistic perspective. MIT press; 2012.

Bhattacharya M, Jurkovitz C and Shatkay H. Assessing Chronic Kidney Disease from Office Visit Records Using Hierarchical Meta-Classification of an Imbalanced Dataset. In Proc. of the IEEE Int. Conference on Bioinformatics and Biomedicine (BIBM), 2017; 663-670.

Pedregosa F, Varoquaux G, Gramfort A, Michel V, et al. Scikit-learn: Machine learning in Python. J of Machine Learning Res. 2011; 2825-2830.

Briesemeister S, Rahnenführer J, Kohlbacher O. Going from where to why—interpretable prediction of protein subcellular localization. Bioinformatics. 2010; 26(9): 1232-1238.

Sud M, Tangri N, Levin A, Pintilie M, Levey AS, Naimark DM. CKD stage at nephrology referral and factors influencing the risks of ESRD and death. American Journal of Kidney Diseases. 2014; 63(6): 928-936.

Go AS, Chertow GM, Fan D, McCulloch CE, Hsu CY. Chronic kidney disease and the risks of death, cardiovascular events, and hospitalization. New England Journal of Medicine. 2004; 351(13): 1296-1305.

Figure Legends

Figure 1. Hierarchical meta-classification for multiclass classification. A. Coarse classification: The dashed rectangle represents the meta-classification scheme used for separating stage 3 records from those associated with either stage 4 or stage 5. The white oval represents the set of records assigned to the combined class by the meta-classifier, while the shaded ovals represent the final classes corresponding to the individual three stages. B. Refinement step: The combined set of stage 4 and 5 records is separated into individual respective constituent subsets. The rhombus represents a simple random forest classifier.

Figure 2. Data partitioning scheme for hierarchical meta-classification. The set of stage 3 records is partitioned into 7 subsets (white squares in the figure), where each subset contains the same number of records as that included in the combined stages 4 and 5 set (grey squares). We merge each of the stage 3 subsets with the combined stages 4 and 5 set, thus forming a total of 7 balanced datasets, each having a uniform distribution across stage 3 and stages 4-5 records.

Figure 3. Performance per-class in terms of Sensitivity (left plot) and F-measure (right plot), of the random forest baseline classifier (RF-Baseline), as well as of over-sampling using SMOTE (RF-SMOTE) and of our hierarchical meta-classifier (RF-Hier-MC). The X-axes in both plots show CKD stages; the Y-axes show sensitivity (left) and F-measure (right) per CKD stage.

Figure 4. True Positive Rates, TPR (solid plots) and False Negative Rates, FNR (dashed plots) with respect to stages 3, 4 and 5, associated with eight hierarchical meta-classifiers, each trained on datasets obtained by gradually truncating the early years of patient history included in the training set, one year at a time. The X-axis shows the years covered by each training set. The Y-axis shows the true positive rate (top) or false negative rate (bottom).

Tables

Table 1: Key characteristics of the office visit datasets. The leftmost column lists the characteristics; the rightmost column shows the corresponding values in the office visit set.

	Office visit Set
Number of Patients	13,111
Age Range (25th –75th Percentile)	60 – 80
Mean Age ()	70 (12)
% Female	60%
% Male	40%
Avg. Number of Visits per Patient	17

Table 2. Three categories of features comprising our dataset. The leftmost column shows the categories, while the middle column shows the number of features per category. The rightmost column shows examples of features associated with each category.

Category

Number of Features

Examples

Demographics

Vital Signs

Diagnosed Conditions

447

Gender; Age; Ethnicity; Race

Heart Rate; Systolic and Diastolic Blood Pressure;
Body Mass Index

Benign essential hypertension; Type 2 diabetes mellitus; Obesity

Table 3. Number of records in each of the classes within the 8 training sets obtained by progressively truncating the early years of patient history included in the training data. The respective year range is shown in the second row. The leftmost column shows CKD stages. The rightmost column shows the number of records per stage, collected in 2015 and used as the test set. Each of the other columns shows the number of records per stage for the corresponding training set, collected during the period indicated in the second row.

CKD Stages	TRAINING SET DISTRIBUTION								TEST SET
CKD Stages	2007-2014	2008-2014	2009 – 2014	2010-2014	2011-2014	2012-2014	2013-2014	2014	2015
Stage 3	73,425	72,808	70,127	65,326	57,863	46,881	33,072	17,273	8,419
Stage 4	6,976	6,903	6,579	6,060	5,385	4,439	3,101	1,624	782
Stage 5	3,241	3,184	3,052	2,821	2,515	2,068	1,471	767	375
*Total*	83,642	82,895	79,758	74,207	65,763	53,388	37,644	19,664	9,576

Table 4. Average specificity, sensitivity and F-measure of the random forest (RF) hierarchical meta-classifier (Hier-MC), meta-classifier (MC), random under-sampling (Under-Sampling), over-sampling using SMOTE (SMOTE) and baseline classifier (Baseline) for assigning CKD severity stages to patients based on office visit records. Classifiers were trained on office visit records from 2007-14 using the complete set of 455 features to represent patients, while records from 2015 were used as the test set. The highest value for each measure is shown in boldface. Std. deviation is shown in parentheses.

*Methods*	*Sensitivity*	*Specificity*	*F-measure*
RF-Hier-MC (Our Method)	.93 (0.02)	.97 (0.02)	.93 (0.02)
RF-MC	.90 (0.04)	.85 (0.04)	.78 (0.04)
RF-Under-Sampling	.83 (0.08)	.91 (0.07)	.83 (0.08)
RF-SMOTE	.92 (0.06)	.95 (0.06)	.92 (0.06)
RF-Baseline	.92 (0.02)	.94 (0.02)	.92 (0.02)

[1] Stage 1 is defined by kidney damage (protein or blood in the urines) while eGFR is normal (eGFR ≥90 ml/min/1.73m2); stage 2 by kidney damage and mildly decreased eGFR (eGFR 60-90); stage 3 as eGFR 30-60; stage 4 as eGFR 15-30 and stage 5 as eGFR 15.