Disclaimer: This dissertation has been written by a student and is not an example of our professional work, which you can see examples of here.

Any opinions, findings, conclusions, or recommendations expressed in this dissertation are those of the authors and do not necessarily reflect the views of UKDiss.com.

New Development of Educational Data Mining: A Literature of Review in Recent Works

Info: 9612 words (38 pages) Dissertation
Published: 11th Dec 2019

Reference this

Tagged: EducationInformation Technology

New Development of Educational Data Mining: A literature of review in recent works


Abstract Educational Data mining is an extraction of meaningful information from heterogeneous data sources of educational systems to assist learners and educators during learning-teaching process. In this paper, a sample of 149 EDM works published from 2014 to the first quarter of 2016 is analyzed. As a result of the implementation of the statistical processes, a set of seven applications composed by 15 types of educational systems, and an integration of techniques from multiple disciplines in two models as predictive and descriptive, nine tasks, 23 methods, 16 techniques and 20 algorithms each, are found and then the inter-relationships between the values of the education and DM attributes based on those models are identified. The paper concludes the most frequently cited authors in recent EDM works and provides an analysis of the EDM strengths, weakness, opportunities, and threats, and finally some of the future lines of research are pointed out.


Keywords Data Mining, Educational Data Mining, Educational Data Mining Applications, Educational Data Mining Profile, Educational Systems

1         Introduction

Data mining (DM) is a process that uses multiple disciplines such as statistical, mathematical, artificial intelligence and machine learning techniques to extract useful information from large databases, it can be used in formative evaluation to assist educators establish a pedagogical basis for decisions when designing or modifying an environment or teaching approach (Han & Kamber, 2006; C. Romero & Ventura, 2007).

Educational data mining (EDM) is the process of converting raw data from educational systems to interesting information for developers, students, teachers, parents, and other educational researchers (Cristóbal Romero & Ventura, 2010). It is an emerging discipline, concerned with developing methods for exploring the unique types of data that come from educational settings, and using these methods to better understand students and the settings in which they learn (R. S. J. D. Baker & Yacef, 2009; Kulkarni, Rampure, & Yadav, 2013). The enthusiastic adoption of DM tools by higher education has the potential to improve some aspects of the quality of education, while it lays the foundation for a more effective understanding of the learning process (R. S. J. D. Baker & Yacef, 2009).

The application of DM in educational systems is an iterative cycle of hypothesis formation, testing, and refinement (C. Romero & Ventura, 2007). Fig. 1 summarizes the various actors in the EDM domain with different efficiency. For example, learners can receive advice and customized recommendations about resources and tasks to improve their current knowledge and learning objectives; educators can detect how effective their learning materials, student performance, etc. (Calders & Pechenizkiy, 2012).

Fig. 1 The cycle of applying DM in educational systems in a nutshell                                                                          (Calders & Pechenizkiy, 2012)

The paper focuses on a new development of EDM works published in recent years and analyzes the role of DM in educational systems. In spite of this, the work provides an updated overview of the current state of art in EDM with the objective of introducing it to several users in educational systems. In this paper a survey of 149 EDM works fulfilled from 2014 to the first quarter of 2016 is reviewed. In addition, the approach of overview followed for producing the survey of EDM works is outlined in methodology Section. The description of 39 EDM works is summarized in results Section, according to seven applications of the most frequently cited EDM works. In discussion Section, an analysis of the sampled works is conducted to figure the recent state and the evolution of the EDM, and the top-10 cited authors in recent EDM works is identified and then the comparing and contrasting of many items between the education and DM attributes of the respective sampled EDM works in two DM-models as predictive and descriptive are revealed. Finally, the conclusions Section presents the snapshot and brief analysis of the EDM that are beneficial reference to future works.

2         Methodology

The framework of this work is devoted to gathering and analyzing EDM works as Fig. 2 shows. It has been organized based on six main stages, such as: “selection EDM works”; “analysis of EDM profiles”; “analysis of citations” to find the most frequently cited EDM works; “statistical processes”; “EDM applications”, which are organized into seven applications; and “generate the inter-relationships between values of the education and DM attributes” that is illustrated in discussion Section.

Fig. 2 The framework of this work that is devoted to gathering and analyzing EDM related works

There are many issues in educational settings that are solved by using DM, but in this survey, a sample of 149 EDM works published from 2014 to the first quarter of 2016 is selected and gathered from 66 journal papers, 72 international conference papers, and 11 chapters of a book[1]. It is made up of seven applications of EDM that were published in 41 journals, 32 international conferences, and one book. The papers reviewed include both qualitative and quantitative studies from researchers in EDM domain.

Fig. 3 shows approximate number of full papers published in journals and international conferences conducted from 2014 to the first quarter of 2016. These details show that there is a tangible growth in EDM research area in recent years.

Fig. 3 Number of published full papers from 2014 to the first quarter of 2016 are grouped according to the year. It should be noted that is counted only 138 EDM works of journals and conferences, and it is not the total number of papers that were really published about EDM in recent years

The methodology of this research is based on two main issues: how to apply data mining attributes in educational systems and what the classification of educational systems based on types of data and objective is. In order to address to these issues, the following subsections are provided:

2.1    Applying data mining in educational systems in a nutshell

According to Han & Kamber (2006), data mining is an extraction of interesting patterns and knowledge from huge amount of data. In this paper, data mining involves an integration of techniques from multiple disciplines in two models as predictive and descriptive to sort DM works; and present their respective tasks, methods, techniques, algorithms, and frames to deploy 149 EDM works.

2.1.1           Data mining tools

The general purpose of data mining applications is the most used tools for data preparation. There is a large variety of software solutions in this field, but in this survey, three software are counted and classified as follows: (1) Weka[2]: 27 papers, (2) RapidMiner[3]: 9 papers, and (3) Matlab: 6 papers; all of them are freely available.

2.1.2           Data mining models and tasks

There are two very broad categories of model building as descriptive and predictive models. In a traditional machine learning context, these equate to unsupervised and supervised learning (Williams, 2011). Fig. 4 shows the two-year tendency of DM-models, from 2014 up to 2015, it can be seen that predictive and descriptive depict 65% and 35% of EDM works, respectively. Table 1 pictures that the most frequent tasks are classification and clustering owing to the total of them obtains 64% of the DM-tasks used by EDM works. The description of two DM-models with their respective tasks are introduced next:

  • Predictive Models: Often our task in DM is to build a model that can be used to predict the occurrence and likelihood of an event (Williams, 2011). These models are composed of several tasks that the most frequent tasks of them with providing support for nearly 60% of the works are classification and regression.
  • Descriptive Models: Descriptive models can be utilized to develop further models that can simulate large number of individualized agents and make predictions. The most used tasks that can be under these models with providing support for nearly 30% of the studies are clustering and association rule mining.

2.1.3           Data mining methods and techniques

When DM-models and DM-tasks were defined, the methods and techniques are chosen to provide the EDM profile. The most popular methods used in the survey are decision trees (DT) (Al-Radaideh, Al-Shawakfa, & Al-Najjar, 2006), Bayes theorem (Ryan S J Baker, Corbett, & Aleven, 2008), and instances-based learning (IBL) that provide support for nearly 63% of the works, as Table 2 shows. The most used techniques in each task are logistic and multiple linear regression (Abdous, He, & Yen, 2012), correlation analysis that provide support for 54% of the paper, as Table 3 illustrates.

2.1.4           Data mining algorithms, equations, and frames

After selecting some methods and techniques to solve a specific DM-task, DM algorithms, equation,  and/or frame are determined how the cases for a DM-model are analyzed in each discipline (Srivastava, Cooley, Deshpande, & Tan, 2000). According to Table 4, K-means (Oyelade, Oladipupo, & Obagbuwa, 2010), J48/C4.5 (Patil & Sherekar, 2013), and Naïve–Bayes (NB) (Sabitha, Mehrotra, Bansal, & Sharma, 2015) are the top-three most deployed algorithms that provide support for nearly 46% of the study, as well as several versions of Bayesian networks and binary classifier are the most popular frame and equation, respectively.

Fig. 4 Number of DM-models of 136 EDM works published from 2014 up to 2015 that the total number is 156

Table 1 Number of DM-tasks of 149 EDM works, which are organized based on nine tasks

Tasks Items Number Percentage (%)
  1. Classification
1 73 42
  1. Clustering
1 39 22
  1. Regression
1 28 16
  1. Association rules
1 13 7
  1. Prediction
1 7 4
  1. Correlation
1 6 3
  1. Others
3 10 6
Total 9 176 100

Table 2 Number of DM-methods of 149 EDM works, which are organized based on 23 methods

Methods Items Number Percentage (%)
  1. DT
1 36 25.4
  1. IBL
1 27 19
  1. Bayesian theorem
1 26 18.3
  1. SVM
1 9 6.4
  1. SNA
1 8 5.6
  1. Rule-based
1 6 4.2
  1. NN
1 6 4.2
  1. Latent analysis
4 5 3.5
  1. HMM
1 5 3.5
Others 11 14 9.9
Total 23 142 100

Table 3 Number of DM-techniques of 149 EDM works, which are organized based on 16 techniques

Techniques Items Number Percentage (%)
  1. Logistic regression
1 12 21
  1. Correlation Analysis
1 11 19
  1. Multiple regression
1 8 14
  1. Linear regression
1 5 8.5
  1. SOM
1 5 8.5
  1. IF-THEN rules
1 4 7
Others 10 13 22
Total 16 58 100

Table 4 Number of DM-algorithms of 149 EDM works, which are organized based on 20 algorithms

Algorithms Items Number Percentage (%)
  1. K-means
1 24 19
  1. J48/C4.5
1 21 16.8
  1. NB
1 12 9.6
  1. BKT
1 7 5.6
  1. JRip
1 7 5.6
  1. Apriori
1 7 5.6
  1. KNN
1 6 4.8
  1. Random Forest (RF)
1 6 4.8
  1. ID3
1 6 4.8
  1. CART
1 5 4
  1. Expectation Maximization (EM)
1 4 3.2
  1. Genetic Algorithm
1 4 3.2
Others 8 16 13
Total 20 125 100

2.2    Classification of educational systems based on types of data and objective

According to Table 5, there is a great variety of educational systems, but in this work, a set of 15 kinds of educational systems is counted and classified, such as: web-based and learning management systems (LMS); traditional classroom; adaptive educational hypermedia systems (AEHS) and intelligent tutoring systems (ITS); massive open online course (MOOC); tests/questionnaires; texts/contents; and others (e.g., social networks, forums, educational game environments, virtual environments, etc.) with 28%; 23%; 12%; 10%; 7%; 5%; and 15%. As is seen, traditional education and web-based education/e-learning (LMS) are the most prominent with nearly 50% of the work. All data provided by each of the abovementioned educational environments are different, therefore, they are able to resolve different problems and tasks using DM techniques (Cristóbal Romero & Ventura, 2010; Chaturvedi & Ezeife, 2013). In addition, Table 6 shows specific instances for the earlier kinds of web-based educational systems such as ASSISTment and Moodle; where the first is an instance of ITS, and the second is a case of LMS.

Table 5 Number of educational systems of 149 EDM works

Educational systems Items Number Percentage (%)
Web-based Education/E-learning (LMS) 2 42 28
Traditional Education 1 34 23
ITS/AES 2 18 12
MOOC 1 14 10
Tests/Questionnaires 1 11 7
Texts/Contents 1 8 5
Others 7 22 15
Total 15 149 100

Table 6 Number of specific instances of educational systems of 149 EDM works

Specific instances of educational systems Items Number Percentage (%)
Moodle 1 19 46
ASSISTment 1 4 10
Others 10 18 44
Total 12 41 100

3         Results

There are many examples of applications in educational environments that can be resolved via DM, but this overview presents a set of seven applications of 149 EDM works are provided to the users, such as: (1) predicting students’ performance, (2) student/student behavior modeling, (3) predicting students’ dropout and retention, (4) grouping/profiling of users, (5) providing feedback and assessment services, (6) social network analysis (SNA), and (7) recommendation of resources. As is seen, Table 7 illustrates that “predicting students’ performance”,student/student behavior modeling”, and “predicting students’ dropout” with nearly 70% of all EDM works are the oldest and the most popular applications of DM in the education. However, EDM has been resolved plenty numbers of new and different issues in recent years.

Table 7 Number of applications of 149 EDM works that are organized according to seven applications

Applications of EDM Label Number Percentage (%)
  1. Predicting students’ performance
PSP 46 31
  1. Student/student behavior modeling
SM-SBM 35 23
  1. Predicting students’ dropout and retention
PSD-R 19 13
  1. Grouping/profiling of users
GU-PU 17 11
  1. Providing feedback and assessment services
FAS 13 9
  1. Social network analysis
SNA 10 7
  1. Recommendation of resources
RS 9 6
Total 149 100

In addition, Fig. 5 illustrates the estimated number of the applications of EDM works during the recent two years. It reveals that there are a similar tendencies for “predicting students’ dropout and retention”, “recommendation of resources”, “student/student behavior modeling”, and SNA; and an increasing tendencies for “predicting students’ performance”, “grouping/profiling of users”, and “providing feedback and assessment services”.

Fig. 5 Number of applications of 149 EDM works classified based on seven applications in recent two years that the number and labels are given in Table 7

In this section, due to space and size restrictions, a brief profile of 39 EDM works in seven subsections according to seven educational applications is introduced, which have more than four citations[4] with novel findings and new innovation.

3.1    Predicting students’ performance

Predicting students’ performance is one of the favorite goals of EDM works that the following authors are the most characterized and cited in this survey: Natek & Zwilling (2014) focused on small students’ courses dataset to answer various HEIs questions. Hu et al. (2014) developed an early warning system to identify at-risk students, or predict student performance by analyzing learning portfolios recorded in a LMS. Brusilovsky (2014) modeled multiple subskill tracing, a temporal item response model, and expert knowledge using data collected from an ITS. Khajah et al. (2014) predicted student performance with a theory of individual differences among students’ problems using three corpora from the PSLC Data Shop. Wolff et al. (2014) explored the use of predictive methods for identifying students who will benefit most from tutor interventions in Open University modules. Xing et al. (2015) developed a prediction model for the most optimal tradeoff between model understandability and the prediction accuracy between groups in the CSCL environment. Gómez-Aguilar et al. (2015) used the SST representation of a visual analytics system to predict performance of the interactions in online environments.

3.2    Student/student behavior modeling

The most cited papers about student modeling in the study are reported next: Guruler & Istanbullu (2014) presented the outcome of a test on university data to explore the factors having an impact on the success of the learning students. Veeramuthu & Periasamy (2014) used DM for optimization of resources, finding the gap between the numbers of candidates applied for the post, and number of applicants responded. Pelánek (2014) studied the skill estimation in two context: modeling of correctness of student answers and modeling of problem solving times using time decay functions and Elo rating system. Grafsgaard et al. (2014) presented a multimodal analysis of automatically recognized nonverbal behaviors and task actions to predict learning beyond pretest scores of students. Wixon et al. (2014) proposed several models of affect based on students’ interaction with a tutoring system. They generated a rich set of features which combined patterns of student behaviors in the last problems. Yu & Jo (2014) revealed more meaningful components for learning analytics in order to improve learners’ learning achievement in LMS.

As well as, Muñoz-Merino et al. (2015) applied precise effectiveness strategy that were presented combining the metrics with different visualizations between courses to calculate the effectiveness of students when watching video lectures and solving parametric exercises in MOOC. Nižnan et al. (2015) described several student models for prior knowledge estimation with Elo rating system.

3.3    Predicting students’ dropout and retention

The specific issues of the study oriented to predict students’ dropout, are reported as follows: Yukselturk et al. (2014) revealed that online technologies self-efficacy, online learning readiness, and previous online experience were found as the most important factors in predicting dropouts. Wen et al. (2014) pictured a correlation between sentiment ratio measured based on daily forum posts in a MOOC and number of students who dropout each day. Amir et al. (2014) presented recognition algorithm for intelligently recognizing students’ activities and predicting students’ failed attempts to solve the problem, and novel visualization methods for presenting these activities to teachers. Zacharis (2015) developed a practical model for predicting students at-risk of performing poorly in blended learning courses using Moodle dataset. Gašević et al. (2016) illustrated the differences in predictive power and significant predictors between course-specific models and generalized predictive models to identify students at-risk of academic failure.

3.4    Grouping/profiling of users

Profiling means grouping related users into categories using prominent characteristics (Prabha, 2014). The most cited authors of the survey in this field are illustrated next: Peiro´-Velert et al. (2014) identified some interrelations among groups of young people with different behavioral profiles clustered in low and high academic performance. Stes & Van Petegem (2014) aimed to map out the approaches of teaching profiles and their scores in higher education. Eagle & Barnes (2014) presented approach maps, a novel representation of student-tutor interaction data that allows for the comparison of problem-solving approaches on open-ended logic problems and compares behaviors between groups.

Regarding 2015, Bapu et al. (2015) provided reliable statistics, behavior groups and predicted results to monitor Moodle log data and to determine students who show different learning behaviors in e-learning courses. Zapata et al. (2015) proposed a collaborative methodology for searching, selecting, rating and recommending learning objects. Additionally, voting aggregation strategies and meta-learning techniques were used in order to automatically obtain the final ratings without having to reach a consensus between all the instructors.

3.5    Providing feedback and assessment services

The main objectives and cited works of the survey to provide feedback and assessment services are illustrated next: Ocumpaugh et al. (2014) discussed the development of automated detectors that infer student affect from log files of students interaction based on data from urban, suburban, and rural students and examined whether detectors of affect remain valid when applied to new populations. Dascalu et al. (2014) introduced ReaderBench, a multi-lingual and flexible environment that integrates text mining technologies for assessing a wide range of learners’ productions and for supporting teachers in several ways. Raca et al. (2015) investigated unobtrusive measures of body-language in order to predict student’s attention during the class and provide teachers with a support system to help them to “scale-up” to a large class. Gandhi et al. (2015) proposed a method for assigning a saliency score to each word extracted from an educational video. The optimal feature combination strategy is learnt from a Rank-SVM to obtain an overall visual saliency score for all the words.

3.6    Social network analysis

The following works are the most cited for using SNA in education: Cairns et al. (2014) focused on SNA between training courses and training providers. Then, in order to identify the best training paths, they proposed a two-step clustering. X. Chen et al. (2014) investigated engineering students’ Twitter posts to classify tweets reflecting students’ problems in their educational experiences. Rabbany et al. (2014) presented the use of SNA to analyze the structure of interactions between students in the forums using clustering and temporal analysis and determining roles in students’ communications and forming of groups in EDM. García-Saiz et al. (2014) described enhanced version of E-learning Web Miner and the new services that it provides, supported by the use of SNA and classification techniques.

Z. Jiang et al. (2015) explored the social relationship there by modeling the forum as a heterogeneous network with theories of SNA in MOOC forums. De-Marcos et al. (2016) analyzed the structure of the social network resulting from a gamified social undergraduate courses as well as the influence that student’s position has on learning achievement.

3.7    Recommendation of resources

The most cited authors in the survey for recommendation of resources are reported as follows: Luna et al. (2014) proposed genetic programming for mining rare association rules to discover a lower number of the best rare rules with high accuracy, using a Moodle dataset that is beneficial for recommendation of materials. Segal et al. (2014) recommended a new algorithm for personalizing educational contents of students that have been combined collaborative filtering with social choice theory. Santos & Boticario (2015) proposed the user-centred design methods to produce social oriented recommendations in e-learning based on tutor-oriented recommendations and identified a pool of 32 generic recommendations. Goga et al. (2015) designed a framework of intelligent recommender system, which can predict students’ first year academic performance and recommend necessary actions for improvement of learners.

4  Discussion

In this section, an analysis of the top-10 most cited EDM works in recent years is firstly introduced and then according to EDM works were introduced in two previous section, a discussion of the overview to discover the inter-relationships between the education and DM attributesis illustrated.

4.1    Analysis of citations in recent EDM works

The top-10 most cited papers that were published from 2014 to the first quarter of 2016 (as of 9 May, 2016) are listed in Table 8 that provides “id”; “author (year)”; “key objectives”, including many aspects (e.g., a: review paper; b: predictive; c:descriptive; d:qualitative analysis; e:quantitative analysis; f: big data set; g: small dataset; h: analysis of informal subjects of the posts and discussion in social media; and i: analysis of technical and pedagogical subjects of the posts and activities in social media). This column also shows the type of EDM applications that were previously introduced in Table 7; “source of publication” that is respectively labeled with J, C, and B for journal, conference, and chapter of book; and “number of citations”, which are from Google Scholar, retrieved on 9 May, 2016. As is seen, the majority objectives of the most cited EDM works are predictive models for big dataset with quantitative analysis. These papers have been highly influential, both on EDM applications and previous review papers in EDM domain. Therefore, they exemplify many of the key trends in this research community.

Table 8 The top-10 most cited papers in recent EDM works

ID Author (year) Key objectives Source Citations
1 Ryan Shaun Baker & Inventado (2014) a, d B 87
2 Peña-ayala (2014) a, d, e J 78
3 Wen et al. (2014) c, d, e, f, PSD-R C 56
4 Ocumpaugh et al. (2014) b, e, f, FAS J 28
5 Brusilovsky (2014) b, e, f, PSP C 24
6 Dascalu et al. (2014) b, e, g, FAS B 23
7 X. Chen et al. (2014) b, d, e, f, h, SNA J 22
8 Gómez-Aguilar et al. (2015) b, f, i, PSP J 21
9 Papamitsiou & Economides (2014) a, f, i J 21
10 Natek & Zwilling (2014) b, f, g, PSP J 17

4.2    Inter-relationships between the education and data mining attributes

A sample of the outcome is shown in Table 9 in which seven kinds of EDM applications; educational system and system-name; DM-tasks; DM-methods; DM-techniques; DM-algorithms; and DM-frames are compared in two DM-models as predictive and descriptive that respectively hold 103, and 53 records of the EDM profiles. The findings are pictured with the statistical information previously provided in methodology and results Sections.

Table 9 Outcome of the analysis of raw data extracted from literature of review to discover the inter-relationships between the education and DM attributes

Educational and DM attributes Predictive (103) Descriptive (53)
Number Number
EDM applications
1.   Predicting students’ performance 32 17
2.   Student/student behavior modeling 23 14
3.   Predicting students’ dropout and retention 16 4
4.   Grouping/profiling of users 8 9
5.   Providing feedback and assessment services 11 2
6.   Social network analysis 7 3
7.   Recommendation of resources 6 4
Educational System
LMS/ Web-based 25 18
Traditional 24 13
ITS 16 3
MOOC 9 5
Educational systems-name
Moodle 9 10
ASSISTment 3 1
Classification 73 0
Regression 28 0
Clustering 0 39
Association rules 0 13
DT 36 0
IBL 6 21
Bayesian theorem 22 4
Logistic regression 12 0
Multiple regression 8 0
Correlation Analysis 6 5
K-means 0 24
J48/C4.5 21 0
NB 12 0
Bayesian networks 5 0

However, the most important goal of the research is to solve next question:  what are the inter-relationships between values of the education and DM attributes? The response of this question is described in the following.

According to Table 9, a sample of 149 EDM works is split into two models to discover the inter-relationships between the values of the education and DM attributes, the right columns reveal the values of two models. Hence, in order to respond to the research question, the comparing and contrasting of the most used items in the respective sampled EDM works in two models is illustrated in Table 10.

Table 10 The most used items in the survey based on two models as predictive and descriptive

  Predictive Descriptive
EDM applications
  • Predicting students’ performance, student/student behavior modeling, predicting students’ dropout, and providing feedback and assessment.
  • Predicting students’ performance, student/student behavior modeling, and grouping/profiling of users.
Educational system
  • LMS with Moodle data; traditional systems; ITS with ASSISTments data.
  • Web-based and LMS with Moodle data.
  • Classification and regression.
  • Clustering and association rules.
  • DT and Bayesian theorem.
  • IBL.
  • Logistic and multiple regression.
  • J48/C4.5 and NB.
  • K-means.
  • Bayesian networks.

5  Conclusions

The current paper presents a review of 149 EDM works. This overview has searched the literature and gathered from 66 journal papers, 72 conference papers, and 11 chapters of one book; and provides the most cited EDM works from 2014 to the first quarter of 2016 in EDM domain. In this section, the main attributes found out from the sampled EDM works are firstly introduced. Secondly, the findings regarding an analysis of the strengths, weaknesses, opportunities and threats (SWOT) of EDM research are illustrated to provide a useful reference for future work.

According to the analysis of 149 EDM works in results and discussion Sections, the most common EDM applications are oriented to “predicting students’ performance”, “predicting dropout and retention”, “student modeling and student behavior modeling”, “grouping and profiling of users”. Such works mainly operate on LMS, traditional, and ITS systems. In particular, they use Moodle to mine their data. However, MOOCs are a recent and widely researched development in EDM domain in recent years. The majority of EDM works is based on predictive models. Classification and clustering are the main tasks. In addition, DT, Bayesian theorem, and IBL are the most used methods, which are complemented by logistic and multiple regression and correlation analysis techniques. Finally, EDM works often implement J48/C4.5, k-means, and NB algorithms. As is shown, these results are almost the same with the achieved results by Peña-ayala (2014), where the most frequently used items have similar tendencies with this overview in many aspects such as “student performance modeling” and “student behavior modeling”; LMS using Moodle data; classification and clustering; Bayes theorem, IBL, and DT; logistic regression; k-means, J48, and NB; and Bayesian networks are respectively for EDM applications; educational systems; DM-tasks; DM-methods; DM-techniques; DM-algorithms; and DM-frames of 222 EDM works that were published from 2010 to the first quarter of 2013.

On the other hand, the EDM community consists of researchers and institutions. According to the gathered references for this paper, a descending ranking of authors’ names and their number of EDM works are presented as follows: (1) Cristobal Romero and Sebastián Ventura: 8; (2) Ryan Baker: 7; (3) Carolyn P. Rosé: 4; (4) Kenneth Koedinger: 4.

At the end of the work, the description of the strengths, weaknesses, opportunities, and threats (SWOT) of EDM works is given in Table 11.

Table 11 Strengths, weaknesses, opportunities, and threats (SWOT) of EDM works

Strengths Weaknesses
  • Transform the raw data to the meaningful information of large repositories of data.
  • Extract critical moments and patterns of users using DM (see methodology Section).
  • The data is massive, fine-grained, and precise for guiding adaptation and personalization of systems.
  • Share data logs, software, and findings among users.
  • Heterogeneous data sources: lack of awareness about the population of learners during learning-teaching process.
  • Data representation issues: most of the results are quantitative.
  • Information overloads with complex systems.
  • Generalization: the most of the research represent the specific implementation of DM to the particular educational settings instead of contributing to extend the DM field (see discussion Section).
Opportunities Threats
  • Increase self-reflection, self-awareness, and self-learning among students in intelligent, autonomous and massive systems.
  • Broadly used to expand and reinforce the traditional achievements of educational systems and try to justify the use of EDM as part of the web-based domain (see classification of educational systems Subsection).
  • Simplify the interaction among users in educational systems in anywhere and anytime.
  • Lack of a standard terminology.
  • Lack of data privacy: it should consider protecting individual privacy while still advancing the research in this field.
  • Possibility of pattern misclassification.
  • Lack of trust: contradictory findings during implementations.
  • Lack of plagiarism detection: it is an ongoing challenge for educators and faculty either in the traditional or online classroom.
  • Adoption issues of EDM in the education.


This work was supported by National Natural Science Foundation of China [grant number 61403351]; and the key project of the Natural Science Foundation of Hubei province, China [grant number 2013CFA004]. We are also thankful to Prof. Minxia Liu, a professor of School of Foreign Languages, CUG, who helped us improve the language of the manuscript.


Abdous, M., He, W., & Yen, C. J. (2012). Using data mining for predicting relationships between online question theme and final grade. Educational Technology and Society, 15(3), 77–88.

Al-Radaideh, Q. A., Al-Shawakfa, E. M., & Al-Najjar, M. I. (2006). Mining student data using decision trees. In The 2006 International Arab Conference on Information Technology (ACIT’2006).

Amir, O., Gal, K., Yaron, D., Karabinos, M., & Belford, R. (2014). Plan Recognition and Visualization in Exploratory Learning Environments. In A. Peña-Ayala (Ed.), Studies in Computational Intelligence (Vol. 524, pp. 289–327). Springer, Switzerland.

Baker, R. S., & Inventado, P. S. (2014). Educational Data Mining and Learning Analytics. In J. A. Larusson & B. White (Eds.), Learning Analytics (pp. 61–75). Manila, Philippines: Springer New York.

Baker, R. S. J., Corbett, A. T., & Aleven, V. (2008). Improving Contextual Models of Guessing and Slipping with a Truncated Training Set. In R. S. J. de Baker, T. Barnes, & J. E. Beck (Eds.), Proc. of the 1 st International Conference on Educational Data Mining (pp. 67–76). Montreal, Canada.

Baker, R. S. J. D., & Yacef, K. (2009). The State of Educational Data Mining in 2009: A Review and Future Visions. Journal of Education Data Mining (JEDM), 1(1), 3–16.

Bapu, G. K., Ashok, M. B., Shamrao, S. P., & Tanaji, S. G. (2015). Clustering Moodle Data As a Tool For Profiling Students. International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), 4(3), 121–126.

Brusilovsky, P. (2014). General Features in Knowledge Tracing: Applications to Multiple Subskills, Temporal Item Response Theory, and Expert Knowledge. In Proceedings of the 7th International Conference on Educational Data Mining (pp. 84–91). London, UK.

Cairns, A. H., Gueni, B., Fhima, M., Cairns, A., David, S., & Khelifa, N. (2014). Towards Custom-Designed Professional Training Contents and Curriculums through Educational Process Mining. In The Fourth International Conference on Advances in Information Mining and Management Towards (IMMM) (pp. 53–58).

Calders, T., & Pechenizkiy, M. (2012). Introduction to the special section on educational data mining. ACM SIGKDD Explorations Newsletter, 13(2), 3.

Chaturvedi, R., & Ezeife, C. I. (2013). Mining the Impact of Course Assignments on Student Performance. Proceedings of the 6th International Conference on Educational Data Mining.

Chen, X., Member, S., Vorvoreanu, M., & Madhavan, K. (2014). Mining Social Media Data for Understanding Students’ Learning Experiences. IEEE TRANSACTIONS ON LEARNING TECHNOLOGIES, 7(3), 246–259.

Dascalu, M., Dessus, P., Bianco, M., Trausan-Matu, S., & Nardy, A. (2014). Mining Texts, Learner Productions and Strategies with ReaderBench. In A. Peña-Ayala (Ed.), Studies in Computational Intelligence (Vol. 524, pp. 345–377). Springer, Switzerland.

De-Marcos, L., García-López, E., García-Cabot, A., Medina-Merodio, J.-A., Domínguez, A., Martínez-Herráiz, J.-J., & Diez-Folledo, T. (2016). Social network analysis of a gamified e-learning course: Small-world phenomenon and network metrics as predictors of academic performance. Computers in Human Behavior, 60, 312–321.

Eagle, M., & Barnes, T. (2014). Exploring Differences in Problem Solving with Data-Driven Approach Maps. In Proceedings of the 7th International Conference on Educational Data Mining (pp. 76–83). London, UK.

Gandhi, A., Biswas, A. ijit, & Deshmukh, O. (2015). Topic Transition in Educational Videos Using Visually Salient Words. In Proceedings of the 8th International Conference on Educational Data Mining (pp. 289–296). Madrid, Spain.

García-Saiz, D., Palazuelos, C., & Zorrilla, M. (2014). Data Mining and Social Network Analysis in the Educational Field: An Application for Non-Expert Users. In A. Peña-Ayala (Ed.), Studies in Computational Intelligence (Vol. 524, pp. 411–439). Springer, Switzerland.

Gašević, D., Dawson, S., Rogers, T., & Gasevic, D. (2016). Learning analytics should not promote one size fits all: The effects of instructional conditions in predicting academic success. Internet and Higher Education, 28, 68–84.

Goga, M., Kuyoro, S., & Goga, N. (2015). A Recommender for Improving the Student Academic Performance. Procedia – Social and Behavioral Sciences: The 6th International Conference Edu World 2014 “Education Facing Contemporary World Issues,” 180, 1481–1488.

Gómez-Aguilar, D. A., Hernández-García, Á., García-Peñalvo, F. J., & Therón, R. (2015). Tap into visual analysis of customization of grouping of activities in eLearning. Computers in Human Behavior, 47, 60–67.

Grafsgaard, J. F., Wiggins, J. B., Boyer, K. E., Wiebe, E. N., & Lester, J. C. (2014). Predicting Learning and Affect from Multimodal Data Streams in Task-Oriented Tutorial Dialogue. In Proceedings of the 7th International Conference on Educational Data Mining (pp. 122–129). London, UK.

Guruler, H., & Istanbullu, A. (2014). Modeling Student Performance in Higher Education Using Data Mining. In A. Peña-Ayala (Ed.), Studies in Computational Intelligence (Vol. 524, pp. 105–124). Cham: Springer, Switzerland.

Han, J., & Kamber, M. (2006). Data Mining: Concepts and Techniques. http://doi.org/ISBN 13: 978-1-55860-901-3 ISBN 10: 1-55860-901-6

Hu, Y.-H., Lo, C.-L., & Shih, S.-P. (2014). Developing early warning systems to predict students’ online learning performance. Computers in Human Behavior, 36, 469–478.

Jiang, Z., Zhang, Y., Liu, C., & Li, X. (2015). Influence Analysis by Heterogeneous Network in MOOC Forums: What can We Discover? In Proceedings of the 8th International Conference on Educational Data Mining (pp. 242–249). Madrid, Spain.

Khajah, M. M., Wing, R. M., Lindsey, R. V, & Mozer, M. C. (2014). Integrating Latent-Factor and Knowledge-Tracing Models to Predict Individual Differences in Learning. In Proceedings of the 7th International Conference on Educational Data Mining (pp. 99–106). London, UK.

Kulkarni, M., Rampure, M., & Yadav, M. (2013). Understanding Educational Data Mining (EDM). International Journal of Electronics and Computer Science Engineering, 2(2), 773–777.

Luna, J. M., Romero, C., Romero, J. R., & Ventura, S. (2014). An evolutionary algorithm for the discovery of rare class association rules in learning management systems. Applied Intelligence, 42(3), 501–513.

Muñoz-Merino, P. J., Ruipérez-Valiente, J. A., Alario-Hoyos, C., Pérez-Sanagustín, M., & Delgado Kloos, C. (2015). Precise Effectiveness Strategy for analyzing the effectiveness of students with educational resources and activities in MOOCs. Computers in Human Behavior, 47, 108–118.

Natek, S., & Zwilling, M. (2014). Student data mining solution-knowledge management system related to higher education institutions. Expert Systems with Applications, 41(14), 6400–6407.

Nižnan, J., Pelánek, R., & Rihák, J. (2015). Student Models for Prior Knowledge Estimation. In Proceedings of the 8th International Conference on Educational Data Mining (pp. 109–116). Madrid, Spain.

Ocumpaugh, J., Baker, R., Gowda, S., Heffernan, N., & Heffernan, C. (2014). Population validity for Educational Data Mining models: A case study in afect detection. British Journal of Educational Technology, 45(3), 487–501.

Oyelade, O. J., Oladipupo, O. O., & Obagbuwa, I. C. (2010). Application of k-Means Clustering algorithm for prediction of Students Academic Performance. International Journal of Computer Science and Information Security, 7(1), 292–295.

Papamitsiou, Z., & Economides, A. A. (2014). Learning Analytics and Educational Data Mining in Practice: A Systematic Literature Review of Empirical Evidence. Educational Technology & Society, 17(4), 49–64.

Patil, T. R., & Sherekar, S. S. (2013). Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification. International Journal Of Computer Science And Applications, 6(2).

Peiro´-Velert, C., Valencia-Peris, A., Gonza´lez, L. M., Garcı´a-Masso´, X., Ano´, P. S.-, & Joseo´ Devı´s-Devı´s. (2014). Screen media usage, sleep time and academic performance in adolescents: Clustering a self-organizing maps analysis. PLoS ONE, 9(6), 1–9.

Pelánek, R. (2014). Application of Time Decay Functions and the Elo System in Student Modeling. In Proceedings of the 7th International Conference on Educational Data Mining (pp. 21–27). London, UK.

Peña-ayala, A. (2014). Educational data mining: A survey and a data mining-based analysis of recent works. Expert Systems with Applications, 41, 1432–1462.

Prabha, S. L. (2014). Educational Data Mining Applications. Operations Research and Applications: An International Journal, 1(1), 23–29.

Rabbany, R., Elatia, S., Takaffoli, M., & Zaïane, O. R. (2014). Collaborative Learning of Students in Online Discussion Forums: A Social Network Analysis Perspective. In A. Peña-Ayala (Ed.), Studies in Computational Intelligence (Vol. 524, pp. 441–466). Cham: Springer, Switzerland.

Raca, M. ko, Kidzinski, Ł., & Dillenbourg, P. (2015). Translating Head Motion into Attention-Towards Processing of Student’s Body-Language. In Proceedings of the 8th International Conference on Educational Data Mining (pp. 320–326). Madrid, Spain.

Romero, C., & Ventura, S. (2007). Educational data mining: A survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135–146.

Romero, C., & Ventura, S. (2010). Educational Data Mining: A Review of the State-of-the-Art. IEEE Transactions on Systems, Man, and Cybernetics–Part C: Applications and Reviews, 40, 601–618.

Sabitha,  a. S., Mehrotra, D., Bansal, A., & Sharma, B. K. (2015). A naive bayes approach for converging learning objects with open educational resources. Education and Information Technologies.

Santos, O. C., & Boticario, J. G. (2015). User-centred design and educational data mining support during the recommendations elicitation process in social online learning environments. Expert Systems, 32(2), 293–311.

Segal, A., Katzir, Z., & Gal, K. (2014). EduRank: A Collaborative Filtering Approach to Personalization in E-learning. In Proceedings of the 7th International Conference on Educational Data Mining (pp. 68–75). London, U.K.

Srivastava, J., Cooley, R., Deshpande, M., & Tan, P.-N. (2000). Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data. ACM SIGKDD Explorations Newsletter 1.2, 1(2), 12–23.

Stes, A., & Van Petegem, P. (2014). Profiling approaches to teaching in higher education: a cluster-analytic study. Studies in Higher Education, 39(4), 644–658.

Veeramuthu, P., & Periasamy, R. (2014). Application of Higher Education System for Predicting Student Using Data mining Techniques. International Journal of Innovative Research in Advanced Engineering (IJIRAE), 1(5), 2349–2163.

Wen, M., Yang, D., & Rosé, C. P. (2014). Sentiment Analysis in MOOC Discussion Forums: What does it tell us? In Proceedings of the 7th International Conference on Educational Data Mining (pp. 130–137). London, UK.

Williams, G. (2011). Descriptive and Predictive Analytics. In Data Mining with Rattle and R: The Art of Excavating Data for Knowledge Discovery, Use R (pp. 193–203).

Wixon, M., Street, M., Burleson, W., Street, M., Lozano, C., Street, M., & Woolf, B. (2014). The Opportunities and Limitations of Scaling Up Sensor-Free Affect Detection. In Proceedings of the 7th International Conference on Educational Data Mining (pp. 145–152). London, UK.

Wolff, A., Zdrahal, Z., Herrmannova, D., & Knoth, P. (2014). Predicting Student Performance from Combined Data Sources. In A. Peña-Ayala (Ed.), Studies in Computational Intelligence (Vol. 524, pp. 175–202). Springer, Switzerland.

Xing, W., Guo, R., Petakovic, E., & Goggins, S. (2015). Participation-based student final performance prediction model through interpretable Genetic Programming: Integrating learning analytics, educational data mining and theory. Computers in Human Behavior, 47, 168–181.

Yu, T., & Jo, I.-H. (2014). Educational Technology Approach toward Learning Analytics: Relationship between Student Online Behavio and Learning Performance in Higher Education. In Proceedins of the Fourth International Conference on Learning Analytics And Knowledge – LAK ’14 (pp. 269–270).

Yukselturk, E., Ozekes, S., & Türel, Y. K. (2014). Predicting Dropout Student: an Application of Data Mining Methods in an Online Education Program. European Journal of Open, Distance and eLearning, 17(1), 118–133.

Zacharis, N. Z. (2015). A multivariate approach to predicting student outcomes in web-enabled blended learning courses. Internet and Higher Education, 27, 44–53.

Zapata, A., Menéndez, V. H., Prieto, M. E., & Romero, C. (2015). Evaluation and selection of group recommendation strategies for collaborative searching of learning objects. International Journal of Human Computer Studies, 76, 22–39.

[1] Chapters of a book are collected only in 2014.

[4] Retrieved from Google Scholar, on 9 May, 2016

Cite This Work

To export a reference to this article please select a referencing stye below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

Related Content

All Tags

Content relating to: "Information Technology"

Information Technology refers to the use or study of computers to receive, store, and send data. Information Technology is a term that is usually used in a business context, with members of the IT team providing effective solutions that contribute to the success of the business.

Related Articles

DMCA / Removal Request

If you are the original writer of this dissertation and no longer wish to have your work published on the UKDiss.com website then please: