Disclaimer: This dissertation has been written by a student and is not an example of our professional work, which you can see examples of here.

Any opinions, findings, conclusions, or recommendations expressed in this dissertation are those of the authors and do not necessarily reflect the views of UKDiss.com.

Applications of Deep Learning to MRI Images

Info: 9565 words (38 pages) Dissertation
Published: 11th Dec 2019

Reference this

Tags: MedicalMedical Technology

Applications of Deep Learning to MRI Images: A Survey

Abstract—Deep learning is providing exciting solutions for the problems in image recognition, speech recognition and natural language processing, and is seen as a key method for future various applications. Duing to non-invasive imaging and good soft tissue contrast of magnetic resonance imaging (MRI) images, MRI images are attracting more and more attention in recent years. With the development of deep learning, many of innovative methods with deep learning had been proposed to improve the performance of MRI image processing and analysis. The purpose of this article is to provide a comprehensive overview for deep learning methods on the applications of MRI images. Firstly, a brief introduction for deep learning and imaging modalities of MRI images is given. Then, the common deep learning architectures are introduced. Next, deep learning applications on MRI images are discussed, such as image detection, image registration, image segmentation and image classification. Afterwards, the advantages and weaknesses of several common tools are discussed, and several deep learning tools in the applications of MRI images are showed. Finally, an objective assessment about deep learning in MRI applications is presented and future developments and trends are addressed for deep learning on MRI images.

Index Terms—Magnetic Resonance Imaging, Deep Learning, Image Detection, Image Registration, Image Segmentation, Image Classification.


Artificial intelligence [1], [2], [3] is not only a field of computer science created in the 1950s, but also a thriving field with many practical applications and research hotspot- s. It attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a way that is similar to human intelligence. It is a simulation of information processing of human consciousness and think- ing. Its ultimate goal is to develop brain-like robots. The research directions in this field include robotics, language recognition, image recognition, natural language processing and expert systems, etc.

Machine learning [4], [5], [6] is not only the core of artifi- cial intelligence, but also the fundamental way to make the computer intelligent. Machine learning involves a number of disciplines such as probability theory, statistics, approx- imation theory, convex analysis, and algorithm complexity theory. Machine learning specializes in how computers sim- ulate or realize human learning behavior to acquire new knowledge or skills, and reorganize the existing knowledge structure to continually improve their performance, which mainly uses induction, synthesis rather than deduction. Its application throughout the field of artificial intelligence. For example, machine learning has been widely applied in many fields, such as computer vision [7], [8], [9], computer- aided disease diagnosis [10], [11], [12], bioinformatics [13], [14], [15] and so on.

Deep learning [16], [17], [18], [19], [20], [21], [22], [23] is a new field in machine learning research. Its purpose is to

  •   J.Liu, M.Li, C.Lu and J.Wang are with the School of Information Science and Engineering, Central South University, Changsha 410083, China. EMAIL: {liujin06, limin, chengqlu, jxwang }@mail.csu.edu.cn
  •   Y.Pan is with the Department of Computer Science, Georgia State Uni- versity, Atlanta, GA30302, USA.EMAIL: yipan@gsu.edu
  •   ∗ Corresponding author

make machine learning closer to its original goal: artificial intelligence. The concept of deep learning is derived from the study of artificial neural networks [24]. The idea of the artificial neural network arises from our understanding of the human brain, which is the interconnection between neurons. The difference between the two networks is that any neuron in the human brain is connected to other neu- rons in a certain physical path, while the neural network contains discrete layers, connections and data propagation directions. Deep learning is an improvement of artificial neural networks [22]. For example, deep learning consists of more hidden layers than artificial neural networks. In deep learning, a more abstract high-level feature representation for different classes is formed by using multiple hidden layers to combine low-level features. Deep learning attempts to build and simulate the human brain to analyze the learn- ing of the neural network, which mimics the human brain mechanism to explain the data, such as images, sounds and text. Deep learning has a good momentum of development in data processing and analysis, and has been reviewed as one of the top 10 breakthrough technologies in 2013 MIT technology review1. So far, deep learning has been widely used in the scientific community [25], [26] and busi- ness community [27]. It is worth mentioning that Google launched the first generation of deep learning system (i.e., DistBelief [27]) in 2011. By using the deep learning system, Google is able to scan thousands of cores of Google’s data center and build a larger neural network. The deep learning system has been widely deployed in Google’s commercial products, such as Google Photos, Google Search, Google Street View, etc.

As a technology, the deep learning methods have two obvious advantages in feature representation as follows:

1. https://www.technologyreview.com/lists/technologies/2013/

  • The deep learning can be used to automatically find features from a given dataset for each specific appli- cation. In general, the traditional feature extraction methods are based on some prior knowledge to extract features in a particular application, thus these methods are semi-automatic learning method.
  • The deep learning can find new features that have never been previously found by researchers, but are suitable for specific applications. The traditional feature extraction methods are often limited by some priori knowledge, which can only extract some fea- tures associated with a particular application. 
The feature representation plays an important role in the field of medical image processing and analysis. In the med- ical image processing and analysis, the two elements that affect the results are image acquisition and image interpre- tation as follows:
  • Image acquisition: As we all know, the better the image quality, the better the results obtained in image processing and analysis. However, the quality of the image depends on the image acquisition, the better the image acquisition, the higher the image quality. Magnetic resonance imaging (MRI) is a non-invasive and good soft tissue contrast imaging modality, which provides invaluable information about shape, size, and localization of tissue structures without exposing the subjects to a high ionization radiation [28]. MRI is attracting more and more attentions for clinical routine and computer-aided diagnosis [29], [30], [31]. MRI can be divided into structural and functional imaging. Structural imaging includes T1-weighted MRI (T1w), T2-weighted MRI (T2w), diffusion tensor imaging (DTI), etc, and functional imaging includes resting state functional MRI (rs- fMRI), tasking state functional MRI (ts-fMRI), etc. Therefore, in this article, we focus on MRI images.
  • Image interpretation: In clinical practice, most med- ical image interpretations are basically performed by clinicians to determine whether the subjects are abnormal. However, due to the limitations of the clinician’s personal skills, subjectivity, energy and other factors, medical image interpretations by clin- icians are often a big difference. In order to obtain accurate image interpretation results, it is imperative to develop an automatic image interpretation system, which includes many functions, such as image detec- tion, image registration, image segmentation, image classification, etc. To realize this system, machine learning methods have been widely used. However, due to the fact that deep learning can obtain high- level latent features, many researchers have applied the deep learning architectures to develop this auto- matic image interpretation system. Therefore, in this article, we focus on deep learning. 
Based on the above analysis, in this article, we provide a comprehensive review regarding the architectures of deep learning and its applications to MRI images. Firstly, we introduce some common architectures of deep learning. Then, we present several applications of deep learning in

MRI images, such as image detection, image registration, image segmentation and image classification. Afterwards, the advantages and weaknesses of several common tools are discussed, and several deep learning tools in the ap- plications of MRI images are showed. Finally, an objective assessment about deep learning in MRI applications is pre- sented and future developments and trends are addressed for deep learning on MRI images.




Artificial neural network [24] is a research hotspot in the

field of artificial intelligence since 1980s. It abstracts the neural network of the human brain from the perspective of information processing to establish a simple model, which forms a different network according to different connections. Artificial neural network is an computational model composed of a large number of nodes (or neurons). Each node represents a specific output function, called an activation function. The connection between each of the two nodes represents a weighting value for the connection signal, which is called a weight, which is equivalent to the memory of the artificial neural network. Therefore, The output of the network depends on the activation function and the weight. The network is often used to approximate an algorithm or function, or it may be an expression for a logical strategy. In general, according to the topology of the connection, the artificial neural network can be divided into feedforward neural network and feedback neural network as follows:

  • Feedforward neural network: The feedforward neu- ral network can be represented by a directed acyclic graph, with each node receiving only the input of the previous level and outputting it to the next level without feedback in the network. The network realizes the transformation of the signal from the input space to the output space, and its informa- tion processing ability comes from the combination of many simple nonlinear functions. The topology structure of the network is relatively simple and easy to implement.
  • Feedback neural network: The feedback neural net- work can be represented by an undirected complete graph, with feedback in the network. The informa- tion processing of the network is the state trans- formation, and can use the dynamic system theory to deal with. The stability of the network is closely related to the associative memory function. 
In the past ten years, with the deepening of the research of artificial neural network, researchers have made great progress in multiple fields, such as pattern recognition, intelligent robot, automatic control, forecasting, biology, medicine and economy. It has successfully solved many practical problems, showing good intelligence character- istics. The characteristics and advantages of the artificial neural network are mainly manifested in three abilities: 1) the ability of self-learning; 2) the ability of associative memory; 3) the ability of finding optimal solution at high speed.


2.2 Deep Feedforward Networks

Deep feedforward networks, also known as feedforward neural networks or multiple layer perceptrons, are the clas- sical deep learning models [32], [33]. The purpose of training a deep feedforward network is to approximate objective function. A deep feedforward network can be defined as a mapping y = f(x;θ), which learns the parameters θ to obtain the best function approximation. In general, a deep feedforward network consists of an input layer, multiple hidden layers and an output layer. Furthermore, the flow of information in a deep feedforward network flows in only one direction and never goes backwards. Fig.1 is an example of a deep feedforward network with an input layer, three hidden layers and an output layer.

The autoencoder has three common variants: sparse autoencoder [37], [38], denoising autoencoder [39], [40], [41] and contractive autoencoder [42], [43] as follows:

  • Sparse autoencoder: Unlike autoencoders, sparse au- toencoders add a sparse constraint Ω(h) to the hid- den layer h. Thus, its reconstruction error can be evaluated by: 
L(x, g(f (x))) + Ω(h) (2)
  • Denoising autoencoder: Unlike sparse autoencoders that add a sparse constraint on the hidden layer, denoising autoencoders are to minimize the loss function: 
L(x, g(f (x ))) (3) 
where x is based on x with some noise.
  • Contractive autoencoder: Similar to sparse autoen- coders, contractive autoencoders add an explicit reg- ularizer Ω(h) to the hidden layer h, and minimize

x h1 h2 hn-1 hn AE1 AE2 AEn


Input Layer

Hidden Layers

Output Layer

the explicit regularizer. The explicit regularizer is

∂f (x) 2 Ω(h)=λ

where Ω(h) is the squared Frobenius norm [44] of the Jacobian matrix of partial derivatives of the encoder function f(x), and λ is a free parameter.

A stacked autoencoder [32], [45] is a neural network with multiple autoencoder layers as shown in Fig.2, in which the output of the previous layer is used as the input of the next layer. An autoencoder is usually only three layers and does not have a deep learning architecture. However, the stacked autoencoder has a deep learning architecture by stacking a number of autoencoders. It is worth mentioning that the training of the stacked autoencoder can not accomplish in an action, but by layer-by-layer. For example, if we want to train a network with n → m → k architecture using a stacked autoencoder, we must first train the network n → m → n to get the transformation n → m, and then train the network m → k → m to get the transformation m → k, and finally stack the two transformation to form the stacked autoencoder (i.e., n → m → k). This process is also called layer-wise unsuperwised pre-training [16].

Fig. 2. An example of a stacked autoencoder with n autoencoders (i.e., AE1, AE2, …, AEn).

2.4 Deep Belief Networks

Boltzmann machine [46], [47], [48], [49] is derived from statistical physics and is a modeling method based on energy functions that can describe the high order inter- action between variables. Although Boltzmann machine is relatively complex, it has a relatively complete physical interpretation and strict mathematical statistics theory as a basis. Boltzmann machine is a symmetric coupled random feedback binary unit neural network, which consists of

Fig. 1. An example of a deep feedforward network with an input layer, three hidden layers and an output layer.

As can be seen from Fig.1, if given an input x and three hidden functions f1, f2, and f3, an output f(x) can be obtain by training the deep feedforward network: f(x) = f3(f2(f1(x))). If y is the corresponding label of x, f(x) should be very close to y, i.e., y ≈ f(x). Like this chain structure is the most common in deep feedforward networks. In general, the length of a chain structure can be called the depth of a deep feedforward network. Since then, there has been a new professional terminology: deep learning. Therefore, the deep feedforward network is one of the most primitive deep learning architectures.

2.3 Stacked Autoencoders

An autoencoder [34], [35], [36] is a simple deep feedforward network, which consists of an input layer, a hidden layer and an output layer. An autoencoder can be divided into two parts: an encoder and a decoder. The encoder (denoted as f (x)) is used to generate a reduced feature representation from an initial input x by a hidden layer h, and the decoder (denoted as g(f(x))) is used to reconstruct the initial input from the output of the encoder by minimizing a loss func- tion:

L(x, g(f (x))) (1)

Through these two processes, autoencoder can be a high- dimensional data into a low dimensional data. Therefore, the autoencoder is very useful in classification and similar issues.

∂x F


a visible layer and multiple hidden layers. The network nodes are divided into visible units and hidden units. The visible and hidden units are used to represent the random neural network learning model, the weights represent the correlation between two units in the model.

A restricted Boltzmann machine [50], [51] is a special form of Boltzmann machine, and it consists of a visible layer and a hidden layer. Unlike the feedforward neural network, the connections between the nodes of the visible layer and the nodes of the hidden layer in the restricted Boltzmann machine can be bidirectionally connected. Compared to Boltzmann machines, since restricted Boltzmann machines have only one hidden layer, they have faster calculation speed and better flexibility. In general, restricted Boltzmann machines have two main functions: 1) similar to autoen- coders, restricted Boltzmann machines are used to reduce the dimension of data; 2) restricted Boltzmann machines are used to obtain a weight matrix, which is used as the initial input of other neural networks.

Similar to stacked autoencoders, deep belief networks [52], [53], [54], [55] also a neural network with multiple restricted Boltzmann machine layers, in which the output of the previous layer is used as the input of the next layer. The deep belief neural networks adopt the hierarchical unsupervised greedy pre-training method [52] to pre-train each restricted Boltzmann machine in a hierarchical way. The obtained results are used as the initial input of the supervised learning probability model, and the learning performance will be greatly improved.

2.5 Convolutional Neural Networks

Convolutional neural networks [56], [57], [58], [59] are also deep feedforward networks, and have been widely used in recognition tasks, such as document recognition [60], handwriting recognition [61] and image classification [62], [63], [64], [65]. The only difference between convolution neural networks and fully connected feedforward neural networks is that the two adjacent layers of the two neu- ral networks are connected in different ways. The former is that only some nodes are connected between adjacent two layers, and the latter is that all nodes are connected between adjacent two layers. The biggest problem using a fully connected feedforward neural network is that there are too many parameters for the network. In general, with the increase in parameters will not only lead to slower calculation speed, but also lead to over-fitting problems. Therefore, a more reasonable neural network architecture is needed to effectively reduce the number of parameters in the neural network. Convolution neural networks are built to achieve this goal.

A common convolutional neural network has two core layers: convolutional layer and pooling layer as follows:

  • Convolutional Layer: The input of each node in the convolution layer is just a small patch of the upper layer, and the size of the small patch is often 3 × 3 or 5 × 5. The convolution layer tries to analyze each small patch of the neural network in depth, resulting in a higher abstraction feature representation.
  • Pooling Layer: There is often a pooling layer fol- lowed by a convolution layer. The pooling layer

can effectively reduce the size of the matrix from the upper convolutional layer, thereby reducing the number of parameters in the neural network. There- fore, the use of pooling layers can not only speed up the calculation speed but also prevent the problem of overfitting.

In general, there are two types of convolution neural network architectures according to the different connection modes of different convolution layers: one is to connect different convolution layers in series such as LeNet-5 [60], AlexNet [62] and ZFNet [66], and the other to connect different convolutions in parallel such as Inception and its follow-up versions [67], [68], [69].


In recent years, many deep learning methods were pro- posed to apply in the field of MRI image processing and analysis, such as image detection, image registration, image segmentation, image classification, etc, all of which can be formulated as feature representation problems, and thus can be solved effectively by using the deep learning methods to find an effective set of features. In this section, we review recent progress in applying the deep learning methods in image detection, image registration, image segmentation and image classification of MRI images.

3.1 Image Detection

In computer-aided detection routines, image detection plays an important role. The purpose of image detection is to find the tissues of interest, and then measure and analyze whether they produce lesions. Some deep learning methods were proposed to perform MRI image detection as follows.

To perform organ detection from datasets with abnor- malities, Shin et al. [70] proposed a deep learning with a stacked sparse auto-encoder model, which is a deep learning architecture of sparse auto-encoder, and can be effectively applied to unsupervised feature learning of a complex dataset that is difficult to identify the labels of the samples in the dataset. In this study, the stacked sparse auto- encoder model was generated by stacking several unsuper- vised feature learning layers, which are trained using greedy methods. Afterwards, a pooling operation was applied to compress features of gradually increased input regions, and further to generate a part-based model to perform multiple organ detection in MRI images.

To achieve automatic detection of lacunes of presumed vascular origin, Ghafoorian et al. [71] proposed an automat- ed two-step deep convolutional network method. Firstly, a fully convolutional network was applied to detect initial candidates. Then, a 3D convolutional network was applied to reduce false positive. Since the location information plays an important role in candidate structure analysis, multi- scale analysis and combination of explicit location features were used to generate contextual information, which was added into the network.

To detect cerebral microbleeds (CMBs) from MRI images, Dou et al. [72] proposed a automatic 3D convolutional network method. The 3D convolutional network can effec- tively encodes the spatial contextual information from the


volumes of MRI images to extract high-level features for CMBs in a data driven way. Since the 3D convolutional network adopted traditional sliding window strategy, the computational cost of using the method to detect CMBs is relatively high. To further improve the performance of CMBs detection, a two-step cascaded 3D fully convolutional network framework was proposed. The 3D fully convolu- tional network first was used to rapidly retrieve potential candidates, and then used these potential candidates to fur- ther accurately distinguish CMBs from challenging mimics.

3.2 Image Registration

Image registration is the process of matching and superim- posing two or more images acquired by different time, dif- ferent modalities (imaging devices) or different conditions [73]. The general process of image registration is as follows:

  • The features are obtained by feature extraction of two images;
  • The feature pairs are found by performing a similar- ity measure;
  • The image space coordinate transformation parame- ters are obtained by matching feature pairs;
  • The image space coordinate transformation parame- ters are used to perform image registration. 
The purpose of medical image registration is to combine various images, which display their information in the same image, providing multiple information for clinical diagno- sis. Medical image registration reveals the correspondences by maximizing the similarity of the feature pairs between two images. Therefore, building accurate and effective corre- spondences plays an key role in medical image registration. 
Recently, with the development of neuroimaging tech- niques, to make the diagnosis and treatment of the diseases more accurate, various new modalities are emerging. Thus, image registration operations are necessary, which combine different modality data. A number of learning-based image registration methods have been proposed to help select the best features that are used to guide correspondence detection across samples with large changes. However, these learning-based image registration methods have a great limitation, which require many known correspondences in the training process. To address the limitation, Wu et al. [74] proposed an unsupervised deep learning framework to extract the optimal image features for image registration. Firstly, a stacked convolutional independent subspace anal- ysis network was built to learn hierarchical representations from many 3D image patches of MRI images. Then, the hierarchical representations were used to perform corre- spondence detection in the image registration process. In this study, the stacked convolutional independent subspace analysis network contained two layers: the primary purpose of the first layer was to obtain the low-level features from a number of 3D image patches in the MRI images; the hi- erarchical representations (i.e., the high-level features) were obtained in the second layer. 
Later, in the same team, Wu et al. [75] also proposed an unsupervised deep learning framework to learn the hierar- chical representations from MRI images for image registra- tion. The unsupervised deep learning framework contained

a stacked auto-encoder with convolutional network. The 3D image patches were used as inputs to train the stacked auto-encoder with convolutional network. In this study, the stacked auto-encoder consisted mainly of two networks: encoder network and decoder network, which the former was used to learn the low-dimensional features from the 3D image patches, the latter was used to recover the 3D image patches from the learnt low-dimensional features. However, if the inputs of the stacked auto-encoder are very large, the computational cost of directly using the stacked auto- encoder to learn useful features is very high. To address this problem, a convolutional network was applied to learn the translational invariant features, which can be used by all points in a particular region.

3.3 Image Segmentation

Automatic tissue segmentation in MRI images is of great importance in modern medical research and clinical rou- tines. Many medical image segmentation challenge were held to encourage the development of automatic segmen- tation techniques, such as Ischemic Stroke Lesion Segmen- tation (ISLES2), Multimodal Brain Tumor Image Segmenta- tion (BRATS3), MR Brain Image Segmentation (MRBrainS4), cardiac MR left ventricle (LV) segmentation (LV5) and so on. Many deep learning methods have also been proposed to perform the segmentation of various tissues in MRI images [76], [77], [78], [79].

In MRI brain images, one of the most common image segmentation is the the segmentation of gray matter (G- M), white matter (WM) and cerebrospinal fluid (CSF). To segment infant brain tissue images into GM, WM and CSF, Zhang et al. [77] proposed using convolutional networks to achieve this goal by combining multi-modal MRI images, which are T1, T2 and fractional anisotropy (FA) images. In this study, four convolutional network architectures were designed according to different input patch sizes. These convolutional network architectures contained a different number of convolutional layers and the resulting feature maps. To capture the nonlinear mappings between inputs and outputs of each convolutional network architecture, local response normalization scheme, fully-connected layers and softmax layers were also applied to these convolution- al network architectures. Meanwhile, to segment neonatal brain tissue images into cortical GM (cGM), myelinated WM (mWM), unmyelinated WM (uWM), ventricular CSF (vCSF), extracerebral CSF (eCSF), basal ganglia and thalami (BGT), cerebellum (CB) and brain stem (BS), Moeskops et al. [78] also proposed a convolutional network to automatically segment these tissues. Similar to the convolutional networks previously proposed Zhang et al. [77], The convolutional network also contained multiple convolutional layers and the resulting feature maps. In addition, to represent each input patch size, the fully-connected layers were also used in the convolutional network, and to perform the final segmentation, a single softmax output layer was used to connect these convolutional and fully-connected layers.

2. http://www.isles-challenge.org/
3. http://braintumorsegmentation.org/
4. http://mrbrains13.isi.uu.nl/
5. http://smial.sri.utoronto.ca/LV Challenge/Home.html


Since most brain tumors can affect patient’s health, and even shorten their life expectancy, automatic and reliable segmentation techniques for removing brain tumors are re- quired. However, most brain tumors have large spatial and structural variability that make them difficult to segment. Thus, automatic and reliable segmentation has become a challenging problem. To address the problem, many deep learning-based brain tumor segmentation methods were proposed [80], [81], [82], [83], [84], [85]. For example, Pereira et al. [84] used convolutional network with small convolu- tional kernels to segment gliomas that are the most common and aggressive brain tumors in MRI images. They believed that using smaller kernels can stack more convolutional layers, and can obtain the same results with larger kernels. The purpose of the operation was to reduce the number of weights in the convolutional network. In addition, to fur- ther improve the segmentation performance, both intensity normalization and volumetric constrains were used to the convolutional network. Later, Havaei et al. [85] also pre- sented a fully automatic brain tumor segmentation method with convolutional network. Unlike most traditional convo- lutional networks, The convolutional network in this study included three new components: two-pathway architecture, cascaded architecture and two-phase training. The two- pathway architecture was used to obtain both local features and global contextual features. The cascaded architecture contained input concatenation, local pathway concatenation and pre-output concatenation, and was used to exploit the efficiency of the output of a convolutional network. The two-phase training procedure was used to deal with the imbalance labels of brain tumors in MRI images.

The measurement of cardiac ventricle (including left ventricle (LV) and right ventricle (RV)) plays an impor- tant role in clinical assessment of cardiac structures and functions, such as ventricular volume, wall thickness and ejection fraction. Therefore, accurate and automatic seg- mentation of cardiac ventricle is also necessary. Recently, many deep learning methods were proposed to segment cardiac ventricle [86], [87], [88]. For example, to segment the LV from MRI images, Avendi et al. [87] proposed a methodology that combined deep learning architectures and deformable models to perform this task. The method mainly included three steps: 1) using convolutional networks to estimate the location of the region of interest (ROI) of the LV from MRI images; 2) using stacked auto-encoders to infer the shape of the LV; 3) the inferred LV shape was incorporated into deformable models to perform the final segmentation of the LV. Later, Ngo et al. [88] proposed to employ deep learning architectures combined with level set for the automated segmentation of the LV from MRI images. The method also includes three steps: 1) using a deep belief network to estimate the location of the ROI of the LV from MRI images; 2) using another deep belief network to delineate the endocardial and epicardial borders; 3) the estimated ROI of the LV and the delineation of the endocardial and epicardial borders were incorporated into the distance regularised level set method to perform the final segmentation of the LV.

In addition to the above tissues segmentation, the deep learning segmentation methods in MRI images were also applied to other tissues, such as multiple sclerosis [89], [90],

prostate [91], striatum [92], tibial cartilage [93], abdominal adipose tissues [94], anterior visual pathway [95], etc.

3.4 Image Classification

Image classification plays an important role in automatic disease diagnosis and cognitive recognition, such as the clas- sification of different severity patients and the recognition of different brain activity. Many deep learning methods have also been proposed to perform image classification tasks in MRI images [96], [97], [98].

3.4.1 Alzheimer’s disease Classification

The automatic diagnosis of Alzheimer’s disease (AD) plays an important role in human health, especially for the elderly. Since AD is a neurodegenerative disease, it has a long incu- bation period, so it is necessary to analyze the symptoms of AD at different stages. Therefore, many researchers used image classification to perform the analysis task. To address the task, many deep learning methods were proposed to perform the classification of different severity AD patients by using MRI images [96], [99], [100], [101], [102], [103], [104], [105], [106], [107].

To diagnose AD and its prodromal stage, i.e., mild cogni- tive impairment (MCI), Suk et al [96] proposed a deep learn- ing method to find high-level latent and shared features from MRI and positron emission tomography (PET) images. In this study, a statistical significance test was first applied to obtain discriminative patches between classes. A multi- modal deep Boltzmann machine was built to find high- level latent and shared features from the paired patches. In the multimodal deep Boltzmann machine, a Gaussian Restricted Boltzmann Machine was trained to transform the paired patches into binary vectors. The binary vectors was used as the inputs of the multi-modal deep Boltzmann machine. After finding high-level latent and shared features by using the paired patches and the trained MM-DBM, an image-level classifier was constructed to perform the final classification. The classifier construction mainly included three steps: 1) patch-level classifier learning; 2) mega-patch construction; 3) ensemble learning. Later, in the same team, to improve the previous classification performance, Suk et al. [105] also proposed a deep learning method for AD classification. In this study, a stacked auto-encoder was first built to find high-level latent features from the low-level features in MRI, PET and CSF. Then, a sparse representation learning method was applied to select the most discrimi- nation features from the high-level latent features in MRI, PET and CSF and two clinical scores. Finally, a multi-kernel support vector machine method was applied to combine multi-modal features to perform the final classification.

In another team, the deep learning methods around AD classification were also presented. For example, Liu et al. [100] designed a deep learning method that combined stacked sparse auto-encoders and a softmax regression layer to diagnose AD and MCI. The stacked sparse auto-encoders were trained to obtain high-level latent features from MRI and PET images. The softmax regression layer was used to give the probability of each subject to classify all ex- perimental subjects. Subsequently, to improve the previous classification performance, Liu et al. [101] also proposed a


multi-phase feature representation learning framework to perform AD classification. Similar to [100], the purpose of the first phase was to obtain high-level latent features by using stacked auto-encoders from MRI and PET images. The second phase was to use low-dimensional biomarkers to optimize the obtained high-level latent features by using a linear regression. The third phase was to classify all experimental subjects by using a softmax regression layer that was similar to [100].

3.4.2 Schizophrenia Classification

Schizophrenia (SCZ) is a complex psychiatric disorder char- acterized by cognitive deterioration, aberrant sensory per- ception and disturbed thinking [108], [109]. Patients with SCZ may seem like losing touch with reality. Families and society are impacted by SCZ, too. Many patients with SCZ have difficulty doing a job or caring for themselves, so they rely on others for help. Approximately 8 out of 1, 000 individuals have SCZ in their lifetime. Therefore, automatic diagnosis of SCZ is also necessary. Recently, many deep learning methods were proposed to perform image classi- fication of SCZ [98], [110], [111], [112].

To diagnose patients with SCZ from health controls, Pinaya et al. [98] trained a deep neural network that com- bined a deep belief network and a softmax layer to extract high-level latent features from MRI images. The deep neural network was trained by using two steps: pre-trained by a deep belief network and supervised fine-tuned by a softmax layer. The pre-trained network was used to find high-level latent features from brain morphometry data. The softmax layer was used to refine the pre-trained network by super- vised fine-tuned, and to perform the final classification.

Later, around the problem of SCZ classification, Kim et al. [111] also presented a deep neural network with multiple hidden layers and a softmax layer to obtain high- level latent features from low-level features (i.e., functional connectivity) from MRI images. To further improve the the accuracy of SCZ classification, both L1-norm regularization and stacked auto-encoder were incorporate into the deep neural network. The L1-norm regularization was used to control the weight sparsity in each hidden layer. The stacked auto-encoder was used to pre-trained the weights of the deep neural network for initialization.

3.4.3 Brain Activity classification

In general, different external stimuli correspond to different brain activities, and different brain activities exhibit differ- ent functional brain images [113], [114]. Therefore, image classification plays an important role in identifying different brain activities. Recently, many deep learning methods were proposed to perform image classification of different brain activities [97], [115], [116], [117].

To identify different brain activities including emotion, gambling, language, motor, relational, social and working memory, Koyamada et al. [97] trained a feed-forward deep neural network from fMRI images to implement this task. The feed-forward deep neural network included multiple hidden layers and a maxsoft layer. Similarly, these hidden layers were used to obtain high-level latent features, and the softmax layer was used to calculate the probability of each subject in a class. In addition, minibatch stochastic gradient

descent, dropout [118], and principal sensitivity analysis [119] were incorporated into the feed-forward deep neural network to improve the final classification performance.

Recently, to classify different sensorimotor tasks includ- ing left-hand clenching, right-hand clenching, auditory at- tention and visual stimulus, Jang et al. [117] employed fully connected feed-forward deep neural networks with multiple hidden layers to implement this task. In this study, a deep belief network with restricted Boltzmann machine was pre-trained and used to initialize weights of the ful- ly connected feed-forward deep neural networks. Then, a back-propagation algorithm was used to fine-tune the deep belief network to control weight-sparsity levels across hidden layers.

In addition to the above three classification, the deep learning classification methods in MRI images are also ap- plied to other classification fields, such as ADHD classifi- cation [120], [121], age prediction [122], [123], [124], stroke diagnosis [125], emotional response prediction [126], cere- bellar ataxia types discrimination [127], etc.


Deep learning is a complex technology, in order to achieve the above mentioned deep learning architectures, researchers need to spend a lot of time and energy. For- tunately, in recent years, many deep learning tools have been developed as shown in Table 1. These tools provide convenience for researchers, thus promoting the application of deep learning architectures. Several common and widely used tools are briefly introduced as follows.

Caffe is not only the first major industry-level deep learning tool, but also the most popular tool in the field of computer vision. Caffe is an open deep learning frame- work, which is created by Yangqing Jia, and is developed jointly by Berkeley AI Research (BAIR), Berkeley Vision and Learning Center (BVLC) and community contributors. The advantages and weaknesses of Caffe are as follows:


  •       ◃  Fast running. As shown on its official website, if using a single NVIDIA K40 GPU, Caffe can process over 60M images within a day.
  •       ◃  Specialize in image processing.
  •       ◃  Fine-tune directly existing networks.
  •       ◃  Train directly models without writing any 
  •       ◃  Support Python as application program inter- 
face (API). 
• Weaknesses: 
◃ Layer-based network structure, its scalability is not good, and need to write codes for new layers. 
◃ Too much extension and dependence, result- ing in more and more bloated. 
Torch is a scientific computing framework, and supports many machine learning algorithms. The main development languages of this framework are C and Lua. Several large technology companies, such as Facebook and Twitter, are


pushing this framework. The advantages and weaknesses of Torch are as follows:


  •       ◃  Fast running and good flexibility.
  •       ◃  Optimize the basic computing units, and easy 
to write new layers and run on GPU.
  •       ◃  Include many common computational models 
based Lua.
  •       ◃  Include many pre-trained models 
• Weaknesses:
  •       ◃  Lua is hard to learn and use.
  •       ◃  Layer-based network structure, its scalability 
is not good, and need to write codes for new 
  •       ◃  Not support Python as API. 
Theano is built and developed by Montreal institute of technology (MIT) in 2008. The main development lan- guage of Theano is Python. Theano derives a lot of Python packages with deep learning, such as Pylearn2 and Keras. Theano is the first architecture to describe the model using symbolic tensor graphs. The advantages and weaknesses of Theano are as follows: 
• Advantages:
  •       ◃  Good flexibility and suitable for academic re- search.
  •       ◃  Good support for recursive network and lan- guage modeling.
  •       ◃  Many high-level deep learning packages such as Keras and Pylearn2.
  •       ◃  Good portability. • Weaknesses:
  •       ◃  Slow compilation.
  •       ◃  Difficult to modify codes for developer.
  •       ◃  Less pre-trained models. 
TensorFlow is created by Google to replace Theano for the purposes of conducting machine learning and deep neural networks research. In fact, both TensorFlow and Theano are quite similar. TensorFlow is also an open source software library that uses data flow graphs to implement numerical computation. Each node in the data flow graph represents a mathematical operation, and each edge in the data flow graph represents the relationship between two multidimensional data arrays (i.e., tensors). TensorFlow can run on multiple platforms, such as one or more CPUs (or GPUs), mobile devices and servers. The advantages and weaknesses of TensorFlow are as follows: 
• Advantages:
  •       ◃  High quality meta frameworks.
  •       ◃  Support multiple GPU.
  •       ◃  Faster compilation than Theano.
  •       ◃  Rapid development for new networks.
  •       ◃  Support distributed training.
  •       ◃  Good portability. • Weaknesses:

◃ Slower running, and need to lager memory.
◃ Less pre-trained models.
◃ Not support dynamic input of convolution op-

eration, and not support convolution of time series.

Based on the above mentioned deep learning tools, in recent years, some deep learning tools applied in MRI images are also developed, such as DeepMedic, BrainNet, resnet cnn mri adni, etc.

• BrainNet: This tool is developed based on tensorflow, and aims to train deep neural networks to segment gray matter and white matter from the brain in MRI images.

• DeepMedic: this tool is developed based on Theano, and aims to train multi-scale 3D convolutional neural networks for brain lesion segmentation, such as brain tumors, traumatic brain injuries and ischemic stroke lesions. Moreover, this tool has showed excellent per- formance for brain lesion segmentation tasks, which is the winner of the ISLES 2015 competition.

• LiviaNET: This tool is also developed based on Theano, and aims to train 3D fully convolutional neural networks to segment subcortical brain struc- tures in MRI images.

• resnet cnn mri adni: This tool is developed to train residual and plain convolutional neural networks to perform AD classification in MRI images.

For more details about some deep learning tools, please click the links and see the references in the Table 1.


In summary, the goal of this article is to provide valuable insights for researchers how to apply deep learning archi- tectures in the field of MRI-based research. To the best of our knowledge, we are the first to review the applications of deep learning architectures in MRI images. As can be seen from Section 3, the deep learning architectures have been widely used in MRI image processing and analysis such as image detection, image registration, image segmentation and image classification, and can get better results.

Although many researchers have paid more attention to the field of deep learning applied in MRI images and have obtained some relatively good results, there are many problems and challenges to be solved urgently due to some limitations. In particular, the two main problems and chal- lenges are as follows:

• Limited dataset size and class imbalance: As we all know, the larger the dataset, the better the results of deep learning. However, since MRI image acquisi- tion processes are usually complex and expensive, the size of the dataset with MRI images are lim- ited in many application. Furthermore, for privacy considerations, many MRI images (especially with the disease-related MRI images) are rarely obtained. Therefore, the size of the dataset with MRI images is often small. In addition, In the training process, if the training MRI images are class imbalance, it is often very difficult to get the suitable deep neural


Name DeepLearnToolbox Caffe
Chainer Deeplearning4j SINGA
MatConvNet Brainstorm maxDNN
BrainNet DeepMedic LiviaNET
resnet cnn mri adni

A list for deep learning tools

https://github.com/rasmusbergpalm/DeepLearnToolbox http://caffe.berkeleyvision.org/
http://deeplearning.net/software/theano http://deeplearning.net/software/pylearn2/ https://github.com/EderSantana/keras
https://www.microsoft.com/en- us/research/product/cognitive- toolkit/ https://github.com/dmlc/mxnet
https://deeplearning4j.org/ http://www.comp.nus.edu.sg/∼dbsystem/singa/ http://www.vlfeat.org/matconvnet/ https://github.com/IDSIA/brainstorm https://github.com/eBay/maxDNN https://github.com/kaspermarstal/BrainNet https://github.com/Kamnitsask/deepmedic https://github.com/josedolz/LiviaNET https://github.com/neuro-ml/resnet cnn mri adni

Reference [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142]

[68] [143] [144] [145]


networks. At present, the two main strategies to improve the above problems are as follows:

  •       ◃  Sampling: Sampling, which includes oversam- pling and undersampling, are widely used to enlarge and rebalance the size of the dataset with MRI images. Oversampling is applied to generate new MRI images from the existing MRI images, and undersampling is applied to select some MRI images from the existing MRI images.
  •       ◃  Pre-training: When the size of the dataset with MRI images is limited, an unsupervised pre- training operation can help to prevent over- fitting and generate more regularized results. Therefore, pre-training is also widely applied to deal with limited dataset size and class im- balance. In general, the pre-training operation is followed by a fine-tuning operation.

• Choosing a suitable deep learning architecture and its corresponding hyperparameters for a particular application: Since so far the advantages and weak- nesses of each deep learning architecture are only roughly understood for most researchers, to obtain good results for a particular application, how to choose a suitable deep learning architecture is a problem. Even if a deep learning architecture for a particular application is determined, how to set the hyperparameters (such as number of layers, weight initialization values, number of hidden units, etc.)of the architecture is also a problem. At present, to ad- dress the above two problems, most researchers are based on experimental experience. Therefore, how to choose the most suitable deep learning architecture and its corresponding hyperparameters for a partic- ular application is not only an urgent problem to be solved, but also a great challenge in the future.

With the continuous advance of medical big data, the size of the dataset with MRI images will no longer be a prob- lem. Moreover, with the deepening of the understanding of

deep learning architectures, choosing a suitable deep learn- ing architecture and its corresponding hyperparameters for a particular application will also no longer be a problem. It is reasonable to expect that deep learning applications in MRI images will bring more remarkable achievements in the near future.


The authors would like to express their gratitude for the support from the National Natural Science Foundation of China under Grant No.61232001, No.61420106009 and No.61622213.

Cite This Work

To export a reference to this article please select a referencing stye below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

DMCA / Removal Request

If you are the original writer of this dissertation and no longer wish to have your work published on the UKDiss.com website then please: