Increasing Performance of Signature-Based Intrusion Detection

Info: 8202 words (33 pages) Dissertation
Published: 3rd Feb 2022

Share this: Facebook Twitter Reddit LinkedIn WhatsApp

Abstract

Cyber-Physical Systems (CPS) are advanced intelligent systems that consist of networked or distributed computational elements, sensors and actuators that control physical entities and mechanisms. Nowadays, CPS have attracted much attention due to their vast applications. The precursor generation of CPS can be found in a variety of areas including aerospace, industrial infrastructures, health care, transportation, energy, Supervisory Control And Data Acquisition (SCADA) and autonomous automobile systems. The development of safe CPS needs a deep understanding of the potential impacts of successful malicious cyber-attacks. CPS security-related concerns include attacker’s efforts to intercept information captured by sensors and manipulate rules sent to actuators to disrupt, defeat and eventually cause the system to fail. Since there are physical actuators included in CPS, the damages could be vital as in autonomous automobile systems.

With the increasing utilization of CPS in infrastructural and vital parts of countries and controlling of significant industrial processes and applying rules based on the perception of sensors from the environment, securing the CPS has become essential. One of the techniques that are used in these systems is signature-based intrusion detection. One of the main problems in signature-based intrusion detection techniques is difficulty in maintaining the dictionary of attacks due to the requirement of memory size to keep all the signatures. The second issue is the low speed in detecting the attacks since packets need to be checked by each signature in the dictionary of attacks one by one.

This thesis proposes and evaluates a method to increase the speed and performance of the signature-based intrusion detection and eventually increase the CPU availability. As the first step in this work, the least valuable information is found and removed from the attack dictionary. Then the dictionary is divided to two sub-dictionaries based on the most numerous attacks and finally, it is classified in a decision tree. Removing the least valuable information and searching for the rules inside the decision tree reduces processing time. Also as the probability of each rule’s occurrence gets more when an incoming packet is matched with a signature, it will enhance the accuracy comparing to traditional ID3 (Iterative Dichotomiser 3) method. The proposed method is simulated using Python based on data sets that have been gathered from real-world networks (KDD-99). The performance enhancement and resource availability improvement are demonstrated as results of the proposed method.

Keywords: Cyber-Physical Systems, Intrusion Detection, Signature, Framework, Frequent Attack Dictionary, Entropy, Support Vector.

Chapter 1: Introduction

The term Cyber-Physical Systems is introduced as a research model and mechanism that connects computing, communications and control. Although the exact definition is difficult due to the wide domain of these systems, CPS can be generally depicted as physical engineered systems that monitor, control and integrate their operations into a computational and communicational core. In other words, CPS are important intelligent systems consisting of computing elements, network elements and distributed sensors and actuators, which control the physical entities.

Security issues in CPS including attackers’ efforts to intercept and manipulate the data captured from the sensors and the instructions sent to the operators can disrupt and ultimately defeat the system. The importance of securing CPS is becoming more sensitive considering the rising use of CPS in vital and critical sectors, infrastructures and sensitive industrial processes.

CPS applications likely will affect the Information Technology revolution of the twentieth century. Examples of CPS include wide range of large-scaled engineered systems such as automated control systems in aviation, energy conservation, medical systems with high reliability in the field of health care, distributed robots (like telemedicine and defense systems), help with daily living (Assisted living), traffic and safety control, SCADA, advanced automotive systems, transportation, automation and smart grid systems. In all these systems, the perfect solution for the complex interaction between different physical and computational elements is very important.

In recent decades, with the wide growth of computers and Internet, too many security issues have been raised in CPS. The number of networks and their applications and security threats is increasing every day. At the same time, the creation of one hundred percent secure personal computers without weaknesses and failures from the technical point is impossible. Therefore, research on intrusion detection systems (IDS) for CPS is being pursued with great interest. One of the intrusion detection techniques that has gathered so much interest is signature-based intrusion detection. This technique is based on a set of rules and signatures in a data set or dictionary which define malicious packet and files. Each rule corresponds with one known threat.

One of the major benefits of signature-based intrusion detection is its accuracy and low false positive rate. On the other hand, it is its disadvantage that the performance is too dependent on the rules and choosing the wrong set of rules might lead to instability in the process control. In this case, many organizations might experience denial of services availability, information stealing and disruption of decision-making that all will affect CPS and may cause financial damages and even vital human injuries. Another disadvantage of the signature-based IDSs is that the received packet should be checked with all individual rules one by one. Therefore, it would increase the time required for detection process and thus makes maintenance complicated due to limited resources.

This thesis proposes a method to increase the speed and performance of the signature-based intrusion detection. First, the least valuable information is found and removed from the attack dictionary. The enhanced value dictionary is divided to two sub-dictionaries based on the most numerous attacks and finally, it is classified in a decision tree. Removing the least valuable information and searching for the rules inside the decision tree reduces processing time. When an incoming packet is matched with a signature, the probability of each rule’s occurrence is increased. These improvements increase the accuracy comparing to traditional ID3 (Iterative Dichotomiser 3) method. the proposed method is evaluated using Quality of Service parameters including accuracy, false positive and false negative.

1.1. Purpose and organization of the thesis

In this thesis, a method for signature-based intrusion detection is proposed. The proposed method is simulated using Python 3.5 and KDD99 data set and the simulation results are analyzed to evaluate the performance of the method. The development of the proposed method is based on exploration and evaluation of existing intrusion detection techniques and frameworks and analysis of their shortcomings.

The thesis is organized into seven chapters. Chapter 2 introduces the main concepts that this thesis relies on intrusion detection techniques and their comparison. Chapter 3 provides an overview of CPS and their security frameworks. Chapter 4 reviews the background of chosen intrusion detection technique. Chapter 5 presents the main contributions of this work, namely the frequent attack dictionary decision tree for advanced signature-based intrusion detection and design of simulation experiment. This chapter is followed by analysis of simulation results and discussion in chapter 6. Finally, chapter 7 presents the conclusions and outlines the possible future research.

Chapter 2: Signature-Based Intrusion Detection

Signature-based IDSs are very effective for known attacks. Since its very fast and easy to install these systems, they can start working immediately. Signature-based IDS analyzes each packet and compares the content with the dictionary of known attacks. Sometimes normal packets are mistaken as attacks (False positives) but this does not occur too often. These systems generate easy to understand reports and label each packet as normal or as one class of attacks.

Although signature-based IDSs are efficient for known attacks, their problem is that they are not able to find zero-day attacks. Hackers use zero-day attacks and attack many systems before the administrators adapt their organizations IDSs [22]. For this reason, signature-based IDS should be updated continuously. Attack reports should be collected from all over the world and as soon as a new attack is detected. In addition, security engineers should analyze and develop a solution for defending against the attack. The solution should be distributed to all the subsystems and IDS/IPS systems should be updated accordingly. However, the first subsection that has been attacked is already compromised and may have been damaged.

Figure 4-1 illustrates the detection process in signature-based techniques. This type of intrusion detection analysis packets’ features and tries to find a match in stored dictionaries which have all the known attacks recorded in them. Some articles call this technique misuse detection or pattern-based detection. Having low false positive rate is the most important advantage of signature-based IDS.

This technique reacts to known malicious behavior. In another word, they define a node as a good node if it is not exhibiting any attack signatures. The most significant issue in these topics is to create an efficient attack database. If the signature is too large it spends too much memory and if it is not detailed enough it will reduce its accuracy. Signature-based IDSs are more efficient and accurate for detecting outsider attack than other IDS techniques.

Figure ‎4‑1. Signature-based intrusion detection process

2.1. Decision tree based intrusion detection

One of the disadvantages in signature-based IDS is that it drops many packets as it does not have enough time to check each packet with all the rules in the attack dictionary. To solve this problem many researchers have used decision tree search methods. SQL injection attacks are examined in [31] and the decision tree is used for their detection. In [31] incoming HTTP requests are filtered using the tree.

A decision tree is a classification algorithm in data mining and its fundamental algorithm is called ID3 (Iterative Dichotomiser 3) [39]. This algorithm builds a tree based on the given classified data and each data is recognized and defined with its features’ values. Classification in decision trees is done in a reverse order and the main challenge is to define the key features for the nodes. Each node in the tree shows a feature from signatures and the process is done when all the signatures are registered in the tree. The leaves in the tree show ending of each connection and are labeled with a type of attack. These trees are capable of working with large amount of data which is an advantage for CPS because there is plenty of traffic flow in CPS. Besides, the high performance in decision trees makes them a good option for the real-time systems. Figure 4-2 shows a small part of a decision tree. In this Figure, the destination node is decided based on the value of source port.

Figure ‎4‑2. Sample of a small branch in decision tree

Decision tree accuracy and their ease of construction is another benefit which makes them a suitable choice for IDS in CPS. Researchers in [30] proposed a classification method using machine learning and simulated the method based on KDD data set. This algorithm has a similar function but it tries to find the attack with the minimum comparison. Although their results demonstrate a good performance improvement, the problem of having a huge amount of data in the dictionary remains unsolved.

2.2. Intrusion detection with filtering mechanism

Nowadays, wireless sensor networks have many security concerns. Sensor networks which are the main part of the critical infrastructures such as CPS require strong security mechanisms. These systems are typically developed in a critical environment which is very vulnerable to attacks. Traditional security methods including encryption, VPN, authentication and firewall are not adequate since they just examine external threats. Therefore, many organizations employ different IDSs to overcome this issue. It is an important step to decide which type of IDS is the best based on the organization’s architecture, size and finance. It should be considered that not every company has enough resources to afford too expensive IDSs.

In signature-based IDS the quality of security depends on the quality of the signatures in the dictionary. However, having a lot of information in the dictionary consumes plenty of resources. Furthermore, each event will be logged and each comparison will record some warning in the system. Recording too many warnings and log files is another problem in signature-based IDS which makes it hard to analyze all the information later.

Researchers in [29] have proposed a new type of signatures which combines traditional signatures with contextual information from the network. They also have defined a Hash function which drops unimportant and uncritical warnings. In their proposed method, each signature is an ordered pair as (CI, Sig) which CI contains the contextual information of the network and Sig shows the related signature. They could drop the warnings by 66.1% filtering rate.

2.1. Learning based intrusion detection

As noted earlier, signature-based technique can detect an attack if and only if there is a matching signature built, tested and developed for it. In many signature-based IDSs, most of the signatures should be extracted manually which is very time-consuming and it gets more probable to have errors. Therefore, the quality of the system becomes dependent on experts that have registered the signatures manually.

To solve this problem, by exploring ideas from Ecology, researchers in [28] have proposed a modified supervised learning Classifier System algorithm (UCSm). This algorithm builds dynamic adjustable signatures to detect intrusion using a supervised classification system. The classification system is an online parallel rule-based system. This method is simulated using KDD data set and approached intrusion detection accuracy rate in this method is 92.03 which has increased its basic method rate (UCS) by 9%.

Some other articles have used data mining to overcome the mentioned issue. A data mining network-based approach is proposed and evaluated in [26] to build signatures. This approach which is called Signature Apriori employs both network information and protocols and learns the signatures based on an environment attacked using many different methods of cyber-attacks. This approach compares normal and malicious packets and registers the characteristic of malicious ones.

Figure ‎4‑3. Signature Arpirori system; A data mining network-based approach for intrusion detection.

Figure 4-3 demonstrate the Signature Arpirori system. This system learns signatures using four main entities namely packet sensor, signature minor, rule set and associated miner. Packet sensor captures packets from the network and sends them to the signature miner. Signature miner recognizes and builds probable signatures based on Signature Apriori. The associated signature miner filters signatures and final filtered signatures will be recorded in the rule set. [26]

2.1. Time-based intrusion detection

Analyzing time of events can be a great help for intrusion detection. Time analysis can be done dynamically, statically or hybrid. The dynamic analysis examines incoming packet’s immediately in processing time. In contrast, static analysis, analysis data in specific periods of time. A sudden change in a period can be a sign of a potential attack.

2.2. Parallel intrusion detection

This section explains the difference between distributed IDS and parallel. A distributed IDS employs distributed monitoring system from different points of the network. The main purpose of these systems is to improve the detection quality. In contrast, parallel IDS analysis incoming traffic at the same time (incoming traffic is split between multiple systems). These systems focus on speed and parallelism of processes.

There are two general types of parallelism in IDS: Data parallelism and function (performance) parallelism. In the former, data is split between different systems so that each system has a different part of the data. The latter, the same data is sent to different functions.

Some articles, such as [25], use parallelism to improve signature-based detection methods’ productivity. In [25] researchers utilize two Snort systems and divide the traffic between them to improve systems’ performance. As a result, the accuracy and efficiency are improved and fewer packets are dropped. Since this method demands a considerable number of strong and high-priced processors, it is not efficient in large-scale infrastructure.

Chapter 3: Proposed Method

While the number of critical CPS in different areas is increasing, improving their security becomes more serious and important. On one hand, according to the earlier chapters, it was concluded that the signature-based intrusion detection method is one of the best methods in precision which makes it more suitable for critical infrastructures such as CPS. Signature-based methods are recognized as an efficient technique and they are being used in so many infrastructures and real-time systems such as CPS because of their low percentage of error rate. On the other hand, they suffer from two important facts or restrictions:

The fact that each incoming packet should be compared and checked by each of the rules in the attack dictionary till one of the rules matches. This will potentially decrease system performance and speed and it consumes too many resources.

The fact that this method will not detect new (zero-day) attacks.

Regarding the second restrictions, many pieces of research have offered a combination of signature-based and anomaly-based detection technique. In this thesis, the focus is on improving the first restriction. The vivid problem arising from this fact is difficulty in maintaining the dictionary of attacks due to the need of having too much memory to keep all the signatures. The second issue is the low speed of detecting attacks since packets need to be checked by each signature in the dictionary of attacks one by one. Unnecessary data in the dictionary which will not make too much difference for comparing two incoming packets should be removed to improve this problem. This will change our normal dictionary to a dictionary with enhanced merit and will reduce consuming resources.

To increase the speed and to decrease the need of high amount of memory and resources, the enhanced dictionary can be also divided to two sub-dictionaries based on most numerous attacks and be classified in a decision tree. Therefore, when a packet finds no match in the most numerous dictionary the connection will be allowed and if the second dictionary finds any match, it will disconnect the connection as soon as possible. To remove unnecessary data in the original dictionary two different methods are used:

Support Vector
Entropy

Using these two methods the least valuable columns of data were found and discarded from the database. Figure 5-1 shows the conceptual diagram of the proposed method explaining the steps which have been taken. A comprehensive explanation of the proposed method is given in this chapter. Finding unnecessary data, building the frequent dictionary, making the decision tree and employed data set and technology for simulation are discussed in this chapter.

Figure ‎5‑1. Conceptual diagram of proposed method.

3.1. Data used for the simulation

In this simulation, an offline mode data set called KDD99 is used which is based on the DARPA data set. DARPA provided KDD99 from audit data that have been collected from intrusions against a real-world network environment. This data set is one of the most common data sets from 1999 and it is used in many intrusion detection simulations and evaluations. The data set is recorded based on network traffic for seven weeks and has about 4 GB data that may illustrate five million connections. Each connection is labeled as one specific attack group or as normal. Each vector has 41 features [32] which all are listed in Table 5-1.

Table ‎5‑1. Features of the DARPA data set

3.2. Enhanced valued dictionary

Figure 5-2 shows some samples of signature-based intrusion detection rules in the data set. Each sample in the data set stands for a network connection and each column shows one of the features of that connection. Distinguishing most important valuable data is critical in intrusion detection process. Removing unvalued data will make the detection process faster and easier.

S1 satan. 0 icmp ecr_i SF 20 0 0 1 1 0.00 0.00 0.00 0.00 1.00 0.00 0.00 255 1 0.00 0.02 0.02 0.00 0.00 0.00 0.00 0.00

S2 ipsweep. 0 icmp ecr_i SF 18 0 0 1 1 0.00 0.00 0.00 0.00 1.00 0.00 0.00 1 1 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00

S3 pod. 0 icmp ecr_i SF 1480 0 0 2 2 0.00 0.00 0.00 0.00 1.00 0.00 0.00 2 2 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00

S4 satan. 0 icmp ecr_i SF 20 0 0 1 1 0.00 0.00 0.00 0.00 1.00 0.00 0.00 1 1 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00

S5 nmap. 0 icmp ecr_i SF 8 0 0 0 1 1 0.00 0.00 0.00 0.00 1.00 0.00 0.00 1 1 1.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00

S6 pod. 0 icmp ecr_i SF 1480 0 0 1 1 3 0.00 0.00 0.00 0.00 1.00 0.00 0.67 1 3 1.00 0.00 1.00 0.67 0.00 0.00 0.00 0.00

S7 pod. 0 icmp ecr_i SF 1480 0 0 1 1 3 0.00 0.00 0.00 0.00 1.00 0.00 0.67 1 5 1.00 0.00 1.00 0.60 0.00 0.00 0.00 0.00

Figure ‎5‑2. A sample of signature-based intrusion detection rules in database

3.2.1. Selecting valuable features

Design and implementation of an ideal and practical defense system is difficult. So, before you provide a defense method, you need to be aware of the parameters that are effective in detecting the attack. In building a frequent dictionary for signature-based intrusion detection it is much of importance to find the most valuable data that gives more information about the connection. For selection method, Support Vector and Entropy are used which are explained in this section.

3.2.1.1. Support Vector

The complexity of the data can be reduced if the key features are distinguished. This helps to have better efficiency and productivity in intrusions detection systems [33]. One of the metrics that can help finding the key features is Support Vector function. When using linear kernels, the formula of the function of decision-making would be as follows [33]:

F(X) = + b,> (5-1)

The point X is predicted to be in class A or “positive class” if the F(X) is positive and class B or “negative class” if F(X) is negative. The formula above can be modified to expand the dot product of V and X.

F(X) = ΣViXi + b (5-2)

It is obvious in the formula that value of F(X) depends on each factor’s contribution that is ViXi. The sign of Vi shows whether the factors are contributing toward attacks (negative) or normal connections (positive) since Xi can take only b ≥ 0.

Size of Vi represents the contribution strength. This means that if Vi is a large positive number the related feature is a key feature in that class. Calculating all Vi for all features and sorting them shows the least important ones.

Key Feature = Max (F_col(X)) col = {col1, col2, …, col41} (5-3)

3.2.1.2. Entropy

Entropy is a metric to evaluate system disorder. System disorder shows the number of states that system has. Entropy shows system uncertainty and can illustrate distributions in each feature. There are different forms of Entropy but one of them has been used in IDS which is called Shannon Entropy. Its formula for two values is as follow:

(5-4)	Entropy(S) = − ₁ ₂( ₁) – ₂ ₂( ₂)
(5-5)	H = −Σ ₂( )

H shows the Entropy or in other words, the distribution of N probable outcomes and p_i is the i_th outcome frequency. Entropy is minimum (equal to zero) when there is just one probable outcome. On the other hand, Entropy will be maximum if all probable outcomes happen the same number of times. If for each feature more values are possible to happen, then the Entropy increases.

To summarize, Entropy has a direct relation with the amount of information that can be achieved from each feature. If the feature is giving more information, its H would increase and show that it is a key feature and it can help better in attack detection.

3.2.2. Key features

Using Support Vector and Entropy, the most valuable features and the features which were not giving valuable information were found. Then the least important ones from the dictionary were removed since their existence would just increase memory and CPU usage and slows down the detection process. Removed features are shown in Table 5-2.

Table ‎5‑2. Unvalued columns in the attack dictionary

7. land

12. logged_in

14. root_shell

15. su_attempted

20. num_outbound_cmds

21. is_host_login

22. is_guest_login

3.2.3. Sub-dictionaries

So far, most valued data were selected and seven column of the dictionary which includes unvalued data were discarded. This decreases the memory consumption by 17.07 percent. Now the enhanced value dictionary should be divided into two sub-dictionaries. The most numerous rules are kept in the frequent dictionary and the rest are moved to the second sub-dictionary. This process which will provide a parallel signature-based intrusion detection is shown in Figure 5-3 in more details.

Figure ‎5‑3. Part of proposed method

3.3. Decision tree based on the frequent dictionary

Decision trees are mostly used as data structures. Trees are set of nodes that are connected to each other and there are no loops in them. In other words, they are connected acyclic graphs. The most important feature in decision trees is their fast speed in searching algorithms compared to linked list. In this thesis, binary decision trees are not used and each node’s degree can be equal or more than two. A sample of this decision trees is shown in Figure 5-4. Search process starts from the root and moves toward leaves. Any non-leaf node is a characteristic feature (columns in the database). The probability of each rule’s occurrence is increased when an incoming packet is matched with a signature. Therefore, it will increase the accuracy comparing to traditional ID3 (Iterative Dichotomiser 3) algorithm.

Figure ‎5‑4. Decision tree structure

Decision tree learning algorithm works based on a greedy search from top to down. The basic algorithm is called Concept Learning System (CLS) which was presented in 1950. In 1986 Ross Quilan presented an updated version of the algorithm named Inducing Decision Trees (ID3). An extended version of Entropy formula is developed to calculate the probability of signature’s occurrence.

H_i= −n/N (Σ 2(n/N)) (5-6)

H_i shows i_th leaf’s Entropy, p_i shows the branch’s Entropy and n shows the number of times that signatures have been matched with an incoming packet. N is the number of all the signatures in the dictionary. With calculating this formula, the value of nodes that have been matched more can be increased. If an incoming packet is not completely matched with leaves, it will be compared with the nearby leaves with the highest Entropy.

∆H = H_{i + 1} – H_i (5-7)

If ∆H is equal or more than zero, the packet is in the class of node i+1 and if it is less packet is detected as the type of node i.

Chapter 4: Analysis of Simulation Results

In this chapter, the proposed Enhanced Dictionary IDS method presented in the previous chapter is evaluated. The same data set (KDD) as in ID3 algorithm including its 41 features is used for simulation and its result is compared with the proposed method.

4.1. Simulation

For simulating the method, Python 3.5 is used. Simulation environment’s characteristics are shown in Table 6-1.

Table ‎6‑1. Simulation environment’s characteristics

Windows 8 Pro.	Operating System
Python 3.5	Programming language
Core™ i5-3210M 2.50 GHz.	Processor

4.2. Building the decision tree

In this section, processing time in both building the tree and analyzing a packet are compared and the results are presented in graphs. Figure 6-1 shows the process time for building the decision tree and Figure 6-2 demonstrate processing time for each packet in ID3 compared to the proposed method. Results illustrate that with any number of rules, elapsed time in proposed method is less than the ID3 algorithm.

Figure ‎6‑1. Time process comparison for building the decision tree

Figure ‎6‑2. Processing time comparison for each packet in ID3 and proposed method

Figure 6-3 shows CPU usage while building the decision tree and Figure 6-4 illustrates CPS usage while analyzing each packet in the ID3 and in the proposed method. Results show that proposed method occupies less CPU for building the tree and for processing and searching through the tree to detect attacks.

Figure ‎6‑3. CPU comparison while building the decision tree in ID3 and proposed method

Figure ‎6‑4. CPU usage comparison during packet analysis in ID3 and proposed method

4.3. Accuracy comparison

In this section, accuracy in ID3 [39], UCSm [28] and proposed method are compared based on given evaluation metrics in previous chapters on the KDD99 data set. As mentioned earlier, accuracy in an intrusion detection algorithm is defined by its total correct detections of attacks and normal packets to all the incoming connections. The achieved results illustrate that reducing too much information from the data set has a negative effect on the accuracy. Therefore, it should be decided on each system that what would be the best threshold on the number of the rules stored in each data set to achieve optimal results. Figure 6-5 shows accuracy comparison based on the number of rules in the data set.

Figure 6-5 illustrates the performed accuracy based on the number of the rules in the data set for both ID3 and proposed method for a various number of rules in the dictionary. Simulations based on a different number of rules in the data set represent that having less than 50,000 number of rules results to less accuracy compared to the ID3 algorithm. An appropriate number of rules (almost 70,000 rules), results in %1.64 better accuracy for the proposed method compared to the ID3 algorithm. Table 6-2 compares the average accuracy between ID3, UCSm and the proposed method.

Figure ‎6‑5. Accuracy comparison based on number of the rules in the data set

Table ‎6‑2. Comparison of average accuracy between ID3, UCSm and the proposed method

Accuracy	Error	Algorithm
95.069	4.931	Proposed Method
93.903	6.097	ID3 [39]
92.03	7.97	UCSm [28]

Chapter 5: Conclusion

CPS are important intelligent systems consisting of computing elements, network elements and distributed sensors and actuators, which monitor and control the physical entities. Nowadays CPS are being used in many various domains including health care systems, assisted living, advanced automotive systems, traffic control and safety, energy conservation, environmental control, critical infrastructure (e.g. power, water), robotics and manufacturing. Failure in the security of CPS can leave irreparable damages. Their success not only depends on attack detection accuracy but also on the time of detection and the resource requirements. If the detection time is longer than expected, the attackers can successfully achieve their goal.

Signature-based IDSs are very effective for detecting known attacks. Their installation is easy and fast which consequently can start working immediately. Signature-based system analyzes each packet and compares the content with the dictionary of known attacks. These systems generate easy to understand reports and label each packet as normal or as one type of attack.

Although signature-based IDSs are efficient for known attacks, their main problem is that they are not capable of finding zero-day attacks. Hackers use zero-day attacks and attack many systems before the administrators adopt IDS. For this reason, signature-based IDS should be updated continuously. This type of intrusion detection analyzes packets’ features and tries to find a match in the stored dictionaries which have all the known attacks recorded.

The most significant issue in this topic is to create an efficient attack database. If a signature is too large it spends too much memory and if it is not detailed enough it will compromise the accuracy.

The main benefit of this classification is a low rate of false positives. The key issue in this classification is that the technique must seek a specific pattern in a huge dictionary. This issue significantly reduces the detection speed since packets need to be confirmed by each signature in the dictionary of attacks individually.

This thesis proposes a method to reduce resource consumption, increase detection speed and improve the accuracy of intrusion detection in signature-based IDS. To increase the speed and decrease the requirement for high memory capacity and resources, the dictionary size is reduced by finding high valuable features. Subsequently, it is divided into two sub-dictionaries based on the most numerous attacks and is classified in a decision tree. Accordingly, the process time is reduced as unimportant rules are eliminated. In addition, the probability of matching signatures with incoming packets is increased. These improvements enhance the achieved accuracy comparing to traditional ID3 (Iterative Dichotomiser 3) algorithm. The proposed method is simulated using Python based on the KDD-99 data set that has been gathered from real-world networks. Simulation results demonstrate that the proposed method overcomes the results achieved by the ID3 algorithm by improving the performance both in speed and resource requirements as well as accuracy. The major features of this work are pointed out as follow:

For intrusion detection, 34 features were used in the attack dictionary.
Unvalued data detected by Support Vector method and Entropy is removed from the dictionary which results in reducing the dictionary size by 17.07 percent.
The dictionary is divided into two dictionaries categorizing attacks into frequent and unfrequented based on the number of their occurrences.
To avoid comparing each packet with every rule in the dictionary and to increase the performance, decision tree search algorithm is used.
Results from the simulation of the proposed approach based on KDD-99 data sets demonstrate improvements in speed and CPU usage. Better accuracy is achieved with 50,000 rules or more in the frequent dictionary comparing to ID3.
33.78 percent CPU was employed for constructing the tree and 10.075 percent CPU to analyze each connection.
The accuracy of the proposed method overcome ID3 by 1.16%.

5.1. Directions for the future work

The main unresolved concern regarding the proposed Enhanced Dictionary IDS method is its inability to detect the unknown attacks. Although detection of the new attacks and resource requirement has been improved, dictionaries should be updated quickly as soon as a new attack is recognized. Another problem with the proposed approach is finding the optimal size for dictionary not to compromise the accuracy. Although eliminating too many rules from frequent dictionary results to a higher speed, its accuracy will be affected negatively. To solve these problems, a combination of proposed Enhanced Dictionary IDS and anomaly detection techniques is recommended in order to benefit from their both advantages in one new hybrid method.

References

[1] Park, K. J., Zheng, R., & Liu, X., Cyber-physical systems: Milestones and research challenges, in Computer Communications, vol. 36, no. 1, p. 1-7, 2012.

[2] Ten, C. W., Liu, C. C., & Manimaran, G., Vulnerability assessment of cybersecurity for SCADA systems, IEEE Transactions on Power Systems, vol. 23, no. 4, p. 1836-1846, 2008.

[3] Lee, I., & Sokolsky, O., Medical cyber-physical systems. in Design Automation Conference (DAC), 47th ACM/IEEE, p. 743-748, Las Vegas, Nevada, USA, 2010.

[4] Lee, E. A., & Seshia, S. A., Introduction to Embedded Systems, A Cyber-Physical Systems Approach, Second Edition, MIT Press, ISBN 978-0-262-53381-2, Berkeley, California, USA, 2017.

[5] Wang, E. K., Ye, Y., Xu, X., Yiu, S. M., Hui, L. C. K., & Chow, K. P., Security issues and challenges for cyber-physical system. in Proceedings of the 2010 IEEE/ACM Int’l Conference on Green Computing and Communications & Int’l Conference on Cyber, Physical and Social Computing, p. 733-738, Hangzhou, China, 2010.

[6] Shi, J., Wan, J., Yan, H., & Suo, H., A survey of cyber-physical systems. in Wireless Communications and Signal Processing (WCSP), 2011 International Conference on IEEE, p. 1-6, Nanjing, China, 2011.

[7] Mikusz, M., Towards an understanding of cyber-physical systems as industrial software-product-service systems. Procedia CIRP (Cooperative Institutional Research Program), vol. 16, p. 385–389, 2014.

[8] Hu, L., Xie, N., Kuang, Z., & Zhao, K., Review of cyber-physical system architecture. in Object/Component/Service-Oriented Real-Time Distributed Computing Workshops (ISORCW), 2012 15th IEEE International Symposium on IEEE, p. 25-30, Shenzhen, Guangdong, China, 2012.

[9] Huang, H. M., Tidwell, T., Gill, C., Lu, C., Gao, X., & Dyke, S., Cyber-physical systems for real-time hybrid structural testing: a case study. in Proceedings of the 1st ACM/IEEE international conference on cyber-physical systems, p. 69-78, Stockholm, Sweden, 2010.

[10] Tan, Y., Goddard, S., & Perez, L. C., A prototype architecture for cyber-physical systems. ACM Sigbed Review in International Conference of ACM Special Interest Group on Embedded Systems, vol. 5, no. 1, p. 26-28, 2008.

[11] Hoang, D. D., Paik, H. Y., & Kim, C. K., Service-oriented middleware architectures for cyber-physical systems. International Journal of Computer Science and Network Security, vol. 12, no. 1, p. 79-87, 2012.

[12] Innella, P, The evolution of intrusion detection systems, SecurityFocus, 2001.

[13] Liao, H. J., Lin, C. H. R., Lin, Y. C., & Tung, K. Y., Intrusion detection system: A comprehensive review. Journal of Network and Computer Applications, vol. 36, no. 1, p. 16-24, 2013.

[14] Abraham, A., & Thomas, J., Distributed intrusion detection systems: a computational intelligence approach. in Applications of Information Systems to Homeland Security and Defense, IGI Global, p. 107-137, 2006.

[15] Modi, C., Patel, D., Borisaniya, B., Patel, H., Patel, A., & Rajarajan, M., A survey of intrusion detection techniques in cloud. Journal of Network and Computer Applications, vol. 36, no, 1, p. 42-57, 2013.

[16] Stiawan, D., Shakhatreh, A. I., Idris, M. Y., Bakar, K. A., & Abdullah, A. H., Intrusion Prevention System: A Survey. Journal of Theoretical and Applied Information Technology, vol. 40, no. 1, p. 44-54, 2012.

[17] Available from: CERT, http://www.cert.org/statsS [Online], Accessed June 2016.

[18] Mukkamala, S., Sung, A., & Abraham, A. S. A., Cyber security challenges: Designing efficient intrusion detection systems and antivirus tools., Vemuri, V. Rao., Enhancing Computer Security with Smart Technology, Auerbach, New Mexico Tech, USA, p. 125-163, 2005.

[19] Farag, M.M., Architectural enhancements to increase trust in cyber-physical systems containing untrusted software and hardware. PhD thesis, Virginia Polytechnic Institute and State University, Blacksburg, Virginia, 2012.

[20] Goel, R., Sardana, A., & Joshi, R. C., Parallel Misuse and Anomaly Detection Model. in International Journal of Network Security, vol. 14, no. 4, p. 211-222, 2012.

[21] Mitchell, R., & Chen, R., A survey of intrusion detection in wireless network applications. In Computer Communications, vol. 42, p. 1–23, 2014.

[22] Mitchell, R., & Chen, R., A survey of intrusion detection techniques for cyber-physical systems. In ACM Computing Surveys (CSUR), vo. 46, no. 4, p. 55, 2014.

[23] Mitchell, R., & Chen, R., Effect of intrusion detection and response on reliability of cyber-physical systems. In IEEE Transactions on Reliability, val. 62, no. 1, p. 199-210, 2013.

[24] Foo, B., Wu, Y. S., Mao, Y. C., Bagchi, S., & Spafford, E., ADEPTS: Adaptive intrusion response using attack graphs in an e-commerce environment. In Dependable Systems and Networks, 2005. DSN. Proceedings. International Conference on IEEE, p. 508–517, Yokohama, Japan, June, 2005.

[25] Shiri, F. I., Shanmugam, B., & Idris, N. B., A parallel technique for improving the performance of signature-based network intrusion detection system. In Communication Software and Networks (ICCSN), 2011 IEEE 3rd International Conference on IEEE, p. 692-696, Xi’an, China, 2011.

[26] Han, H., Lu, X. L., & Ren, L. Y., Using data mining to discover signatures in network-based intrusion detection. In Machine Learning and Cybernetics. Proceedings. 2002 International Conference on IEEE, vol. 1, p. 13-17, 2002.

[27] Uppuluri, P., & Sekar, R., Experiences with specification-based intrusion detection. In RAID ’00 Proceedings of the 4th International Symposium on Recent Advances in Intrusion Detection, vol. 2212, p. 172-189, Berlin Heidelberg, Germany, 2001.

[28] Shafi, K., & Abbass, H. A., An adaptive genetic-based signature learning system for intrusion detection. In International Journal of Expert Systems with Applications, vol. 36, no. 10, p. 12036-12043, 2009.

[29] Meng, Y., & Kwok, L. F., Adaptive non-critical alarm reduction using hash-based contextual signatures in intrusion detection. In Computer Communications, vol. 38, p. 50-59, 2014. [30] Kruegel, C., & Toth, T., Using decision trees to improve signature-based intrusion detection. In International Workshop on Recent Advances in Intrusion Detection, RAID 2003. Lecture Notes in Computer Science, Springer, vol. 2820, p. 173-191, Berlin Heidelberg, Germany, 2003.

[31] Hanmanthu, B., Ram, B. R., & Niranjan, P., SQL Injection Attack prevention based on decision tree classification. In Intelligent Systems and Control (ISCO), 2015 IEEE 9th International Conference on IEEE, p. 1-5, Coimbatore, India, January 2015.

[32] Tao, Z., & Ruighaver, A. B., Wireless Intrusion Detection: Not as easy as traditional network intrusion detection. In TENCON 2005 IEEE Region 10, p. 1-5, Melbourne, Queensland, Australia, November 2005.

[33] Vapnik, V., The nature of statistical learning theory. Springer science & business media, second edition, Springer-Verlag, ISBN: 978-1-4419-3160-3, New York, 1995.

[34] Zhou, J., Chen, Z., & Jiang, W., Probability based IDS towards secure WMN. In Intelligent Systems and Applications (ISA), 2nd International Workshop on IEEE, p. 1-4, Wuhan, China, May 2010.

[35] Legrand, I., Newman, H., Voicu, R., Cirstoiu, C., Grigoras, C., Dobre, C., & Stratan, C., MonALISA: An agent based, dynamic service system to monitor, control and optimize distributed systems. In Computer Physics Communications, 2009, vol. 180, no. 12, p. 2472-2498, 2009.

[36] El Kalam, A. A., Deswarte, Y., Baïna, A., & Kaaniche, M., PolyOrBAC: a security framework for critical infrastructures. In International Journal of Critical Infrastructure Protection, vol. 2, no. 4, p. 154-169, 2009.

[37] Ten, C. W., Manimaran, G., & Liu, C. C., Cybersecurity for critical infrastructures: Attack and defense modeling. In IEEE Transactions on Systems, Man and Cybernetics-Part A: Systems and Humans, vol. 40, no. 4, p. 853-865, 2010.

[38] Kalam, A. A. E., Baida, R. E., Balbiani, P., Benferhat, S., Cuppens, F., Deswarte, Y., & Trouessin, G., Organization based access control. In Policies for Distributed Systems and Networks. Proceedings. POLICY 2003. IEEE 4th International Workshop on IEEE, p. 120-131, Lake Como, Italy, 2003.

[39] Kumar, M., Hanumanthappa, M., & Kumar, T. S., Intrusion detection system using decision tree algorithm. in Communication Technology (ICCT), 2012 IEEE 14th International Conference on IEEE, p. 629-634, Chengdu, China, 2012.

[40] Han, S., Xie, M., Chen, H. H., & Ling, Y., Intrusion detection in cyber-physical systems: Techniques and challenges. In IEEE Systems Journal, vol. 8, no. 4, p. 1052-1062, 2014.

[41] Al-Nashif, Y., Kumar, A. A., Hariri, S., Luo, Y., Szidarovsky, F., & Qu, G., Multi-level intrusion detection system (ml-ids). In Autonomic Computing, 2008. ICAC’08. International Conference on IEEE, p. 131-140, Chicago, Illinois, USA, 2008.