Data Mining Algorithm Techniques

Info: 1439 words (6 pages) Introduction
Published: 22nd Oct 2021

Tagged: Computer ScienceCyber Security

Share this: Facebook Twitter Reddit LinkedIn WhatsApp

Introduction

Phishing is seen as a difficult challenge in today's world, with the number of victims increasing every year. It is considered an illegal act when social engineering and technological tools are used to steal personal information from customers, such as usernames and passwords (Manning & Aron 2015).

Spam messages are classified as phishing emails. Users receive emails claiming to be from a reputable corporation or bank and instructing them to click on an embedded connection. The customer will be redirected to a bogus website that will ask for personal details including usernames, passwords, and credit card numbers (Al-Momani and Gupta 2013).

The phishing technique's interval is shown in Figure 1.1. The method starts by submitting emails to the inboxes of the persons targeted in an effort to get them to click on a connection contained in the text.In that way, online phishing is similar to conventional fishing; instead of using fishing bait and line to capture a trout, the phisher would send out as many emails as possible in an effort to get as many recipients as possible to "catch" the bait and click on the embedded attachment (Al-Momani and Gupta 2013).

Phishers use one of two methods to accomplish their objectives: manipulative phishing or malware-based phishing. The first method relies on social-engineering techniques that use emails to send false links that appear to come from a legitimate company or bank account and guide the recipient to a bogus website that asks for private information such as usernames, passwords, credit card numbers, and personal information.

While malware-based phishing does not explicitly ask for information, it does rely on malicious passwords, malware, and technological schemes if users click on the embedded connection, or it searches for security vulnerabilities in the receivers' devices to directly obtain their online account information. The phisher may attempt to redirect the user to a fake or legitimate website that is being controlled by substitutions (Al-Momani, 2013).

In 2012, an online study stated that phishing attacks resulted in a $1.5 billion loss, which the report attributed to the impact of the attacks. This massive loss and hazard is on the rise, necessitating the development of more effective identification strategies for phishing emails in order to limit the damage and minimise the danger (Akinyelu, 2014).

In order to label an email as phishing or not, phishing identification techniques derive values from the analysed emails using a pre-defined collection of features. The classification is done with the aid of extracted feature vectors and a qualified model (Figure 1.3).

1.2 Statement of the Issue

Phishing is a method of obtaining personal information for the purpose of identity fraud by sending phoney e-mail messages that claim to be from reputable companies. This is normally accomplished by submitting emails that seem to originate from a trustworthy source in order to obtain access to a person's personal and sensitive details.

Phishing emails are the most popular cyber crime tool for stealing personal financial information and committing identity theft. Responding to phishing e-mails by entering demanded financial or personal information into e-mails, websites, or pop-up windows puts themselves and their organisations at risk.

According to the Microsoft Consumer Safety Index, the annual global impact of phishing email is $5 billion. Repairing their effect, on the other hand, would cost $6 billion (MCSI reveals the impact of poor online safety behaviours in Singapore, 2014).

Despite the extensive research into phishing email detection, no single group of features has been identified as the most effective in detecting phishing. Furthermore, the underpinning classification algorithm is subjected to the same nondeterministic situation. Finally, there is a need to keep improving the detection techniques' precision. The following are the major issues addressed in this study:

How to choose the most appropriate set of features for phishing detection.

How to use the most appropriate classification algorithm for phishing detection.

How to make the best picked features and classifiers do much better.

How to combine various classification algorithms for phishing detection and assess their effectiveness.

1.3 The Study's Objectives

The aim of this study is to compare and contrast various classification data mining algorithms techniques, as well as different feature selection scenarios (manual feature selection and automated feature selection groups). Furthermore, the aim involves the development of a multi-classifier integration model that combines clustering and multiple classification techniques to improve phishing email identification and protection.

The following are the goals of this study:

Determine and compare the right collection of features to use for phishing email identification using manual feature selection based on the layout of the email and automatic selection techniques.

To improve phishing detection, combine unsupervised machine learning techniques with the best supervised machine-learning algorithms.

Design a method that integrates several classification algorithms for phishing Emails detection and evaluates such integration to decide the best classification algorithm for phishing detection.

1.4 Inspiration

Phishing's negative consequences could include gaining access to users' personal information, resulting in financial losses for users and also preventing them from accessing their own accounts. As a result, we will measure and qualify the phishing email functionality in this research in order to avoid and mitigate the likelihood of phishing emails.

In addition, this research would compare and contrast classifiers, data mining algorithms, manual feature selection groups, and automatic feature selection groups. Special attention should be paid to header-based features such as sub-reply, sub-verify, and so on, as well as content-based features (body) such as body suspension word and dear word, long URL addresses, and so on, in order to pick those that provide high quality for our research. In addition, convergence of grouping and clustering would be introduced to improve identification accuracy.

1.5 Scope and restrictions

The aim of this study is to identify phishing emails, so 47 features were chosen and grouped into five categories that cover all email components. Furthermore, five classification data mining algorithms were used to detect phishing emails: LR, DT, One R, SMO, and nave foundation.

For the limitation, this research will not cover the phishing websites, moreover the experiments will not cover all the available classification algorithms. However, this study will evaluate experimentally the most well-known algorithms.

1.6 Contribution

The aim of the thesis is to develop a data mining-based phishing detection model. The following are the contributions of this thesis:

Manually and automatically choose the right sets of features for the phishing detection problem.

Test and equate the output of feature sets picked manually and automatically in an experimental environment. Test the accuracy of the classification algorithm for fishing detection in an experimental environment.

Propose a phishing prevention scheme that integrates multiple classifications.

1.7 Thesis Structure

The research is organised into five chapters:

Chapter One: Introduction: outline of phishing detection techniques, problem statement, research aim, purpose, scope and limitations, thesis reference, and finally thesis structure.

Chapter 2: Literature Review: This chapter gives an outline of relevant studies in phishing email identification as well as a list of publications written by other researchers.

Chapter 3: Methodology: Offers an overview of the research techniques used in this study. This study used an overview of the applications and dataset that were used to evaluate the proposed process.

Chapter Four: the experiment's implementation specifics and the outcomes obtained with all of the proposed conditions, as well as a comparison of the results.

Chapter 5: Concluding remarks and prospective development path.

Share this: Facebook Twitter Reddit LinkedIn WhatsApp

Cite This Work

To export a reference to this article please select a referencing stye below:

Related Services

View all

Dissertation Writing Service

From £136

Dissertation Proposal Service

From £124

Female student reading and using laptop to study

Literature Review Writing Service

From £124

DMCA / Removal Request

If you are the original writer of this dissertation introduction and no longer wish to have your work published on the UKDiss.com website then please:

Dissertation Services

PhD Services

Other Services

Contact