Spam Filtering Software Using JAVA
Info: 7983 words (32 pages) Dissertation
Published: 11th Dec 2019
Tagged: Information Systems
ABSTRACT
This is a software which will give us a basic idea of how e-mail spams filtering actually works in our everyday life. When we talk about Spam, we generally mean emails. So a spam mail is one which is sent to you as an email promotion or a bulk mail. And in most of the cases you are not interested in receiving them. So earlier days we had to go through the mail and identify if it’s a spam or not. A mail which is not spam (is called ham), we keep in Inbox and for the spam, and we manually used to move it to a junk folder. Now that is a lot of work to do, given that these days a huge amount of mails are spam and they waste our time and space. We can use this software to assist us to filter our emails for spams. The software will give us a basic idea of how e-mail spams filtering actually works in our everyday life. When we talk about Spam, we generally mean emails. So a spam mail is one which is sent to you as an email promotion or a bulk mail. And in most of the cases you are not interested in receiving them. So earlier days we had to go through the mail and identify if it’s a spam or not. A mail which is not spam (is called ham), we keep in Inbox and for the spam, and we manually used to move it to a junk folder. Now that is a lot of work to do, given that these days a huge amount of mails are spam and they waste our time and space. We can use this software to assist us to filter our emails for spams. The software will expect a collection of emails already downloaded for input. Then it will analyse all the emails and identify which among them is spam and which are important mails (ham). After all the programs run on it the identified spam will be moved to a specific folder which we will call junk folder. The important mails will be transferred to the inbox. Both of these folders will be visible through the software and if any of the important mails are moved to the junk folder or any of the spam is moved to the inbox folder, then we will modify the existing data on the spam and is then used for future calculations. So, the required outcome of the project is the filtered inbox folder which will contain all the important mails. The spams will be transferred to the junk folder which is later deleted.
INTRODUCTION
Email filtering is the processing of email to organize it according to specified criteria. Most often this refers to the automatic processing of incoming messages, but the term also applies to the intervention of human intelligence in addition to anti-spam techniques, and to outgoing emails as well as those being received. Common uses for mail filters include organizing incoming email and removal of spam and computer viruses. A less common use is to inspect outgoing email at some companies to ensure that employees comply with appropriate laws. Users might also employ a mail filter to prioritize messages, and to sort them into folders based on subject matter or other criteria. Naive Bayes classifiers work by correlating the use of tokens (typically words, or sometimes other things), with spam and non-spam e-mails and then using Bayes’ theorem to calculate a probability that an email is or is not spam. Naive Bayes spam filtering is a baseline technique for dealing with spam that can tailor itself to the email needs of individual users and give low false positive spam detection rates that are generally acceptable to users. It is one of the oldest ways of doing spam filtering, with roots in the 1990s. The spam that a user receives is often related to the online user’s activities. For example, a user may have been subscribed to an online newsletter that the user considers to be spam. This online newsletter is likely to contain words that are common to all newsletters, such as the name of the newsletter and its originating email address. A Bayesian spam filter will eventually assign a higher probability based on the user’s specific patterns. The legitimate e-mails a user receives will tend to be different. For example, in a corporate environment, the company name and the names of clients or customers will be mentioned often. The filter will assign a lower spam probability to emails containing those names. The word probabilities are unique to each user and can evolve over time with corrective training whenever the filter incorrectly classifies an email. As a result, Bayesian spam filtering accuracy after training is often superior to pre-defined rules. It can perform particularly well in avoiding false positives, where legitimate email is incorrectly classified as spam. For example, if the email contains the word “Nigeria”, which is frequently used in Advance fee fraud spam, a pre-defined rules filter might reject it outright. A Bayesian filter would mark the word “Nigeria” as a probable spam word, but would take into account other important words that usually indicate legitimate e-mail. For example, the name of a spouse may strongly indicate the e-mail is not spam, which could overcome the use of the word “Nigeria.”
RELATED WORK AND MOTIVATION
While new PC security dangers may travel every which way, spam remains a steady aggravation for non-benefits. At any rate, spam can intrude on your bustling days, driving you to invest energy opening and erasing messages selling home grown cures or ideal speculation openings. In a more genuine situation, spam could unleash a frightful infection on your association’s system, devastating your servers and desktop machines. Specialists and hostile to spam administrations tend to peg the rate of spam at somewhere in the range of 50 to 90 percent of all messages on the Internet. In spite of the fact that keeping relentless spammers from sending garbage mail may never be conceivable, introducing an against spam application on your association’s mail server or individual PCs can unfathomably lessen the measure of spam your staff members need to manage. Hostile to spam applications normally utilize at least one separating techniques to distinguish spam and prevent it from achieving a client’s inbox. Be that as it may, on the grounds that hostile to spam projects are intended to do a similar occupation doesn’t mean they all go about it similarly.
Listed below are some of the spam filters that are available based on different algorithms :-
List-Based Filters
Blacklist
Real-Time Blackhole List
Whitelist
Greylist Content-Based Filters
Word-Based Filters
Heuristic Filters
Bayesian Filters Other Filtering Methods
Challenge/Response System
Collaborative Filters
DNS Lookup Systems
Researching Spam-Filtering Products
The filter that we will be implementing is a content-based filter , bayesian filter. The difference between word-based filters, heuristic filters and bayesian filters are:-
Word-Based Filters
A word-based spam channel is the least complex sort of substance based channel. As a rule, word-based channels essentially hinder any email that contains certain terms. Since many spam messages contain terms not frequently found in individual or business interchanges, word channels can be a basic yet competent method for battling garbage email. Be that as it may, if designed to piece messages containing more typical words, these sorts of channels may create false positives. For example, if the channel has been set to stop all messages containing “rebate,” messages from honest to goodness senders offering your charitable equipment or programming at a decreased cost may not achieve their goal. Additionally take note of that since spammers frequently intentionally incorrectly spell watchwords keeping in mind the end goal to sidestep word-based channels, your IT staff should set aside a few minutes to routinely refresh the channel’s rundown of blocked words.
HEURISTIC FILTERS
Heuristic Filters Heuristic (or control based) channels make things a stride past straightforward word-based channels. As opposed to blocking messages that contain a suspicious word, heuristic channels take different terms found in an email into thought. Heuristic channels filter the substance of approaching messages and relegating focuses to words or expressions. Suspicious words that are ordinarily found in spam messages, for example, “Rolex” or “Viagra,” get higher focuses, while terms often found in ordinary messages get bring down scores. The channel then includes every one of the focuses and figures an aggregate score. In the event that the message gets a specific score or higher (dictated by the counter spam application’s director), the channel distinguishes it as spam and pieces it. Messages that score lower than the objective number are conveyed to the user.Heuristic channels work quick — limiting email delay — and are very compelling when they have been introduced and arranged. Be that as it may, heuristic channels designed to be forceful may produce false positives if a honest to goodness contact happens to send an email containing a specific blend of words. So also, some sharp spammers may realize which words to abstain from including, accordingly tricking the heuristic channel into trusting they are considerate senders.
Bayesian Filters
Bayesian channels, considered the most progressive type of substance based separating, utilize the laws of scientific likelihood to figure out which messages are authentic and which are spam. All together for a Bayesian channel to successfully square spam, the end client should at first “prepare” it by physically hailing each message as either garbage or true blue. After some time, the channel takes words and expressions found in genuine messages and adds them to a rundown; it does likewise with terms found in spam. To figure out which approaching messages are named spam, the Bayesian channel filters the substance of the email and after that thinks about the content against its two-word records to compute the likelihood that the message is spam. For example, if “valium” has showed up 62 times in spam messages list however just three times in real messages, there is a 95 percent chance that an approaching email containing “valium” is garbage. Since a Bayesian channel is continually constructing its statement list in light of the messages that an individual client gets, it hypothetically turns out to be more compelling the more it’s utilized. Be that as it may, since this technique requires a preparation period before it begins functioning admirably, you should practice persistence and will presumably need to physically erase a couple garbage messages, in any event at first.
Failure to get on whitelists poses a big problem –
A whitelist consists of a list of e-mail addresses or domain names that are granted permission to pass a blocking program’s filters and deliver a message to the intended recipient. Most ISPs, such as AOL and MSN, have whitelists. Similarly, almost all individuals who use email have a white list – commonly referred to as their address book. If you are not on an ISP’s whitelist and you try to send a series of messages to a number of their subscribers you can expect your message to be rejected from the get-go. Many companies are unaware or do not take the time to get whitelisted. It can be very time intensive as it requires taking the time to form a relationship with the ISP. Getting listed on a whitelist means having to prove you are legitimate and competent (e.g. good list hygiene).
Using too many bad words will trigger spam filters!–
There are many terms that individuals use consistently and in their promoting materials that can trigger spam channels. The exact opposite thing you need to do is utilize a progression of words or expressions that outcome in your email being blocked or dumped in a garbage mail envelope. Simply utilizing single word or expression is not prone to get your messages blocked, unless it is considered to a great degree awful. Expressions, for example, “switch maturing” and “think about rates” are two such expressions that rank high on the point scale. “Evacuates wrinkles” piles on more than 4 indicates agreeing the spam professional killer mail channel. Considering all it goes up against normal is 5 focuses to get separated, a solitary expressions likening more than 4 focuses is terrible. Trigger words can show up in the headline of a message, in the body of the message, and in the “To:” email address field.
Here is a small sample of common spam trigger words:
Filter Words in the Subject Line
Contains $$$ “100% free”
Contains word “ad”
apply now
Earn $
earn extra cash
eliminate debt
extra income
fast cash
financial freedom
free gift
free info
free offer
home based
online marketing
Filter Words in the Message Body
1-800 1-888
100% free
100% guarantee
call toll free
debt free
earn extra income
email marketing
information you requested
joke of the day
life insurance quote
limited time offer
lose inches
lose weight
work from home
you have won
“To” Email Address Begins With
everyone
fellow marketer
free
friend
group
list
netmarketer
nobody
promotion
winner
Not considering the design and distribution of your email can cause trouble – Quite often it is the very design of the email that results in it being blocked. It goes without saying that if you are sending an HTML newsletter and the coding is incredibly sloppy the spam filters will be throwing up red flags all over the place.
Take the time to make sure your code is clean.
Here are some common design and distribution issues that trigger spam filters:
WRITING IN ALL CAPS IS EQUAL TO SHOUTING
Excessive use of punctuations!!!!!!!!!!!???!?!!
Sending attachments with your emails
The color blue in the HTML or links
Including a large number of hotlinks
Links not superseded by http://
Links that reference numeric IP addresses instead of a domain name
HTML-only emails (e.g. an email that is made up of all pictures and no text)
Emails that are too long
Sending your email to too many people at once who reside on the same ISP
Use of CC: or BCC: to a group of people
Sending email from your email browser (E.g. Outlook, Eudora) to a list of recipients
Stating in your mail piece that you conform to spam laws (As a legitimate email marketer, you shouldn’t have to say it.)
We are going to build a spam filter that is used to detect unsolicited and unwanted email and prevent those messages from getting to a user’s inbox. Like other types of filtering programs, a spam filter looks for certain criteria on which it bases judgments. For example, the simplest and earliest versions (such as the one available with Microsoft’s Hotmail) can be set to watch for particular words in the subject line of messages and to exclude these from the user’s inbox. We will be implementing a “Bayesian spam filter”. Naive Bayes classifiers are a famous measurable procedure of email sifting. They normally utilize pack of words elements to recognize spam email, an approach usually utilized as a part of content characterization. Naive Bayes classifiers work by associating the utilization of tokens (commonly words, or in some cases different things), with spam and non-spam messages and afterward utilizing Bayes’ hypothesis to compute a likelihood that an email is or is not spam. Naive Bayes spam separating is a benchmark procedure for managing spam that can tailor itself to the email needs of individual clients and give low false positive spam location rates that are by and large adequate to clients. It is one of the most established methods for doing spam sifting, with roots in the 1990s.
MOTIVATION OF THE PROPOSED WORK
Motivation of the proposed work Common uses for mail filters include organizing incoming email and removal of spam and computer viruses. A less common use is to inspect outgoing email at some companies to ensure that employees comply with appropriate laws. Users might also employ a mail filter to prioritize messages, and to sort them into folders based on subject matter or other criteria. Naive Bayes classifiers work by correlating the use of tokens (typically words, or sometimes other things), with spam and non-spam e-mails and then using Bayes’ theorem to calculate a probability that an email is or is not spam. Naive Bayes spam filtering is a baseline technique for dealing with spam that can tailor itself to the email needs of individual users and give low false positive spam detection rates that are generally acceptable to users. It is one of the oldest ways of doing spam filtering, with roots in the 1990s
COMPARITIVE ANALYSIS
OUR PROJECT | EXISTING MODEL |
In the development of the spam filter we have used databases to store some words to identify the mails as spam or ham
The advantage of using databases is that the tables in databases are dynamic , that is, they change(increase or decrease) according to the data stored in it. We have made the use of naïve Bayesian algorithm for developing the spam filter. The spam filter developed in our project is a content based Bayesian filter. The use of the spam filter developed in our project in mainly based and developed for the corporate world |
The existing spam filters use hash tables and other data structures.
In hash tables one has to define the size of the hash table which may not prove to be optimal is many cases. The existing spam filter makes use of many other different kinds of algorithms like the FISHER-ROBINSSON INVERSE chi square algorithm, KNN classifier, ADABOOST classifier. There are many other different types of filter available like list based filter(whitelist, greylist, blacklist) , collaborative filters. There exists other filters which are more suitable for individuals , schools , colleges |
EXPERIMENTAL LAYOUT
SYSTEM ARCHITECTURE
Architectural Design
When we open the software then the first window that opens up will be the login page which will give users their privacy and their own account.
The StartGUI.java will define this JFrame is the starting window. This page contains a sign up option which creates an account and login option. The page is divided by a vertical separator. Left side of the window contains Sign-In area contains new-username, new-password, confirm-password, create-button. Text-Field will get the entered text from the user. On clicking create-button it cross-checks the validity of the username, password and confirm password, then creates a new account in the database. Right side of the window contains Login area contains login, username, password, login-button. On clicking login-button it cross-checks the validity of the username and password, then accesses the account.
The Home.java is the window that opens after the login. This page contains menu at the top of the window containing JMenus which are Home, Inbox, Junk, Profile, Change Password and Sign-out. Next it contains a JList which has a list of all the mails sent to the account. User can manually send the mail to inbox folder and junk folder, since we are keeping that option available to the user. After that there is filter [JButton]. On running the filter the important mails will go the inbox folder and rest of them goes the junk folder which is done by the event listener filterActionPerformed(). Apart from this on clicking any JMenus in the menu will result in going to that window using defined event listeners for that particular menu jMenuMouseClicked().
The Inbox.java is the next window [JFrame]. This window contains same menu with same functionality as the Home window. It contains a JList next which has a list of all the important mails or hams sent to the account. If we delete any mail in the inbox then it is transferred to the junk folder.
The Junk.java is the next window [JFrame]. This window contains same menu with same functionality as the Home window. It contains a JList next which has a list of all the junk mails or spams sent to the account. If we delete any mail in the junk window then it is permanently deleted from the account. We can also restore all hams to inbox.
The Profile.java is the next window [JFrame]. Profile window is an additional window dedicated to the user personal details. This information can be modified by user who is using that particular account. The database will be modified accordingly.
The ChangePassword.java is the next window [JFrame]. Password can be modified using this window. It has three essential components previousPassword [JTextField], newPassword [JTextField], confirmPassword [JTextField]. It asks for the previous password, then new password and after that it asks to confirm password. The changePassword [JButton] has an event listener which checks the validity of the password and updates the database.
The Sign-Out [JMenu] when clicked runs an event listener jMenu6MouseClicked() which sign-outs the account and returns user to the Login window.
Decomposition Description
There are various subsystems mentioned in the above Architectural design. The first window that opens up will be the login page which will give users their privacy and their own account. This page contains a sign up option which creates an account and login option. The login box has user-id and password which will log the user into his specific account. After login, it opens up a second window will be the home page which will show the user all the sent mails as a list. There will be an option available to the user to run the spam filter in the form of a button. On running the filter the important mails will go the inbox folder and rest of them goes the junk folder. Of course user can manually send them to inbox folder and junk folder, since we are keeping that option available to the user. We can go to the inbox folder and junk folder via a menu bar that is part of the window. This will be located at the top of the window. The menu bar will contain various options in the form of buttons such as home, inbox, junk, profile, password change, sign-out. The inbox window will contain the same menu bar. It will also contain all the important mails sent to the user, which is important to the user. If we delete any mail in the inbox then it is transferred to the junk folder. The junk window will contain the same menu bar. It will also contain all the spam mails sent to the user, which is important to the user. The user can restore any mail in the junk folder to the inbox. Any mail in this window can be permanently deleted any time according to the user using a delete button. A popup window to confirm the delete will open if user attempts to permanently delete a mail from the junk folder. Profile window is an additional window dedicated to the user personal details. This information can be modified by user who is using that particular account. Password can be modified using this window. It has three essential components. It asks for the previous password, then new password and after that it asks to confirm password. Sign-out button will logout the user from his account and take him to the login page.
Data Flow Diagram
Structural Decomposition Diagram
DATA DESIGN
DATA DESCRIPTION
We are using an SQL Database System to store data. First of all a database is created which will contain all the tables. These tables are:
- profile
- mails
- word
- commonwords
- deletedmails
Profile
Username | Password | Fullname | DOB | Gender | Mobile | Country |
ghanshyam23 | sehgal56 | Ghanshyam Sehgal | ######## | Male | 8.97E+09 | India |
rohan | rohan123 | Rohan Roy | ######## | Male | 8.96E+09 | India |
Mails
MailId | Username | SenderId | Subject | Message | status |
1 | rohan | CodeProject | Daily news | Perform analytics on real-time data with cognitive capabilities & automated workflows through the Watson IoT platform on IBM Cloud platform – Bluemix. Build meaningful experiences for your clients. | spam |
2 | rohan | TheChef | Invitation to March Lunchtime | The school is over. And it’s time to head to the canvas and draw some exciting summer vacation plans. We too have some plans lined up for you. With few more contests added to our rated contest list, this is your chance to better your ratings and accumulate some more laddus. Yes! We will have new rated contests here onward. If that sounds exciting, let’s give you the details for all the impending contests. | spam |
3 | rohan | QuoraMessages | Quora sent you a message on Quora | Hello! We will be moving to the new anonymity on Quora experience very soon. If you would like to edit or delete your existing anonymous content in the future, please provide your email here before March 20, 2017. You are receiving this message because we have not yet received an email from you. Please note that if you do not provide your email by March 20, 2017, you will need to contact us using our Contact Form and selecting “I need help with my account.†| spam |
4 | rohan | Mendeley | Mendeley Sign-up Verification | I’m part of the Mendeley team and just noticed that you haven’t verified your e-mail address yet. | ham |
5 | rohan | HoxxVPN | Android App is available | We are proud to announce about our android app launch. From now on you can use our android app on your phone or tablet to surf through one of our vpn servers. | ham |
6 | ghanshyam23 | HackerRank | Join Week of Code 31 | Beginning Monday, April 10th, we’ll unlock one new challenge each day for you to solve. You’ll have 7 days to solve 7 challenges. The top 10 coders win HackerRank T-shirts. Learn more here. | spam |
7 | ghanshyam23 | rohan | Invitation to Party | On Wednesday 3rd May there is a birthday party. And you have to come. I will email you the address and time later. | ham |
8 | ghanshyam23 | Log in Verification | It looks like someone tried to log into your account on April 13 at 12:16am using Chrome for Windows 10. Your account is safe; we just wanted to make sure it was you who tried to log in from somewhere new. | ham | |
9 | rohan | CodeProject | The Daily Build | I’m looking for about a dozen volunteers to test a new system we’re working on to filter articles and questions. If you’re a regular reader and keep wishing you could customise your reading list a little more conveniently please fire me an email. Places are limited. | spam |
Word
Word | Username | SpamCount | HamCount |
account | rohan | 1 | 3 |
accounts | rohan | 1 | 1 |
accumulate | rohan | 2 | 1 |
added | rohan | 2 | 1 |
address | rohan | 0 | 4 |
allies | rohan | 1 | 1 |
also | rohan | 0 | 2 |
analytics | rohan | 7 | 1 |
android | rohan | 0 | 6 |
announce | rohan | 0 | 2 |
anonymity | rohan | 1 | 1 |
anonymous | rohan | 1 | 1 |
articles | rohan | 1 | 1 |
automated | rohan | 7 | 1 |
available | rohan | 0 | 2 |
avatar | rohan | 1 | 1 |
back | rohan | 1 | 1 |
basically | rohan | 0 | 2 |
because | rohan | 1 | 1 |
Commonwords
Common |
& |
able |
about |
above |
across |
after |
against |
all |
along |
amid |
among |
an |
and |
around |
as |
at |
been |
before |
behind |
being |
below |
beneath |
beside |
besides |
between |
beyond |
but |
by |
concerning |
considering |
could |
despite |
do |
does |
done |
In the login page when we Sign-In i.e. create a new account then a new row is added in the table Profile. And when we Login into an account then the Username and Password are cross-checked from the Profile table.
In the home page all the content of Mails table is displayed in the JList. When the filter button is pressed then Bayesian Model is followed using the event listener. This function changes the status of the mail to spam and ham in the table Mails.
The Inbox and Junk is displayed using the Mail table. The status is often wrong then the changes are made by transferring mail from Inbox to Junk or vise-versa. Then both Map and Word-list tables are modified, the status and probability are changed hence.
Profile of the user is displayed using Profile table. It can be modified using the window and the modifications are done in the rows. Password can also be modified and these modifications are done in Profile table.
HUMAN INTERFACE DESIGN
OVERVIEW OF USER INTERFACE
The first window that opens up will be the login page which will give users their privacy and their own account. This page contains a sign up option which creates an account and login option. The login box has user-id and password which will log the user into his specific account.
After login, it opens up a second window will be the home page which will show the user all the sent mails as a list. There will be an option available to the user to run the spam filter in the form of a button. On running the filter the important mails will go the inbox folder and rest of them goes the junk folder. Of course user can manually send them to inbox folder and junk folder, since we are keeping that option available to the user.
We can go to the inbox folder and junk folder via a menu bar that is part of the window. This will be located at the top of the window. The menu bar will contain various options in the form of buttons such as home, inbox, junk, profile, password change, sign-out.
The inbox window will contain the same menu bar. It will also contain all the important mails sent to the user, which is important to the user. If we delete any mail in the inbox then it is transferred to the junk folder.
The junk window will contain the same menu bar. It will also contain all the spam mails sent to the user, which is important to the user. The user can restore any mail in the junk folder to the inbox. Any mail in this window can be permanently deleted any time according to the user using a delete button. A popup window to confirm the delete will open if user attempts to permanently delete a mail from the junk folder.
Profile window is an additional window dedicated to the user personal details. This information can be modified by user who is using that particular account.
Password can be modified using this window. It has three essential components. It asks for the previous password, then new password and after that it asks to confirm password.
Sign-out button will logout the user from his account and take him to the login page.
SCREEN IMAGES
Login Page
Home Page
Inbox
Junk
Profile
Change Password
SCREEN OBJECTS AND ACTIONS
Login Page – This page contains a sign up option which creates an account and login option. The page is divided by a vertical separator. Left side of the window contains Sign-In area contains new-username, new-password, confirm-password, create-button. Text-Field will get the entered text from the user. On clicking create-button it cross-checks the validity of the username, password and confirm password, then creates a new account in the database. Right side of the window contains Login area contains login, username, password, login-button. On clicking login-button it cross-checks the validity of the username and password, then accesses the account.
Home Page – This page contains menu at the top of the window containing Home, Inbox, Junk, Profile, Change Password and Sign-out. Next it contains a list which has all the mails sent to the account. User can manually send the mail to inbox folder and junk folder, since we are keeping that option available to the user. After that there is filter button. On running the filter the important mails will go the inbox folder and rest of them goes the junk folder which is done by the event listener. Apart from this on clicking any options in the menu will result in going to that window using defined event listeners for that particular menu.
Inbox – This window contains same menu with same functionality as the Home window. It contains a list of all the important mails or hams sent to the account. If we delete any mail in the inbox then it is transferred to the junk folder.
Junk – This window contains same menu with same functionality as the Home window. It contains a list of all the junk mails or spams sent to the account. If we delete any mail in the junk window then it is permanently deleted from the account. We can also restore all hams to inbox.
Profile – Profile window is an additional window dedicated to the user personal details. This information can be modified by user who is using that particular account. The database will be modified accordingly.
Change Password – Password can be modified using this window. It has three essential components previous-password, new-password, and confirm-password. It asks for the previous password, then new password and after that it asks to confirm password. The change-password button has an event listener which checks the validity of the password and updates the database.
Sign-Out – The Sign-Out menu option when clicked runs an event listener which sign-outs the account and returns user to the Login window.
DISCUSSION
The idea of spam filter software and its implementation has been done by so many researchers and experienced software developers in the last few decades. However, in the previous researches the implementation of the spam filter software for ignoring and elimination of the unwanted e-mails have been done using different techniques depending upon the purpose of the spam filter like the content based spam filter and list based spam filter. The technique that we have tried to implement and improvise comes under the content based spam filter-bayesian filter which is based on the naïves bayesian spam filter formulation. By applying the spam filter and finding out the number of unwanted e-mails filtered by the spam filter provides assurance to the software testers of the quality of the spam filter. For evaluating the implemented spam filter we extracted some of the e-mails from our personal e-mail accounts and performed the filtering operation on these e-mails which was based on the percentage of the e-mails that were actually marked as spam and the number of e-mails that were spam but were not filtered by the proposed filter. The results presented under the results and analysis section proved to be quite promising and showed that the implemented spam filter is quite accurate in predicting the e-mails as spam or ham. Making some changes in the implemented technique along with the changes in the data structure “hash table” that is normally used for the implementation of spam filter with “database” showed promising results and proved to be more accurate and fast in the filtering process along with some added advantages that includes one of the most important phase, the reduction in the training period of the spam filter as words that normally are filtered by people working in a particular field could already be stored in the database and provided to them.
Main advantages of the proposed technique are as follows :-
- Increase in the filtering speed of the spam filter as compared to existing spam filters.
- Reduction in the training period of the spam filter because of the use of database instead of a “hash table”, which can already have some data stored in it for the filtering process.
- Utilization of the proposed spam filtering technique for the corporate sector in a shared manner. This means that the proposed technique in this paper that uses a database instead of a database could also be replaced with a cloud database that would not require a common group of people suppose working in the same department to install the spam filter in their personal computers and train it indivisually, it could simply be trained once only by an indivisual and could be used by all the people having access to the cloud.
However the one challenge that we faced in the implementation of the software was the size of the software. Our spam filter being linked with the database requires a great amount of space especially when used for long periods of time, the increase in the number of words stored in the database would eventually increase the space required by the software therefore degrading its performance overtime. Hence it is recommended that the software be cleaned and trained again and again after a certain periods of time, else it could make the filtering process slow which could become frustrating in today’s world given the value of time.
THREATS
• Encrypt email addresses When one first creates an email address, come up with a combination of letters and numbers that are cryptic in nature something you couldn’t find in a dictionary. For example, instead of using sally, or sally1, or sallysmith, choose: s18all56y. This number/letter combination is inconvenient for humans to remember but it provides more of a challenge for the spammer’s programs to randomly send Spam to your email address.
• Use fake e-mail addresses On many websites, one is required to enter an email address into a standard form before one can proceed through the website. If one doesn’t feel comfortable giving out one’s email address to the particular website, leave a fake email address.
• Guard your e-mail addresses Treat your email address the same way you do most of your personal information. Don’t give it out to anyone you don’t trust. If you are not sure you can trust a particular website, read their privacy policy to see what they will do with your email address.
• Don’t open spam If the Spam is HTML (one of those attractive graphic emails) and you open it, the graphic is pulled from the spammer’s server. Your computer informs the spammer that your email address is in use.
• Don’t reply; Remember those pesky telemarketers or the unrelenting door-to door salesman? Once you answer the telephone or door, they know you are home and are a challenge to get rid of. The same is true of spammers. Once you reply to a Spam email, you have just confirmed for the spammer the legitimacy of your email address.
• Don’t post your e-mail addresses Once your email address has been placed on a website (personal or corporate) or entered into an online guest book, newsgroup, contact list, ezine, chat room, or a host of other online activities, you have just invited a spammer to take your email address. Spammers “harvest” your emails through programs called spiders, crawlers, and bots. These programs scour the web for email addresses to be used in the spammers future email campaigns
. • Opt-out When you are purchasing something online or signing up for a service or promotion, be sure to opt-out on any additional services or promotions you don’t want cluttering your inbox
• Don’t unsubscribe Honourable marketers will unsubscribe your email address if you request it, but distinguishing between legitimate companies and those who are not is a challenge. Check their privacy policy and complaint procedures. Submitting and unsubscribe request can be used against you your email address may be confirmed by or sold to spammers. When this happens, your Spam will increase when you thought you’d submitted an unsubscribe request
• Advanced malware protection Advanced malware protection that enables IT admins to secure their email infrastructure from the new breed of malware that creates different variants of itself to avoid detection.
• Combining an arsenal of anti-spam filters Blocking spoofed emails, blocking emails sent in other languages.
• Protect your users against phishing and spyware. Detects and blocks threats posed by phishing emails by comparing the content of the spam with a constantly updated database. This ensures all the latest phishing emails are captured. As extra protection, it also checks for typical phishing keywords in every email sent to your organization.
CONCLUSIONS AND FUTURE WORK
In this paper we have presented a spam filtering software that would help people to ignore the unwanted e-mails. The combination of the naives bayesian formulation along with the database used instead of a hash table showed great results with an increase in accuracy in the filtering technique along with an increase in the speed of the filtering technique. Different experiments were conducted for the testing of the implemented software by using different e-mails from our own personal e-mail accounts and the results presented that the use of database and the changes in the naives bayesian technique showed improvement in the output. The implemented spam filter software could prove to be of huge help to people whose work is generally done via e-mails like people working in the corporate sector. In today’s world people have too much work and less time and the unwanted e-mails being received again and again could become frustrating. In future , performance of the presented for projects including huge amount of e-mails may have to be tested and analysed. In addition, some improvements can also be made in the future to the presented work in our paper by using a cloud instead of a database as previously mentioned in the discussion’s section. The use of cloud would eliminate the training period of the software for multiple people doing the same kind of work in a company, receiving the same kinds of e-mails as the training period of the software need not be done by every indivisual as the words could be stored in the cloud as accessed by the concerned , also various other techniques along with the beysian technique can be implemented to improve the filtering procedure to obtain better results and make spam filter highly useful for a wider range of fields.
Cite This Work
To export a reference to this article please select a referencing stye below:
Related Services
View allRelated Content
All TagsContent relating to: "Information Systems"
Information Systems relates to systems that allow people and businesses to handle and use data in a multitude of ways. Information Systems can assist you in processing and filtering data, and can be used in many different environments.
Related Articles
DMCA / Removal Request
If you are the original writer of this dissertation and no longer wish to have your work published on the UKDiss.com website then please: