Disclaimer: This research project has been written by a student and is not an example of our professional work, which you can see examples of here.

Any opinions, findings, conclusions, or recommendations expressed in this research project are those of the authors and do not necessarily reflect the views of UKDiss.com.

Detection of Online Opinion Spam

Info: 8896 words (36 pages) Example Research Project
Published: 20th Oct 2021

Reference this

Tagged: Computer Science

Abstract

Online opinion spamming has become a potential threat widespread in this digital era as most decisions from purchase of a simple product to consulting certain doctor are taken based on the online user opinions. Taking this as an advantage, businesses in various fields have either committing online spamming or being affected by the same for multiple reasons like market competition and profit gains.

Despite of the significant research carried out in identifying spam reviews, there is a huge gap left unbridged in detecting the spamming activity on the business as a whole (we measure this as honesty of the businesses). Identifying a single review to be spam or benign cannot clearly justify the business to be dishonest or trustworthy. With the advancements in the camouflage strategies followed by malicious users (spammers) in writing fake reviews, it has become difficult to categorize a review as a spam/no-spam. One such important strategy is singleton review technique – the technique where reviewers create multiple accounts and write only one review under each account. A large number of such Singleton Reviews (SRs) constitute to a biased review of the overall business. Recent research reveal that singleton reviews are a significant source of spam reviews and largely affects the ratings of online businesses. For example, approximately 68% of the amazon review data are singleton reviews.

In this research project we focus on detecting the businesses that are affected by opinion spamming over time. We take advantage of the Yelp review data containing reviews from 5,044 business by 260,277 reviewers. We leverage the recent techniques in deep learning such as transfer learning, semantic embeddings, auto encoding and LSTMs to classify the business as honest or dishonest based on semantic analysis of their reviews over time.

Extensive experiments showed that the proposed models outperformed the baseline models in terms of precision, recall, and F1-score metrics in identifying both honest and dishonest businesses.

TABLE OF CONTENTS

1. INTRODUCTION….…….……….………………………….…….…….……….… 1

2. BACKGROUND OF THE STUDY………………………………………………… 5

3. RELATED WORK………………………………….…….…………………..….…10

4. DATA ANALYSIS…..….…..….….….….………..……..…….…..….…….…..… 12

5. PROPOSED SOLUTION AND ITS ARCHITECTURE……………….….….…… 16

6. EXPERIMENTATION AND INTERPRETATION…….….……….…..……….… 21

7. BASE MODELS ….…….……….………………………….…….…….……….… 24

8. COMPARITIVE RESULTS ….…….……….………………………….…….……. 26

9. CONCLUSION ….….….……….……………….………….…….…….……….… 34

References

LIST OF TABLES

1. Dataset Files and Details….…….……….….…………….…….…….…….…. 12

2. Different Variations of Proposed Model .….…….……….…….…….…….…. 21

3. Results of LSTM_WORC Model ….…….………..…….……..…….….….…. 21

4. Results of BiLSTM_WORC Model ….…….…………….…….…….…….…. 22

5. Results of LSTM_WRC Model ….…….………....……..……..…….….….…. 23

6. Results of BiLSTM_WRC Model ….…….….……….….……..…….…….…. 23

7. Results of Support Vector Machines ….…….….….….….……….….…….…. 25

8. Results of Simple Neural Network ….…….…..….…………….…….…….…. 25

LIST OF FIGURES

1. Artificial Deep Learning Neural Network..……….………..….….…..…..…..…. 6

2. Repetitive Long Short-Term Memory Cells……….………..…….….…….….…. 8

3. Frequency Distribution of reviews over time……….…..…….…………………. 13

4. Cumulative Distribution of Fake Percentage across stores….….…….…….…… 14

5. Distribution of Honest and Dishonest Stores ……….………..….….…..…..….. 15

6. Document Vector Generator Model Architecture.….…..…….….…..…..…..…. 17

7. Store Embedding Extractor Model Architecture..…..…...……….…..………..… 18

8. Store Honesty Classifier Model Architecture...…..…...……….…..………...….. 19

9. Complete Architecture ..…….….……..….…..…....………..…..………...……. 19

10. Comparison of Precision Values of Different Models on Dishonest Stores ……. 26

11. Comparison of Recall Values of Different Models on Dishonest Stores ………. 27

12. Comparison of F1-Scores of Different Models on Dishonest Stores ..……….…. 28

13. Comparison of Precision Values of Different Models on Honest Stores ..….….. 29

14. Comparison of Recall Values of Different Models on Honest Stores ..………... 29

15. Comparison of F1-Score of Different Models on Honest Stores ..………..……. 30

16. Comparison of Average Precision Values of Different Models on all Stores.….. 31

17. Comparison of Average Recall Values of Different Models on all Stores………. 32

18. Comparison of Average F1-Scores of Different Models on all Stores….………. 32

INTRODUCTION

Growth of digitalization has changed the entire business model of this smart era to a feedback driven model. Every field of business from a simple household store to a multibillion revenue ecommerce space, from a drug store to pharmaceutical companies, from a hotel to a worldwide tourism, from the food we eat to the glamor product industries everything around us and every decision we make is based on the user opinion and online recommendations. The recent surveys reveal that 88% of customers trust online reviews as much as personal recommendations. On average, one star increase on yelp leads to 5-9 percent increase in the business's revenue and one negative review can cost up to 30 customers. Having such a prominent role in our day to day life for online reviews has led some businesses to purposefully generate spams wherein spammers manipulate the perception of a business (e.g. a restaurant) by writing fake reviews. This has raised a serious issue of trust and created a need to automatically detect spamming activity and identify the businesses that are affected by it.

Extensive work has been devoted to identifying fake reviews via modeling various factors separately. However, this problem remains challenging due to the fact that more advanced camouflage strategies are utilized by malicious users (spammers) hired by business owners or competitive business owners. One such important strategy is singleton review technique. Singleton review technique is where reviewers create multiple accounts and write only one review under each account. Recent discoveries show that singleton reviews are significant source of spam reviews and largely affects the ratings of online stores. At the same time, identifying a single review to be spam or benign cannot clearly justify the business to be dishonest or trustworthy. Our project is focused on detecting the businesses affected by fake reviews over time.

Unlike the previous works that are mainly focused on the non-semantic review statistics such as review rating, review length, number of reviews, ratio of singleton reviews [3][4][7], in this project, we leverage the semantics of the reviews over time in order to classify the businesses as honest or dishonest. The motivation to study this problem as a temporal problem is drawn from [1] which explains the fact that most of the spamming attacks would occur for a specific period of time (for example 6 months or 3 months) as a business needs to improve itself or demean the competitive businesses in a short period of time.

1.1 Our Contributions

We propose a 3-step deep learning architecture, which is a novel architectural framework that is capable of detecting the occurrence of spamming activities in a business and thus classifying the business as honest or dishonest. Our architecture mainly focuses on the review semantics of a business over time along with few nonsemantic features of the reviews. Considering the fact that spamming occurs in a temporal fashion, we have conducted extensive experiments on the real-time Yelp restaurant business data with the three different time frame 'bins' (i.e. 6 months, 3 months &1 month) of reviews for each business.

We design a document to vector generator model that takes advantage of the learnings from the GloVe word2vec model pre-trained on the twitter data [11]. This is further retrained on our yelp data and combined with respective TF-IDF vectors to generate a weighted sum which represent the document vector for each review in each restaurant.

We tailor a business embeddings generator model which is an LSTM based recurrent neural network models that learns the linguistic features and patterns from the document vectors of businesses for a given time frame bin created by document to vector generator model and emit a vector embeddings representing the business for that particular time bin. During the experimentation we have created a new variant of the model that accepts and considers non-semantic attributes of reviews along with semantics in generating the embeddings of the business.

We implement a classification model with two variants 1) LSTM based and 2) Bidirectional LSTM based models that classify the business/store as honest or dishonest, based on the embeddings generated by the embedding's generator model. During the training, we feed embeddings of all the bins for a given business along with its labels of honesty.

We ran extensive experiments with the proposed 3-step architecture which is a combined architecture of the models 1) Document to vector generator 2) Business embeddings generator and 3) Business classification model on the Yelp review data with over 5044 restaurant businesses that have more than 608,598 reviews collected from NJ, VT, CT, and PA regions of USA and this proposed architecture has outperformed the results of baseline models in all the 3 time frames with the 3 month time frame data bins producing the best results of 77% F1 Score which is an 8% increase compared to baseline models.

The next portions of the report are organized as follows. We go over the background of the study and related work in chapters 2 and 3 respectively. We walk through our data and its analysis in chapter 4. In chapter 5, we introduce our proposed solution and present its architecture in detail. We explain our experiments and interpret the results in chapter 6 followed by learning about our base models and their results in chapter 7. Chapter 8 discusses the comparative results of all the variants of the proposed model along with base models and in chapter 9, we conclude the report with a direction of future scope.

BACKGROUND OF THE STUDY

2.1 Natural Language Processing

Natural Language Processing (NLP) is an interdisciplinary field combining computer science and linguistics. This is the study of how we can make computers understand natural language, process it, infer from and respond in natural language. Although being a pretty old study, the recent advancements in the field of computer science and the field of artificial intelligence has brought up many useful and amazing services like text analytics, chatbots, virtual assistance, language translation, POS tagging, story creators and many more. Our current project falls into the same domain of NLP.

2.2 Online Opinion Spamming

Online opinion spamming is an attack on online user review system by a person, a team or an institution of people called as spammers who are intended to write biased review for a product or a business with a goal of either improving the fame/profits of their own business or to defame the competitors. Online reviews being one of the reliable information to people in making their decisions there is a potential threat to these by the spammers. There is a good amount of research done in identifying the spams in multiple areas like messages, emails, reviews etc. However, there is still a great deal of extension for examine here, as advanced camouflage strategies are utilized by malicious users(spammers) for spamming the reviews.

2.3 Machine Learning and Deep Learning

Machine learning is a subset of artificial intelligence that enables machines to learn, understand and predict things from the past experiences. The advancements of machine learning have changed the industry exponentially. A subset of machine learning called deep learning has become a core technology lead to many beak throughs in the fields of computer vision, NLP etc. Deep learning works on the same principle of pattern recognition like our brain. Like we have neurons connected in our brain passing on information based on certain conditions, we have input layer, hidden layers and an output layer forming a network of neurons called as Neural Network in deep learning which recognizes deep patterns in input data and learns for it.

Figure 1: Artificial Deep Learning Neural Network. [17]

Figure 1 shows the simple architecture of a neural network with 1 input layer, 3 hidden (middle) layers and 1 output layer. A typical deep neural network has two or more hidden layers and as a matter of proven fact deep neutral networks outperforms most of the other machine learning techniques.

2.4 Recurrent Neural Networks

Recurrent Neural Networks (RNN's) are type of neural networks that can remember the learnt data and use it in future learning or producing a new output. In short, they can take a sequential input and are capable of identifying patters in your data that changes over time like speech, stock data, sentence of text etc. and produce a single or a sequence as a output. RNN's can store the intermediate results in the memory and use them to create a bigger picture. Unlike the regular feed forward neural networks the RNN's input size, number of layers and the output size keeps varying based on the input data. RNN's like LSTM and GRU have open up a wide variety of applications like language translation, stock prediction, weather forecasting, virtual assistants, robotics, self-driving cars.

2.5 Long Short-Term Memory

Long Short-Term Memory (LSTM's) are a special type of RNN's which can handle a large sequence data and detect patterns in it which is a limitation in the vanilla RNN because of the vanishing gradient problem. LSTM has a repeating structure with LSTM cell containing four neural networks interacting with each other unlike a RNN having repeating structure with single neural network. Figure 2 shows the details of the LSTM cells connected together in a LSTM layer.

Figure 2: Repetitive Long Short-Term Memory Cells. [18]

2.6 GloVe and Gensim

GloVe (Global Vectors for word representation) is an unsupervised learning algorithm for obtaining vector representations for words. Using GloVe we can extract a vector representation for any given word based on its occurrence and frequency in the context. It's been trained on a huge corpus of data from different sources like Wikipedia, twitter etc [11]. Stanford has provided a set of pre-trained models to be used to apply for other use cases.

Gensim is a python based open source library for topic modeling [20], text analysis and natural language processing using statistical machine learning models. Gensim provides good data streaming capabilities to work with large data and also supports different transfer learning techniques.

2.7 Embedding Techniques

Embedding is a technique to project the data in the vector space to make it feasible for a machine to learn and process it. There are different techniques to encode data such as one hot encoding, TF-IDF and auto encoding. Word embedding is one of the widely used techniques in deep learning models for natural language applications. The two famous algorithms of word embedding are word2vec and GloVe.

2.8 TF-IDF Vectorization

Term Frequency – Inverse Document Frequency (TF-IDF) is a popular statistical technique to calculate the importance (weight) of the word in a document from a large corpus. It is calculated based on the term frequency within the document and the number of documents present in the corpus containing that word. TF-IDF is used widely in the text mining and as weighting factor of a term.

RELATED WORK

In the recent research there exist algorithms designed for singleton review spam detection. However, they are not applicable at business level. In [2], the authors constructed the spams for evaluation, but those constructed reviews don't exhibit the same temporal features that of the real-world singleton review spams. In [12][14], considered reviewers' behaviors by constructing a graph connecting the reviewers, their reviews and businesses and showcased the relation between reviewer's honesty, store reliability and reviewer's trustworthiness. They use these relations to identify the deceptive spammers. However, because of the insufficient information we get from the singleton reviews, the graphs constructed from these methods would not be of significant help in identifying the store's honesty. [4] proposed the techniques of spam detection by combining the reviewer's network and metadata of the review. [5] tried to create a relationship between psychology of imaginative writing and fake reviewers' behavior. In [7][10][13], the reviewer's behavior is used as a primary sign of spamming. They worked on constructing the features of similarities between different reviews with a vision to detect a specific pattern in the reviewer's behavior. However, this doesn't work with the singleton reviews as a spammer writes only one review for each account. [6] studied the techniques of spammers to keep the review threads alive in order to grab more attention on the products. Their study focused at single product and not at business level. [15] studied group spammers and proposed a three-step strategy to identify spammer groups. They started with finding the group reviewers by using pattern matching algorithms, their social networks and constructed features from those to detect the group reviewers. However, it still doesn't address neither the singleton review spam problem nor the business level spamming activity, as the reviewer's group and their reviews are considered to be suspicious only if the group reviewer's wrote reviews together at least 3 times. In [9], though the authors tried to classify reviews based on semantics, they considered reviews as single static documents and thus neglected to analyze the changing review semantic patterns over time.

[2] treated the problem of singleton review attack at business level as a temporal problem. There are works that focus on the abnormal pattern detection in a multidimensional time series, but they cannot be directly applied to the singleton review problem and business level spamming activity detection. In [1], the authors performed a temporal analysis on different fake reviews posted to the stores in Yelp and showcased the trends in postings into early, mid and late reviews based on the rate at which the fake reviews were posted. In [8], the authors used review statistics to construct the time frames and detected the temporal pattern anomaly which is not a very efficient manner as they neglected the semantics of the review. In summary, none of the above works dedicated to the research on classifying whole business as honest and dishonest.

DATA ANALYSIS

The dataset we chose for our research project is Yelp's review dataset collected from Yelp.com, which was first used in [16]. The data was collected for a period of 10 years which is comprised of 608,598 reviews from 5,044 restaurants written by 260,277 reviewers from the regions of NJ, VT, CT and PA in USA. Each review contained ratings and review message along with the user information and time stamp. Yelp has a filtering algorithm that identifies suspicious reviews and separates them into a filtered list [19]. The dataset has reviews labeled as fake or genuine which we used as the ground truth data for each review. The data set has the following structure:

File Name

Columns

Metadata

userId, productId, rating, label (1 = un-filtered, -1 = filtered), date (of review). Note: filtered means that the Yelp website filtered these reviews as suspicious

ProductIdMapping

productName, productid

ReviewContent

userId, productId, date, reviewContent

ReviewGraph

userId, productId, rating

UserId

userName, userId

Table1: Dataset Files and Details

Table 1 describes the files in the dataset and the metadata of each file showing different features captured in this dataset for each business.

After the preliminary analysis of the data, the cumulative distribution of the stores and the number of reviews per each store revealed that there are stores with reviews less than 10 which would be very less for our study and will be biased when preprocessed to feed into a deep neural network. So, as a first step in the normalization of data we have excluded the store with number of reviews less than 10. This has dropped our dataset to 3531 restaurants.

Figure 3: Frequency Distribution of Reviews Over Time

In Figure 3, we see that the major part of the reviews in our dataset are concentrated over a time period between 2010 to 2014. We ignored all other data in order to avoid any bias because of this unbalanced data.

As each review in the data is labeled as fake or genuine, we generated the ground truth data for each business based on the percentage of fakeness in that restaurant. By plotting the distribution graphs of percentage of fakeness vs stores, we realized that around 30% to 40% fakeness in a restaurant review would be the correct range for threshold values that can be used to label a restaurant as dishonest. We ran experiments with different values in between 30 and 40 and came up with 30% fakeness as the potential threshold value to label a store as dishonest. Following are some important observations.

Figure 4: Cumulative Distribution of Fake Percentage Across Stores

Figure 4 showcases the cumulative distribution of fake percentage plotted against percentage of stores. This plot clearly shows that 30% would give a good ratio of honest and dishonest stores resembling the real-world scenarios.

After the complete analysis of the data we ended up with 3000 potential restaurants out of which 2447 are honest and 553 are dishonest stores based on the threshold value of 30% fakeness in the data over the period of 2010 to 2014. The following is the summary of the final data that will be used in our models.

Figure 5: Distribution of Honest and Dishonest Stores

The data at all the stages of experiments were split in to 70 - 30 ratio of train and test sets. Each of the training and test datasets have been split in such a way that the ratio of fakeness in them is 50% so as to avoid any bias during the training. The final data were preprocessed with the standard text preprocessing techniques of alpha conversions, stop word removal and word net lemmatization.

PROPOSED SOLUTION AND ITS ARCHITECTURE

Singleton reviews are the major contributors in the spamming activities and considering the fact that these spamming activities are carried out for a certain period of time, with all the knowledge and motivation we have gained from the relative work previously done, a novel approach is designed to solve this problem of classifying the businesses as honest and dishonest. The previous works had ignored the fact that "context is the intelligence" in any natural language. Our approach mainly focuses on the semantics of the reviews which plays a vital role in identifying a review as a spam/no spam. Considering the problem as a temporal problem in our approach we conducted extensive experiments with the data points based on time frames of 6, 3 and 1months data. Our proposed solution leveraged the transfer learning techniques and deep learning models to work with the semantics of all the reviews of a certain business to classify the business as honest or dishonest. This approach was carried out in 3 stages as explained below in details.

5.1 Stage 1 – Document Vector Generation

The first and foremost task was to convert each review text into a vector form. To achieve this, we have used the GloVe (word2vec) pre-trained model to take advantage of its intelligence of word representations learnt from training on twitter data and applied it to our review data using Gensim's text modeling methods. This has resulted in generating a vector form for each word in the reviews. This was further added up with the TF-IDF vectors of the respective words such that it gives out the weighted sum of each word. All these weighted sums of the words in a review are added together to get the single vector of 200 dimension representing the review as whole. At the end of this stage we have vector representations for all the reviews of each business calculated and along with its labels provided in our dataset were made ready to be fed into the next stage of the model.

Figure 6: Document Vector Generator Model Architecture.

5.2 Stage 2 – Store/Business Embeddings Extractor

The main goal of the stage 2 was to generate embeddings that represent the entire business as one single vector so that we can use a classifier model to achieve our final goal of identifying the business as honest or dishonest. To achieve this, we designed a deep neural network model with LSTMs that are best suited when working with sequential data. This deep learning models was fed with the vectors of data corresponding to the time frames (6 months, 3 months or 1 month) as input, passed it through the LSTM layers so as to recognize the patterns in the data and captured those embeddings at the last but one layer. These captured embeddings of 128 dimensions represented the restaurant data of that particular time frame which further passed to the next stage.

Figure 7: Store Embedding Extractor Model Architecture.

5.3 Stage 3 – Store/Business Honesty Classifier

This was the final stage in the architecture where we fed all the embeddings (time framed) of a given store to the LSTM based deep learning classifier to classify the stores/businesses to honest and dishonest businesses based on the semantic embeddings.

Figure 8: Store Honesty Classifier Model Architecture.

The complete architecture design is as shown below.

Figure 9: Complete Architecture

5.4 Result Measurements

The results of the above architecture were measured in terms of F1 score which is the measure of a test accuracy in the statistical binary classification problems. F1 score is the harmonic mean of the precision and recall which can range been 1 to 0 1 being the best and 0 being worst. Precision and recall are calculated based on the confusion matrix of a binary classification model with precision indicating how many selected items are relevant and recall value indicating how many relevant items are selected.

EXPERIMENTATION AND INTERPRETATION

We have conducted extensive experiments with different variation of the proposed architecture shown in the below table.

Model Name

Model Detail

LSTM_WRC

LSTM With Reviews Count Included in Embeddings extractor along with semantics

LSTM_WORC

LSTM With Out Reviews Count in Embeddings extractor

BiLSTM_WRC

LSTM With Reviews Count Included in Embeddings extractor along with semantics fed into a Bi-directional model

BiLSTM_WORC

LSTM With Out Reviews Count in Embeddings extractor fed into Bidirectional Model

Table 2: Different Variations of Proposed Model

6.1 Experiment 1

In experiment 1, we used the LSTM_WORC model where we passed the review text that has been converted into vectors in stage 1 through LSTM variation of store embedding extractor model, to generate embeddings and these embeddings are fed into store honesty classifier model with LSTM variation for all the 3time frames. The results of this experiment are as follows for both honest and dishonest stores:

Honest(1)/

Dishonest (0)

Time Frame

in Months

Precision

Recall

F1-Score

0

6

0.76

0.72

0.74

0

3

0.75

0.76

0.75

0

1

0.71

0.79

0.74

1

6

0.75

0.79

0.77

1

3

0.77

0.75

0.76

1

1

0.77

0.69

0.73

Table 3: Results of LSTM_WORC Model

6.2 Experiment 2

In experiment 2, we used BiLSTM_WORC model which followed the same steps as in experiment 1 except that in the last stage we have used a Bi-directional LSTM deep learning model instead of LSTM variation for all the 3time frames. In Bi-directional models the hidden layers of opposite directions are connected to the same output through which the output layer can have information from both the past and future layer. The results of this experiment are as follows for both honest and dishonest stores:

Honest(1)/

Dishonest (0)

Time Frame

in Months

Precision

Recall

F1-Score

0

6

0.72

0.77

0.74

0

3

0.75

0.76

0.75

0

1

0.74

0.75

0.75

1

6

0.76

0.71

0.74

1

3

0.77

0.75

0.76

1

1

0.76

0.75

0.76

Table 4: Results of BiLSTM_WORC Model

6.3 Experiment 3

In Experiment 3, we used LSTM_WRC model that mimicked the same pattern as in experiment 1 except that at stage 2 for input to the store embedding extractor model, we had sent the review count as an additional input along with the semantic vectors representing the review text of the time frame provided. This architecture took into consideration both the non-semantic and semantic features of the reviews. The results of this model are as follows for both honest and dishonest stores:

Honest(1)/

Dishonest (0)

Time Frame

in Months

Precision

Recall

F1-Score

0

6

0.67

0.81

0.73

0

3

0.73

0.82

0.77

0

1

0.69

0.82

0.75

1

6

0.77

0.62

0.69

1

3

0.81

0.71

0.76

1

1

0.79

0.65

0.71

Table 5: Results of LSTM_WRC Model

6.4 Experiment 4

Experiment 4 was a mimic of the experiment 2 with the Bi-directional model in the final stage. We called this model as BiLSTM_WRC Model. In this variation we added the review count along with review semantics like in the experiment 3. The results of this model are as follows for all the 3 time frames of honest and dishonest stores.

Honest(1)/

Dishonest (0)

Time Frame

in Months

Precision

Recall

F1-Score

0

6

0.68

0.78

0.73

0

3

0.73

0.78

0.76

0

1

0.7

0.8

0.74

1

6

0.76

0.65

0.7

1

3

0.78

0.73

0.75

1

1

0.78

0.67

0.72

Table 6: Results of BiLSTM_WRC Model

BASE MODELS

7.1 Base Model 1 – Support Vector Machine – Classification

Support Vector Machines (SVMs) is a supervised learning models that is used in classification and regression analysis. SVM uses associated learning algorithms and considers the extreme data points (aka support vectors) in the dataset and creates a hyper plane to classify data into different classes.

In this experiment, review documents of the preprocessed data are combined into a single corpus to feed into a TF-IDF vectorizer to generate the feature name in the corpus. The entire data was split into training and testing datasets as per our previously mentioned standards. For each restaurant in this dataset, a single vector representation is generated by combining the TF-IDF vectors of all the review documents related that restaurant. Those vectors were fed into the support vector machines along with the labels that were calculated for respective restaurants based on the threshold value.

During the training of the model we equally split the honest and dishonest stores so as to avoid any bias in the model. Once the training was completed it was tested and the results are shown below for all the 3 time frames (6 months, 3 months and 1 month) for both honest and dishonest stores.

Honest(1)/

Dishonest (0)

Time Frame

in Months

Precision

Recall

F1-Score

0

6

0.71

0.64

0.67

0

3

0.71

0.64

0.67

0

1

0.71

0.64

0.67

1

6

0.69

0.75

0.72

1

3

0.69

0.75

0.72

1

1

0.69

0.75

0.72

Table 7: Results of Support Vector Machines

7.2 Base Model 2 – Simple Deep Neural Network – Classification

Most part of the neural network model followed the same steps as in base model 1 - SVM except that the TF-IDF vectorized review text was fed into a neural network which learnt the deep pattern in the input data and categorized the test data into honest and dishonest stores. The results of this model are as follows.

Honest(1)/

Dishonest (0)

Time Frame

in Months

Precision

Recall

F1-Score

0

6

0.71

0.64

0.67

0

3

0.71

0.64

0.67

0

1

0.71

0.64

0.67

1

6

0.69

0.75

0.72

1

3

0.69

0.75

0.72

1

1

0.69

0.75

0.72

Table 8: Results of Simple Neural Network

COMPARITIVE RESULTS

In this section we compared the results of all the architecture and experiments we performed and interpreted the results of each model for honest stores, dishonest stores and both combined. The below comparative results show how the models behaved on the Yelp data for all the 3 time frames of 6 months, 3 months and 1 month.

8.1 Dishonest Store Results

Below are the comparative results of all the different models we experimented with for the dishonest stores. The results show that the LSTM_WRC model where review counts are fed along with the semantics of the reviews at stage 2 generated better results in 3 months frame data when compared with other models and time frame combinations.

Figure 10: Comparison of Precision Values of Different Models on Dishonest Stores

Figure 10 shows the precision value datapoints plotted for 6 months, 3 months and 1 month time frames for all the models for dishonest stores. It was observed from Figure 10 that 6 months and 3months bins data obtained better results when compared to 1month bins. BiLSTM_WORC and LSTM_WORC obtained better results for the 3 months' time bins whereas for 6 months' time bins, better results were achieved with BiLSTM_WORC.

Figure 11: Comparison of Recall Values of Different Models on Dishonest Stores

From the Figure 11, we observed that the results were better for 1 month data bins when compared to 3 months and 6 months data. The results showed that BiLSTM_WRC outperformed all the other models.

Figure 12: Comparison of F1 Scores of Different Models on Dishonest Stores

Figure 12 depicts the F1 scores of all the experimented models for dishonest stores and we observed that better f1 scores were acheived for 3 months data bins with LSTM_WRC model outperforming other models.

8.2 Honest Store Results

The below charts showcase the comparative results of all the models on the honest stores. The results show that the LSTM_WORC model fed with semantics of the reviews at stage 2 generated better results in 6 months frame data when compared with other models and overlapped with the Bidirectional model BiLSTM_WORC fed with review semantics on 3 months and 1 month time frames.

Figure 13: Comparison of Precision Values of Different Models on Honest Stores

Figure 13 is the plot for precision values of honest stores for all the models ran on all the 3 time frame bins. It was observed that in terms of precision, LSTM_WRC outperformed all the other models across all the time bins.

Figure 14: Comparison of Recall Values of Different Models on Honest Stores

Figure 14 shows that the recall values are better with 6 months data bins when compared to other data bins. Out of all the model's LSTM_WORC performed best.

Figure 15: Comparison of F1 Scores of Different Models on Honest Stores

Figure 15 shows the plots of the F1-score data points for honest stores. As shown in the Figure 15, F1 scores of most of the models overlap at the 3 months point. In comparison with other model's LSTM_WORC performed best in 6 months data bins and BiLSTM_WORC performed best with 1 month data.

8.3 Average Results for All Stores Combined

The average results are the performance of the models on both the honest and dishonest businesses i.e on the entire Yelp data. Below are the observations from the average results:

LSTM_WRC model fed with a 3 months' time frame clearly outperformed all other models with an F1 score of 0.77.

All the proposed models exceeded the results of baseline models.

Almost all the models performed better when fed with the 3 month's time data than rest of the time frames.

Figure 16: Comparison of Average Precision Values of Different Models on all Stores

The Figure 16 shows the average precision results plotted for all the stores including honest and dishonest stores. LSTM_WORC and BiLSTM_WORC performed best with all the 3-time frame bins.

Figure 17: Comparison of Average Recall Values of Different Models on all Stores

Figure 17 shows the average recall values plotted against months for all the models. We observed that the LSTM_WRC model performed best with the 3 months data bins, followed by BiLSTM_WORC and LSTM_WORC models.

Figure 18: Comparison of Average F1-Score of Different Models on all Stores

Figure 18 compares the F1 scores for all the stores are considered as the final results of our entire proposed architecture. We observed that all the proposed models outperformed our baseline models in all the 3 data bins. LSTM_WRC model for 3 months bin stood as the best performer compared to other models.

CONCLUSION

In this research project we focus on detecting the businesses that are affected by opinion spamming over time. We used Yelp review data containing reviews from 5,044 business by 260,277 reviewers in our research. We leveraged the recent techniques in deep learning such as transfer learning, semantic embeddings, auto encoding and LSTMs and designed a novel architecture that classify the business as honest or dishonest based on semantic analysis of their reviews over time. Extensive experiments showed that the proposed models outperformed the baseline models in terms of precision, recall, and F1score metrics in identifying both honest and dishonest businesses.

REFERENCES

[1] Sihong Xie , Guan Wang , Shuyang Lin , Philip S. Yu, Review spam detection via temporal pattern discovery, Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, August 12-16, 2012, Beijing, China.

[2] Santosh KC , Arjun Mukherjee, On the Temporal Dynamics of Opinion Spamming: Case Studies on Yelp, Proceedings of the 25th International Conference on World Wide Web, April 11-15, 2016, Montréal, Québec, Canada.

[3] Online social networking services and spam detection approaches in opinion mining-a review, International Journal of Web Based Communities, v.14 n.4, p.353-378, January 2018. https://dl.acm.org/citation.cfm?id=3302826.

[4] Shebuti Rayana , Leman Akoglu, Collective Opinion Spam Detection: Bridging Review Networks and Metadata, Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 10-13, 2015, Sydney, NSW, Australia.

[5] Myle Ott , Yejin Choi , Claire Cardie , Jeffrey T. Hancock, Finding deceptive opinion spam by any stretch of the imagination, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, June 19-24, 2011, Portland, Oregon.

[6] Yu-Ren Chen , Hsin-Hsi Chen, Opinion Spam Detection in Web Forum: A Real Case Study, Proceedings of the 24th International Conference on World Wide Web, May 18-22, 2015, Florence, Italy.

[7] Arjun Mukherjee , Abhinav Kumar , Bing Liu , Junhui Wang , Meichun Hsu , Malu Castellanos , Riddhiman Ghosh, Spotting opinion spammers using behavioral footprints, Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, August 11-14, 2013, Chicago, Illinois, USA.

[8] Muhammad, Iqra & Qamar, Usman & Khan, Farhan. (2019). Temporal Spam Identification: A Multifaceted Approach to Identifying Review Spam: Proceedings of the 2018 Intelligent Systems Conference (IntelliSys) Volume 2. 10.1007/978-3-030-010577_58.

[9] Vlad Sandulescu, Martin Ester, Detecting Singleton Review Spammers Using Semantic Similarity, Proceedings of the 24th International Conference on World Wide Web, May 18-22, 2015, Florence, Italy.

[10] Chang Xu , Jie Zhang, Collusive Opinion Fraud Detection in Online Reviews: A Probabilistic Modeling Approach, ACM Transactions on the Web (TWEB), v.11 n.4, p.128, July 2017 https://dl.acm.org/citation.cfm?id=3098859.

[11] Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. https://nlp.stanford.edu/pubs/glove.pdf.

[12] Guan Wang , Sihong Xie , Bing Liu , Philip S. Yu, Review Graph Based Online Store Review Spammer Detection, Proceedings of the 2011 IEEE 11th International Conference on Data Mining, p.1242-1247, December 11-14, 2011.

[13] Man-Chun Ko , Hen-Hsen Huang , Hsin-Hsi Chen, Paid review and paid writer detection, Proceedings of the International Conference on Web Intelligence, August 23-26, 2017, Leipzig, Germany.

[14] Chungsik Song , Kunal Goswami , Younghee Park , Sang-Yoon Chang , Euijin Choo, Graphic model analysis of frauds in online consumer reviews, Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing, p.1-7, March 22-23, 2017, Cambridge, United Kingdom.

[15] Chih-Chien Wang, Min-Yuh Day, and Yu-Ruei Lin. 2016. Toward understanding the cliques of opinion spammers with social network analysis. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM'16). 1163--1169.

[16] R Rayana, Shebuti & Akoglu, Leman. (2016). Collective Opinion Spam Detection using Active Inference. 630-638. 10.1137/1.9781611974348.71.

[17] https://developer.oracle.com/databases/neural-network-machine-learning.html.

[18] https://www.knime.com/blog/text-generation-with-lstm.

[19] https://www.yelp.com/dataset.

[20] https://radimrehurek.com/gensim/.

Cite This Work

To export a reference to this article please select a referencing stye below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

Related Content

All Tags

Content relating to: "Computer Science"

Computer science is the study of computer systems, computing technologies, data, data structures and algorithms. Computer science provides essential skills and knowledge for a wide range of computing and computer-related professions.

Related Articles

DMCA / Removal Request

If you are the original writer of this research project and no longer wish to have your work published on the UKDiss.com website then please: