SlideShare a Scribd company logo
1 of 5
Download to read offline
Amazon Reviews Sentiment Analysis - Data Warehouse and Data Mining (UCS625) Project Report
Akshit Arora (akshit.arora1995@gmail.com) and Arush Nagpal (arushngpl16@gmail.com).
1
Amazon Reviews Sentiment Analysis
Arush Nagpal1
, Akshit Arora1
1
Thapar Institute of Engineering and Technology University, Patiala - 147004, Punjab, India
Sentiment analysis is an important step towards comprehension in natural language processing. Analyzing user sentiments towards
products through their review comments and ratings can be economically profitable to product designers. We propose a platform that
classifies the reviews given by users on the amazon product page, into positive and negative sentiments using a simple-rule based system.
Using a combination of qualitative and quantitative methods, we first construct and empirically validate a gold-standard list of lexical
features (along with their associated sentiment intensity measures) which are specifically attuned to sentiment in microblog-like contexts.
We then combine these lexical features with consideration for five general rules that embody grammatical and syntactical conventions
for expressing and emphasizing sentiment intensity.
Index Termsโ€”Analysis, Amazon, Products, Reviews, Sentiment.
I. INTRODUCTION
1.1 Need of the system
ore than ever before, peopleโ€™s judgments of what to do, or
what to eat, are governed by the opinions of other people.
The internet has become the ultimate trove of the opinions of
many, many people. Today, sites like Amazon have become a
vast database for products that include reviews and opinions
written by everyday people.
Around the globe, there are more than 6,500 daily
newspapers selling close to 400 million copies every day.
Additionally, there are blogs, micro blogs, periodicals,
magazines, fanzines, etc. How can we make sense of all this
information? How can we classify it and aggregate it so that we
can perform quantitative analysis?
As a seller, it is essential to โ€œstay on top of your gameโ€, i.e.,
keep your product updated with the most requested features.
However, most e-commerce websites provide only an average
rating (out of 5) for each product. Consequently, it is difficult
to identify why people like or dislike a particular product. We
aim to solve this problem.
Apart from quantitative reviews (which are mostly skewed),
amazon also records qualitative reviews. The objective is to
assess those text reviews and determine whether they are
negative or positive. Not only classification of sentiment but we
also focus on determining how strong the sentiment is.
Sentiment analysis research focuses on understanding the
positive or negative tone of a sentence based on sentence
syntax, structure, and content.
This paper describes the working of a simple rule-based
system called VADER (Hutto & Gilbert, 2004), for Valence
Aware Dictionary for Sentiment Reasoning.
1.2 Applications
Sentiment analysis is useful to a wide range of problems
that are of interest to human-computer interaction
practitioners and researchers, as well as those from fields such
as sociology, marketing and advertising, psychology,
economics, and political science. It can be used to solve the
problem.
Nowadays, social media has become a platform for people to
convey their voice to the public. Among various opinions that
people share and exchange, there are a lot of comments about
consumer products. Recently, it has been shown that the chatter
of the consumers in the social media such as Facebook, Twitter,
Myspace, Google+ and etc. correlates strongly with the
productโ€™s actual financial performance in the market. This
forms a beneficial database for companies to analyze the
consumersโ€™ demands in order to make a quantitative prediction
of their potential customers.
1.3 Challenges in development
The goal of our project is to apply rule-based for sentiment
analysis, or opinion mining, on user generated text on the web,
such as movie or product reviews, or comments on social
networks and forums. Given the content of this user generated
text, we are looking to classify the reviews/comments as being
positive or negative. An opinion is defined as a positive or
negative sentiment, view, attitude, emotion, or appraisal about
an entity or an aspect of the entity from an opinion holder. This
is a relevant problem in todayโ€™s world as the amount of user
generated text on the web is increasing and sentiment analysis
can be used to detect the mood of users on a forum or to detect
spam if the text is too negative. By building features to
categorize the content of a given text, we use rule-based
techniques to detect positive vs negative sentiment in the text.
Some of these challenges stem from the sheer rate and
volume of user generated social content, combined with the
contextual sparseness resulting from shortness of the text and a
tendency to use abbreviated language conventions to express
sentiments.
A comprehensive, high quality lexicon is often essential for
fast, accurate sentiment analysis on such large scales.
We use a combination of qualitative and quantitative
methods to produce, and then empirically validate, a gold-
standard sentiment lexicon that is especially attuned to
microblog-like contexts. We next combine these lexical
features with consideration for five generalizable rules that
embody grammatical and syntactical conventions that humans
use when expressing or emphasizing sentiment intensity.
M
Amazon Reviews Sentiment Analysis - Data Warehouse and Data Mining (UCS625) Project Report 2
II. EXISTING WORK
Sentiment analysis, or opinion mining, is an active area of
study in the field of natural language processing that analyzes
people's opinions, sentiments, evaluations, attitudes, and
emotions via the computational treatment of subjectivity in text.
It is not our intention to review the entire body of literature
concerning sentiment analysis. Indeed, such an endeavor would
not be possible within the limited space available (such
treatments are available in Liu (2012) and Pang & Lee (2008)).
We do provide a brief overview of canonical works and
techniques relevant to our study.
A. Sentiment Lexicons
A substantial number of sentiment analysis approaches rely
greatly on an underlying sentiment (or opinion) lexicon. A
sentiment lexicon is a list of lexical features (e.g., words) which
are generally labeled according to their semantic orientation as
either positive or negative (Liu, 2010). Manually creating and
validating such lists of opinion-bearing features, while being
among the most robust methods for generating reliable
sentiment lexicons, is also one of the most time-consuming.
For this reason, much of the applied research leveraging
sentiment analysis relies heavily on preexisting manually
constructed lexicons.
B. Sentiment Intensity (Valence-Based) Lexicons
Many applications would benefit from being able to
determine not just the binary polarity (positive versus negative),
but also the strength of the sentiment expressed in text. Just
how favorably or unfavorably do people feel about a new
product, movie, or legislation bill? Analysts and researchers
want (and need) to be able to recognize changes in sentiment
intensity over time in order to detect when rhetoric is heating
up or cooling down. It stands to reason that having a general
lexicon with strength valences would be beneficial.
C. Lexicons and Context-Awareness
Whether one is using binary polarity-based lexicons or more
nuanced valence-based lexicons, it is possible to improve
sentiment analysis performance by understanding deeper
lexical properties (e.g., parts-of-speech) for more context
awareness. Despite their ubiquity for evaluating sentiment in
social media contexts, there are generally three shortcomings of
lexicon-based sentiment analysis approaches: 1) they have
trouble with coverage, often ignoring important lexical
features which are especially relevant to social text in
microblogs, 2) some lexicons ignore general sentiment intensity
differentials for features within the lexicon, and 3) acquiring
a new set of (human validated gold standard) lexical features
โ€“ along with their associated sentiment valence scores โ€“ can
be a very time consuming and labor intensive process. We view
the current study as an opportunity not only to address this gap
by constructing just such a lexicon and providing it to the
broader research community, but also a chance to compare its
efficacy against other well-established lexicons with regards to
sentiment analysis of social media text and other domains.
III. WORKING OF THE PROPOSED SYSTEM
We will use VADER, which is a lexicon and rule based
sentiment analyzer tool. A sentiment lexicon is a list of lexical
features (e.g., words) which are generally labeled according to
their semantic orientation as either positive or negative (Liu,
2010). Manually creating and validating such lists of opinion-
bearing features, while being among the most robust methods
for generating reliable sentiment lexicons, is also one of the
most time-consuming. For this reason, much of the applied
research involving sentiment analysis relies heavily on pre-
existing manually constructed lexicons. We will use the
VADER sentiment lexicon which is a combination of many
lexicons provided by LIWC (Linguistic Inquiry
and Word Count), ANEW (The Affective Norms for English
Words) and GI (The General Inquirer) but the words which had
a mean sentiment of 0.0 were removed resulting in a total of
7,500 lexical features with validated valence scores that
indicated both the sentiment polarity (positive/negative), and
the sentiment intensity on a scale from โ€“4 to +4. For example,
the word โ€œokayโ€ has a positive valence of 0.9, โ€œgoodโ€ is 1.9,
and โ€œgreatโ€ is 3.1, whereas โ€œhorribleโ€ is โ€“2.5, the frowning
emoticon โ€œ:(โ€ is โ€“2.2, and โ€œsucksโ€ and โ€œsuxโ€ are both โ€“1.5.
The proposed system will be fed a JSON file of the reviews
of any product from any website. Next we will provide it with
the attribute which is to be analyzed. After cleaning and
processing of the data, an output file will be returned which will
have 4 values:
1). Negative Sentiment
2). Neutral Sentiment
3). Positive Sentiment
4). Compound Sentiment
All the four values will have a total sum of 1. This will be fed
to another python code to produce a CSV file. Now we will
process it to calculate the average negative, positive, neutral and
the compound reviews. Now we can analyze the products on
the basis of that result.
1) If there is a greater positive sentiment, most of the
people gave positive reviews on the product and it is
actually good.
2) If there is a greater negative sentiment, most of the
people gave negative reviews on the product and it is not
actually a good product however good it might have been
advertised or predicted.
3) If there is a greater neutral sentiment, most of the
people gave neutral reviews on the product and did not
express much of content or satisfaction about the product.
Instead the reviews were more of a description of the
product.
4) If there is a greater compound sentiment, it means that
the product reviews have a greater use of the word โ€œbutโ€
and the product has both of pros and cons and not a single
majority of like or dislike.
Amazon Reviews Sentiment Analysis - Data Warehouse and Data Mining (UCS625) Project Report 3
Here is a sample input given to the system:
{"reviewerID": "A3155NWLKXEY1I", "asin":
"B00009RAX7", "reviewerName": "AKO California",
"helpful": [2, 2], "reviewText": "I had a cracked air intake hose
that caused the "check engine" light to go on, but a couple days
after I fixed it. I wasn't sure if that was the reason, and if it was,
I didn't want to take it to a dealer just to clear that code. This
did the trick. Very easy to use, gave me the code so I could
check what it was. Once I found out, I knew it was the cracked
hose. Cleared the message using this unit, and it never came
back. Had I taken it to a mechanic or dealer, they'd probably
have told me it was some other problem and I wouldn't have
known if they were telling me the truth or not.", "overall": 5.0,
"summary": "Pretty easy to use", "unixReviewTime":
1206748800, "reviewTime": "03 29, 2008"}^
{"reviewerID": "A31Y28UEDXQ0HB", "asin":
"B00009RAX7", "reviewerName": "Jack W. Wolfe",
"helpful": [0, 0], "reviewText": "this scanner works just like
they say you have everything you need to scan your auto.
WORTH EVERY PENNY!!", "overall": 5.0, "summary":
"great buy", "unixReviewTime": 1217116800, "reviewTime":
"07 27, 2008"}^
{"reviewerID": "A9GPCR9WJQCCJ", "asin":
"B00009RAX7", "reviewerName": "Jim "gearhead4"",
"helpful": [0, 0], "reviewText": "I purchased this Acton scanner
2 years ago when our older vehicle was repeatedly showing its
MIL (Malfunction Indicator Lamp). I was able to clear the code
immediately. When the MIL illuminated again later, I was able
to track down the problem (O2 sensor heater trouble) and
correct it by reseating the connector. Later, I used the scanner
to diagnose which cylinder was misfiring. By tapping on that
cylinder's fuel injector, I was able to correct the misfiring.
Again, the Acton cleared the MIL display. My mechanic uses a
more sophisticated (and much more expensive) OBD scanner
in his day to day work, but for someone who does occasional
maintenance on his own car, this tool is worth the $89
investment. If you paid more, you paid too much.", "overall":
4.0, "summary": "A good tool for $89", "unixReviewTime":
1214784000, "reviewTime": "06 30, 2008"}^
{"reviewerID": "A2TT3U4U8NMWEL", "asin":
"B00009RAX7", "reviewerName": "JX", "helpful": [3, 3],
"reviewText": "Actron seems to be a company that makes high
quality diagnostic equipment. This auto-scanner tool is very
affordable and excellent value for money. Works very well and
is pretty rugged as I had hoped. The only problem is that fixing
the vehicle based on the diagnosed code is NOT always helpful,
sorry. Repairing your car based on these codes requires a good
understanding of auto repair. For example: code PM001 -
Cylinder Misfire. What can a NOVICE do with ALL that
"helpful" information (LOL). Relatively easy to use. I will
give full marks for the quality of the physical device
otherwise.", "overall": 5.0, "summary": "Actron Auto Scanner",
"unixReviewTime": 1190764800, "reviewTime": "09 26,
2007"}^
{"reviewerID": "A3REK3OFONWB1Q", "asin":
"B00009RAX7", "reviewerName": "Paul M. Provencher
"ppro"", "helpful": [10, 19], "reviewText": "Have you ever felt
like your auto mechanic was "Mr. Wizard"? All the way down
to the bad attitude? Only to find he has a doo-dad sitting
somewhere out of eyesight that tells all, like a crystal ball...Well
for a reasonable price you can get your own crystal ball. It
might be able to predict the future and track flying monkeys but
it can tell you very quickly why the "Check Engine" light has
come on. Big or small you know what might be waiting for you
when you go see Mr. Wizard. Sometimes you might even be
able to track down the problem yourself and cut the cost to fix
it. I would not go to Oz without mine!", "overall": 5.0,
"summary": "Pay no attention to that man behind the curtain...",
"unixReviewTime": 1174780800, "reviewTime": "03 25,
2007"}^
After running this input through the system, we receive an
output which is of the form:
Negative Neutral Positive Compound
0 0.526 0.474 0.2023
0 1 0 0
0 0.519 0.481 0.5719
0 0.779 0.221 0.1779
0 0.58 0.42 0.4404
0 0.548 0.452 0.5106
0.412 0.336 0.252 -0.2263
0 0.256 0.744 0.4404
0.219 0.781 0 -0.1027
0.239 0.761 0 -0.296
0.147 0.6 0.253 0.3818
0 0 1 0.6588
0 0.404 0.596 0.7096
0.524 0.476 0 -0.296
0 0.448 0.552 0.5719
0 0.439 0.561 0.7506
0 0.182 0.818 0.6696
0 0.455 0.545 0.5859
0 1 0 0
0 0.507 0.493 0.7783
0 1 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 0.196 0.804 0.6249
0 1 0 0
0 0.196 0.804 0.6249
0 1 0 0
0 0.256 0.744 0.4404
After this, we will take the average of the four sentiments to
Amazon Reviews Sentiment Analysis - Data Warehouse and Data Mining (UCS625) Project Report 4
calculate the final result.
IV. DATA COLLECTION AND PREPARATION
We used Amazon reviews dataset provided by
https://snap.stanford.edu/data/web-Amazon.html
We had to use various data cleaning techniques to get dataset
for our use:
1) Separating multiple JSON objects.
2) Parsing the summary field of JSON data.
3) Parsing lexicon data to dictionary.
4) Separating words and emoticons from Data.
5) Filtering Negation words.
6) Filtering all upper case stressful words.
7) Filtering booster words like โ€œveryโ€, โ€œgreatlyโ€, etc.
8) Filtering idioms and spam words
9) Converting the result to CSV format and calculating
averages of all sentiments..
Everything was done using python scripts
V. TRAINING OF THE MODEL
As discussed above, we use the VADER Lexicon file to train
the system and not actually a machine learning or data mining
algorithm to train the model. It is a rule based system in which
we will be using the rules already created and tested by
researchers. The training process involves reading the lexicon
file which has 7517 words and emoticons to be precise and put
them into a python dictionary. So that we can quickly extract
the sentiment of the words extracted from the reviews. The
structure of the lexicon file is as follows:
[Word] [Mean] [Standard Deviation] [A list of ratings based
on emotions varying from -4 to +4]
We will be creating a hash map which uses the first two
fields:
[Word]: [Mean Sentiment] to train the model and then use it
later.
VI. TESTING OF THE MODEL
Since the lexicon file is accepted by researchers worldwide,
there is little scope of errors in the analysis. Besides, we tested
some of the sentences for their sentiments and the results were
found out to be quite satisfactory. Here is the table with the
scores:
The product was very Bad:
'positive': 0.0, 'neutral': 0.513, 'negative': 0.487
I hated the product:
'positive': 0.0, 'neutral': 0.323, 'negative': 0.677
I hated the product!:
'positive': 0.0, 'neutral': 0.308, 'negative': 0.692
I hated the product!!:
'positive': 0.0, 'neutral': 0.295, 'negative': 0.705
I hated the product!!!:
'positive': 0.0, 'neutral': 0.283, 'negative': 0.717
I really hate the product!!:
'positive': 0.0, 'neutral': 0.396, 'negative': 0.604
I hate the service:
'positive': 0.0, 'neutral': 0.351, 'negative': 0.649
I hated the product:
'positive': 0.0, 'neutral': 0.323, 'negative': 0.677
I like the product:
'positive': 0.556, 'neutral': 0.444, 'negative': 0.0
I love the product:
'positive': 0.677, 'neutral': 0.323, 'negative': 0.0
The product is good:
'positive': 0.492, 'neutral': 0.508, 'negative': 0.0
The product is great:
'positive': 0.577, 'neutral': 0.423, 'negative': 0.0
The product is awesome:
'positive': 0.577, 'neutral': 0.423, 'negative': 0.0
I am happy with the product :):
'positive': 0.626, 'neutral': 0.374, 'negative': 0.0
I am happy with the product.:
'positive': 0.481, 'neutral': 0.519, 'negative': 0.0
Really?? You don't deserve to be in the market!:
'positive': 0.0, 'neutral': 1.0, 'negative': 0.0
The product is awesome:
'positive': 0.577, 'neutral': 0.423, 'negative': 0.0
The product works:
'positive': 0.0, 'neutral': 1.0, 'negative': 0.0
The product is very beneficial:
'positive': 0.444, 'neutral': 0.556, 'negative': 0.0
I would never recommend it to anyone:
'positive': 0.0, 'neutral': 0.703, 'negative': 0.297
Thus we can safely rely on the proposed system to analyse the
reviews.
VII. RESULTS AND DISCUSSIONS
We tested the system on Amazon Product โ€“ โ€œJumper cables
Automobile partsโ€ Reviews. The dataset chosen had 1259
records. The results obtained were as follows:
Negative 0.035002
Neutral 0.595128
Positive 0.369875
Compound 0.296674
So we can analyze that almost 60% of the reviews were
neutral. 37% of the reviews were positive and appreciated the
product. A very low portion almost 3.5% reviews were
negative. A considerable portion of the reviews which is 30%
were compound reviews, which means that they had both pros
and cons in it indicated by the presence of the word โ€œbutโ€.
Amazon Reviews Sentiment Analysis - Data Warehouse and Data Mining (UCS625) Project Report 5
VIII. CONCLUSIONS AND FUTURE SCOPE
We can conclude that buying Automobile parts from
Amazon is a great deal since most of the users were either
positive or neutral regarding their reviews on their purchases.
Scope: VADER served as an efficient tool to predict the
sentiments and it can extended to almost any type of product
reviews. That can be any website which has its primary
language as English. Also, it can be used for sentiment analysis
in case of Facebook posts as well as twitter tweets related to
particular search term.
IX. REFERENCES
Hutto, C. J. & Gilbert, E. (2014). VADER: A Parsimonious
Rule-based Model for Sentiment Analysis of Social Media
Text. AAAI 2014.
Liu, B. (2012). Sentiment Analysis and Opinion Mining. San
Rafael, CA: Morgan & Claypool.
Pang, B., & Lee, L. (2008). Opinion mining and sentiment
analysis. Foundations & Trends in Information Retrieval, 2(1),
1โ€“135.
Liu, B. (2010). Sentiment Analysis and Subjectivity. In N.
Indurkhya & F. Damerau (Eds.), Handbook of Natural
Language Processing (2nd Ed). Boca Raton, FL: Chapman &
Hall.

More Related Content

More from Akshit Arora

G.D.P. Trends in India
G.D.P. Trends in IndiaG.D.P. Trends in India
G.D.P. Trends in IndiaAkshit Arora
ย 
SRS for Ph.D. Student Portal (C.S.E.D., Thapar University)
SRS for Ph.D. Student Portal (C.S.E.D., Thapar University)SRS for Ph.D. Student Portal (C.S.E.D., Thapar University)
SRS for Ph.D. Student Portal (C.S.E.D., Thapar University)Akshit Arora
ย 
Image Segmentation using Otsu's Method - Computer Graphics (UCS505) Project PPT
Image Segmentation using Otsu's Method - Computer Graphics (UCS505) Project PPTImage Segmentation using Otsu's Method - Computer Graphics (UCS505) Project PPT
Image Segmentation using Otsu's Method - Computer Graphics (UCS505) Project PPTAkshit Arora
ย 
Image Segmentation using Otsu's Method - Computer Graphics (UCS505) Project R...
Image Segmentation using Otsu's Method - Computer Graphics (UCS505) Project R...Image Segmentation using Otsu's Method - Computer Graphics (UCS505) Project R...
Image Segmentation using Otsu's Method - Computer Graphics (UCS505) Project R...Akshit Arora
ย 
Souvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project ReportSouvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project ReportAkshit Arora
ย 
Souvenir's Booth - Algorithm Design and Analysis Project Presentation
Souvenir's Booth - Algorithm Design and Analysis Project PresentationSouvenir's Booth - Algorithm Design and Analysis Project Presentation
Souvenir's Booth - Algorithm Design and Analysis Project PresentationAkshit Arora
ย 
Developing Interactive Landslide Simulator
Developing Interactive Landslide SimulatorDeveloping Interactive Landslide Simulator
Developing Interactive Landslide SimulatorAkshit Arora
ย 
Developing Interactive Landslide Simulator (Poster)
Developing Interactive Landslide Simulator (Poster)Developing Interactive Landslide Simulator (Poster)
Developing Interactive Landslide Simulator (Poster)Akshit Arora
ย 
Developing Interactive Landslide Simulator (Report)
Developing Interactive Landslide Simulator (Report)Developing Interactive Landslide Simulator (Report)
Developing Interactive Landslide Simulator (Report)Akshit Arora
ย 
Emotional Regulation and Stress Burnout
Emotional Regulation and Stress BurnoutEmotional Regulation and Stress Burnout
Emotional Regulation and Stress BurnoutAkshit Arora
ย 
Asynchronous processors Poster
Asynchronous processors PosterAsynchronous processors Poster
Asynchronous processors PosterAkshit Arora
ย 
Asynchronous Processors - The Clock less Future
Asynchronous Processors - The Clock less FutureAsynchronous Processors - The Clock less Future
Asynchronous Processors - The Clock less FutureAkshit Arora
ย 

More from Akshit Arora (12)

G.D.P. Trends in India
G.D.P. Trends in IndiaG.D.P. Trends in India
G.D.P. Trends in India
ย 
SRS for Ph.D. Student Portal (C.S.E.D., Thapar University)
SRS for Ph.D. Student Portal (C.S.E.D., Thapar University)SRS for Ph.D. Student Portal (C.S.E.D., Thapar University)
SRS for Ph.D. Student Portal (C.S.E.D., Thapar University)
ย 
Image Segmentation using Otsu's Method - Computer Graphics (UCS505) Project PPT
Image Segmentation using Otsu's Method - Computer Graphics (UCS505) Project PPTImage Segmentation using Otsu's Method - Computer Graphics (UCS505) Project PPT
Image Segmentation using Otsu's Method - Computer Graphics (UCS505) Project PPT
ย 
Image Segmentation using Otsu's Method - Computer Graphics (UCS505) Project R...
Image Segmentation using Otsu's Method - Computer Graphics (UCS505) Project R...Image Segmentation using Otsu's Method - Computer Graphics (UCS505) Project R...
Image Segmentation using Otsu's Method - Computer Graphics (UCS505) Project R...
ย 
Souvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project ReportSouvenir's Booth - Algorithm Design and Analysis Project Project Report
Souvenir's Booth - Algorithm Design and Analysis Project Project Report
ย 
Souvenir's Booth - Algorithm Design and Analysis Project Presentation
Souvenir's Booth - Algorithm Design and Analysis Project PresentationSouvenir's Booth - Algorithm Design and Analysis Project Presentation
Souvenir's Booth - Algorithm Design and Analysis Project Presentation
ย 
Developing Interactive Landslide Simulator
Developing Interactive Landslide SimulatorDeveloping Interactive Landslide Simulator
Developing Interactive Landslide Simulator
ย 
Developing Interactive Landslide Simulator (Poster)
Developing Interactive Landslide Simulator (Poster)Developing Interactive Landslide Simulator (Poster)
Developing Interactive Landslide Simulator (Poster)
ย 
Developing Interactive Landslide Simulator (Report)
Developing Interactive Landslide Simulator (Report)Developing Interactive Landslide Simulator (Report)
Developing Interactive Landslide Simulator (Report)
ย 
Emotional Regulation and Stress Burnout
Emotional Regulation and Stress BurnoutEmotional Regulation and Stress Burnout
Emotional Regulation and Stress Burnout
ย 
Asynchronous processors Poster
Asynchronous processors PosterAsynchronous processors Poster
Asynchronous processors Poster
ย 
Asynchronous Processors - The Clock less Future
Asynchronous Processors - The Clock less FutureAsynchronous Processors - The Clock less Future
Asynchronous Processors - The Clock less Future
ย 

Recently uploaded

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
ย 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
ย 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
ย 
Call Girls In Bangalore โ˜Ž 7737669865 ๐Ÿฅต Book Your One night Stand
Call Girls In Bangalore โ˜Ž 7737669865 ๐Ÿฅต Book Your One night StandCall Girls In Bangalore โ˜Ž 7737669865 ๐Ÿฅต Book Your One night Stand
Call Girls In Bangalore โ˜Ž 7737669865 ๐Ÿฅต Book Your One night Standamitlee9823
ย 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
ย 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
ย 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
ย 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfRagavanV2
ย 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfrs7054576148
ย 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
ย 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...SUHANI PANDEY
ย 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
ย 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
ย 
Top Rated Call Girls In chittoor ๐Ÿ“ฑ {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor ๐Ÿ“ฑ {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor ๐Ÿ“ฑ {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor ๐Ÿ“ฑ {7001035870} VIP Escorts chittoordharasingh5698
ย 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
ย 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
ย 
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
ย 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
ย 

Recently uploaded (20)

(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
ย 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
ย 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
ย 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
ย 
Call Girls In Bangalore โ˜Ž 7737669865 ๐Ÿฅต Book Your One night Stand
Call Girls In Bangalore โ˜Ž 7737669865 ๐Ÿฅต Book Your One night StandCall Girls In Bangalore โ˜Ž 7737669865 ๐Ÿฅต Book Your One night Stand
Call Girls In Bangalore โ˜Ž 7737669865 ๐Ÿฅต Book Your One night Stand
ย 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
ย 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
ย 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ย 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
ย 
Intro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdfIntro To Electric Vehicles PDF Notes.pdf
Intro To Electric Vehicles PDF Notes.pdf
ย 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
ย 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
ย 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
ย 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
ย 
Top Rated Call Girls In chittoor ๐Ÿ“ฑ {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor ๐Ÿ“ฑ {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor ๐Ÿ“ฑ {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor ๐Ÿ“ฑ {7001035870} VIP Escorts chittoor
ย 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
ย 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
ย 
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...
ย 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
ย 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
ย 

Amazon Reviews Sentiment Analysis

  • 1. Amazon Reviews Sentiment Analysis - Data Warehouse and Data Mining (UCS625) Project Report Akshit Arora (akshit.arora1995@gmail.com) and Arush Nagpal (arushngpl16@gmail.com). 1 Amazon Reviews Sentiment Analysis Arush Nagpal1 , Akshit Arora1 1 Thapar Institute of Engineering and Technology University, Patiala - 147004, Punjab, India Sentiment analysis is an important step towards comprehension in natural language processing. Analyzing user sentiments towards products through their review comments and ratings can be economically profitable to product designers. We propose a platform that classifies the reviews given by users on the amazon product page, into positive and negative sentiments using a simple-rule based system. Using a combination of qualitative and quantitative methods, we first construct and empirically validate a gold-standard list of lexical features (along with their associated sentiment intensity measures) which are specifically attuned to sentiment in microblog-like contexts. We then combine these lexical features with consideration for five general rules that embody grammatical and syntactical conventions for expressing and emphasizing sentiment intensity. Index Termsโ€”Analysis, Amazon, Products, Reviews, Sentiment. I. INTRODUCTION 1.1 Need of the system ore than ever before, peopleโ€™s judgments of what to do, or what to eat, are governed by the opinions of other people. The internet has become the ultimate trove of the opinions of many, many people. Today, sites like Amazon have become a vast database for products that include reviews and opinions written by everyday people. Around the globe, there are more than 6,500 daily newspapers selling close to 400 million copies every day. Additionally, there are blogs, micro blogs, periodicals, magazines, fanzines, etc. How can we make sense of all this information? How can we classify it and aggregate it so that we can perform quantitative analysis? As a seller, it is essential to โ€œstay on top of your gameโ€, i.e., keep your product updated with the most requested features. However, most e-commerce websites provide only an average rating (out of 5) for each product. Consequently, it is difficult to identify why people like or dislike a particular product. We aim to solve this problem. Apart from quantitative reviews (which are mostly skewed), amazon also records qualitative reviews. The objective is to assess those text reviews and determine whether they are negative or positive. Not only classification of sentiment but we also focus on determining how strong the sentiment is. Sentiment analysis research focuses on understanding the positive or negative tone of a sentence based on sentence syntax, structure, and content. This paper describes the working of a simple rule-based system called VADER (Hutto & Gilbert, 2004), for Valence Aware Dictionary for Sentiment Reasoning. 1.2 Applications Sentiment analysis is useful to a wide range of problems that are of interest to human-computer interaction practitioners and researchers, as well as those from fields such as sociology, marketing and advertising, psychology, economics, and political science. It can be used to solve the problem. Nowadays, social media has become a platform for people to convey their voice to the public. Among various opinions that people share and exchange, there are a lot of comments about consumer products. Recently, it has been shown that the chatter of the consumers in the social media such as Facebook, Twitter, Myspace, Google+ and etc. correlates strongly with the productโ€™s actual financial performance in the market. This forms a beneficial database for companies to analyze the consumersโ€™ demands in order to make a quantitative prediction of their potential customers. 1.3 Challenges in development The goal of our project is to apply rule-based for sentiment analysis, or opinion mining, on user generated text on the web, such as movie or product reviews, or comments on social networks and forums. Given the content of this user generated text, we are looking to classify the reviews/comments as being positive or negative. An opinion is defined as a positive or negative sentiment, view, attitude, emotion, or appraisal about an entity or an aspect of the entity from an opinion holder. This is a relevant problem in todayโ€™s world as the amount of user generated text on the web is increasing and sentiment analysis can be used to detect the mood of users on a forum or to detect spam if the text is too negative. By building features to categorize the content of a given text, we use rule-based techniques to detect positive vs negative sentiment in the text. Some of these challenges stem from the sheer rate and volume of user generated social content, combined with the contextual sparseness resulting from shortness of the text and a tendency to use abbreviated language conventions to express sentiments. A comprehensive, high quality lexicon is often essential for fast, accurate sentiment analysis on such large scales. We use a combination of qualitative and quantitative methods to produce, and then empirically validate, a gold- standard sentiment lexicon that is especially attuned to microblog-like contexts. We next combine these lexical features with consideration for five generalizable rules that embody grammatical and syntactical conventions that humans use when expressing or emphasizing sentiment intensity. M
  • 2. Amazon Reviews Sentiment Analysis - Data Warehouse and Data Mining (UCS625) Project Report 2 II. EXISTING WORK Sentiment analysis, or opinion mining, is an active area of study in the field of natural language processing that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions via the computational treatment of subjectivity in text. It is not our intention to review the entire body of literature concerning sentiment analysis. Indeed, such an endeavor would not be possible within the limited space available (such treatments are available in Liu (2012) and Pang & Lee (2008)). We do provide a brief overview of canonical works and techniques relevant to our study. A. Sentiment Lexicons A substantial number of sentiment analysis approaches rely greatly on an underlying sentiment (or opinion) lexicon. A sentiment lexicon is a list of lexical features (e.g., words) which are generally labeled according to their semantic orientation as either positive or negative (Liu, 2010). Manually creating and validating such lists of opinion-bearing features, while being among the most robust methods for generating reliable sentiment lexicons, is also one of the most time-consuming. For this reason, much of the applied research leveraging sentiment analysis relies heavily on preexisting manually constructed lexicons. B. Sentiment Intensity (Valence-Based) Lexicons Many applications would benefit from being able to determine not just the binary polarity (positive versus negative), but also the strength of the sentiment expressed in text. Just how favorably or unfavorably do people feel about a new product, movie, or legislation bill? Analysts and researchers want (and need) to be able to recognize changes in sentiment intensity over time in order to detect when rhetoric is heating up or cooling down. It stands to reason that having a general lexicon with strength valences would be beneficial. C. Lexicons and Context-Awareness Whether one is using binary polarity-based lexicons or more nuanced valence-based lexicons, it is possible to improve sentiment analysis performance by understanding deeper lexical properties (e.g., parts-of-speech) for more context awareness. Despite their ubiquity for evaluating sentiment in social media contexts, there are generally three shortcomings of lexicon-based sentiment analysis approaches: 1) they have trouble with coverage, often ignoring important lexical features which are especially relevant to social text in microblogs, 2) some lexicons ignore general sentiment intensity differentials for features within the lexicon, and 3) acquiring a new set of (human validated gold standard) lexical features โ€“ along with their associated sentiment valence scores โ€“ can be a very time consuming and labor intensive process. We view the current study as an opportunity not only to address this gap by constructing just such a lexicon and providing it to the broader research community, but also a chance to compare its efficacy against other well-established lexicons with regards to sentiment analysis of social media text and other domains. III. WORKING OF THE PROPOSED SYSTEM We will use VADER, which is a lexicon and rule based sentiment analyzer tool. A sentiment lexicon is a list of lexical features (e.g., words) which are generally labeled according to their semantic orientation as either positive or negative (Liu, 2010). Manually creating and validating such lists of opinion- bearing features, while being among the most robust methods for generating reliable sentiment lexicons, is also one of the most time-consuming. For this reason, much of the applied research involving sentiment analysis relies heavily on pre- existing manually constructed lexicons. We will use the VADER sentiment lexicon which is a combination of many lexicons provided by LIWC (Linguistic Inquiry and Word Count), ANEW (The Affective Norms for English Words) and GI (The General Inquirer) but the words which had a mean sentiment of 0.0 were removed resulting in a total of 7,500 lexical features with validated valence scores that indicated both the sentiment polarity (positive/negative), and the sentiment intensity on a scale from โ€“4 to +4. For example, the word โ€œokayโ€ has a positive valence of 0.9, โ€œgoodโ€ is 1.9, and โ€œgreatโ€ is 3.1, whereas โ€œhorribleโ€ is โ€“2.5, the frowning emoticon โ€œ:(โ€ is โ€“2.2, and โ€œsucksโ€ and โ€œsuxโ€ are both โ€“1.5. The proposed system will be fed a JSON file of the reviews of any product from any website. Next we will provide it with the attribute which is to be analyzed. After cleaning and processing of the data, an output file will be returned which will have 4 values: 1). Negative Sentiment 2). Neutral Sentiment 3). Positive Sentiment 4). Compound Sentiment All the four values will have a total sum of 1. This will be fed to another python code to produce a CSV file. Now we will process it to calculate the average negative, positive, neutral and the compound reviews. Now we can analyze the products on the basis of that result. 1) If there is a greater positive sentiment, most of the people gave positive reviews on the product and it is actually good. 2) If there is a greater negative sentiment, most of the people gave negative reviews on the product and it is not actually a good product however good it might have been advertised or predicted. 3) If there is a greater neutral sentiment, most of the people gave neutral reviews on the product and did not express much of content or satisfaction about the product. Instead the reviews were more of a description of the product. 4) If there is a greater compound sentiment, it means that the product reviews have a greater use of the word โ€œbutโ€ and the product has both of pros and cons and not a single majority of like or dislike.
  • 3. Amazon Reviews Sentiment Analysis - Data Warehouse and Data Mining (UCS625) Project Report 3 Here is a sample input given to the system: {"reviewerID": "A3155NWLKXEY1I", "asin": "B00009RAX7", "reviewerName": "AKO California", "helpful": [2, 2], "reviewText": "I had a cracked air intake hose that caused the "check engine" light to go on, but a couple days after I fixed it. I wasn't sure if that was the reason, and if it was, I didn't want to take it to a dealer just to clear that code. This did the trick. Very easy to use, gave me the code so I could check what it was. Once I found out, I knew it was the cracked hose. Cleared the message using this unit, and it never came back. Had I taken it to a mechanic or dealer, they'd probably have told me it was some other problem and I wouldn't have known if they were telling me the truth or not.", "overall": 5.0, "summary": "Pretty easy to use", "unixReviewTime": 1206748800, "reviewTime": "03 29, 2008"}^ {"reviewerID": "A31Y28UEDXQ0HB", "asin": "B00009RAX7", "reviewerName": "Jack W. Wolfe", "helpful": [0, 0], "reviewText": "this scanner works just like they say you have everything you need to scan your auto. WORTH EVERY PENNY!!", "overall": 5.0, "summary": "great buy", "unixReviewTime": 1217116800, "reviewTime": "07 27, 2008"}^ {"reviewerID": "A9GPCR9WJQCCJ", "asin": "B00009RAX7", "reviewerName": "Jim "gearhead4"", "helpful": [0, 0], "reviewText": "I purchased this Acton scanner 2 years ago when our older vehicle was repeatedly showing its MIL (Malfunction Indicator Lamp). I was able to clear the code immediately. When the MIL illuminated again later, I was able to track down the problem (O2 sensor heater trouble) and correct it by reseating the connector. Later, I used the scanner to diagnose which cylinder was misfiring. By tapping on that cylinder's fuel injector, I was able to correct the misfiring. Again, the Acton cleared the MIL display. My mechanic uses a more sophisticated (and much more expensive) OBD scanner in his day to day work, but for someone who does occasional maintenance on his own car, this tool is worth the $89 investment. If you paid more, you paid too much.", "overall": 4.0, "summary": "A good tool for $89", "unixReviewTime": 1214784000, "reviewTime": "06 30, 2008"}^ {"reviewerID": "A2TT3U4U8NMWEL", "asin": "B00009RAX7", "reviewerName": "JX", "helpful": [3, 3], "reviewText": "Actron seems to be a company that makes high quality diagnostic equipment. This auto-scanner tool is very affordable and excellent value for money. Works very well and is pretty rugged as I had hoped. The only problem is that fixing the vehicle based on the diagnosed code is NOT always helpful, sorry. Repairing your car based on these codes requires a good understanding of auto repair. For example: code PM001 - Cylinder Misfire. What can a NOVICE do with ALL that "helpful" information (LOL). Relatively easy to use. I will give full marks for the quality of the physical device otherwise.", "overall": 5.0, "summary": "Actron Auto Scanner", "unixReviewTime": 1190764800, "reviewTime": "09 26, 2007"}^ {"reviewerID": "A3REK3OFONWB1Q", "asin": "B00009RAX7", "reviewerName": "Paul M. Provencher "ppro"", "helpful": [10, 19], "reviewText": "Have you ever felt like your auto mechanic was "Mr. Wizard"? All the way down to the bad attitude? Only to find he has a doo-dad sitting somewhere out of eyesight that tells all, like a crystal ball...Well for a reasonable price you can get your own crystal ball. It might be able to predict the future and track flying monkeys but it can tell you very quickly why the "Check Engine" light has come on. Big or small you know what might be waiting for you when you go see Mr. Wizard. Sometimes you might even be able to track down the problem yourself and cut the cost to fix it. I would not go to Oz without mine!", "overall": 5.0, "summary": "Pay no attention to that man behind the curtain...", "unixReviewTime": 1174780800, "reviewTime": "03 25, 2007"}^ After running this input through the system, we receive an output which is of the form: Negative Neutral Positive Compound 0 0.526 0.474 0.2023 0 1 0 0 0 0.519 0.481 0.5719 0 0.779 0.221 0.1779 0 0.58 0.42 0.4404 0 0.548 0.452 0.5106 0.412 0.336 0.252 -0.2263 0 0.256 0.744 0.4404 0.219 0.781 0 -0.1027 0.239 0.761 0 -0.296 0.147 0.6 0.253 0.3818 0 0 1 0.6588 0 0.404 0.596 0.7096 0.524 0.476 0 -0.296 0 0.448 0.552 0.5719 0 0.439 0.561 0.7506 0 0.182 0.818 0.6696 0 0.455 0.545 0.5859 0 1 0 0 0 0.507 0.493 0.7783 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0.196 0.804 0.6249 0 1 0 0 0 0.196 0.804 0.6249 0 1 0 0 0 0.256 0.744 0.4404 After this, we will take the average of the four sentiments to
  • 4. Amazon Reviews Sentiment Analysis - Data Warehouse and Data Mining (UCS625) Project Report 4 calculate the final result. IV. DATA COLLECTION AND PREPARATION We used Amazon reviews dataset provided by https://snap.stanford.edu/data/web-Amazon.html We had to use various data cleaning techniques to get dataset for our use: 1) Separating multiple JSON objects. 2) Parsing the summary field of JSON data. 3) Parsing lexicon data to dictionary. 4) Separating words and emoticons from Data. 5) Filtering Negation words. 6) Filtering all upper case stressful words. 7) Filtering booster words like โ€œveryโ€, โ€œgreatlyโ€, etc. 8) Filtering idioms and spam words 9) Converting the result to CSV format and calculating averages of all sentiments.. Everything was done using python scripts V. TRAINING OF THE MODEL As discussed above, we use the VADER Lexicon file to train the system and not actually a machine learning or data mining algorithm to train the model. It is a rule based system in which we will be using the rules already created and tested by researchers. The training process involves reading the lexicon file which has 7517 words and emoticons to be precise and put them into a python dictionary. So that we can quickly extract the sentiment of the words extracted from the reviews. The structure of the lexicon file is as follows: [Word] [Mean] [Standard Deviation] [A list of ratings based on emotions varying from -4 to +4] We will be creating a hash map which uses the first two fields: [Word]: [Mean Sentiment] to train the model and then use it later. VI. TESTING OF THE MODEL Since the lexicon file is accepted by researchers worldwide, there is little scope of errors in the analysis. Besides, we tested some of the sentences for their sentiments and the results were found out to be quite satisfactory. Here is the table with the scores: The product was very Bad: 'positive': 0.0, 'neutral': 0.513, 'negative': 0.487 I hated the product: 'positive': 0.0, 'neutral': 0.323, 'negative': 0.677 I hated the product!: 'positive': 0.0, 'neutral': 0.308, 'negative': 0.692 I hated the product!!: 'positive': 0.0, 'neutral': 0.295, 'negative': 0.705 I hated the product!!!: 'positive': 0.0, 'neutral': 0.283, 'negative': 0.717 I really hate the product!!: 'positive': 0.0, 'neutral': 0.396, 'negative': 0.604 I hate the service: 'positive': 0.0, 'neutral': 0.351, 'negative': 0.649 I hated the product: 'positive': 0.0, 'neutral': 0.323, 'negative': 0.677 I like the product: 'positive': 0.556, 'neutral': 0.444, 'negative': 0.0 I love the product: 'positive': 0.677, 'neutral': 0.323, 'negative': 0.0 The product is good: 'positive': 0.492, 'neutral': 0.508, 'negative': 0.0 The product is great: 'positive': 0.577, 'neutral': 0.423, 'negative': 0.0 The product is awesome: 'positive': 0.577, 'neutral': 0.423, 'negative': 0.0 I am happy with the product :): 'positive': 0.626, 'neutral': 0.374, 'negative': 0.0 I am happy with the product.: 'positive': 0.481, 'neutral': 0.519, 'negative': 0.0 Really?? You don't deserve to be in the market!: 'positive': 0.0, 'neutral': 1.0, 'negative': 0.0 The product is awesome: 'positive': 0.577, 'neutral': 0.423, 'negative': 0.0 The product works: 'positive': 0.0, 'neutral': 1.0, 'negative': 0.0 The product is very beneficial: 'positive': 0.444, 'neutral': 0.556, 'negative': 0.0 I would never recommend it to anyone: 'positive': 0.0, 'neutral': 0.703, 'negative': 0.297 Thus we can safely rely on the proposed system to analyse the reviews. VII. RESULTS AND DISCUSSIONS We tested the system on Amazon Product โ€“ โ€œJumper cables Automobile partsโ€ Reviews. The dataset chosen had 1259 records. The results obtained were as follows: Negative 0.035002 Neutral 0.595128 Positive 0.369875 Compound 0.296674 So we can analyze that almost 60% of the reviews were neutral. 37% of the reviews were positive and appreciated the product. A very low portion almost 3.5% reviews were negative. A considerable portion of the reviews which is 30% were compound reviews, which means that they had both pros and cons in it indicated by the presence of the word โ€œbutโ€.
  • 5. Amazon Reviews Sentiment Analysis - Data Warehouse and Data Mining (UCS625) Project Report 5 VIII. CONCLUSIONS AND FUTURE SCOPE We can conclude that buying Automobile parts from Amazon is a great deal since most of the users were either positive or neutral regarding their reviews on their purchases. Scope: VADER served as an efficient tool to predict the sentiments and it can extended to almost any type of product reviews. That can be any website which has its primary language as English. Also, it can be used for sentiment analysis in case of Facebook posts as well as twitter tweets related to particular search term. IX. REFERENCES Hutto, C. J. & Gilbert, E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. AAAI 2014. Liu, B. (2012). Sentiment Analysis and Opinion Mining. San Rafael, CA: Morgan & Claypool. Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations & Trends in Information Retrieval, 2(1), 1โ€“135. Liu, B. (2010). Sentiment Analysis and Subjectivity. In N. Indurkhya & F. Damerau (Eds.), Handbook of Natural Language Processing (2nd Ed). Boca Raton, FL: Chapman & Hall.