Search Engine Technology. Project – Feature-based Opinion Extraction from Amazon reviews. Ravi Kiran Holur Vijay – rh2424Contents • Abstract • Motivation • System Description • Evaluation • Tools & Data • Important Files • Walkthrough – Using the System • Walkthrough – Evaluating the SystemAbstract.The goal of this project is to develop a software tool that can generate ratings for individual features of aproduct from its opinionated reviews, i.e, given a set of reviews about a product; we can obtain a set offeatures and its ratings.Motivation.The large number of online review sites put a lot of useful and relevant information within a consumer’sreach. These reviews can be used to compare offerings by different competitors and consequently tomake an informed decision about buying a particular offering. But, for a typical consumer, making thisdecision would turn out to be difficult for the following reasons: • The consumer might not be familiar with the various metrics used to compare the offerings in that particular domain. • The consumer might have to read a lot of reviews to get an overview of the product and its features as reading just a few reviews might not help if they are all biased similarly.
Therefore, it would turn out to be helpful if we can somehow: • Pick out the right metrics that could be useful indicators of the product’s performance, specific to its domain. • Summarize the opinions about these important metrics which can be obtained from the large number of reviews into a couple of positive and negative points.These observations in turn led to my decision to develop a software tool that could do precisely whatwas stated above.System Description.At the highest level, the system accomplishes the following tasks: • Gather reviews about the product from Amazon.com. • Select a set of product features to rate on. • Determine the ratings for the selected features based on the sentiment of the sentence in which it appears. • Summarize the ratings for the features as the total number of positive and negative points for each of the review.The techniques implemented were adapted from the paper “Minqing Hu and Bing Liu. "Mining andsummarizing customer reviews". Proceedings of the ACM SIGKDD International Conference onKnowledge Discovery & Data Mining (KDD-2004, full paper), Seattle, Washington, USA, Aug 22-25,2004”. Here’s a snapshot of the general system architecture proposed by Minqing Hu and Bing Liu.
Figure 1 - Architecture of the system.Here’s a breakdown of each step and the implementation details for that step: • Review collection: There are many sources on internet that provide reviews about products. I choose to pull out reviews from Amazon because of the large domain it covers and the large number of choices it offers for the consumer. It also has considerable number of different reviews for each of the items. The reviews are obtained using Amazon’s Web Services API, whereby we get an XML response. This XML file is later parsed to obtain the reviews. The system currently fetches upto 20 pages of reviews, with 5 reviews per page. This option can be changed to any integer. • Sentence segmentation and POS tagging: I have used the NLProcessor program to accomplish this step. This program is available for both windows and unix. Once we have the product reviews, we run the reviews through the NLProcessor software to obtain an output in the format defined by NLProcessor. • Frequent feature identification: All the nouns and noun phrases occurring in each sentence are chosen as candidate features and are aggregated into a transaction file. A variant of Apriori
algorithm is then run on this to identify the features that are frequently commented upon, with the hope that these are the features that really matter for the product. For the Apriori algorithm part, a package from CPAN named “Data::Mining:AssociationRules” is used. From this, we get a set of frequent patterns which might be candidate features for the product. • Feature Pruning: Once we have a set of candidate features, we can use a couple of heuristics for removing some items that might not be a relevant feature. I have implemented the Compactness and Redundancy pruning heuristics, as described in the paper by “Minqing Hu and Bing Liu”. • Opinion Words Extraction: Now, we have a set of product features and we need to identify the opinion words that describe them. For this, we extract the adjectives that are within some fixed distance from each of the feature words. Thus, we get a list of adjectives describing each of the features. • Opinion Orientation Identification: Once we have a set of opinion words, we need to calculate its orientation i.e. whether the opinion word is expressing a positive or a negative opinion. For this, I have used the data from Sentiwordnet, as described by “Andrea Esuli and Fabrizio Sebastiani. SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining. In Proceedings of LREC-06, 5th Conference on Language Resources and Evaluation, Genova, IT, 2006, pp. 417-422”. I have written 2 modules which collect data from either the locally available database or from the web by parsing HTML output generated by Sentiwordnet. By default, I will be using the locally available copy of Sentiwordnet. Given a word, it gives us a score for positivity, negativity and neutrality. • Opinion sentence orientation identification: Now that we have the orientations of individual opinion words, we can try to estimate the orientation of the sentence containing them. For this, I have implemented the algorithm described in the paper by “Minqing Hu and Bing Liu”. Only the sentences that contain at least one feature word are considered. • Opinion Summarization: We can calculate the total number of positive and negative sentences that describe each of the features. The features are ranked first by the number of terms they contain and then by the number of times they appear in the reviews (frequency). So, we have a tuple of <Feature, Positive scores, Negative scores>.Evaluation.I carried out a basic evaluation of the system as follows: • Obtained the hand-annotated dataset by “Minqing Hu and Bing Liu” from http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html. • Extracted the manually identified features. • Extracted the features from the reviews automatically using the software tool. • Extracted the manually identified opinion sentences. • Extracted the opinion sentences automatically using the software tool. • For more details, please take a look at the paper by “Minqing Hu and Bing Liu”.
• Now that we have the set of actual features and sentences, along with the automatically retrieved features and sentences, we can calculate the Precision and Recall measures.Here are the results of running the evaluation program as described above:Product No. of No. of Precision Recall No. of No. of Precision Recall for Accuracy annotated extracted for for annotated extracted for sentences for Features features features features sentences sentences sentences sentences.Camera1 106 78 0.295 0.217 239 400 0.42 0.703 0.60Camera2 75 93 0. 162 0.2 160 266 0.451 0.75 0.67DVD 116 61 0.345 0.181 344 463 0.523 0.70 0.60PlayerCell 111 83 0.35 0.26 265 352 0.59 0.78 0.70PhoneMp3 190 78 0.372 0.153 720 1100 0.46 0.70 0.57Player Figure 2 - System Evaluation • A very important comment I would like to make is that these results appear to be lower than that obtained by “Minqing Hu and Bing Liu”. The reason is that they have considered only a subset of the manually annotated features for each of the products, as can be seen from their feature counts. Whereas the evaluation that I have documented includes all of the annotated features, including the implicit features (like “size” in “the phone fits in my pocket”) and those requiring pronoun resolution (like size and mobile in “it fits in my pocket”). Also, they have not documented what subset of features they considered during their evaluation in order to reduce the feature set to the numbers they have tabulated. • Another point worth nothing is the difference in techniques used to calculate the orientation of each feature. In the paper by “Minqing Hu and Bing Liu”, they use an algorithm based on WordNet and an initial set of seed adjectives, whereas I am using the Sentiwordnet database for the same task.Tools and Data.I have used the following third-party tools and libraries: • Data::Mining:AssociationRules for mining association rules. (http://search.cpan.org/~dfrankow/Data-Mining-AssociationRules- 0.10/lib/Data/Mining/AssociationRules.pm). • NLProcessor for POS tagging and sentence segmenting (http://www.infogistics.com/textanalysis.html). • SentiWordNet for calculating orientation of individual words. Andrea Esuli and Fabrizio Sebastiani. SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining. In Proceedings of LREC-06, 5th Conference on Language Resources and Evaluation, Genova, IT, 2006, pp. 417-422. (http://sentiwordnet.isti.cnr.it)
• Amazon web services API for extracting reviews from Amazon (http://docs.amazonwebservices.com/AWSECommerceService/latest/DG/). • “Minqing Hu and Bing Liu. "Mining and summarizing customer reviews". Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-2004, full paper), Seattle, Washington, USA, Aug 22-25, 2004”.Important Files. • FeatureExtraction.pm: The module that contains all the methods required to process reviews and give out summary for each of the features. • SystemEval.pm: The module that contains the methods for evaluating the system using Precision and Recall measures. • ExtractReviews.pl: A command line client using the API’s provided by FeatureExtraction module for generating features and their ratings. • Evaluate.pl: A command line client using the API’s provided by SystemEval module for evaluating the system. • SentiWordNet_1.0.1.txt: The SentiWordNet database containing positive and negative scores for words. • eval_reviews and eval_results: Contains some reviews in the annotated format and also the results of running the evaluation program on those files. • FeatureExtraction.html: POD2HTML format documentation for the FeatureExtraction module. • SystemEval.html: POD2HTML format documentation for the SystemEval module.A demo walkthrough using the system. • Verify the prerequisites: The following libraries should be available either in the program’s directory or in the Perl’s Lib directory. o DataMiningAssociationRules.pm. o SentementFeatureExtraction.pm, SentementSystemEval.pm, SentementData directory. o SentiWordNet_1.0.1.txt in the Program’s directory. o LWP::Simple Perl library. o POSIX Perl library. • The following external programs must be installed. o NLProcessor from http://www.infogistics.com/demos/ o NLProcessor should be working, else we will get some weird errors in our program. o I have included the archive as well as the installation instructions. • Obtain the ASIN: We need a product to mine opinions for. For this, visit Amazon.com using any internet browser and browse to the product you are interested in. For the purpose of this demo, I am interested in the product “Canon Digital Rebel XSi 12.2 MP Digital SLR Camera with EF-S 18- 55mm f/3.5-5.6 IS Lens (Black)”. Once we are on the item’s page, search for the item’s ASIN. Just
search for the string “asin:” on the product’s page and you should have it. For the product mentioned above, the ASIN is “B0012YA85A”.• Run the extraction and rating script: ExtractReviews.pl <ASIN> <Output file> <NLProcessor> o ASIN of the product from Amazon. o Output file to write the results to. o Full path to the NLProcessor executable program. o In our case, I used the following command - perl ExtractReviews.pl "B0012YA85A" "features_canonrebel.txt" "c:nlpbinnlp.cmd" o Now, we have the output in the file “features_canonrebel.txt” in the format: feature, number of positive ratings, number of negative ratings.• Since the format is CSV, we can easily import the data into Matlab and get some fancy plots. Here’s what we can do: o Copy the features output file (features_canonrebel.txt) and the Matlab visualization script (createfigure.m) into Matlab’s work directory or any other directory of your choice. o Start Matlab and run the visualization script on the output features file. createfigure(<featured file>,<top ‘n’ features to include> eg: createfigure(‘features_canonrebel.txt’, 10) If everything goes fine, we can see a graphical display of the feature ratings. As indicated by the legend, the Green bars indicate number of positive reviews and the Red bar indicates number of negative reviews. The numbers 1 … 10 corrospond to the features in the feature file (specifically, the line number in the features file). Figure 3 - Top 50 features
Figure 4 - Top 10 FeaturesA demo walkthrough for evaluating the system. • Identify the annotated review file: I have included some sample reviews in the “eval_reviews” folder. The reviews should be in the format as described in “Minqing Hu and Bing Liu. "Mining and summarizing customer reviews". Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD-2004, full paper), Seattle, Washington, USA, Aug 22-25, 2004”. I obtained these files from http://www.cs.uic.edu/~liub/FBS/CustomerReviewData.zip. For this demo, let’s select “camera1.txt”. • Run the evaluation script: Evaluate.pl <annotated reviews> <NLProcessor command>. o perl Evaluate.pl "eval_reviewsmp3player.txt" "c:nlpbinnlp.cmd" > mp3player.txt. • The system will be automatically evaluated and we get the values for precision and recall at both the feature and the sentence levels. • Here’s a sample output from the command. o For features ... o Precision = 0.371794871794872 ... Recall = 0.152631578947368 o For Sentences ... o Precision = 0.457194899817851 ... Recall = 0.698191933240612 ... Accuracy = 0.567729083665339We now know how to process reviews as well as how to evaluate the system through practicalwalkthroughs.