SlideShare a Scribd company logo
1 of 39
Flipkart Product Review Using
Natural Language Processing (NLP)
Introduction to Natural Language Processing (NLP)
• According to industry estimates, only 21% of the
available data is present in structured form. Data
is being generated as we speak, tweet, and send
messages on WhatsApp, and in various other
activities.
• Despite having high-dimensional data, its
information is not directly accessible unless it is
processed (read and understood) manually or
analyzed by an automated system.
• To produce significant and actionable insights
from text data, it is important to get acquainted
with the techniques and principles of Natural
Language Processing (NLP).
What is Sentiment Analysis?
• Sentiment Analysis, as the name suggests, means to identify the view or emotion behind a
situation.
• We, humans, communicate with each other in a variety of languages, and any language is just a
mediator or a way in which we try to express ourselves. And, whatever we say has a sentiment
associated with it. It might be positive or negative or it might be neutral as well.
• Sentiment Analysis is a sub-field of NLP and with the help of machine learning techniques, it tries
to identify and extract the insights.
• Let’s look at an example below to get a clear view of Sentiment Analysis:
Challenges faced by NLP in real world
1) Ambiguity and Context: NLP struggles with understanding the multiple meanings of words
and phrases in different contexts.
2) Data Quality and Quantity: NLP models need large amounts of high-quality data, but
obtaining and labeling it can be challenging.
3) Domain Adaptation: Models trained in one domain often fail to generalize well to others,
requiring adaptation for real-world use.
4) Ethical and Bias Concerns: Biases in data can lead to unfair outcomes, necessitating
measures to address ethical concerns and mitigate biases.
5) Interpretability and Trust: Complex NLP models are difficult to interpret, making it hard to
trust their decisions without explanation.
Real-life applications of NLP
1) Virtual Assistants: Siri, Alexa, and Google Assistant, aiding in tasks such as setting reminders,
answering questions, and controlling smart devices.
2) Email Filtering and Categorization: Sorting emails into folders or labeling them as spam based
on their content.
3) Language Translation Apps: Such as Google Translate, helps users understand and
communicate in different languages.
4) Customer Support Chatbots: Providing instant responses to customer queries on websites or
messaging platforms.
5) Social Media Monitoring: Analyzing trends, sentiments, and customer feedback on platforms
like Twitter and Facebook for brand reputation management.
Basic Libraries of Python
1) NumPy: For numerical computing with large
arrays and mathematical operations.
2) Pandas: For data manipulation and
analysis, especially with structured data.
3) Matplotlib: For creating various types of
plots and visualizations.
4) scikit-learn: For machine learning tasks like
classification, regression, and clustering.
Important Libraries for NLP
1) NLTK: Offers sentiment analysis via Vader
Sentiment Analyzer.
2) TextBlob: Provides simple functions for
sentiment polarity.
3) scikit-learn: Offers machine learning
algorithms for sentiment classification.
4) spaCy: Supports sentiment analysis via
rule-based or integrated approaches.
5) VADER: Specifically tuned for sentiment
analysis in social media text.
6) Gensim: Python library for topic modeling
and document similarity analysis,
including LSA and LDA.
Dataset
• This dataset contains information about Product name, Product price, Rate, Reviews, Summary,
and Sentiment in CSV format. There are 104 different types of products on flipkart.com such as
electronics items, clothing for men, women, and kids, Home decor items, Automated systems, and
so on. It has 205053 rows and 6 columns.
• This dataset has multiclass labels as sentiment such as positive, neutral, and negative. The
sentiment given was based on a column called Summary using NLP and the Vader model. Also,
after that, we manually checked the label and put it into the appropriate categories if the summary
has text like okay, just ok, or one positive and negative we labeled it as neutral for better
understanding while using this dataset for human languages.
• Data was collected through web scraping using the library called Beautiful Soup from flipkart.com.
First 5 rows of data
Shape of data
There are 205052 rows and 6 features. From the above table, we can see that the Sentiment
column is our target variable since we have to classify whether the Reviews are positive,
negative, or neutral.
All the columns in the data are of Object type.
Checking the type of columns
Checking the null values in the data
Review and Summary have null values present.
After dropping the null values, there are 841 unique products available in Flipkart
data.
Top 10 products in the data
In the product name column there were many punctuation marks and Cyrillic text was present so
it was creating noise in the data. After removing punctuation marks and converting Cyrillic text
into human-readable format here are 10 products that are frequently purchased online.
Distribution of Price
From the KDE plot, we can see that the maximum number of products is between 0 to 1000 price
range. The minimum product price is 59 and the maximum is 86990.
Distribution of Ratings
58.6% have been given 5-star ratings for the products they purchased online.
Top 10 Frequently Used Words in Review
These are the top 10 words used frequently in Reviews of products. And all these reviews reflect
positive sentiments about the products. Also, we saw that a maximum of people have given a 5-
star rating.
Sentiment Analysis
From this graph, we can say that many people have given positive feedback to the products.
Relationship between Sentiment and Rate
This is a count plot of Sentiment and Rate, as we can see for the positive sentiment the highest
rating is 5 and 4, for the negative sentiment the highest is 1, and for neutral all ratings are
distributed evenly. The same can be seen through the line plot.
Relationship between Product price and Rate
The correlation between product price and rate is 0.062 and it is visible that for product prices
of low range, the rating is more as compared to higher product price ranges.
Plotting the Word Cloud for Sentiment columns
1) Positive Sentiment 2) Negative Sentiment
Data Preprocessing
Now, we will pre-process the data before converting it into vectors and passing it to the machine
learning model.
We will create a function for the pre-processing of data.
1) First, we will iterate through each record, and Split the text into individual words or tokens.
2) Then, we will convert the string to lowercase as the word “Good” is different from the word
“good”.
3) Then we will check for stopwords in the data and get rid of them. Stopwords are commonly
used words in a sentence such as “the”, “an”, “to” etc. which do not add much value.
4) Then, we will perform lemmatization on each word,i.e. change the different forms of a word
into a single item called a lemma.
5) A lemma is a base form of a word. For example, “run”, “running” and “runs” are all forms of the
same lexeme, where the “run” is the lemma. Hence, we are converting all occurrences of the
same lexeme to their respective lemma.
Text preprocessing
Stemming and Lemmatization
Topic Modelling using Latent Dirichlet Allocation (LDA)
• Latent Dirichlet Allocation (LDA) is a popular topic modeling technique to extract topics from a given
corpus. In other words, latent means hidden or concealed.
• LDA generates probabilities for the words using which the topics are formed and eventually the topics
are classified into documents.
• Any corpus, which is the collection of documents, can be represented as a document-word (or
document term matrix) also known as DTM.
Vectorization
To convert the text data into numerical data, we need some smart ways which are known as
vectorization, or in the NLP world, it is known as Word embeddings.
Count Vectorizer
• It creates a document term matrix, which is a set of dummy variables that indicates if a
particular word appears in the document.
• Count vectorizer will fit and learn the word vocabulary and try to create a document term matrix
in which the individual cells denote the frequency of that word in a particular document, which is
also known as term frequency, and the columns are dedicated to each word in the corpus.
TF-IDF Vectorization
Term frequency-inverse document frequency ( TF-IDF) gives a measure that considers the
importance of a word depending on how frequently it occurs in a document and a corpus
Term Frequency
Term frequency denotes the frequency of a word in a document.
Inverse Document Frequency
It measures the importance of the word in the corpus. It measures how common a particular
word is across all the documents in the corpus.
For Example, In any corpus, a few words like ‘is’ or ‘and’ are very common, and most likely,
they will be present in almost every document.
Let’s say the word ‘is’ is present in all the documents in a corpus of 1000 documents. The idf for
that would be:
The idf(‘is’) is equal to log (1000/1000) = log 1 = 0
Count Vectorizer and TF-IDF Vectorizer in Python
Machine Learning Model
• This is a machine learning problem and classification where the goal is to predict the
sentiment based on reviews. To do this I fitted the Multinomial Naive Bayes, Random forest
classifier, and XGBoost classifier.
• Our task is a classification problem so we can use performance metrics like precision,
recall, Accuracy, and F1-score.
• We will evaluate our model using various metrics such as Accuracy Score, Precision Score,
Recall Score, and Confusion Matrix and create a roc curve to visualize how our model
performed.
1) Multinomial Naïve Bayes
Count Vectorizer
TF-IDF Vectorizer
For the Multinomial Naïve Bayes model, we got an accuracy of 90%. And also the precision, recall,
and f1-score are above 70%.
2) Random Forest Classifier
Count Vectorizer
TF-IDF Vectorizer
For the Random Forest Classifier, we got an accuracy of 91%.
3) XGBoost Classifier
Count Vectorizer
TF-IDF Vectorizer
For the XGBoost Classifier, we got an accuracy of 91%.
Sample Prediction
We can see that the model can classify the sentiments properly based on reviews.
Recommendation of Products
Conclusion
1. The majority of the reviews (59%) were rated 5 out of 5, indicating a high level of customer
satisfaction.
2. Positive sentiment was the most common sentiment in the reviews, followed by neutral and
negative sentiment.
3. There was a positive correlation between product price and rate, suggesting that customers
were more likely to give higher ratings to more expensive products.
4. The most frequently used words in positive reviews included "good", "great", "love", and
"amazing", while the most frequently used words in negative reviews included "bad",
"terrible", "waste", and "disappointed".
5. The topic modeling analysis identifies several key topics in the reviews, including product
quality, customer service, value for money, and shipping.
6. The Multinomial Naive Bayes classifier achieves an accuracy of around 70% on both count
vectorizer and TF-IDF vectorizer, suggesting that it is a suitable model for sentiment analysis
on this dataset.
7. The Random Forest classifier achieves an accuracy of around 75% on both count vectorizer
and TF-IDF vectorizer, outperforming the Multinomial Naive Bayes classifier.
9. The XGBoost classifier achieves an accuracy of around 80% on the TF-IDF vectorizer,
outperforming both the Multinomial Naive Bayes and Random Forest classifiers.
10. Hyperparameter tuning further improves the performance of the XGBoost classifier, achieving
an accuracy of around 85% on the TF-IDF vectorizer.
11. The analysis suggests that customers tend to be more satisfied with products that are of good
quality, offer good value for money, and have a good customer service experience.
12. The insights gained from this project can be used by Flipkart to make data-driven decisions to
improve its business and provide a better customer experience.
Thank you!!

More Related Content

Similar to NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx

NLP (4) for class 9 (1).pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnn
NLP (4) for class 9 (1).pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnNLP (4) for class 9 (1).pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnn
NLP (4) for class 9 (1).pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnshradhasharma2101
 
NLP - updated (Natural Language Processing))
NLP - updated (Natural Language Processing))NLP - updated (Natural Language Processing))
NLP - updated (Natural Language Processing))Jitendra Kumar Yadav
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEkevig
 
Movie Recommendation System.pptx
Movie Recommendation System.pptxMovie Recommendation System.pptx
Movie Recommendation System.pptxrandominfo
 
Sentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdfSentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdfOmSatpathy
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment AnalysisRebecca Williams
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEijnlc
 
Aspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsAspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsKimberly Pulley
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine LearningIRJET Journal
 
Dictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewDictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewINFOGAIN PUBLICATION
 
Reflective Plan Examples
Reflective Plan ExamplesReflective Plan Examples
Reflective Plan ExamplesMonica Turner
 
Natural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overviewNatural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overviewBenjaminlapid1
 

Similar to NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx (20)

NLP (4) for class 9 (1).pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnn
NLP (4) for class 9 (1).pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnnNLP (4) for class 9 (1).pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnn
NLP (4) for class 9 (1).pptx nnnnnnnnnnnnnnnnnnnnnnnnnnnnn
 
LLM.pdf
LLM.pdfLLM.pdf
LLM.pdf
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
NLP(Natural Language Processing)
NLP(Natural Language Processing)NLP(Natural Language Processing)
NLP(Natural Language Processing)
 
NLP - updated (Natural Language Processing))
NLP - updated (Natural Language Processing))NLP - updated (Natural Language Processing))
NLP - updated (Natural Language Processing))
 
Sentiment analysis on_unstructured_review-1
Sentiment analysis on_unstructured_review-1Sentiment analysis on_unstructured_review-1
Sentiment analysis on_unstructured_review-1
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
sent_analysis_report
sent_analysis_reportsent_analysis_report
sent_analysis_report
 
Movie Recommendation System.pptx
Movie Recommendation System.pptxMovie Recommendation System.pptx
Movie Recommendation System.pptx
 
N01741100102
N01741100102N01741100102
N01741100102
 
Sentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdfSentiment Analysis using Machine Learning.pdf
Sentiment Analysis using Machine Learning.pdf
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment Analysis
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
NLP todo
NLP todoNLP todo
NLP todo
 
NLP PPT.pptx
NLP PPT.pptxNLP PPT.pptx
NLP PPT.pptx
 
Aspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel ReviewsAspect-Level Sentiment Analysis On Hotel Reviews
Aspect-Level Sentiment Analysis On Hotel Reviews
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine Learning
 
Dictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A ReviewDictionary Based Approach to Sentiment Analysis - A Review
Dictionary Based Approach to Sentiment Analysis - A Review
 
Reflective Plan Examples
Reflective Plan ExamplesReflective Plan Examples
Reflective Plan Examples
 
Natural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overviewNatural Language Processing: A comprehensive overview
Natural Language Processing: A comprehensive overview
 

More from Boston Institute of Analytics

NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesBoston Institute of Analytics
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionCombating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionBoston Institute of Analytics
 
Predicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachPredicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachBoston Institute of Analytics
 
Employee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationEmployee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationBoston Institute of Analytics
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...Boston Institute of Analytics
 
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...Boston Institute of Analytics
 
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...Boston Institute of Analytics
 
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...Boston Institute of Analytics
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...Predicting the Perfect Purchase: Student Presentation on Customer Transaction...
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...Boston Institute of Analytics
 

More from Boston Institute of Analytics (20)

E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile Prices
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Analyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning projectAnalyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning project
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud DetectionCombating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection
 
Predicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning ApproachPredicting Liver Disease in India: A Machine Learning Approach
Predicting Liver Disease in India: A Machine Learning Approach
 
Employee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project PresentationEmployee Churn Prediction: Artificial Intelligence Project Presentation
Employee Churn Prediction: Artificial Intelligence Project Presentation
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
Nmap project presentation : Unlocking Network Secrets: Mastering Port Scannin...
 
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
Cyber Security Project Presentation : Essential Reconnaissance Tools and Tech...
 
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
Identifying and Eradicating Web Application Vulnerabilities : Cyber Security ...
 
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
Cyber Security Project Presentation: Unveiling Reconnaissance Tools and Techn...
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...Predicting the Perfect Purchase: Student Presentation on Customer Transaction...
Predicting the Perfect Purchase: Student Presentation on Customer Transaction...
 

Recently uploaded

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 

Recently uploaded (20)

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 

NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx

  • 1. Flipkart Product Review Using Natural Language Processing (NLP)
  • 2. Introduction to Natural Language Processing (NLP) • According to industry estimates, only 21% of the available data is present in structured form. Data is being generated as we speak, tweet, and send messages on WhatsApp, and in various other activities. • Despite having high-dimensional data, its information is not directly accessible unless it is processed (read and understood) manually or analyzed by an automated system. • To produce significant and actionable insights from text data, it is important to get acquainted with the techniques and principles of Natural Language Processing (NLP).
  • 3. What is Sentiment Analysis? • Sentiment Analysis, as the name suggests, means to identify the view or emotion behind a situation. • We, humans, communicate with each other in a variety of languages, and any language is just a mediator or a way in which we try to express ourselves. And, whatever we say has a sentiment associated with it. It might be positive or negative or it might be neutral as well. • Sentiment Analysis is a sub-field of NLP and with the help of machine learning techniques, it tries to identify and extract the insights. • Let’s look at an example below to get a clear view of Sentiment Analysis:
  • 4. Challenges faced by NLP in real world 1) Ambiguity and Context: NLP struggles with understanding the multiple meanings of words and phrases in different contexts. 2) Data Quality and Quantity: NLP models need large amounts of high-quality data, but obtaining and labeling it can be challenging. 3) Domain Adaptation: Models trained in one domain often fail to generalize well to others, requiring adaptation for real-world use. 4) Ethical and Bias Concerns: Biases in data can lead to unfair outcomes, necessitating measures to address ethical concerns and mitigate biases. 5) Interpretability and Trust: Complex NLP models are difficult to interpret, making it hard to trust their decisions without explanation.
  • 5. Real-life applications of NLP 1) Virtual Assistants: Siri, Alexa, and Google Assistant, aiding in tasks such as setting reminders, answering questions, and controlling smart devices. 2) Email Filtering and Categorization: Sorting emails into folders or labeling them as spam based on their content. 3) Language Translation Apps: Such as Google Translate, helps users understand and communicate in different languages. 4) Customer Support Chatbots: Providing instant responses to customer queries on websites or messaging platforms. 5) Social Media Monitoring: Analyzing trends, sentiments, and customer feedback on platforms like Twitter and Facebook for brand reputation management.
  • 6. Basic Libraries of Python 1) NumPy: For numerical computing with large arrays and mathematical operations. 2) Pandas: For data manipulation and analysis, especially with structured data. 3) Matplotlib: For creating various types of plots and visualizations. 4) scikit-learn: For machine learning tasks like classification, regression, and clustering.
  • 7. Important Libraries for NLP 1) NLTK: Offers sentiment analysis via Vader Sentiment Analyzer. 2) TextBlob: Provides simple functions for sentiment polarity. 3) scikit-learn: Offers machine learning algorithms for sentiment classification. 4) spaCy: Supports sentiment analysis via rule-based or integrated approaches. 5) VADER: Specifically tuned for sentiment analysis in social media text. 6) Gensim: Python library for topic modeling and document similarity analysis, including LSA and LDA.
  • 8. Dataset • This dataset contains information about Product name, Product price, Rate, Reviews, Summary, and Sentiment in CSV format. There are 104 different types of products on flipkart.com such as electronics items, clothing for men, women, and kids, Home decor items, Automated systems, and so on. It has 205053 rows and 6 columns. • This dataset has multiclass labels as sentiment such as positive, neutral, and negative. The sentiment given was based on a column called Summary using NLP and the Vader model. Also, after that, we manually checked the label and put it into the appropriate categories if the summary has text like okay, just ok, or one positive and negative we labeled it as neutral for better understanding while using this dataset for human languages. • Data was collected through web scraping using the library called Beautiful Soup from flipkart.com.
  • 9. First 5 rows of data Shape of data There are 205052 rows and 6 features. From the above table, we can see that the Sentiment column is our target variable since we have to classify whether the Reviews are positive, negative, or neutral.
  • 10. All the columns in the data are of Object type. Checking the type of columns
  • 11. Checking the null values in the data Review and Summary have null values present. After dropping the null values, there are 841 unique products available in Flipkart data.
  • 12. Top 10 products in the data In the product name column there were many punctuation marks and Cyrillic text was present so it was creating noise in the data. After removing punctuation marks and converting Cyrillic text into human-readable format here are 10 products that are frequently purchased online.
  • 13. Distribution of Price From the KDE plot, we can see that the maximum number of products is between 0 to 1000 price range. The minimum product price is 59 and the maximum is 86990.
  • 14. Distribution of Ratings 58.6% have been given 5-star ratings for the products they purchased online.
  • 15. Top 10 Frequently Used Words in Review These are the top 10 words used frequently in Reviews of products. And all these reviews reflect positive sentiments about the products. Also, we saw that a maximum of people have given a 5- star rating.
  • 16. Sentiment Analysis From this graph, we can say that many people have given positive feedback to the products.
  • 17. Relationship between Sentiment and Rate This is a count plot of Sentiment and Rate, as we can see for the positive sentiment the highest rating is 5 and 4, for the negative sentiment the highest is 1, and for neutral all ratings are distributed evenly. The same can be seen through the line plot.
  • 18. Relationship between Product price and Rate The correlation between product price and rate is 0.062 and it is visible that for product prices of low range, the rating is more as compared to higher product price ranges.
  • 19. Plotting the Word Cloud for Sentiment columns 1) Positive Sentiment 2) Negative Sentiment
  • 20. Data Preprocessing Now, we will pre-process the data before converting it into vectors and passing it to the machine learning model. We will create a function for the pre-processing of data. 1) First, we will iterate through each record, and Split the text into individual words or tokens. 2) Then, we will convert the string to lowercase as the word “Good” is different from the word “good”. 3) Then we will check for stopwords in the data and get rid of them. Stopwords are commonly used words in a sentence such as “the”, “an”, “to” etc. which do not add much value. 4) Then, we will perform lemmatization on each word,i.e. change the different forms of a word into a single item called a lemma. 5) A lemma is a base form of a word. For example, “run”, “running” and “runs” are all forms of the same lexeme, where the “run” is the lemma. Hence, we are converting all occurrences of the same lexeme to their respective lemma.
  • 23. Topic Modelling using Latent Dirichlet Allocation (LDA) • Latent Dirichlet Allocation (LDA) is a popular topic modeling technique to extract topics from a given corpus. In other words, latent means hidden or concealed. • LDA generates probabilities for the words using which the topics are formed and eventually the topics are classified into documents. • Any corpus, which is the collection of documents, can be represented as a document-word (or document term matrix) also known as DTM.
  • 24. Vectorization To convert the text data into numerical data, we need some smart ways which are known as vectorization, or in the NLP world, it is known as Word embeddings. Count Vectorizer • It creates a document term matrix, which is a set of dummy variables that indicates if a particular word appears in the document. • Count vectorizer will fit and learn the word vocabulary and try to create a document term matrix in which the individual cells denote the frequency of that word in a particular document, which is also known as term frequency, and the columns are dedicated to each word in the corpus.
  • 25. TF-IDF Vectorization Term frequency-inverse document frequency ( TF-IDF) gives a measure that considers the importance of a word depending on how frequently it occurs in a document and a corpus Term Frequency Term frequency denotes the frequency of a word in a document.
  • 26. Inverse Document Frequency It measures the importance of the word in the corpus. It measures how common a particular word is across all the documents in the corpus. For Example, In any corpus, a few words like ‘is’ or ‘and’ are very common, and most likely, they will be present in almost every document. Let’s say the word ‘is’ is present in all the documents in a corpus of 1000 documents. The idf for that would be: The idf(‘is’) is equal to log (1000/1000) = log 1 = 0
  • 27. Count Vectorizer and TF-IDF Vectorizer in Python
  • 28. Machine Learning Model • This is a machine learning problem and classification where the goal is to predict the sentiment based on reviews. To do this I fitted the Multinomial Naive Bayes, Random forest classifier, and XGBoost classifier. • Our task is a classification problem so we can use performance metrics like precision, recall, Accuracy, and F1-score. • We will evaluate our model using various metrics such as Accuracy Score, Precision Score, Recall Score, and Confusion Matrix and create a roc curve to visualize how our model performed.
  • 29. 1) Multinomial Naïve Bayes Count Vectorizer
  • 30. TF-IDF Vectorizer For the Multinomial Naïve Bayes model, we got an accuracy of 90%. And also the precision, recall, and f1-score are above 70%.
  • 31. 2) Random Forest Classifier Count Vectorizer
  • 32. TF-IDF Vectorizer For the Random Forest Classifier, we got an accuracy of 91%.
  • 34. TF-IDF Vectorizer For the XGBoost Classifier, we got an accuracy of 91%.
  • 35. Sample Prediction We can see that the model can classify the sentiments properly based on reviews.
  • 37. Conclusion 1. The majority of the reviews (59%) were rated 5 out of 5, indicating a high level of customer satisfaction. 2. Positive sentiment was the most common sentiment in the reviews, followed by neutral and negative sentiment. 3. There was a positive correlation between product price and rate, suggesting that customers were more likely to give higher ratings to more expensive products. 4. The most frequently used words in positive reviews included "good", "great", "love", and "amazing", while the most frequently used words in negative reviews included "bad", "terrible", "waste", and "disappointed". 5. The topic modeling analysis identifies several key topics in the reviews, including product quality, customer service, value for money, and shipping. 6. The Multinomial Naive Bayes classifier achieves an accuracy of around 70% on both count vectorizer and TF-IDF vectorizer, suggesting that it is a suitable model for sentiment analysis on this dataset. 7. The Random Forest classifier achieves an accuracy of around 75% on both count vectorizer and TF-IDF vectorizer, outperforming the Multinomial Naive Bayes classifier.
  • 38. 9. The XGBoost classifier achieves an accuracy of around 80% on the TF-IDF vectorizer, outperforming both the Multinomial Naive Bayes and Random Forest classifiers. 10. Hyperparameter tuning further improves the performance of the XGBoost classifier, achieving an accuracy of around 85% on the TF-IDF vectorizer. 11. The analysis suggests that customers tend to be more satisfied with products that are of good quality, offer good value for money, and have a good customer service experience. 12. The insights gained from this project can be used by Flipkart to make data-driven decisions to improve its business and provide a better customer experience.