SlideShare a Scribd company logo
1 of 39
Text Mining - Bayesian Topic Modeling for Interactive Retrievalat SAP and Cisco Ram Akella University of California and Stanford With Karla Caballero, Maria Daltayanni, Chunye Wang - UCSC and Paul Hofmann SAP Labs October 6, 2011 SAP
Outline Motivation Statistical Topic Modeling - SAP & Saffron Knowledge Extraction and Reuse at Cisco Interactive Retrieval Interactive Retrieval Demo
Outline Motivation Statistical Topic Modeling - SAP & Saffron  Knowledge Extraction and Reuse in Cisco Interactive Retrieval Interactive Retrieval Demo
Motivation 10/6/2011 User expects to find more relevant results each time she interacts with the system Depression treatment of patients… q3: symptoms and treatment q2: depression symptoms q1: elderly depression DOCTOR SEARCH Depression influence on family relationships… Relevance of the presented documents depends on user context SOCIAL SCIENTIST
Interactive Retrieval Model  Query Interactive  Retrieval System User Feeback Document Collection  Metadata  Generation System Information need Update Feedback and  propagation to  similar documents
Interactive Retrieval Model  Query Interactive  Retrieval System User Feeback Document Collection  Metadata Generation System Add to the document metadata  that facilitates the retrieval process This metadata consist of: Statistical Topic Mixture Knowledge Extraction based on Business process (problem, cause,  solution) Information need Update Feedback and  propagation to  similar documents
Outline Motivation Statistical Topic Modeling - SAP & Saffron Motivation Related Work Proposed Approach Topic Modeling and Entity Association Knowledge Extraction and Reuse at Cisco Interactive Retrieval Interactive Retrieval Demo
Topic Modeling: Motivation Given a set of documents, we want to identify the main areas or topics discussed in a unsupervised manner. We take advantage of the semantic associations between words across the documents.   If two words appear in the same document, they should be related. For each topic we have different distributions of words and each document might contain material about a variety of topics. Music notes instrument net ball racquet Sports Play net Topic 1 (80%) Sports game Topic 1 Sports Topic 2 (5%) Topic 3 (20%) Common Words ball 10/6/2011
Related Work
Our Approach  The higher probability mass is accommodated in the upper part of the tree (this facilitates the truncation and reduction in the number of topics) We can define a method to determine the number of topics suitable for a particular dataset without training the model several times (each time for a given number of specified topics) … … …  0.0851  0.0660 0.0310  0.0096 0.0146 10/6/2011
Experimental Setup The datasets are from two types:  Scientific Articles (NIPS) Longer documents News Data (NYT, APW, XIE) Shorter Documents More diverse vocabulary We compare the performance of the algorithm against  three approaches in the literature : LDA, CTM and Pachinko We test our model using Empirical Likelihood This method estimate how likely it is that a test document  will be generated from the estimated model.  We want this value to be high (better generalization and applicability to unseen documents). 10/6/2011
Results: NYT Dataset We obtain the topic mixture for the NYT Dataset using K=20 topics . 10/6/2011 + + - - + + + +
Results: Empirical Likelihood 10/6/2011 13 Our Model APW Dataset NIPS Dataset XIE Dataset NYT Dataset
Results: Running Time 10/6/2011 Minutes Minutes APW Dataset NIPS Dataset Minutes Our Model Minutes XIE Dataset NYT Dataset
Illustrative Example: NYT Dataset 10/6/2011 NORTHRIDGE TAUGHT A LESSON LOS ANGELES _ School has been out at Cal State Northridge since the week before Christmas, but since you can learn something everyday, Mississippi State's women's basketball team gave a lesson.   Northridge has talked about taking its game to the next level. The 21st-ranked Bulldogs _ the first nationally ranked team to play here in Northridge's Division I era _ gave a glimpse of that level in a 98-64 nonconference victory before a crowd of 165 Friday night.
Illustrative Example: NYT Dataset 10/6/2011 NORTHRIDGE TAUGHT A LESSON LOS ANGELES _ School has been out at Cal State Northridge since the week before Christmas, but since you can learn something everyday, Mississippi State's women's basketball team gave a lesson.   Northridge has talked about taking its game to the next level. The 21st-ranked Bulldogs _ the first nationally ranked team to play here in Northridge's Division I era _ gave a glimpse of that level in a 98-64 nonconference victory before a crowd of 165 Friday night.
Illustrative Example: NYT Dataset 10/6/2011 NORTHRIDGE TAUGHT A LESSON LOS ANGELES _ School has been out at Cal State Northridge since the week before Christmas, but since you can learn something everyday, Mississippi State's women's basketball team gave a lesson.   Northridge has talked about taking its game to the next level. The 21st-ranked Bulldogs _ the first nationally ranked team to play here in Northridge's Division I era _ gave a glimpse of that level in a 98-64 nonconference victory before a crowd of 165 Friday night.
Illustrative Example: NYT Dataset 10/6/2011 NORTHRIDGE TAUGHT A LESSON LOS ANGELES _ School has been out at Cal State Northridge since the week before Christmas, but since you can learn something everyday, Mississippi State's women's basketball team gave a lesson.   Northridge has talked about taking its game to the next level. The 21st-ranked Bulldogs _ the first nationally ranked team to play here in Northridge's Division I era _ gave a glimpse of that level in a 98-64 nonconference victory before a crowd of 165 Friday night.
Topic Modeling & Entity Association Entities SAP Business Objects Entity Extractor Saffron Associative Memory Base Base knowledge Source Query Text Data to be monitored UCSC Topic Mining System We would like to know who are the actors involved in a particular action that led to the failure of Lehman brothers Valukas Report about  why Lehman Brothers Failed (6 volumes) Topics Saffron Associative Memory creates associations among entities and topics  This work was presented at SAPPHIRE NOW 2010
Outline Motivation Statistical Topic modeling - SAP & Saffron Knowledge Extraction and Reuse in Cisco Knowledge Extraction System System Architecture Domain Knowledge Improving Productivity Performance of Service Request Recommender Interactive Retrieval Interactive Retrieval Demo
Knowledge Extraction System at Cisco Service Request Database Knowledge  Database Applications such as retrieval Service Request Text Mining System Unstructured Text Knowledge Finding different solutions to the same problem Problem Cause Document 1 Document 2 Similarity Solution high Problem Problem high Cause Cause Irrelevant Content low Solution Solution Why did it occur? How was it solved? What was the  problem?
System Architecture Features from Expertise Service Request Preprocessor Bag-of-words Feature Generator Hierarchical Classifier Expertise Domain Knowledge Labeled Paragraphs Service Request Recommender User Legend Data flow  of Analyzer Data flow of Recommender Data output for User
Domain Knowledge ,[object Object]
Benefits: (1) the expansion of acronyms and terminology; 		     (2) the enhancement of concept dependencies. ,[object Object],Snippet from Doc1 Measuring  similarity Snippet from Doc2 […]: explanation from ITAD.  Blue: overlapping words between unexpanded excerpts. Red: overlapping words introduced by ITAD.
Improving Productivity Compare the time spent by engineers in reading service requests before and after using our system. Browse a service request Time to access relevance N Relevant? Y Read and understand thoroughly Time to extract knowledge Read enough?   N Y Create knowledge article
Performance of Service Request Recommender Our Method Retrieval Schemes ,[object Object]
Result 2: Using domain knowledge further improves retrieval results.
Result 3: Probabilistic recommender outperformed deterministic recommender.,[object Object]
Interactive Retrieval Model the user intent to retrieve relevant documents Identify the trade-off between Retrieval accuracy (how accurate are the results required to be by the user?) Interaction time (how much time is the user willing to spend on interaction?) Applied to Medical documents retrieval e.g., search for past patient cases with similar symptoms Resume retrieval in a labor marketplace e.g., search for Python developers who work in machine learning MORE IMPORTANT LESS IMPORTANT
Problem 10/6/2011 28 Dynamic Programming t1                    t2                  t3 …            tn Reinforcement Learning User Intent User Intent User Intent Set of Relevant Documents Set of Relevant Documents Set of Relevant Documents Myopic Dynamic Static Dynamic What is the best path to choose ?
Reinforcement Learning formulation of IIR Agent IIR system Environment User Action Ranking Rk Objective Max. sum of rewards Reward Improvement  v(Rk)-v(Rk-1) (as observed from user feedback) Intent Best guess for user intent or need (expressed in query terms)
Experiments Set-Up Dataset: TREC-9 OHSUMED, 348.566 medical documents with a list of relevance judgments 65 user queries query title: 2 − 5 words query description: 5 − 10 words Interactive Sessions of 3 − 5 steps Relevance function is binary Value of results (with appropriate weights wi)  Precision @10: percentage of relevant  documents in the top-10 results We compare our results with Pseudo-relevance Feedback
How many interaction steps needed? 9/19/2011
How much feedback is needed? Experiments tested on 348,566 OHSU-MED medical dataset, TREC 2002
Interactive Retrieval w Topic Modeling Topics help us to reduce the search  They add context to the query Some important terms to describe the users’ intent may not be included in the query Topics are calculated a-priori and added to each document as metadata Topic Mixture of Relevant Docs Meta-query (combination of user inputs) Updated each time the user provides feedback (clicks) or additional information to the system (query redefinition) Topic Mixture of Non Relevant Docs Combination of terms and topic  relevance scores
Proposed Dataset We test our approach using the HARD TREC queries which consist of : 851,018 news documents from NYT APW and XIE agencies Each document has an average length of 305 terms There are 496,779 unique terms We infer the topic information of the corpus using  75 topics   For testing purposes we use m=3 interactions We use test 30 queries We compare our algorithm with mixture relevance feedback 10/6/2011
Preliminary Results 10/6/2011 Precision Number of Interactions
Outline Motivation Statistical Topic modeling– SAP & Saffron Knowledge Extraction and Reuse at Cisco Interactive Retrieval Interactive Retrieval Demo
Example User intent young female with fevers and increased CPK (CreatinePhosphoKinase) CPK: enzyme, may cause heart attack or severe muscle breakdown if increased neuroleptic malignant syndrome (life-threatening neurological disorder) Associated with CPK Symptoms: muscular cramps, fever, unstable blood pressure, changes in cognition, including agitation, delirium and coma differential diagnosis List symptoms List causes of the symptoms Prioritize by the most dangerous  Treat treatment

More Related Content

What's hot

Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2Sara Hooker
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016George Roth
 
An Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationAn Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationSeth Grimes
 
Interleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904LabsInterleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904LabsJohn T. Kane
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and boltsNBER
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchTrey Grainger
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelTrey Grainger
 
Popular Text Analytics Algorithms
Popular Text Analytics AlgorithmsPopular Text Analytics Algorithms
Popular Text Analytics AlgorithmsPromptCloud
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1Sara Hooker
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextSeth Grimes
 
Getting Started with Unstructured Data
Getting Started with Unstructured DataGetting Started with Unstructured Data
Getting Started with Unstructured DataChristine Connors
 
Module 1 introduction to machine learning
Module 1  introduction to machine learningModule 1  introduction to machine learning
Module 1 introduction to machine learningSara Hooker
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratorySara Hooker
 
Semantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering SystemsSemantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering SystemsAndre Freitas
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Andre Freitas
 
Recommender System with Distributed Representation
Recommender System with Distributed RepresentationRecommender System with Distributed Representation
Recommender System with Distributed RepresentationRakuten Group, Inc.
 

What's hot (20)

Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
 
Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016Unstructured data processing webinar 06272016
Unstructured data processing webinar 06272016
 
An Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationAn Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentation
 
Interleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904LabsInterleaving, Evaluation to Self-learning Search @904Labs
Interleaving, Evaluation to Self-learning Search @904Labs
 
Nuts and bolts
Nuts and boltsNuts and bolts
Nuts and bolts
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered Search
 
Sina presentation in IBM
Sina presentation in IBMSina presentation in IBM
Sina presentation in IBM
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
 
NLP Structured Data Investigation on Non-Text
NLP Structured Data Investigation on Non-TextNLP Structured Data Investigation on Non-Text
NLP Structured Data Investigation on Non-Text
 
Popular Text Analytics Algorithms
Popular Text Analytics AlgorithmsPopular Text Analytics Algorithms
Popular Text Analytics Algorithms
 
Module 8: Natural language processing Pt 1
Module 8:  Natural language processing Pt 1Module 8:  Natural language processing Pt 1
Module 8: Natural language processing Pt 1
 
Streaming Outlier Analysis for Fun and Scalability
Streaming Outlier Analysis for Fun and Scalability Streaming Outlier Analysis for Fun and Scalability
Streaming Outlier Analysis for Fun and Scalability
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
 
Getting Started with Unstructured Data
Getting Started with Unstructured DataGetting Started with Unstructured Data
Getting Started with Unstructured Data
 
Module 1 introduction to machine learning
Module 1  introduction to machine learningModule 1  introduction to machine learning
Module 1 introduction to machine learning
 
Module 1.3 data exploratory
Module 1.3  data exploratoryModule 1.3  data exploratory
Module 1.3 data exploratory
 
Semantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering SystemsSemantic Perspectives for Contemporary Question Answering Systems
Semantic Perspectives for Contemporary Question Answering Systems
 
Bayesian reasoning
Bayesian reasoningBayesian reasoning
Bayesian reasoning
 
Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)Question Answering over Linked Data (Reasoning Web Summer School)
Question Answering over Linked Data (Reasoning Web Summer School)
 
Recommender System with Distributed Representation
Recommender System with Distributed RepresentationRecommender System with Distributed Representation
Recommender System with Distributed Representation
 

Viewers also liked

New Technologies For The Sustainable Enterprise; keynote @Wharton
New Technologies For The Sustainable Enterprise; keynote @WhartonNew Technologies For The Sustainable Enterprise; keynote @Wharton
New Technologies For The Sustainable Enterprise; keynote @WhartonPaul Hofmann
 
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...Paul Hofmann
 
The Big Five IT Mega Trends
The Big Five IT Mega TrendsThe Big Five IT Mega Trends
The Big Five IT Mega TrendsPaul Hofmann
 
Intent-Aware Temporal Query Modeling for Keyword Suggestion
Intent-Aware Temporal Query Modeling for Keyword SuggestionIntent-Aware Temporal Query Modeling for Keyword Suggestion
Intent-Aware Temporal Query Modeling for Keyword SuggestionFindwise
 
CAVE Language Presentation for AI Camp
CAVE Language Presentation for AI CampCAVE Language Presentation for AI Camp
CAVE Language Presentation for AI CampLoren Davie
 
Transforming the intent to action
Transforming the intent to actionTransforming the intent to action
Transforming the intent to actionCSO Partners
 
Sketching Out Your Search Intent
Sketching Out Your Search IntentSketching Out Your Search Intent
Sketching Out Your Search IntentXian-Sheng Hua
 
A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...
A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...
A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...Sven Van Laere
 
Predicting Current User Intent with Contextual Markov Models
Predicting Current User Intent with Contextual Markov ModelsPredicting Current User Intent with Contextual Markov Models
Predicting Current User Intent with Contextual Markov ModelsJulia Kiseleva
 
Economics of Cloud Computing
Economics of Cloud ComputingEconomics of Cloud Computing
Economics of Cloud ComputingPaul Hofmann
 
Advanced Keyword Modeling
Advanced Keyword ModelingAdvanced Keyword Modeling
Advanced Keyword ModelingBill Hunt
 
Ptc creo fmx sales presentation
Ptc creo fmx sales presentationPtc creo fmx sales presentation
Ptc creo fmx sales presentationVictor Mitov
 
RFID Simulation of the US Pharmaceutical Supply Chain
RFID Simulation of the US Pharmaceutical Supply ChainRFID Simulation of the US Pharmaceutical Supply Chain
RFID Simulation of the US Pharmaceutical Supply ChainPaul Hofmann
 
Network Intent Composition in OpenDaylight
Network Intent Composition in OpenDaylightNetwork Intent Composition in OpenDaylight
Network Intent Composition in OpenDaylightOpenDaylight
 
Summit 16: Applying Machine Learning to Intent-based Networking and Nfv Scali...
Summit 16: Applying Machine Learning to Intent-based Networking and Nfv Scali...Summit 16: Applying Machine Learning to Intent-based Networking and Nfv Scali...
Summit 16: Applying Machine Learning to Intent-based Networking and Nfv Scali...OPNFV
 
Saffron Tech Company Profile
Saffron Tech Company ProfileSaffron Tech Company Profile
Saffron Tech Company ProfileIT Chimes
 
Droidcon it 2015: Android Lollipop for Enterprise
Droidcon it 2015: Android Lollipop for EnterpriseDroidcon it 2015: Android Lollipop for Enterprise
Droidcon it 2015: Android Lollipop for EnterpriseConsulthinkspa
 
Object-oriented design patterns in UML [Software Modeling] [Computer Science...
Object-oriented design patterns  in UML [Software Modeling] [Computer Science...Object-oriented design patterns  in UML [Software Modeling] [Computer Science...
Object-oriented design patterns in UML [Software Modeling] [Computer Science...Ivano Malavolta
 
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Yun-Nung (Vivian) Chen
 

Viewers also liked (20)

New Technologies For The Sustainable Enterprise; keynote @Wharton
New Technologies For The Sustainable Enterprise; keynote @WhartonNew Technologies For The Sustainable Enterprise; keynote @Wharton
New Technologies For The Sustainable Enterprise; keynote @Wharton
 
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
New Business Applications Powered by In-Memory Technology @MIT Forum for Supp...
 
The Big Five IT Mega Trends
The Big Five IT Mega TrendsThe Big Five IT Mega Trends
The Big Five IT Mega Trends
 
Intent-Aware Temporal Query Modeling for Keyword Suggestion
Intent-Aware Temporal Query Modeling for Keyword SuggestionIntent-Aware Temporal Query Modeling for Keyword Suggestion
Intent-Aware Temporal Query Modeling for Keyword Suggestion
 
CAVE Language Presentation for AI Camp
CAVE Language Presentation for AI CampCAVE Language Presentation for AI Camp
CAVE Language Presentation for AI Camp
 
Transforming the intent to action
Transforming the intent to actionTransforming the intent to action
Transforming the intent to action
 
Sketching Out Your Search Intent
Sketching Out Your Search IntentSketching Out Your Search Intent
Sketching Out Your Search Intent
 
A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...
A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...
A Method for Detecting Behavior-Based User Profiles in Collaborative Ontology...
 
Predicting Current User Intent with Contextual Markov Models
Predicting Current User Intent with Contextual Markov ModelsPredicting Current User Intent with Contextual Markov Models
Predicting Current User Intent with Contextual Markov Models
 
Economics of Cloud Computing
Economics of Cloud ComputingEconomics of Cloud Computing
Economics of Cloud Computing
 
Advanced Keyword Modeling
Advanced Keyword ModelingAdvanced Keyword Modeling
Advanced Keyword Modeling
 
Ptc creo fmx sales presentation
Ptc creo fmx sales presentationPtc creo fmx sales presentation
Ptc creo fmx sales presentation
 
RFID Simulation of the US Pharmaceutical Supply Chain
RFID Simulation of the US Pharmaceutical Supply ChainRFID Simulation of the US Pharmaceutical Supply Chain
RFID Simulation of the US Pharmaceutical Supply Chain
 
Network Intent Composition in OpenDaylight
Network Intent Composition in OpenDaylightNetwork Intent Composition in OpenDaylight
Network Intent Composition in OpenDaylight
 
Summit 16: Applying Machine Learning to Intent-based Networking and Nfv Scali...
Summit 16: Applying Machine Learning to Intent-based Networking and Nfv Scali...Summit 16: Applying Machine Learning to Intent-based Networking and Nfv Scali...
Summit 16: Applying Machine Learning to Intent-based Networking and Nfv Scali...
 
LINK TO VIDEOS
LINK TO VIDEOSLINK TO VIDEOS
LINK TO VIDEOS
 
Saffron Tech Company Profile
Saffron Tech Company ProfileSaffron Tech Company Profile
Saffron Tech Company Profile
 
Droidcon it 2015: Android Lollipop for Enterprise
Droidcon it 2015: Android Lollipop for EnterpriseDroidcon it 2015: Android Lollipop for Enterprise
Droidcon it 2015: Android Lollipop for Enterprise
 
Object-oriented design patterns in UML [Software Modeling] [Computer Science...
Object-oriented design patterns  in UML [Software Modeling] [Computer Science...Object-oriented design patterns  in UML [Software Modeling] [Computer Science...
Object-oriented design patterns in UML [Software Modeling] [Computer Science...
 
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
Unsupervised Learning and Modeling of Knowledge and Intent for Spoken Dialogu...
 

Similar to Dynamic Search Using Semantics & Statistics

CIKM Tutorial 2008
CIKM Tutorial 2008CIKM Tutorial 2008
CIKM Tutorial 2008Peiling Wang
 
Advancing the International Plant Names Index (IPNI)
Advancing the International Plant Names Index (IPNI) Advancing the International Plant Names Index (IPNI)
Advancing the International Plant Names Index (IPNI) nickyn
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabadGeohedrick
 
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)Qazi Maaz Arshad
 
Data Science Workshop - day 1
Data Science Workshop - day 1Data Science Workshop - day 1
Data Science Workshop - day 1Aseel Addawood
 
Epistemic networks for Epistemic Commitments
Epistemic networks for Epistemic CommitmentsEpistemic networks for Epistemic Commitments
Epistemic networks for Epistemic CommitmentsSimon Knight
 
Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Jean Brenda
 
AllegroGraph - Cognitive Probability Graph webcast
AllegroGraph - Cognitive Probability Graph webcastAllegroGraph - Cognitive Probability Graph webcast
AllegroGraph - Cognitive Probability Graph webcastFranz Inc. - AllegroGraph
 
Predicting Online News Popularity
Predicting Online News Popularity Predicting Online News Popularity
Predicting Online News Popularity Ke Feng
 
Research Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesResearch Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesRebekah Cummings
 
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...IRJET Journal
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsCambridge Semantics
 
Data Samples & Data AnalysesNYU SCPSDataba
Data Samples & Data AnalysesNYU  SCPSDatabaData Samples & Data AnalysesNYU  SCPSDataba
Data Samples & Data AnalysesNYU SCPSDatabaOllieShoresna
 
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...Stephen Childs
 
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...Rasha
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceMahir Haque
 
Machine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineSalford Systems
 

Similar to Dynamic Search Using Semantics & Statistics (20)

CIKM Tutorial 2008
CIKM Tutorial 2008CIKM Tutorial 2008
CIKM Tutorial 2008
 
Advancing the International Plant Names Index (IPNI)
Advancing the International Plant Names Index (IPNI) Advancing the International Plant Names Index (IPNI)
Advancing the International Plant Names Index (IPNI)
 
Data science training in hyderabad
Data science training in hyderabadData science training in hyderabad
Data science training in hyderabad
 
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
Cse443 Project Report - LPU (Modern Big Data Analysis with SQL Specialization)
 
Data Science Workshop - day 1
Data Science Workshop - day 1Data Science Workshop - day 1
Data Science Workshop - day 1
 
Epistemic networks for Epistemic Commitments
Epistemic networks for Epistemic CommitmentsEpistemic networks for Epistemic Commitments
Epistemic networks for Epistemic Commitments
 
Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval
 
AllegroGraph - Cognitive Probability Graph webcast
AllegroGraph - Cognitive Probability Graph webcastAllegroGraph - Cognitive Probability Graph webcast
AllegroGraph - Cognitive Probability Graph webcast
 
Predicting Online News Popularity
Predicting Online News Popularity Predicting Online News Popularity
Predicting Online News Popularity
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
Research Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesResearch Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and Humanities
 
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Sustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive AnalyticsSustainability Investment Research Using Cognitive Analytics
Sustainability Investment Research Using Cognitive Analytics
 
Data Samples & Data AnalysesNYU SCPSDataba
Data Samples & Data AnalysesNYU  SCPSDatabaData Samples & Data AnalysesNYU  SCPSDataba
Data Samples & Data AnalysesNYU SCPSDataba
 
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
CIRPA 2016: Individual Level Predictive Analytics for Improving Student Enrol...
 
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
11 - qualitative research data analysis ( Dr. Abdullah Al-Beraidi - Dr. Ibrah...
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Machine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search Engine
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 

Dynamic Search Using Semantics & Statistics

  • 1. Text Mining - Bayesian Topic Modeling for Interactive Retrievalat SAP and Cisco Ram Akella University of California and Stanford With Karla Caballero, Maria Daltayanni, Chunye Wang - UCSC and Paul Hofmann SAP Labs October 6, 2011 SAP
  • 2. Outline Motivation Statistical Topic Modeling - SAP & Saffron Knowledge Extraction and Reuse at Cisco Interactive Retrieval Interactive Retrieval Demo
  • 3. Outline Motivation Statistical Topic Modeling - SAP & Saffron Knowledge Extraction and Reuse in Cisco Interactive Retrieval Interactive Retrieval Demo
  • 4. Motivation 10/6/2011 User expects to find more relevant results each time she interacts with the system Depression treatment of patients… q3: symptoms and treatment q2: depression symptoms q1: elderly depression DOCTOR SEARCH Depression influence on family relationships… Relevance of the presented documents depends on user context SOCIAL SCIENTIST
  • 5. Interactive Retrieval Model Query Interactive Retrieval System User Feeback Document Collection Metadata Generation System Information need Update Feedback and propagation to similar documents
  • 6. Interactive Retrieval Model Query Interactive Retrieval System User Feeback Document Collection Metadata Generation System Add to the document metadata that facilitates the retrieval process This metadata consist of: Statistical Topic Mixture Knowledge Extraction based on Business process (problem, cause, solution) Information need Update Feedback and propagation to similar documents
  • 7. Outline Motivation Statistical Topic Modeling - SAP & Saffron Motivation Related Work Proposed Approach Topic Modeling and Entity Association Knowledge Extraction and Reuse at Cisco Interactive Retrieval Interactive Retrieval Demo
  • 8. Topic Modeling: Motivation Given a set of documents, we want to identify the main areas or topics discussed in a unsupervised manner. We take advantage of the semantic associations between words across the documents.  If two words appear in the same document, they should be related. For each topic we have different distributions of words and each document might contain material about a variety of topics. Music notes instrument net ball racquet Sports Play net Topic 1 (80%) Sports game Topic 1 Sports Topic 2 (5%) Topic 3 (20%) Common Words ball 10/6/2011
  • 10. Our Approach The higher probability mass is accommodated in the upper part of the tree (this facilitates the truncation and reduction in the number of topics) We can define a method to determine the number of topics suitable for a particular dataset without training the model several times (each time for a given number of specified topics) … … … 0.0851 0.0660 0.0310 0.0096 0.0146 10/6/2011
  • 11. Experimental Setup The datasets are from two types: Scientific Articles (NIPS) Longer documents News Data (NYT, APW, XIE) Shorter Documents More diverse vocabulary We compare the performance of the algorithm against three approaches in the literature : LDA, CTM and Pachinko We test our model using Empirical Likelihood This method estimate how likely it is that a test document will be generated from the estimated model. We want this value to be high (better generalization and applicability to unseen documents). 10/6/2011
  • 12. Results: NYT Dataset We obtain the topic mixture for the NYT Dataset using K=20 topics . 10/6/2011 + + - - + + + +
  • 13. Results: Empirical Likelihood 10/6/2011 13 Our Model APW Dataset NIPS Dataset XIE Dataset NYT Dataset
  • 14. Results: Running Time 10/6/2011 Minutes Minutes APW Dataset NIPS Dataset Minutes Our Model Minutes XIE Dataset NYT Dataset
  • 15. Illustrative Example: NYT Dataset 10/6/2011 NORTHRIDGE TAUGHT A LESSON LOS ANGELES _ School has been out at Cal State Northridge since the week before Christmas, but since you can learn something everyday, Mississippi State's women's basketball team gave a lesson. Northridge has talked about taking its game to the next level. The 21st-ranked Bulldogs _ the first nationally ranked team to play here in Northridge's Division I era _ gave a glimpse of that level in a 98-64 nonconference victory before a crowd of 165 Friday night.
  • 16. Illustrative Example: NYT Dataset 10/6/2011 NORTHRIDGE TAUGHT A LESSON LOS ANGELES _ School has been out at Cal State Northridge since the week before Christmas, but since you can learn something everyday, Mississippi State's women's basketball team gave a lesson. Northridge has talked about taking its game to the next level. The 21st-ranked Bulldogs _ the first nationally ranked team to play here in Northridge's Division I era _ gave a glimpse of that level in a 98-64 nonconference victory before a crowd of 165 Friday night.
  • 17. Illustrative Example: NYT Dataset 10/6/2011 NORTHRIDGE TAUGHT A LESSON LOS ANGELES _ School has been out at Cal State Northridge since the week before Christmas, but since you can learn something everyday, Mississippi State's women's basketball team gave a lesson. Northridge has talked about taking its game to the next level. The 21st-ranked Bulldogs _ the first nationally ranked team to play here in Northridge's Division I era _ gave a glimpse of that level in a 98-64 nonconference victory before a crowd of 165 Friday night.
  • 18. Illustrative Example: NYT Dataset 10/6/2011 NORTHRIDGE TAUGHT A LESSON LOS ANGELES _ School has been out at Cal State Northridge since the week before Christmas, but since you can learn something everyday, Mississippi State's women's basketball team gave a lesson. Northridge has talked about taking its game to the next level. The 21st-ranked Bulldogs _ the first nationally ranked team to play here in Northridge's Division I era _ gave a glimpse of that level in a 98-64 nonconference victory before a crowd of 165 Friday night.
  • 19. Topic Modeling & Entity Association Entities SAP Business Objects Entity Extractor Saffron Associative Memory Base Base knowledge Source Query Text Data to be monitored UCSC Topic Mining System We would like to know who are the actors involved in a particular action that led to the failure of Lehman brothers Valukas Report about why Lehman Brothers Failed (6 volumes) Topics Saffron Associative Memory creates associations among entities and topics This work was presented at SAPPHIRE NOW 2010
  • 20. Outline Motivation Statistical Topic modeling - SAP & Saffron Knowledge Extraction and Reuse in Cisco Knowledge Extraction System System Architecture Domain Knowledge Improving Productivity Performance of Service Request Recommender Interactive Retrieval Interactive Retrieval Demo
  • 21. Knowledge Extraction System at Cisco Service Request Database Knowledge Database Applications such as retrieval Service Request Text Mining System Unstructured Text Knowledge Finding different solutions to the same problem Problem Cause Document 1 Document 2 Similarity Solution high Problem Problem high Cause Cause Irrelevant Content low Solution Solution Why did it occur? How was it solved? What was the problem?
  • 22. System Architecture Features from Expertise Service Request Preprocessor Bag-of-words Feature Generator Hierarchical Classifier Expertise Domain Knowledge Labeled Paragraphs Service Request Recommender User Legend Data flow of Analyzer Data flow of Recommender Data output for User
  • 23.
  • 24.
  • 25. Improving Productivity Compare the time spent by engineers in reading service requests before and after using our system. Browse a service request Time to access relevance N Relevant? Y Read and understand thoroughly Time to extract knowledge Read enough? N Y Create knowledge article
  • 26.
  • 27. Result 2: Using domain knowledge further improves retrieval results.
  • 28.
  • 29. Interactive Retrieval Model the user intent to retrieve relevant documents Identify the trade-off between Retrieval accuracy (how accurate are the results required to be by the user?) Interaction time (how much time is the user willing to spend on interaction?) Applied to Medical documents retrieval e.g., search for past patient cases with similar symptoms Resume retrieval in a labor marketplace e.g., search for Python developers who work in machine learning MORE IMPORTANT LESS IMPORTANT
  • 30. Problem 10/6/2011 28 Dynamic Programming t1 t2 t3 … tn Reinforcement Learning User Intent User Intent User Intent Set of Relevant Documents Set of Relevant Documents Set of Relevant Documents Myopic Dynamic Static Dynamic What is the best path to choose ?
  • 31. Reinforcement Learning formulation of IIR Agent IIR system Environment User Action Ranking Rk Objective Max. sum of rewards Reward Improvement v(Rk)-v(Rk-1) (as observed from user feedback) Intent Best guess for user intent or need (expressed in query terms)
  • 32. Experiments Set-Up Dataset: TREC-9 OHSUMED, 348.566 medical documents with a list of relevance judgments 65 user queries query title: 2 − 5 words query description: 5 − 10 words Interactive Sessions of 3 − 5 steps Relevance function is binary Value of results (with appropriate weights wi) Precision @10: percentage of relevant documents in the top-10 results We compare our results with Pseudo-relevance Feedback
  • 33. How many interaction steps needed? 9/19/2011
  • 34. How much feedback is needed? Experiments tested on 348,566 OHSU-MED medical dataset, TREC 2002
  • 35. Interactive Retrieval w Topic Modeling Topics help us to reduce the search They add context to the query Some important terms to describe the users’ intent may not be included in the query Topics are calculated a-priori and added to each document as metadata Topic Mixture of Relevant Docs Meta-query (combination of user inputs) Updated each time the user provides feedback (clicks) or additional information to the system (query redefinition) Topic Mixture of Non Relevant Docs Combination of terms and topic relevance scores
  • 36. Proposed Dataset We test our approach using the HARD TREC queries which consist of : 851,018 news documents from NYT APW and XIE agencies Each document has an average length of 305 terms There are 496,779 unique terms We infer the topic information of the corpus using 75 topics For testing purposes we use m=3 interactions We use test 30 queries We compare our algorithm with mixture relevance feedback 10/6/2011
  • 37. Preliminary Results 10/6/2011 Precision Number of Interactions
  • 38. Outline Motivation Statistical Topic modeling– SAP & Saffron Knowledge Extraction and Reuse at Cisco Interactive Retrieval Interactive Retrieval Demo
  • 39. Example User intent young female with fevers and increased CPK (CreatinePhosphoKinase) CPK: enzyme, may cause heart attack or severe muscle breakdown if increased neuroleptic malignant syndrome (life-threatening neurological disorder) Associated with CPK Symptoms: muscular cramps, fever, unstable blood pressure, changes in cognition, including agitation, delirium and coma differential diagnosis List symptoms List causes of the symptoms Prioritize by the most dangerous Treat treatment
  • 40. Relevant Documents Non-relevant documents: Doc 1: Significance of elevated levels of CPK in febrile diseases: a prospective study. The incidence and significance of elevated serum levels of (CPK) in febrile diseases were studied prospectively in all patients admitted with fever to a department of medicine during 1 year. Doc 2: Metoclopramide-induced neuroleptic malignant syndrome….Symptoms of NMS include rigidity, hyperpyrexia, altered consciousness, and autonomic instability. This syndrome is generally associated with neuroleptic medications used to treat psychotic and major depressive illnesses… Relevant document: Doc 3: Neuroleptic malignant syndrome: guidelines for treatment and reinstitution of neuroleptics… Cardinal symptoms include fever, muscular rigidity, an elevated serum level of creatine phosphokinase, changes in mental status, and autonomic dysfunction…
  • 41. Interactive Demo InteractiveDemo_MedicalData Sub-queries young female with fevers and increased CPK neuroleptic malignant syndrome differential diagnosis treatment

Editor's Notes

  1. - Query keywords may have different meaning for different users