SlideShare a Scribd company logo
1 of 28
Latent Semantic
Analysis
Auro Tripathy
ipserv@yahoo.com
Outline

   Introduction
   Singular Value Decomposition
   Dimensionality Reduction
   LSA in Information Retrieval
Latent Semantic Analysis




         Introduction
Mathematical treatment capable
of inferring meaning
   Measures of word-word, word-passage,
    & passage-passage relations that
    correlate well with human
    understanding of semantic similarity
   Similarity estimates are NOT based on
    contiguity frequencies, co-occurrence
    counts, or usage correlations
   Mathematical way capable of inferring
    deeper relationships; hence “latent”
Akin to a well-read nun dispensing
sex-advice

   Analysis of text alone
   Its knowledge does NOT come from
    perceived information about the physical
    world, NOT from instinct, NOT from
    feelings, NOT from emotions
   Does NOT take into account word-order,
    phrases, syntactic relationships, logic,
   It takes in large amounts of text and looks
    for mutual interdependencies in the text
Words and Passages
   LSA represents the meaning of a word as the
    average of the meaning of all the passages in
    which it appears…
   …and the meaning of the passage as an
    average of the meaning of the words it
    contains


      word1
      word2
      word3
What is LSA?
   LSA is a mathematical technique for
    extracting and inferring relations of
    expected contextual usage of words in
    documents
What LSA is not
   Not a natural language processing
    program
   Not an artificial intelligence program
   Does NOT use dictionaries or databases
   Does NOT use syntactic parsers
   Does not use morphologies
Takes as input – words and text
 paragraphs
Example
   Titles of N=9 technical memoranda
       Five on human-computer interaction
       Four on mathematical graph theory
       Disjoint topics




                Source: An Introduction to Latent Semantic Analysis, Landauer, Foltz, Laham
Sample Word-by-Document Matrix
    Word selection criteria – occurs in at least two of the
     titles
How much was said about a topic




                     Source: An Introduction to Latent Semantic Analysis, Landauer, Foltz, Laham
Semantic Similarity
 using Spearman rank coefficient
 correlation
    The correlation between human and user is
     negative, -0.38
    The correlation between human and minor is
     also negative, -0.29
    Expected; words never in the same
     passage, no co-occurrences
     Spearman ρ (human.user) = -0.38


     Spearman ρ (human.minor) = -0.29

http://en.wikipedia.org/wiki/Spearman's_rank_correlation_coefficient
Singular Value Decomposition
The Term Space
        Documents
Terms




                    Source: Latent Semantic Indexing and Information Retrieval, Johanna Geiß
The Document Space
        Documents
Terms




                    Source: Latent Semantic Indexing and Information Retrieval, Johanna Geiß
The Semantic Space
one space for terms and documents

    Represent terms AND documents in one
     space
    Makes it possible to calculate similarities
        Between documents
        Between terms
        Between terms and documents
The Decomposition

Term1
Term2
Term3                                           S               DT


                M                 T
             Term-by-                          rxr             rxd
            document
              matrix


               txd               txr

     Splits the term-document matrix into three matrices
     New space, the SVD space
           because new axes were found by SVD along which the terms
            and documents can be grouped
New Term Vector, New Document
Vector, & Singular Values

      T contains in its rows the term vectors
       scaled to a new basis
      DT contains the new vectors of the
       documents
      S contains the singular values
          σ1,σ2, …. σn
          Where, σ1 ≥ σ2 ≥ …. ≥ σn ≥ 0
Dimensionality Reduction




To reveal the latent semantic structure
Reduce to k Dimensions

Term1
Term2                      S    DT
Term3

            M        T
         Term-by-         kxk   rxk
        document
          matrix


          txd       txk
Example
Term Vector Reduced to two Dimensions
                                                                T




                                                                                       S




                                                                                            D




              Source: An Introduction to Latent Semantic Analysis, Landauer, Foltz, Laham
Reconstruction of the original matrix
based on the reduced dimensions



NEW




                     Original




  Source: An Introduction to Latent Semantic Analysis, Landauer, Foltz, Laham
Recomputed Semantic Similarity
        using Spearman rank coefficient
        correlation
                               Spearman ρ (human.user) = +0.94
             NEW
                              Spearman ρ (human.minor) = -0.83




                               Spearman ρ (human.user) = -0.38
             Original
                              Spearman ρ (human.minor) = -0.29


Humans-user correlation went up and the human-minor correlation went down
Correlation between a title and all
     other titles – Raw Data



•Correlation between the human-computer interaction titles was low
•Average correlations, 0.2; half the Spearman correlations were 0

•Correlation between the four graph-theory papers (mx / my) was mixed
•Average Spearman correlation was 0.44, 0.

•Correlation between human-computer interaction titles and the
graph-theory papers was -0.3, despite no semantic overlap
                       Source: An Introduction to Latent Semantic Analysis, Landauer, Foltz, Laham
Correlation in the reduced
     dimension (k=2) space



•Average correlations jumped from 0.2 to 0.92


•Correlation between the graph-theory papers (mx/my) was HIGH;1.0

•Correlation between human-computer interaction titles and the
graph-theory papers was strongly negative
                       Source: An Introduction to Latent Semantic Analysis, Landauer, Foltz, Laham
LSA in Information Retrieval
How to treat a query
   Matrix of term-by-document
   Perform SVD, reduce dimensions to 50-400
   A query is a “pseudo-document”
       Weighted average of the vector of the words it
        contains
   Use a similarity metric (such as cosine)
    between the query vector and the document-
    to-document vectors
   Rank the results
The Query Vector




   Does better that literal matches between terms in
    query documents
   Superior when query and document use different
    words         Source: Latent Semantic Indexing and Information Retrieval, Johanna Geiß
References
• Latent Semantic Indexing and
  Information Retrieval, Johanna Geiß
• An Introduction to Latent Semantic
  Analysis, Landauer, Foltz, Laham

More Related Content

What's hot

Boolean Retrieval
Boolean RetrievalBoolean Retrieval
Boolean Retrievalmghgk
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text ClassificationSai Srinivas Kotni
 
Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation
Ranking Objects by Exploiting Relationships: Computing Top-K over AggregationRanking Objects by Exploiting Relationships: Computing Top-K over Aggregation
Ranking Objects by Exploiting Relationships: Computing Top-K over AggregationJason Yang
 
The vector space model
The vector space modelThe vector space model
The vector space modelpkgosh
 
A Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text SummarizationA Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text SummarizationIJERD Editor
 
Scalable Discovery Of Hidden Emails From Large Folders
Scalable Discovery Of Hidden Emails From Large FoldersScalable Discovery Of Hidden Emails From Large Folders
Scalable Discovery Of Hidden Emails From Large Foldersfeiwin
 
Development of learned dictionary based spoken language
Development of learned dictionary based spoken languageDevelopment of learned dictionary based spoken language
Development of learned dictionary based spoken languagePallavi Bharti
 
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET Journal
 
Question Answering with Lydia
Question Answering with LydiaQuestion Answering with Lydia
Question Answering with LydiaJae Hong Kil
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Sean Golliher
 
Data Integration Ontology Mapping
Data Integration Ontology MappingData Integration Ontology Mapping
Data Integration Ontology MappingPradeep B Pillai
 
Spell checker using Natural language processing
Spell checker using Natural language processing Spell checker using Natural language processing
Spell checker using Natural language processing Sandeep Wakchaure
 
Ontology mapping for the semantic web
Ontology mapping for the semantic webOntology mapping for the semantic web
Ontology mapping for the semantic webWorawith Sangkatip
 
Inverted files for text search engines
Inverted files for text search enginesInverted files for text search engines
Inverted files for text search enginesunyil96
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational SemanticsMarina Santini
 

What's hot (20)

Boolean Retrieval
Boolean RetrievalBoolean Retrieval
Boolean Retrieval
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
[ppt]
[ppt][ppt]
[ppt]
 
Presentation on Text Classification
Presentation on Text ClassificationPresentation on Text Classification
Presentation on Text Classification
 
Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation
Ranking Objects by Exploiting Relationships: Computing Top-K over AggregationRanking Objects by Exploiting Relationships: Computing Top-K over Aggregation
Ranking Objects by Exploiting Relationships: Computing Top-K over Aggregation
 
The vector space model
The vector space modelThe vector space model
The vector space model
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
A Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text SummarizationA Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text Summarization
 
Scalable Discovery Of Hidden Emails From Large Folders
Scalable Discovery Of Hidden Emails From Large FoldersScalable Discovery Of Hidden Emails From Large Folders
Scalable Discovery Of Hidden Emails From Large Folders
 
Development of learned dictionary based spoken language
Development of learned dictionary based spoken languageDevelopment of learned dictionary based spoken language
Development of learned dictionary based spoken language
 
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
IRJET- Automatic Language Identification using Hybrid Approach and Classifica...
 
Question Answering with Lydia
Question Answering with LydiaQuestion Answering with Lydia
Question Answering with Lydia
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
 
Data Integration Ontology Mapping
Data Integration Ontology MappingData Integration Ontology Mapping
Data Integration Ontology Mapping
 
Spell checker using Natural language processing
Spell checker using Natural language processing Spell checker using Natural language processing
Spell checker using Natural language processing
 
Ontology mapping for the semantic web
Ontology mapping for the semantic webOntology mapping for the semantic web
Ontology mapping for the semantic web
 
Topics Modeling
Topics ModelingTopics Modeling
Topics Modeling
 
Inverted files for text search engines
Inverted files for text search enginesInverted files for text search engines
Inverted files for text search engines
 
AINL 2016: Kravchenko
AINL 2016: KravchenkoAINL 2016: Kravchenko
AINL 2016: Kravchenko
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational Semantics
 

Similar to Latent Semanctic Analysis Auro Tripathy

Meaningful Interaction Analysis
Meaningful Interaction AnalysisMeaningful Interaction Analysis
Meaningful Interaction Analysisfridolin.wild
 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learningfridolin.wild
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Innovation Quotient Pvt Ltd
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilaritySaswat Padhi
 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networksconnectbeubax
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Chat bot using text similarity approach
Chat bot using text similarity approachChat bot using text similarity approach
Chat bot using text similarity approachdinesh_joshy
 
Topic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam FilterTopic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam FilterSudarsun Santhiappan
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONijnlc
 
Learning to summarize using coherence
Learning to summarize using coherenceLearning to summarize using coherence
Learning to summarize using coherenceContent Savvy
 
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffnL6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffnRwanEnan
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
 

Similar to Latent Semanctic Analysis Auro Tripathy (20)

Meaningful Interaction Analysis
Meaningful Interaction AnalysisMeaningful Interaction Analysis
Meaningful Interaction Analysis
 
The Geometry of Learning
The Geometry of LearningThe Geometry of Learning
The Geometry of Learning
 
Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...Usage of word sense disambiguation in concept identification in ontology cons...
Usage of word sense disambiguation in concept identification in ontology cons...
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity
 
Designing, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural NetworksDesigning, Visualizing and Understanding Deep Neural Networks
Designing, Visualizing and Understanding Deep Neural Networks
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Chat bot using text similarity approach
Chat bot using text similarity approachChat bot using text similarity approach
Chat bot using text similarity approach
 
LSA and PLSA
LSA and PLSALSA and PLSA
LSA and PLSA
 
Topic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam FilterTopic Models Based Personalized Spam Filter
Topic Models Based Personalized Spam Filter
 
L0261075078
L0261075078L0261075078
L0261075078
 
L0261075078
L0261075078L0261075078
L0261075078
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATIONAN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
AN EMPIRICAL STUDY OF WORD SENSE DISAMBIGUATION
 
Learning to summarize using coherence
Learning to summarize using coherenceLearning to summarize using coherence
Learning to summarize using coherence
 
P13 corley
P13 corleyP13 corley
P13 corley
 
Distributional semantics
Distributional semanticsDistributional semantics
Distributional semantics
 
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffnL6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
L6.pptxsdv dfbdfjftj hgjythgfvfhjyggunghb fghtffn
 
Marcu 2000 presentation
Marcu 2000 presentationMarcu 2000 presentation
Marcu 2000 presentation
 
5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval5 Lessons Learned from Designing Neural Models for Information Retrieval
5 Lessons Learned from Designing Neural Models for Information Retrieval
 
Feb20 mayo-webinar-21feb2012
Feb20 mayo-webinar-21feb2012Feb20 mayo-webinar-21feb2012
Feb20 mayo-webinar-21feb2012
 

More from Auro Tripathy

Auro tripathy - Localizing with CNNs
Auro tripathy -  Localizing with CNNsAuro tripathy -  Localizing with CNNs
Auro tripathy - Localizing with CNNsAuro Tripathy
 
Back-propagation Primer
Back-propagation PrimerBack-propagation Primer
Back-propagation PrimerAuro Tripathy
 
Of knights-and-drawbridges-nat-behaviour
Of knights-and-drawbridges-nat-behaviourOf knights-and-drawbridges-nat-behaviour
Of knights-and-drawbridges-nat-behaviourAuro Tripathy
 
A Random Forest Approach To Skin Detection With R
A Random Forest Approach To Skin Detection With RA Random Forest Approach To Skin Detection With R
A Random Forest Approach To Skin Detection With RAuro Tripathy
 

More from Auro Tripathy (6)

Auro tripathy - Localizing with CNNs
Auro tripathy -  Localizing with CNNsAuro tripathy -  Localizing with CNNs
Auro tripathy - Localizing with CNNs
 
GoogLeNet Insights
GoogLeNet InsightsGoogLeNet Insights
GoogLeNet Insights
 
Back-propagation Primer
Back-propagation PrimerBack-propagation Primer
Back-propagation Primer
 
Of knights-and-drawbridges-nat-behaviour
Of knights-and-drawbridges-nat-behaviourOf knights-and-drawbridges-nat-behaviour
Of knights-and-drawbridges-nat-behaviour
 
A Random Forest Approach To Skin Detection With R
A Random Forest Approach To Skin Detection With RA Random Forest Approach To Skin Detection With R
A Random Forest Approach To Skin Detection With R
 
HTTP Live Streaming
HTTP Live StreamingHTTP Live Streaming
HTTP Live Streaming
 

Latent Semanctic Analysis Auro Tripathy

  • 2. Outline  Introduction  Singular Value Decomposition  Dimensionality Reduction  LSA in Information Retrieval
  • 4. Mathematical treatment capable of inferring meaning  Measures of word-word, word-passage, & passage-passage relations that correlate well with human understanding of semantic similarity  Similarity estimates are NOT based on contiguity frequencies, co-occurrence counts, or usage correlations  Mathematical way capable of inferring deeper relationships; hence “latent”
  • 5. Akin to a well-read nun dispensing sex-advice  Analysis of text alone  Its knowledge does NOT come from perceived information about the physical world, NOT from instinct, NOT from feelings, NOT from emotions  Does NOT take into account word-order, phrases, syntactic relationships, logic,  It takes in large amounts of text and looks for mutual interdependencies in the text
  • 6. Words and Passages  LSA represents the meaning of a word as the average of the meaning of all the passages in which it appears…  …and the meaning of the passage as an average of the meaning of the words it contains word1 word2 word3
  • 7. What is LSA?  LSA is a mathematical technique for extracting and inferring relations of expected contextual usage of words in documents
  • 8. What LSA is not  Not a natural language processing program  Not an artificial intelligence program  Does NOT use dictionaries or databases  Does NOT use syntactic parsers  Does not use morphologies Takes as input – words and text paragraphs
  • 9. Example  Titles of N=9 technical memoranda  Five on human-computer interaction  Four on mathematical graph theory  Disjoint topics Source: An Introduction to Latent Semantic Analysis, Landauer, Foltz, Laham
  • 10. Sample Word-by-Document Matrix  Word selection criteria – occurs in at least two of the titles How much was said about a topic Source: An Introduction to Latent Semantic Analysis, Landauer, Foltz, Laham
  • 11. Semantic Similarity using Spearman rank coefficient correlation  The correlation between human and user is negative, -0.38  The correlation between human and minor is also negative, -0.29  Expected; words never in the same passage, no co-occurrences Spearman ρ (human.user) = -0.38 Spearman ρ (human.minor) = -0.29 http://en.wikipedia.org/wiki/Spearman's_rank_correlation_coefficient
  • 13. The Term Space Documents Terms Source: Latent Semantic Indexing and Information Retrieval, Johanna Geiß
  • 14. The Document Space Documents Terms Source: Latent Semantic Indexing and Information Retrieval, Johanna Geiß
  • 15. The Semantic Space one space for terms and documents  Represent terms AND documents in one space  Makes it possible to calculate similarities  Between documents  Between terms  Between terms and documents
  • 16. The Decomposition Term1 Term2 Term3 S DT M T Term-by- rxr rxd document matrix txd txr  Splits the term-document matrix into three matrices  New space, the SVD space  because new axes were found by SVD along which the terms and documents can be grouped
  • 17. New Term Vector, New Document Vector, & Singular Values  T contains in its rows the term vectors scaled to a new basis  DT contains the new vectors of the documents  S contains the singular values  σ1,σ2, …. σn  Where, σ1 ≥ σ2 ≥ …. ≥ σn ≥ 0
  • 18. Dimensionality Reduction To reveal the latent semantic structure
  • 19. Reduce to k Dimensions Term1 Term2 S DT Term3 M T Term-by- kxk rxk document matrix txd txk
  • 20. Example Term Vector Reduced to two Dimensions T S D Source: An Introduction to Latent Semantic Analysis, Landauer, Foltz, Laham
  • 21. Reconstruction of the original matrix based on the reduced dimensions NEW Original Source: An Introduction to Latent Semantic Analysis, Landauer, Foltz, Laham
  • 22. Recomputed Semantic Similarity using Spearman rank coefficient correlation Spearman ρ (human.user) = +0.94 NEW Spearman ρ (human.minor) = -0.83 Spearman ρ (human.user) = -0.38 Original Spearman ρ (human.minor) = -0.29 Humans-user correlation went up and the human-minor correlation went down
  • 23. Correlation between a title and all other titles – Raw Data •Correlation between the human-computer interaction titles was low •Average correlations, 0.2; half the Spearman correlations were 0 •Correlation between the four graph-theory papers (mx / my) was mixed •Average Spearman correlation was 0.44, 0. •Correlation between human-computer interaction titles and the graph-theory papers was -0.3, despite no semantic overlap Source: An Introduction to Latent Semantic Analysis, Landauer, Foltz, Laham
  • 24. Correlation in the reduced dimension (k=2) space •Average correlations jumped from 0.2 to 0.92 •Correlation between the graph-theory papers (mx/my) was HIGH;1.0 •Correlation between human-computer interaction titles and the graph-theory papers was strongly negative Source: An Introduction to Latent Semantic Analysis, Landauer, Foltz, Laham
  • 25. LSA in Information Retrieval
  • 26. How to treat a query  Matrix of term-by-document  Perform SVD, reduce dimensions to 50-400  A query is a “pseudo-document”  Weighted average of the vector of the words it contains  Use a similarity metric (such as cosine) between the query vector and the document- to-document vectors  Rank the results
  • 27. The Query Vector  Does better that literal matches between terms in query documents  Superior when query and document use different words Source: Latent Semantic Indexing and Information Retrieval, Johanna Geiß
  • 28. References • Latent Semantic Indexing and Information Retrieval, Johanna Geiß • An Introduction to Latent Semantic Analysis, Landauer, Foltz, Laham