SlideShare a Scribd company logo
1 of 57
Download to read offline
Hendrik Heuer

Stockholm NLP Meetup
word2vec 

From theory to practice
About me
Hendrik Heuer
hendrikheuer@gmail.com
http://hen-drik.de
@hen_drik
–J. R. Firth 1957
“You shall know a word 

by the company it keeps”
–J. R. Firth 1957
“You shall know a word 

by the company it keeps”
Quoted after Socher
Quoted after Socher
Quoted after Socher
Vectors are directions in space
Vectors can encode relationships
man is to woman as king is to ?
word2vec
• by Mikolov, Sutskever, Chen, 

Corrado and Dean at Google
• NAACL 2013
• takes a text corpus as input 

and produces the 

word vectors as output
Sweden
Similar words
Harvard
Similar words
word2vec
• word meaning and relationships
between words are encoded spatially
• two main learning algorithms in
word2vec: 

continuous bag-of-words and 

continuous skip-gram
Goal
continuous
bag-of-words
• predicting the current
word based on the
context
• order of words in the
history does not
influence the
projection
• faster & more
appropriate for
larger corpora
Mikolov et al.
continuous
skip-gram
• maximize
classification of a
word based on
another word in the
same sentence
• better word vectors
for frequent words,
but slower to train
Mikolov et al.
Why it is awesome
• there is a fast open-source implementation
• can be used as features 

for natural language processing tasks and
machine learning algorithms
Machine Translation
Quoted after Mikolov
Sentiment Analysis
Quoted after SocherRecursive Neural Tensor Network
Image Descriptions
Quoted after Vinyals
Using word2vec
• Original: http://word2vec.googlecode.com/svn/trunk/
• C++11 version: https://github.com/jdeng/word2vec
• Python: http://radimrehurek.com/gensim/models/
word2vec.html
• Java: https://github.com/ansjsun/word2vec_java
• Parallel java: https://github.com/siegfang/word2vec
• CUDAversion: https://github.com/whatupbiatch/cuda-word2vec
Quoted after Wang
Using it in Python
Usage
Training a model
Quoted after Řehůřek
Training a model with iterator
Quoted after Řehůřek
Doing it in C
• Download the code: 

git clone https://github.com/h10r/word2vec-macosx-maverics.git
!
• Run 'make' to compile word2vec tool
!
• Run the demo scripts: 



./demo-word.sh 



and 



./demo-phrases.sh
Doing it in C
• Download the code: 

git clone https://github.com/h10r/word2vec-macosx-maverics.git
!
• Run 'make' to compile word2vec tool
!
• Run the demo scripts: 



./demo-word.sh 



and 



./demo-phrases.sh
Testing a model
Quoted after Řehůřek
word2vec t-SNE JSON
1. Find Word
Embeddings
2. Dimensionality
Reduction
3. Output
word2vec t-SNE JSON
1. Find Word
Embeddings
2. Dimensionality
Reduction
3. Output
word2vec t-SNE JSON
1. Find Word
Embeddings
2. Dimensionality
Reduction
3. Output
Should be in Macports
py27-scikit-learn @0.15.2 (python, science)
word2vec t-SNE JSON
1. Find Word
Embeddings
2. Dimensionality
Reduction
3. Output
word2vec 

From theory to practice
Hendrik Heuer

Stockholm NLP Meetup
!
Discussion:
Can anybody here think of ways
this might help her or him?
Further Reading
• Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey
Dean. Efficient Estimation of Word Representations in
Vector Space. In Proceedings of Workshop at ICLR, 2013.
• Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado,
and Jeffrey Dean. Distributed Representations of Words
and Phrases and their Compositionality. In Proceedings of
NIPS, 2013.
• Tomas Mikolov, Wen-tau Yih,and Geoffrey Zweig.
Linguistic Regularities in Continuous Space Word
Representations. In Proceedings of NAACL HLT, 2013.
Further Reading
• Richard Socher - Deep Learning for NLP (without Magic)

http://lxmls.it.pt/2014/?page_id=5
• Wang - Introduction to Word2vec and its application to find
predominant word senses

http://compling.hss.ntu.edu.sg/courses/hg7017/pdf/
word2vec%20and%20its%20application%20to%20wsd.pdf
• Exploiting Similarities among Languages for Machine
Translation, Tomas Mikolov, Quoc V. Le, Ilya Sutskever,
http://arxiv.org/abs/1309.4168
• Title Image by Hans Arp
Further Coding
• word2vec

https://code.google.com/p/word2vec/
• word2vec for MacOSX Maverics

https://github.com/h10r/word2vec-macosx-maverics
• Gensim Python Library

https://radimrehurek.com/gensim/index.html
• Gensim Tutorials

https://radimrehurek.com/gensim/tutorial.html
• Scikit-Learn TSNE

http://scikit-learn.org/stable/modules/generated/
sklearn.manifold.TSNE.html
Hendrik Heuer

Stockholm NLP Meetup
word2vec 

From theory to practice

More Related Content

What's hot

What's hot (20)

A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information Retrieval
 
Word embedding
Word embedding Word embedding
Word embedding
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 
Word Embeddings - Introduction
Word Embeddings - IntroductionWord Embeddings - Introduction
Word Embeddings - Introduction
 
Word2vec slide(lab seminar)
Word2vec slide(lab seminar)Word2vec slide(lab seminar)
Word2vec slide(lab seminar)
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
Word embeddings
Word embeddingsWord embeddings
Word embeddings
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Text classification using Text kernels
Text classification using Text kernelsText classification using Text kernels
Text classification using Text kernels
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 
Word Embeddings, why the hype ?
Word Embeddings, why the hype ? Word Embeddings, why the hype ?
Word Embeddings, why the hype ?
 
Wordnet
WordnetWordnet
Wordnet
 
Wordnet Introduction
Wordnet IntroductionWordnet Introduction
Wordnet Introduction
 
Nlp ambiguity presentation
Nlp ambiguity presentationNlp ambiguity presentation
Nlp ambiguity presentation
 
A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)A Review of Deep Contextualized Word Representations (Peters+, 2018)
A Review of Deep Contextualized Word Representations (Peters+, 2018)
 
BERT
BERTBERT
BERT
 
GPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask LearnersGPT-2: Language Models are Unsupervised Multitask Learners
GPT-2: Language Models are Unsupervised Multitask Learners
 
Sequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learningSequence to sequence (encoder-decoder) learning
Sequence to sequence (encoder-decoder) learning
 
Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 

Viewers also liked

Document Summarization
Document SummarizationDocument Summarization
Document Summarization
Pratik Kumar
 
Text summarization
Text summarizationText summarization
Text summarization
kareemhashem
 
Statistical Semantic入門 ~分布仮説からword2vecまで~
Statistical Semantic入門 ~分布仮説からword2vecまで~Statistical Semantic入門 ~分布仮説からword2vecまで~
Statistical Semantic入門 ~分布仮説からword2vecまで~
Yuya Unno
 
Automatic Text Summarization
Automatic Text SummarizationAutomatic Text Summarization
Automatic Text Summarization
HimanshuPu
 

Viewers also liked (20)

Drawing word2vec
Drawing word2vecDrawing word2vec
Drawing word2vec
 
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
Word2Vec: Learning of word representations in a vector space - Di Mitri & Her...
 
Representation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and PhrasesRepresentation Learning of Vectors of Words and Phrases
Representation Learning of Vectors of Words and Phrases
 
Textrank algorithm
Textrank algorithmTextrank algorithm
Textrank algorithm
 
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vecword2vec, LDA, and introducing a new hybrid algorithm: lda2vec
word2vec, LDA, and introducing a new hybrid algorithm: lda2vec
 
Document Summarization
Document SummarizationDocument Summarization
Document Summarization
 
Word2vec 4 all
Word2vec 4 allWord2vec 4 all
Word2vec 4 all
 
Word representations in vector space
Word representations in vector spaceWord representations in vector space
Word representations in vector space
 
Text summarization
Text summarizationText summarization
Text summarization
 
Word2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensimWord2vec: From intuition to practice using gensim
Word2vec: From intuition to practice using gensim
 
Statistical Semantic入門 ~分布仮説からword2vecまで~
Statistical Semantic入門 ~分布仮説からword2vecまで~Statistical Semantic入門 ~分布仮説からword2vecまで~
Statistical Semantic入門 ~分布仮説からword2vecまで~
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 
Hacking Human Language (PyData London)
Hacking Human Language (PyData London)Hacking Human Language (PyData London)
Hacking Human Language (PyData London)
 
Sentiment Polarity Analysis for Generating Search Result Snippets based on Pa...
Sentiment Polarity Analysis for Generating Search Result Snippets based on Pa...Sentiment Polarity Analysis for Generating Search Result Snippets based on Pa...
Sentiment Polarity Analysis for Generating Search Result Snippets based on Pa...
 
Word embeddings et leurs applications (Meetup TDS, 2016-06-30)
Word embeddings et leurs applications (Meetup TDS, 2016-06-30)Word embeddings et leurs applications (Meetup TDS, 2016-06-30)
Word embeddings et leurs applications (Meetup TDS, 2016-06-30)
 
IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents IRE Semantic Annotation of Documents
IRE Semantic Annotation of Documents
 
Text mining, word embeddings, & wikipedia
Text mining, word embeddings, & wikipediaText mining, word embeddings, & wikipedia
Text mining, word embeddings, & wikipedia
 
Deep learning - Chatbot
Deep learning - ChatbotDeep learning - Chatbot
Deep learning - Chatbot
 
Automatic Text Summarization
Automatic Text SummarizationAutomatic Text Summarization
Automatic Text Summarization
 
Word embeddings as a service - PyData NYC 2015
Word embeddings as a service -  PyData NYC 2015Word embeddings as a service -  PyData NYC 2015
Word embeddings as a service - PyData NYC 2015
 

Similar to word2vec - From theory to practice

Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Lucidworks
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Lucidworks
 

Similar to word2vec - From theory to practice (20)

Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdfWord2Vec model to generate synonyms on the fly in Apache Lucene.pdf
Word2Vec model to generate synonyms on the fly in Apache Lucene.pdf
 
Subword tokenizers
Subword tokenizersSubword tokenizers
Subword tokenizers
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
 
Applying Data Science to Move Beyond Keywords for Social Analysis
Applying Data Science to Move Beyond Keywords for Social Analysis Applying Data Science to Move Beyond Keywords for Social Analysis
Applying Data Science to Move Beyond Keywords for Social Analysis
 
Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...Code as Data workshop: Using source{d} Engine to extract insights from git re...
Code as Data workshop: Using source{d} Engine to extract insights from git re...
 
Devops: A History
Devops: A HistoryDevops: A History
Devops: A History
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
 
StackEngine Problem Space Demo
StackEngine Problem Space DemoStackEngine Problem Space Demo
StackEngine Problem Space Demo
 
Multi modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed modelsMulti modal retrieval and generation with deep distributed models
Multi modal retrieval and generation with deep distributed models
 
Deep Learning for Information Retrieval
Deep Learning for Information RetrievalDeep Learning for Information Retrieval
Deep Learning for Information Retrieval
 
Jwt == insecurity?
Jwt == insecurity?Jwt == insecurity?
Jwt == insecurity?
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
Andrey Kutuzov and Elizaveta Kuzmenko - WebVectors: Toolkit for Building Web...
Andrey Kutuzov and  Elizaveta Kuzmenko - WebVectors: Toolkit for Building Web...Andrey Kutuzov and  Elizaveta Kuzmenko - WebVectors: Toolkit for Building Web...
Andrey Kutuzov and Elizaveta Kuzmenko - WebVectors: Toolkit for Building Web...
 
NLP with H2O
NLP with H2ONLP with H2O
NLP with H2O
 
Open Source Design at Ignite lightning talk
Open Source Design at Ignite lightning talkOpen Source Design at Ignite lightning talk
Open Source Design at Ignite lightning talk
 

More from hen_drik (9)

ifib Lunchbag: CHI2018 Highlights - Algorithms in (Social) Practice and more
ifib Lunchbag: CHI2018 Highlights - Algorithms in (Social) Practice and moreifib Lunchbag: CHI2018 Highlights - Algorithms in (Social) Practice and more
ifib Lunchbag: CHI2018 Highlights - Algorithms in (Social) Practice and more
 
Game Design - Black and White
Game Design - Black and WhiteGame Design - Black and White
Game Design - Black and White
 
Hacking Human Language (PyCon Sweden 2015)
Hacking Human Language (PyCon Sweden 2015)Hacking Human Language (PyCon Sweden 2015)
Hacking Human Language (PyCon Sweden 2015)
 
Actuated workbench
Actuated workbenchActuated workbench
Actuated workbench
 
Sport
SportSport
Sport
 
Ruby on Rails - Kurzvortrag
Ruby on Rails - KurzvortragRuby on Rails - Kurzvortrag
Ruby on Rails - Kurzvortrag
 
Jugendmedienschutz
JugendmedienschutzJugendmedienschutz
Jugendmedienschutz
 
Farbformprozess
FarbformprozessFarbformprozess
Farbformprozess
 
Defining video games
Defining video gamesDefining video games
Defining video games
 

Recently uploaded

No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
Sheetaleventcompany
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
raffaeleoman
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
Kayode Fayemi
 

Recently uploaded (20)

Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
George Lever - eCommerce Day Chile 2024
George Lever -  eCommerce Day Chile 2024George Lever -  eCommerce Day Chile 2024
George Lever - eCommerce Day Chile 2024
 
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, YardstickSaaStr Workshop Wednesday w/ Lucas Price, Yardstick
SaaStr Workshop Wednesday w/ Lucas Price, Yardstick
 
Mathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptxMathematics of Finance Presentation.pptx
Mathematics of Finance Presentation.pptx
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
No Advance 8868886958 Chandigarh Call Girls , Indian Call Girls For Full Nigh...
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
Air breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animalsAir breathing and respiratory adaptations in diver animals
Air breathing and respiratory adaptations in diver animals
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night EnjoyCall Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
Call Girl Number in Khar Mumbai📲 9892124323 💞 Full Night Enjoy
 
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
Andrés Ramírez Gossler, Facundo Schinnea - eCommerce Day Chile 2024
 
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
Night 7k Call Girls Noida Sector 128 Call Me: 8448380779
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
Microsoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AIMicrosoft Copilot AI for Everyone - created by AI
Microsoft Copilot AI for Everyone - created by AI
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 

word2vec - From theory to practice

  • 1. Hendrik Heuer Stockholm NLP Meetup word2vec 
 From theory to practice
  • 3. –J. R. Firth 1957 “You shall know a word 
 by the company it keeps”
  • 4. –J. R. Firth 1957 “You shall know a word 
 by the company it keeps” Quoted after Socher
  • 7.
  • 8. Vectors are directions in space Vectors can encode relationships
  • 9. man is to woman as king is to ?
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. word2vec • by Mikolov, Sutskever, Chen, 
 Corrado and Dean at Google • NAACL 2013 • takes a text corpus as input 
 and produces the 
 word vectors as output
  • 17. word2vec • word meaning and relationships between words are encoded spatially • two main learning algorithms in word2vec: 
 continuous bag-of-words and 
 continuous skip-gram
  • 18. Goal
  • 19. continuous bag-of-words • predicting the current word based on the context • order of words in the history does not influence the projection • faster & more appropriate for larger corpora Mikolov et al.
  • 20. continuous skip-gram • maximize classification of a word based on another word in the same sentence • better word vectors for frequent words, but slower to train Mikolov et al.
  • 21. Why it is awesome • there is a fast open-source implementation • can be used as features 
 for natural language processing tasks and machine learning algorithms
  • 23. Sentiment Analysis Quoted after SocherRecursive Neural Tensor Network
  • 25. Using word2vec • Original: http://word2vec.googlecode.com/svn/trunk/ • C++11 version: https://github.com/jdeng/word2vec • Python: http://radimrehurek.com/gensim/models/ word2vec.html • Java: https://github.com/ansjsun/word2vec_java • Parallel java: https://github.com/siegfang/word2vec • CUDAversion: https://github.com/whatupbiatch/cuda-word2vec Quoted after Wang
  • 26. Using it in Python
  • 27.
  • 28. Usage
  • 29. Training a model Quoted after Řehůřek
  • 30. Training a model with iterator Quoted after Řehůřek
  • 31. Doing it in C • Download the code: 
 git clone https://github.com/h10r/word2vec-macosx-maverics.git ! • Run 'make' to compile word2vec tool ! • Run the demo scripts: 
 
 ./demo-word.sh 
 
 and 
 
 ./demo-phrases.sh
  • 32. Doing it in C • Download the code: 
 git clone https://github.com/h10r/word2vec-macosx-maverics.git ! • Run 'make' to compile word2vec tool ! • Run the demo scripts: 
 
 ./demo-word.sh 
 
 and 
 
 ./demo-phrases.sh
  • 33.
  • 34. Testing a model Quoted after Řehůřek
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40. word2vec t-SNE JSON 1. Find Word Embeddings 2. Dimensionality Reduction 3. Output
  • 41. word2vec t-SNE JSON 1. Find Word Embeddings 2. Dimensionality Reduction 3. Output
  • 42.
  • 43.
  • 44. word2vec t-SNE JSON 1. Find Word Embeddings 2. Dimensionality Reduction 3. Output
  • 45.
  • 46. Should be in Macports py27-scikit-learn @0.15.2 (python, science)
  • 47.
  • 48. word2vec t-SNE JSON 1. Find Word Embeddings 2. Dimensionality Reduction 3. Output
  • 49.
  • 50.
  • 51.
  • 52.
  • 53. word2vec 
 From theory to practice Hendrik Heuer Stockholm NLP Meetup ! Discussion: Can anybody here think of ways this might help her or him?
  • 54. Further Reading • Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. In Proceedings of Workshop at ICLR, 2013. • Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of NIPS, 2013. • Tomas Mikolov, Wen-tau Yih,and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. In Proceedings of NAACL HLT, 2013.
  • 55. Further Reading • Richard Socher - Deep Learning for NLP (without Magic)
 http://lxmls.it.pt/2014/?page_id=5 • Wang - Introduction to Word2vec and its application to find predominant word senses
 http://compling.hss.ntu.edu.sg/courses/hg7017/pdf/ word2vec%20and%20its%20application%20to%20wsd.pdf • Exploiting Similarities among Languages for Machine Translation, Tomas Mikolov, Quoc V. Le, Ilya Sutskever, http://arxiv.org/abs/1309.4168 • Title Image by Hans Arp
  • 56. Further Coding • word2vec
 https://code.google.com/p/word2vec/ • word2vec for MacOSX Maverics
 https://github.com/h10r/word2vec-macosx-maverics • Gensim Python Library
 https://radimrehurek.com/gensim/index.html • Gensim Tutorials
 https://radimrehurek.com/gensim/tutorial.html • Scikit-Learn TSNE
 http://scikit-learn.org/stable/modules/generated/ sklearn.manifold.TSNE.html
  • 57. Hendrik Heuer Stockholm NLP Meetup word2vec 
 From theory to practice