TF*IDF and the Evolution of Topic Models

TF*IDF
Introduction to Topic
Modeling for SEO
24 Hours of SEO
Prepared for
January 25, 2018
Nick Eubanks
Presented by
@nick_eubanks

Hi, I’m Nick Eubanks
❏ First business (painting) at 19 years old. $280k in revenue
during summer of 2003.
❏ Launch first digital company in 2008 (atomni), built CMS for
rapid deployment of Microsites. Sold.
❏ Build custom CMS for reviews in Japan. Angel funded. Sold.
❏ Built and sold lead-gen websites in medical, legal, and SEO.
❏ Joined TrafficSafetyStore.com in May 2012. Grew from
~$3M to well north of ~$20 Million
❏ Currently split time between NK Tech and IFTF.
Nick Eubanks
Founder and CEO, I’m From The Future
Founder, TrafficThinkTank.com
Executive Director, NK Tech
Owner, ADDHero.com
@nick_eubanks

We All Want More Traffic to Our Website
@nick_eubanks

Ideally traffic that is...
❏ Sustainable
❏ Scalable
❏ Free
@nick_eubanks

Understanding Search Engines
❏ Process of information retrieval; NLP
❏ Training machines to “understand” text
❏ Identifying patterns; topics
❏ Identifying importance; term frequency
@nick_eubanks

In Comes TF*IDF
❏ Term frequency x inverse document Frequency
❏ Process for machine-based information retrieval
❏ Calculates weights based on frequency of terms used throughout entire document set
@nick_eubanks

What does that look like...
❏ Defining term “weights” based on the
frequency of use through the
document
❏ Removal of common “stop” words like
the, and, is, because, and so on
❏ Excluding common HTML elements
such as words used in header, footer,
sidebars, and navigational elements
@nick_eubanks

Topic Identification
❏ Identify themes across terms to group
into “topics”
❏ Ahrefs has some of this
❏ Much better with trained corpora
using tf*idf
❏ Can do manually if you have the
patience
❏ Use topics to design your content
calendars
@nick_eubanks

Building Relevance
❏ Process:
Keyword research > Topic
Identification > Theme
❏ Pillar content = Top-level theme
❏ Build topic content in clusters
❏ Example:
https://www.hubspot.com/marketing
-statistics
@nick_eubanks

Imagine this Scenario
❏ I gave you a stack of 1,000 documents
❏ Could you answer these questions:
❏ How would you organize them?
❏ What if I asked what they were
about?
❏ What if I asked for all the documents
about watermelons?
❏ How would you provide this
information?
@nick_eubanks

You would need to create a topic model
❏ To do this, you could use a set of different
colored highlighters
❏ Then scan the documents looking for
words that are related to each other, and
highlight them
❏ Then you would create an index
❏ You would come up with a topic for
each set of related terms
❏ You would annotate which pages
terms relevant to that topic
appeared on
@nick_eubanks

This is How Search Engines Work
@nick_eubanks

Example Topic: Organic Farming
❏ Fruit
❏ Watermelons
❏ Lemons
❏ Apples
❏ Vegetables
❏ Squash
❏ Lettuce
❏ broccoli
❏ Organic foods
❏ usda
❏ Lower yields
❏ Organic standards
❏ Farmers
❏ Organic farmers
❏ Certified organic
❏ Synthetic pesticides
@nick_eubanks

TF*IDF for “Organic Farming”
@nick_eubanks

Compared to #1 Ranking URL
@nick_eubanks

Probabilistic Complexities
❏ Automating the identification of topics
❏ Validating topics and training your model
❏ Creation of word vectors
@nick_eubanks

What are Word Vectors
❏ Simply, a vector of weights
❏ Word2Vec
❏ Pre-trained
❏ King - man + woman = queen
❏ Spatial relations where proximity
represents relevance
@nick_eubanks

Term Clustering into
Topics using LDA
❏ Tf*idf has evolved - a look at latent dirichlet
allocation [LDA]
❏ LDA is a generative statistical model
❏ Ability to generate probabilities; this is
where it gets interesting...
@nick_eubanks

Topic Clustering
❏ Grouping topics into themes
❏ Where keyword research becomes
important
❏ Building corpora of documents
representative of relevant topics
❏ Scraping Google really helps with
this
@nick_eubanks

How Does This Get You More SEO Traffic?
@nick_eubanks

What if you knew...
❏ The terms Google was expecting to find
❏ The topics Google was expecting to find
❏ The concepts that should be linked to
❏ The concepts and content that your page
should be linked from?
@nick_eubanks

Not “Keyword Density”
❏ Dial in the term weights from tf*idf for a
specific document
❏ Stuff in a bunch of keywords because you
see them weighted heavily in the top
ranking pages
❏ Google is much, much smarter than this
❏ Look for terms that represent topics
❏ Use those topics to perform additional
keyword research
❏ Explore the pages currently ranking, and
analyze their
❏ Internal links (both directions)
❏ External links (both directions)
Don’t Do
@nick_eubanks

Implementing Data from a Topic Model
❏ 28” orange traffic
cones
❏ 18” lime traffic
cones
❏ Portable traffic
cones
❏ Yellow traffic cones
❏ Extendable traffic
cone barricades
Terms
❏ Orange traffic cones
❏ Lime traffic cones
❏ Colored traffic cones
❏ Collapsible traffic
cones
❏ Cone bars
❏ Work zone safety
Topics
❏ Traffic cones
❏ Safety cones
❏ Road safety cones
Themes
@nick_eubanks

Looking at the URL Architecture
❏ Traffic Cones
❏ Orange and Lime Traffic Cones
❏ 28” Orange Traffic Cones
❏ 28” Lime Traffic Cones
❏ 18” Orange Traffic Cone
❏ Colored Traffic Cones
❏ Grabber Cones
❏ Cone Bars
❏ Collapsible Traffic Cones
❏ Traffic Cone Accessories
Product Content
❏ Traffic Cones
❏ History of Traffic Cones
❏ Custom Traffic Cones
❏ Traffic Cone Selection Guide
❏ Traffic Cones for Construction
@nick_eubanks

Implementing Data from a Topic Model
❏ Defining content type
❏ Designing content map
Reference: imfromthefuture.com/content-map/
❏ Laying out content calendar
❏ Designing an SEO-First URL Architecture
❏ Publishing
❏ Expanding keyword footprint
Reference: imfromthefuture.com/bigfoot-strategy/
@nick_eubanks

Take TF*IDF For a Spin
https://imfromthefuture.com/tfidf-embed/ @nick_eubanks

Take LDA Visualization For a Spin
https://imfromthefuture.com/lda-tool/ @nick_eubanks

TrafficThinkTank.com @nick_eubanks

TF*IDF and the Evolution of Topic Models

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to TF*IDF and the Evolution of Topic Models

Similar to TF*IDF and the Evolution of Topic Models (20)

Recently uploaded

Recently uploaded (20)

TF*IDF and the Evolution of Topic Models