Originally presented at SXSW March 13, 2011, on panel with Fred Beecher and Austin Govella. Modified and updated for Web 2.0 Expo talk, October 12, 2011, UX Web Summit September 26, 2012; Webdagene September 10, 2013.
4. No, let’s look at the real data
Critical elements in bold: IP address, time/date stamp, query, and # of
results:
XXX.XXX.X.104 - - [10/Jul/2006:10:25:46 -0800]
"GET /search?access=p&entqr=0
&output=xml_no_dtd&sort=date%3AD%3AL
%3Ad1&ud=1&site=AllSites&ie=UTF-8
&client=www&oe=UTF-8&proxystylesheet=www&
q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1"
200 971 0 0.02
XXX.XXX.X.104 - - [10/Jul/2006:10:25:48 -0800]
"GET /searchaccess=p&entqr=0
&output=xml_no_dtd&sort=date%3AD%3AL
%3Ad1&ie=UTF-8&client=www&
q=license+plate&ud=1&site=AllSites
&spell=1&oe=UTF-8&proxystylesheet=www&
ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16
5. No, let’s look at the real data
Critical elements in bold: IP address, time/date stamp, query, and # of
results:
XXX.XXX.X.104 - - [10/Jul/2006:10:25:46 -0800]
"GET /search?access=p&entqr=0
&output=xml_no_dtd&sort=date%3AD%3AL
%3Ad1&ud=1&site=AllSites&ie=UTF-8
&client=www&oe=UTF-8&proxystylesheet=www&
q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1"
200 971 0 0.02
XXX.XXX.X.104 - - [10/Jul/2006:10:25:48 -0800]
"GET /searchaccess=p&entqr=0
&output=xml_no_dtd&sort=date%3AD%3AL
%3Ad1&ie=UTF-8&client=www&
q=license+plate&ud=1&site=AllSites
&spell=1&oe=UTF-8&proxystylesheet=www&
ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16
What are users
searching?
6. No, let’s look at the real data
Critical elements in bold: IP address, time/date stamp, query, and # of
results:
XXX.XXX.X.104 - - [10/Jul/2006:10:25:46 -0800]
"GET /search?access=p&entqr=0
&output=xml_no_dtd&sort=date%3AD%3AL
%3Ad1&ud=1&site=AllSites&ie=UTF-8
&client=www&oe=UTF-8&proxystylesheet=www&
q=lincense+plate&ip=XXX.XXX.X.104 HTTP/1.1"
200 971 0 0.02
XXX.XXX.X.104 - - [10/Jul/2006:10:25:48 -0800]
"GET /searchaccess=p&entqr=0
&output=xml_no_dtd&sort=date%3AD%3AL
%3Ad1&ie=UTF-8&client=www&
q=license+plate&ud=1&site=AllSites
&spell=1&oe=UTF-8&proxystylesheet=www&
ip=XXX.XXX.X.104 HTTP/1.1" 200 8283 146 0.16
What are users
searching?
How often are
users failing?
10. A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents
meet the needs of your most important audiences
11. A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents
meet the needs of your most important audiences
Not all queries are
distributed equally
12. A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents
meet the needs of your most important audiences
13. A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents
meet the needs of your most important audiences
Nor do they
diminish gradually
14. A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents
meet the needs of your most important audiences
15. A little goes a long wayA handful of queries/tasks/ways to navigate/features/ documents
meet the needs of your most important audiences
80/20 rule isn’t
quite accurate
22. Some things you can do with SSA
1.Make it harder to get lost in deep content
2.Make search smarter
3.Reduce jargon
4.Learn how your audiences differ
5.Know when to publish what
6.Own and enjoy your failures
7.Avoid disaster
8.Predict the future
24. Start with basic SSA data:
queries and query frequency
Percent: volume
of search activity
for a unique
query during a
particular time
period
Cumulative
Percent:
running sum of
percentages
27. Tease out common content types
Took an hour to...
• Analyze top 50 queries (20% of all search activity)
• Ask and iterate: “what kind of content would users be
looking for when they searched these terms?”
• Add cumulative percentages
Result: prioritized list of potential content types
#1) application: 11.77%
#2) reference: 10.5%
#3) instructions: 8.6%
#4) main/navigation pages: 5.91%
#5) contact info: 5.79%
#6) news/announcements: 4.27%
28. Clear content types lead to
better contextual navigation
artist descriptions
album reviews
album pages
artist biosdiscography
TV listings
37. Session data suggest
progression and context
search session patterns
1. solar energy
2. how solar energy works
search session patterns
1. solar energy
2. energy
38. Session data suggest
progression and context
search session patterns
1. solar energy
2. how solar energy works
search session patterns
1. solar energy
2. energy
search session patterns
1. solar energy
2. solar energy charts
39. Session data suggest
progression and context
search session patterns
1. solar energy
2. how solar energy works
search session patterns
1. solar energy
2. energy
search session patterns
1. solar energy
2. solar energy charts
search session patterns
1. solar energy
2. explain solar energy
40. Session data suggest
progression and context
search session patterns
1. solar energy
2. how solar energy works
search session patterns
1. solar energy
2. energy
search session patterns
1. solar energy
2. solar energy charts
search session patterns
1. solar energy
2. explain solar energy
search session patterns
1. solar energy
2. solar energy news
43. Saving the brand by killing jargon
at a community college
Jargon related to online education: FlexEd, COD,
College on Demand
Marketing’s solution: expensive campaign to
educate public (via posters, brochures)
The Numbers
(from SSA):
Result: content relabeled, money saved
query rank query
#22 online*
#101 COD
#259 College on Demand
#389 FlexTrack
*“online”part of 213 queries
49. Why analyze queries by audience?
Fortify your personas with data
Learn about differences between audiences
• Open University “Enquirers”: 16 of 25 queries
are for subjects not taught at OU
• Open University Students: search for course
codes, topics dealing with completing program
Determine what’s commonly important to all
audiences (these queries better work well)
65. Failed business goals?
Developing custom metrics
Netflix asks
1. Which movies most frequently searched? (query count)
2. Which of them most frequently clicked through? (MDP views)
3. Which of them least frequently added to queue? (queue adds)
66. Failed business goals?
Developing custom metrics
Netflix asks
1. Which movies most frequently searched? (query count)
2. Which of them most frequently clicked through? (MDP views)
3. Which of them least frequently added to queue? (queue adds)
67. Failed business goals?
Developing custom metrics
Netflix asks
1. Which movies most frequently searched? (query count)
2. Which of them most frequently clicked through? (MDP views)
3. Which of them least frequently added to queue? (queue adds)
69. The new and improved search engine
that wasn’t
Vanguard used SSA to help benchmark
existing search engine’s performance and
help select new engine
New search engine “performed” poorly
But IT needed
convincing
to delay
launch
Information Architect &
Dev Team Meeting
Search seems
to have a few
problems… Nah
.
Where’s the
proof?
You can’t tell
for sure.
70. What to do?
Test performance of common queries
“Before and after” testing using two sets of
metrics
1.Relevance: how reliably the search engine
returns the best matches first
2.Precision: proportion of relevant results
clustered at the top of the list
71. Old engine (target) and new compared
Note: low relevance and high precision scores are optimal
More on Vanguard case study: http://bit.ly/D3B8c
72. Old engine (target) and new compared
Note: low relevance and high precision scores are optimal
More on Vanguard case study: http://bit.ly/D3B8c
uh-oh
73. Old engine (target) and new compared
Note: low relevance and high precision scores are optimal
More on Vanguard case study: http://bit.ly/D3B8c
uh-oh better
75. Shaping the
FinancialTimes’ editorial agenda
FT compares these
• Spiking queries
for proper nouns
(i.e., people and
companies)
• Recent editorial
coverage of
people and
companies
Discrepancy?
• Breaking story?!
• Let the
editors
know!
Seed your
77. Lou’s TABLE OF
OVERGENERALIZED
DICHOTOMIES
Web Analytics User Experience
What they analyze
Users' behaviors (what's
happening)
Users' intentions and
motives (why those things
happen)
What methods they
employ
Quantitative methods to
determine what's happening
Qualitative methods for
explaining why things
happen
What they're trying
to achieve
Helps the organization meet
goals (expressed as KPI)
Helps users achieve goals
(expressed as tasks or
topics of interest)
How they use data
Measure performance (goal-
driven analysis)
Uncover patterns and
surprises (emergent
analysis)
What kind of data
they use
Statistical data ("real" data
in large volumes, full of
errors)
Descriptive data (in small
volumes, generated in lab
environment, full of errors)
83. Use SSA to start work
on a site report card
SSA helps
determine common
information needs
84. Read this
Search Analytics forYour Site:
Conversations with
Your Customers
by Louis Rosenfeld
(Rosenfeld Media, 2011)
www.rosenfeldmedia.com
Use code
WEBDAGENE2013
for 20% off all
Rosenfeld Media books
We get two major things out of this data: SESSIONS and FREQUENT QUERIES\n
Your brain on data: what will it do?\n
Your brain on data: what will it do?\n
\n
\n
Amazing drawing by Eva-Lotta Lamm: www.evalotta.net\n
Amazing drawing by Eva-Lotta Lamm: www.evalotta.net\n
Amazing drawing by Eva-Lotta Lamm: www.evalotta.net\n
Amazing drawing by Eva-Lotta Lamm: www.evalotta.net\n
Amazing drawing by Eva-Lotta Lamm: www.evalotta.net\n
Amazing drawing by Eva-Lotta Lamm: www.evalotta.net\n
Amazing drawing by Eva-Lotta Lamm: www.evalotta.net\n
Amazing drawing by Eva-Lotta Lamm: www.evalotta.net\n
Amazing drawing by Eva-Lotta Lamm: www.evalotta.net\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Personas: http://www.uie.com/images/blog/YahooExamplePersona.gif\nTable: From Jarrett, Quesenbery, Stirling, and Allen’s report “Search Behaviour at OU;” April 6, 2007.\n
Personas: http://www.uie.com/images/blog/YahooExamplePersona.gif\nTable: From Jarrett, Quesenbery, Stirling, and Allen’s report “Search Behaviour at OU;” April 6, 2007.\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
Examples\n “OO7” versus “007”\n Porn-related (not carried by Netflix)\n “yoga”: not stocking enough? Or not indexing enough record content? Some other problem?\n
Examples\n “OO7” versus “007”\n Porn-related (not carried by Netflix)\n “yoga”: not stocking enough? Or not indexing enough record content? Some other problem?\n