Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
System and User Aspects of Web Search Latency
Ioannis Arapakis, Xiao Bai, B. Barla Cambazoglu
Yahoo Labs, Barcelona
Web Search Engines
Search Engine Result Page (SERP)
Actors in Web Search
§  User’s perspective: accessing information
•  relevance
•  speed
§  Search engine’s perspective: ...
Components of a Web Search Engine
§  Major components: Web crawling, indexing, and query processing
crawler
Web
indexer
q...
Expectations from a Web Search Engine
§  Crawl and index a large fraction of the Web
§  Maintain most recent copies of t...
Multi-site Web Search Engines (why?)
§  Better scalability and fault tolerance
§  Faster web crawling due to reduced pro...
Search Data Centers
§  Quality and speed requirements
imply large amounts of compute
resources, i.e., very large data
cen...
Web Search Engine Architectures
§  Partitioned§  Replicated§  Centralized
Energy-Price-Driven Query
Processing Problem
Idea
§  Shifting the query workload among the search sites to
decrease the total financial cost involved in query process...
Problem
Query
?
Local site
Remote sites
§  Forward a query to another data center based on
•  Query processing capacities...
Example
21
= $3
21 1 1
= $1
2
= $3
2 1 1 1
= $1
A
= $2
B
C A
= $2
1
B
C
Cost = $26 Cost = $22
221
1
Spatio-Temporal Variation
0 3 6 9 12 15 18 21
Hour of the day (GMT+0)
0.000
0.002
0.004
0.006
0.008
0.010
0.012
0.014
0.01...
Formal Problem Definition
§  Minimize financial cost"
§  Response time constraint
§  Capacity constraint
Solution
Generating Probabilities
A
D
C
B
Remote sites
Local site
Query
w.p. pA(A)
w.p. pA(C)
Generating Probabilities
§  Cheap
§  Expensive
Price Variation Configurations
§  Universal
§  Spatial
§  Temporal
§  Spatio-temporal
30
60
90
120
150
180
210
240
270...
Impact of Temporal and Spatial Price Variation
44 64 76 81 94 99 139 148 168 176
Network latency between data center pairs...
Query Response Time and Cost Saving
>100 >200 >400 >800
Query response time (ms)
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0...
Human Perception of Web Search Latency
Background Information
§  The core research in IR has been on improving the quality of
search results with the eventual g...
Trade-off between the speed of a search system and
the quality of its results
Too slow or too fast may result in financial...
§ Web users
•  are impatient
•  have limited time
•  expect sub-second response times
§ High response latency
•  can dis...
Components of User-Perceived Response Latency
§  network latency: tuf + tfu
§  search engine latency: tpre + tfb + tproc...
Distribution of Latency Values
0
0.005
0.010
0.015
0.020
0.025
0.030
0.035
0.040
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2...
Contribution of Latency Components
0
20
40
60
80
100
0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4
Contributionpercomponent(...
Experimental Design
User Study 1: User
Sensitivity to Latency
User Study 2: Impact of
Latency on Search Experience
Study 1:
User Sensitivity to Latency
Experimental Method (Task 1 & 2)
§ Independent variables:
•  Search latency (0 – 2750ms)
•  Search site speed (slow, fast...
Task 1: Procedure
§  Participants submitted 40 navigational queries
§  After submitting each query, they were asked to r...
Task 1: Results
250 750 1250 1750
Added latency (ms)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Likelihoodoffeelingaddedl...
Task 2: Procedure
§  Participant submitted 50 navigational queries
§  After each query submission they provided an estim...
Task 2: Results
Fig. 1: Slow search engine
1750 2000 2250 2500 2750
Actual latency (ms)
0
500
1000
1500
2000
2500
3000
350...
Study 2:
Impact of Latency on Search Experience
Experimental Design
§ Two independent variables
•  Search latency (0, 750, 1250, 1750)
•  Search site speed (slow, fast)
...
Procedure
§  Participants performed four search tasks
•  Evaluate the performance of four different backend search system...
Questionnaires
§  User Engagement Scale (UES)
•  Positive affect (PAS)
•  Negative affect (NAS)
•  Perceived usability
• ...
Descriptive Statistics (M) for UE and SYSUSE
§  Positive bias towards SEfast
§  SEfast participants were more deeply eng...
Correlation Analysis of Beliefs and Reported Scales
Correlation Matrix
Beliefs postPAS postNAS FA
CSUQ-
SYSUS
custom-1 cus...
Query Log Analysis
Query Log and Engagement Metric
§  Random sample of 30m web search queries obtained from Yahoo
§  We use the end-to-end ...
0.0
0.2
0.4
0.6
0.8
1.0
0.4 0.6 0.8 1.0 1.2 1.4
Clickedpageratio(normalizedbythemax)
Latency (normalized by the mean)
Vari...
0.02
0.04
0.06
0.08
0.10
0.12
0 250 500 750 100012501500 17502000
0.8
0.9
1.0
1.1
1.2
1.3
Fractionofquerypairs
Click-on-fa...
Eliminating the Effect of Content
0.02
0.04
0.06
0.08
0.10
0.12
0 250 500 750 100012501500 17502000
0.8
0.9
1.0
1.1
1.2
1....
Conclusions
Conclusions
§  Up to a point (500ms) added response time delays are not
noticeable by the users
§  After a certain thres...
Conclusions
§  The tendency to overestimate or underestimate system
performance biases users’ interpretations of search i...
Conclusions
§  Given two content-wise identical result pages, users are more
likely to click on the result page that is s...
Thank you for your attention!
arapakis@yahoo-inc.com
iarapakis
Upcoming SlideShare
Loading in …5
×

System and User Aspects of Web Search Latency

988 views

Published on

Published in: Internet
  • I think you need a perfect and 100% unique academic essays papers have a look once this site i hope you will get valuable papers, ⇒ www.HelpWriting.net ⇐
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Very nice tips on this. In case you need help on any kind of academic writing visit website ⇒ www.WritePaper.info ⇐ and place your order
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

System and User Aspects of Web Search Latency

  1. 1. System and User Aspects of Web Search Latency Ioannis Arapakis, Xiao Bai, B. Barla Cambazoglu Yahoo Labs, Barcelona
  2. 2. Web Search Engines
  3. 3. Search Engine Result Page (SERP)
  4. 4. Actors in Web Search §  User’s perspective: accessing information •  relevance •  speed §  Search engine’s perspective: monetization •  attract more users •  increase the ad revenue •  reduce the operational costs §  Advertiser’s perspective: publicity •  attract more customers •  pay little
  5. 5. Components of a Web Search Engine §  Major components: Web crawling, indexing, and query processing crawler Web indexer query processor document collection index query results user
  6. 6. Expectations from a Web Search Engine §  Crawl and index a large fraction of the Web §  Maintain most recent copies of the content in the Web §  Scale to serve hundreds of millions of queries every day §  Evaluate a typical query under several hundred milliseconds §  Serve most relevant results for a user query
  7. 7. Multi-site Web Search Engines (why?) §  Better scalability and fault tolerance §  Faster web crawling due to reduced proximity to web servers §  Lower response latency due to reduced proximity to users DC1   DC2   DC3   DC4  
  8. 8. Search Data Centers §  Quality and speed requirements imply large amounts of compute resources, i.e., very large data centers §  High variation in data center sizes •  hundreds of thousands of computers •  a few computers
  9. 9. Web Search Engine Architectures §  Partitioned§  Replicated§  Centralized
  10. 10. Energy-Price-Driven Query Processing Problem
  11. 11. Idea §  Shifting the query workload among the search sites to decrease the total financial cost involved in query processing by exploiting the spatio-temporal variation in •  Query workload" •  Energy price"
  12. 12. Problem Query ? Local site Remote sites §  Forward a query to another data center based on •  Query processing capacities •  Estimated workloads •  Estimated response latency •  Current energy prices
  13. 13. Example 21 = $3 21 1 1 = $1 2 = $3 2 1 1 1 = $1 A = $2 B C A = $2 1 B C Cost = $26 Cost = $22 221 1
  14. 14. Spatio-Temporal Variation 0 3 6 9 12 15 18 21 Hour of the day (GMT+0) 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 0.016 0.018 0.020 Queryvolume(normalizedbythetotalvolume) DC-A (GMT+10) DC-B (GMT-3) DC-C (GMT-4) DC-D (GMT+1) DC-E (GMT-4) Country 0 50 100 150 200 250 300 350 400 450 500 Electricprice($/MWh) Denmark Italy Germany Sweden France Australia Turkey Singapore Portugal Peru USA Iceland Malaysia Finland Canada 0 3 6 9 12 15 18 21 Hour of the day (GMT-4) 0 15 30 45 60 75 90 105 120 Electricprice($/MWh) Most expensive day (Tuesday) Week average Cheapest day (Saturday) §  Search queries" §  Electricity price
  15. 15. Formal Problem Definition §  Minimize financial cost" §  Response time constraint §  Capacity constraint
  16. 16. Solution
  17. 17. Generating Probabilities A D C B Remote sites Local site Query w.p. pA(A) w.p. pA(C)
  18. 18. Generating Probabilities §  Cheap §  Expensive
  19. 19. Price Variation Configurations §  Universal §  Spatial §  Temporal §  Spatio-temporal 30 60 90 120 150 180 210 240 270 300 Price($/MWh) DC-A, DC-B, DC-C, DC-D, DC-E DC-A DC-B DC-C DC-D DC-E 0 3 6 9 12 15 18 21 Hour of the day (GMT+0) 30 60 90 120 150 180 210 240 270 300 Price($/MWh) DC-A DC-B DC-C DC-D DC-E 0 3 6 9 12 15 18 21 Hour of the day (GMT+0) DC-A DC-B DC-C DC-D DC-E Universal (PC-U) Temporal (PC-T) Spatial (PC-S) Spatio-temporal (PC-ST) U T S ST
  20. 20. Impact of Temporal and Spatial Price Variation 44 64 76 81 94 99 139 148 168 176 Network latency between data center pairs (ms) 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 Fraction Saving (r=400) Saving (r=800) 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 Forwarded queries (r=400) Forwarded queries (r=800) 0 3 6 9 12 15 18 21 Hour of the day 0.00 0.01 0.02 0.03 0.04 0.05 0.06 Fraction Saving Forwarded queries Electric price
  21. 21. Query Response Time and Cost Saving >100 >200 >400 >800 Query response time (ms) 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.22 0.24 Fractionofquerytrafficvolume PC-U PC-T PC-S PC-ST 200 400 800 inf. Query response time limit (ms) 0.00 0.04 0.08 0.12 0.16 0.20 0.24 0.28 0.32 0.36 0.40 Savinginelectriccost PC-T PC-S PC-ST
  22. 22. Human Perception of Web Search Latency
  23. 23. Background Information §  The core research in IR has been on improving the quality of search results with the eventual goal of satisfying the information needs of users §  This often requires sophisticated and costly solutions •  more information stored in the inverted index •  machine-learned ranking strategies •  fusing results from multiple resources
  24. 24. Trade-off between the speed of a search system and the quality of its results Too slow or too fast may result in financial consequences for the search engine
  25. 25. § Web users •  are impatient •  have limited time •  expect sub-second response times § High response latency •  can distract users •  results in fewer query submissions •  decreases user engagement over time Web Search Economics
  26. 26. Components of User-Perceived Response Latency §  network latency: tuf + tfu §  search engine latency: tpre + tfb + tproc + tbf + tpost §  browser latency: trender User Search frontend Search backend tpre tproc tpost tfb tbf tuf tfu trender
  27. 27. Distribution of Latency Values 0 0.005 0.010 0.015 0.020 0.025 0.030 0.035 0.040 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 0 0.2 0.4 0.6 0.8 1.0 Fractionofqueries Cumulativefractionofqueries Latency (normalized by the mean)
  28. 28. Contribution of Latency Components 0 20 40 60 80 100 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 Contributionpercomponent(%) Latency (normalized by the mean) search engine latency network latency browser latency
  29. 29. Experimental Design User Study 1: User Sensitivity to Latency User Study 2: Impact of Latency on Search Experience
  30. 30. Study 1: User Sensitivity to Latency
  31. 31. Experimental Method (Task 1 & 2) § Independent variables: •  Search latency (0 – 2750ms) •  Search site speed (slow, fast) § 12 participants (female=6, male=6) •  aged from 24 to 41 •  Full-time students (33.3%), studying while working (54.3%), full-time employees (16.6%)
  32. 32. Task 1: Procedure §  Participants submitted 40 navigational queries §  After submitting each query, they were asked to report if the response of the search site was “slow” or “normal” §  For each query we increased latency by a fixed amount (0 – 1750ms), using a step of 250ms §  Each latency value (e.g., 0, 250, 500) was introduced 5 times, in a random order
  33. 33. Task 1: Results 250 750 1250 1750 Added latency (ms) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Likelihoodoffeelingaddedlatency Slow SE (base) Slow SE Fast SE (base) Fast SE 250 750 1250 1750 Added latency (ms) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Increaserelativetobaselikelihood Slow SE Fast SE §  Delays <500ms are not easily noticeable §  Delays >1000ms are noticed with high likelihood
  34. 34. Task 2: Procedure §  Participant submitted 50 navigational queries §  After each query submission they provided an estimation of the search latency in milliseconds §  Search latency was set to a fixed amount (500 – 2750ms), using a step of 250ms §  Each latency value (e.g., 0, 250, 500) was introduced 5 times, in a random order §  A number of training queries was submitted without any added delay
  35. 35. Task 2: Results Fig. 1: Slow search engine 1750 2000 2250 2500 2750 Actual latency (ms) 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Predictedlatency(ms) Actual Males Females Average 750 1000 1250 1500 1750 2000 2250 2500 2750 Actual latency (ms) 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Predictedlatency(ms) Actual Males Females Average Fig. 2: Fast search engine
  36. 36. Study 2: Impact of Latency on Search Experience
  37. 37. Experimental Design § Two independent variables •  Search latency (0, 750, 1250, 1750) •  Search site speed (slow, fast) § Search latency was adjusted by a desired amount using a custom-made JS deployed through the Greasemonkey extension § 20 participants (female=10, male=10)
  38. 38. Procedure §  Participants performed four search tasks •  Evaluate the performance of four different backend search systems •  Submit as many navigational queries from a list of 200 randomly sampled web domains •  For each query they were asked to locate the target URL among the first ten results of the SERP §  Training queries were used to allow participants to familiarize themselves with the “default” search site speed
  39. 39. Questionnaires §  User Engagement Scale (UES) •  Positive affect (PAS) •  Negative affect (NAS) •  Perceived usability •  Felt involvement and focused attention §  IBM’s Computer System Usability Questionnaire (CSUQ) •  System usefulness (SYSUSE) §  Custom statements
  40. 40. Descriptive Statistics (M) for UE and SYSUSE §  Positive bias towards SEfast §  SEfast participants were more deeply engaged §  SEfast participants’ usability perception was more tolerant to delays SEslow latency SEfast latency 0 750 1250 1750 0 750 1250 1750 Post-Task Positive Affect 16.20 14.50 15.50 15.20 20.50 19.00 20.80 19.30 Post-Task Negative Affect 7.00 6.80 7.60 6.90 6.80 7.40 7.40 7.20 Frustration 3.20 3.10 2.90 3.30 2.80 3.00 3.50 2.60 Focused Attention 22.80 22.90 19.90 22.20 27.90 26.60 23.90 29.50 SYSUS 32.80 28.90 29.80 27.90 35.20 31.30 29.80 33.20
  41. 41. Correlation Analysis of Beliefs and Reported Scales Correlation Matrix Beliefs postPAS postNAS FA CSUQ- SYSUS custom-1 custom-2 custom-3 SEslow will respond fast to my queries .455** .041 0.702** .267 .177 .177 .082 SEslow will provide relevant results .262 -.083 .720** .411** .278 .263 .232 SEfast will respond fast to my queries -.051** .245 .341* .591** .330* .443** .624** SEfast will provide relevant results -.272 .133 -.133 .378* .212 .259 .390* *. Correlation is significant at the .05 level (2-tailed). **. Correlation is significant at the .01 level (2-tailed)
  42. 42. Query Log Analysis
  43. 43. Query Log and Engagement Metric §  Random sample of 30m web search queries obtained from Yahoo §  We use the end-to-end (user perceived) latency values §  To control for differences due to geolocation or device, we select queries issued: •  Within the US •  To a particular search data center •  From desktop computers §  We quantify engagement using the clicked page ratio metric
  44. 44. 0.0 0.2 0.4 0.6 0.8 1.0 0.4 0.6 0.8 1.0 1.2 1.4 Clickedpageratio(normalizedbythemax) Latency (normalized by the mean) Variation of Clicked Page Ratio Metric
  45. 45. 0.02 0.04 0.06 0.08 0.10 0.12 0 250 500 750 100012501500 17502000 0.8 0.9 1.0 1.1 1.2 1.3 Fractionofquerypairs Click-on-fast/Click-on-slow Latency difference (in milliseconds) Click-on-fast Click-on-slow Ratio Eliminating the Effect of Content §  q1 = q2 & SERP1 = SERP2 §  500ms of latency difference is the critical point beyond which users are more likely to click on a result retrieved with lower latency
  46. 46. Eliminating the Effect of Content 0.02 0.04 0.06 0.08 0.10 0.12 0 250 500 750 100012501500 17502000 0.8 0.9 1.0 1.1 1.2 1.3 Fractionofquerypairs Click-more-on-fast/Click-more-on-slow Latency difference (in milliseconds) Click-more-on-fast Click-more-on-slow Ratio §  q1 = q2 & SERP1 = SERP2 §  Clicking on more results becomes preferable to submitting new queries when the latency difference exceeds a certain threshold (1250ms)
  47. 47. Conclusions
  48. 48. Conclusions §  Up to a point (500ms) added response time delays are not noticeable by the users §  After a certain threshold (1000ms) the users can feel the added delay with very high likelihood §  Perception of search latency varies considerably across the population!
  49. 49. Conclusions §  The tendency to overestimate or underestimate system performance biases users’ interpretations of search interactions and system usability §  Participants of the SE_fast: •  were generally more deeply engaged •  their usability perception was more tolerant to delays
  50. 50. Conclusions §  Given two content-wise identical result pages, users are more likely to click on the result page that is served with lower latency §  500ms of latency difference is the critical point beyond which users are more likely to click on a result retrieved with lower latency §  Clicking on more results becomes preferable to submitting new queries when the latency difference exceeds a certain threshold (1250ms)
  51. 51. Thank you for your attention! arapakis@yahoo-inc.com iarapakis

×