Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Hours of Technical SEO

583 views

Published on

These slides were presented at the SEMrush webinar "How to Leverage Insights from Your Site’s Server Logs | 5 Hours of Technical SEO". Video replay and transcript are available at https://www.semrush.com/webinars/how-to-leverage-insights-from-your-site-s-server-logs-or-5-hours-of-technical-seo/

Published in: Marketing
  • Be the first to comment

  • Be the first to like this

Jamie Alberico — How to Leverage Insights from Your Site’s Server Logs | 5 Hours of Technical SEO

  1. 1. Leveraging Insights from Server Logs #5hoursoftechnicalSEO
  2. 2. To be crawled, indexed, and ranked. All SEOs share a common goal:
  3. 3. How can we answer all these questions? ● Which pages is Googlebot crawling? ● What user-agent is it using? ● Is Googlebot crawl mirroring our understanding of site structure and assets? ● How’s the sites tech health?
  4. 4. Logs are a record of every request a server receives.
  5. 5. Actions > Words.
  6. 6. Aggregate Validate Googlebot Translate Parse logs for meaningful search and analysis Translate Log Source 1
  7. 7. Logs can come from multiple places in your stack. Web Server 1 Web Server 2 Web Server 3 CDN DDOS Mitigation/Bot Manager Load Balancer
  8. 8. You want enough log data to get an accurate picture.
  9. 9. Check your CDN on data on edge node (cached) vs server (uncached) hits
  10. 10. Internal Log Requests Ask: Is there already a log management platform in place? Be Clear: We do not want Personal Identification Information (PII) and request it be removed Be specific: Exported as .csv, please!
  11. 11. DIY Log Access Apache (Linux Server) NGINX (Linux Server) IIS log files (Windows Server) AWS Load Balancer (Load Balancer) Google Cloud Load Balancer (Load Balancer) AWS Cloudfront (CDN) Accessing CloudFare log files (CDN, Enterprise account required) Incapsula (CDN/DDoS Mitigation) Akamai logs (CDN/DDoS Mitigation)
  12. 12. Standard Wordpress site? Log into your hosting provider and look for Raw Access
  13. 13. Aggregate Validate Googlebot Translate Parse logs for meaningful search and analysis Translate Log Source 1
  14. 14. Many tools, many languages Paid: DeepCrawl, Botify, Logz.io, Sumo Logic, Splunk Free(mium): SEMRush, Screaming Frog Log Analyzer, Big Query Code savvy: Python, JP Masochistic: Excel, Command Line
  15. 15. Leverage the tools and functionalities already in place.
  16. 16. Aggregate Validate Googlebot Translate Parse logs for meaningful search and analysis Translate Log Source 1
  17. 17. Manually validate Googlebot IPs Run a reverse DNS lookup on the accessing IP address from your logs, using the host command. jammer@Hypatia ~ % host 66.249.66.1 1.66.249.66.in-addr.arpa domain name pointer crawl-66-249-66-1.googlebot.com
  18. 18. Bulk validate Googlebot IPs with Scripts Source: Shell Script to Detect if the IP Address Is Googlebot, Dzone
  19. 19. Validate Googlebot IPs with Tool
  20. 20. Aggregate Validate Googlebot Translate Parse logs for meaningful search and analysis Translate Log Source 1
  21. 21. 216.150.168.131 [07/Mar/2018:16:11:58 -0800] 66.249.66.1 GET /twiki/bin/view/TWiki/WikiSyntax HTTP/1.1 www.arrow.com 200 7352 616 - Mozilla/5.0+(Linux;+Android+6.0.1;+Nexus+5X+Bu ild/MMB29P)+AppleWebKit/537.36+(KHTML,+like+Ge cko)+Chrome/41.0.2272.96+Mobile+Safari/537.36+ (compatible;+Googlebot/2.1;++http://www.google .com/bot.html) https://www.arrow.com/en/ indiegogo The values captured in logs is unique to each site. Make a new engineering friend to learn exactly what they mean.
  22. 22. Unlock logs ≤ 6 lines 1. Data Source 2. Condition 3. Segments 4. Grouping 5. Sort 6. Limit*
  23. 23. Use Cases + Queries
  24. 24. Use Case (Basic Query) Legacy code being brought kicking and screaming into mobile-only index
  25. 25. Query: Are we migrating to mobile-only index? 1. Data Source: Your aggregated logs 2. Condition: where the requester is (verified) Googlebot 3. Group by: User-agent 4. Count: Number of hits (desc) 5. Limit: Start with ~10 results.
  26. 26. (Query with grouping) Use case: Google chose a different canonical
  27. 27. Query: Are non-canonical hostnames being crawled? 1. Data Source: Aggregated logs 2. Condition: where Googlebot 3. Group by: Hostname 4. Count: Number of hits (desc) 5. Limit: 10
  28. 28. (Query with creative segments) Use case: Launching content in a new language.
  29. 29. Segmentation = pattern matching/creative thinking Happy path: Consistent URL structure Plan b: HTTP Entity header Content-Language
  30. 30. Query: Which languages are being crawled? 1. Data Source: Your aggregated logs 2. Condition: where Googlebot 3. Group by: Language 4. Count: Number of hits (desc) 5. Limit: 10 6. Limit: Start with ~10 results.
  31. 31. (Query with parsed segments) Use case: Low index coverage
  32. 32. Build on the fly segments by parsing URL structure /en/products/blam-o/log-12345 }Language App } Manufacturer } SKU }
  33. 33. Query: Which subfolders are being crawled? 1. Data Source: Your aggregated logs 2. Condition: where Googlebot 3. Parse: subfolder 4. Aggregate: by Subfolder 5. Count: Number of hits (desc) 6. Limit: Start with ~10 results.
  34. 34. (Parsed Segments AND Conditions) Use case: Sudden crawl flux
  35. 35. Even search engines need to CYA Googlebot is designed to be a good citizen of the web... For Googlebot a speedy site is a sign of healthy servers... If the site slows down or responds with server errors, the [crawl rate] limit goes down and Googlebot crawls less. Official Google Webmaster Central Blog: What Crawl Budget Means for Googlebot
  36. 36. Starting query: What HTTP status codes are we returning? 1. Data Source: Your aggregated logs 2. Condition: where Googlebot 3. Aggregate: by HTTP Status 4. Count: Number of hits (desc) 5. Limit: Start with ~10 results.
  37. 37. Iterative query: What resources are returning 5XX? 1. Data Source: Your aggregated logs 2. Condition: where Googlebot AND 3. Condition: where 5XX 4. Parse: subfolder 5. Count: Number of hits (desc) 6. Limit: Start with ~10 results.
  38. 38. Advanced Use Cases + Blended Data
  39. 39. Query: Non-indexable pages with bot hits
  40. 40. Query: Indexable pages without bot hits
  41. 41. Query: Bot hits by indexability
  42. 42. Query: In sitemaps with no bot hits
  43. 43. Query: Empty dynamically generated pages
  44. 44. | ̄ ̄ ̄ ̄ ̄ ̄ ̄ ̄| IT'S CHAOS. BE KIND. |________| (__/) || (•ㅅ•) || /   づ
  45. 45. I'm a mentor @ United Search Want to take stage as an SEO speaker? Want to stay in the audience but see more diversity in SEO events? United Search is an SEO speaker accelerator designed to specifically aid underrepresented groups, at no cost to students. ● Application - unitedsearch.org/apply ● Mentors - unitedsearch.org/mentors ● Mission - unitedsearch.org/about-us For more info check out unitedsearch.org or @search_united on Twitter.

×