SlideShare a Scribd company logo
1 of 210
slideshare.net/Tom-Pool
How To Use Chrome Puppeteer To
Fake Googlebot And Monitor Your Site
Tom Pool // BlueArray //
@cptntommy
Who Am I?
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
Look After
Technical
Output Of
The
Agency
@cptntommy #BrightonSEO
Always Trying
To Find Ways
To Make My
Teams Job
Easier
@cptntommy #BrightonSEO
So I Was
Watching
Google I/O 18
(Which Is
Awesome
BTW)
@cptntommy #BrightonSEO
And I Saw A
Really
Really
Really Cool
Talk
@cptntommy #BrightonSEO
Eric Bidelman
@cptntommy #BrightonSEO
This Got Me Thinking
@cptntommy #BrightonSEO
I Can Use This
To Help Me
With My Job!
@cptntommy #BrightonSEO
So I Went Away &
Did A Shit Ton Of
Research
@cptntommy #BrightonSEO
That Included
@cptntommy #BrightonSEO
Headless
Chrome
@cptntommy #BrightonSEO
Chrome
@cptntommy #BrightonSEO
And A Little Bit
Of Coding
@cptntommy #BrightonSEO
(Not Much!)
@cptntommy #BrightonSEO
I Want All Of You
To At Least Take
@cptntommy #BrightonSEO
A Small
Piece Of
Knowledge
From This
@cptntommy #BrightonSEO
I’ll Also Tweet Out
This Deck
@cptntommy #BrightonSEO
So...
@cptntommy #BrightonSEO
What Is
Headless
Chrome?
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
Headless
Chrome
=
None Of That
Shit
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
Google Chrome Is
Running, But With
No User Interface
@cptntommy #BrightonSEO
So It Is ‘Headless’
@cptntommy #BrightonSEO
Why Should You Even Care?
@cptntommy #BrightonSEO
You Can:
@cptntommy #BrightonSEO
Scrape The Shit Out Of (JS)
Websites
@cptntommy #BrightonSEO
Copy The DOM, & Paste To A
Text File
@cptntommy #BrightonSEO
Compare Source Code With
DOM & Export Differences
@cptntommy #BrightonSEO
Generate
Screenshots
of Pages
@cptntommy #BrightonSEO
Crawl Single Page
Applications
@cptntommy #BrightonSEO
I Know, JS Is Evil, But
It Ain’t Going Away!
@cptntommy #BrightonSEO
Screaming Frog Does Have JS
Rendering Features.
Utilises (Something Like)
Headless Chrome
@cptntommy #BrightonSEO
Google Can Render JS, But It Is
In No Way Perfect, Or Even
That Effective
@cptntommy #BrightonSEO
Countless Case Studies
@cptntommy #BrightonSEO
Crawl Single Page
Applications
@cptntommy #BrightonSEO
Automate WebPage Checks
@cptntommy #BrightonSEO
Used For Webpage Testing
(Clicking On Buttons, Filling
In Forms, General Fuckery)
@cptntommy #BrightonSEO
Great For Emulating User
Behaviour!
@cptntommy #BrightonSEO
Great For Seeing How Much
Shit A Website Can Take
Before It Breaks!
@cptntommy #BrightonSEO
The Problem Is...
@cptntommy #BrightonSEO
You Have To Run
Basic Headless
Chrome From
Command Line
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
/Applications/Google
Chrome.app/Contents/MacOS/Google
Chrome
@cptntommy #BrightonSEO
/Applications/Google
Chrome.app/Contents/MacOS/Google
Chrome --headless
@cptntommy #BrightonSEO
/Applications/Google
Chrome.app/Contents/MacOS/Google
Chrome --headless --remote-debugging-
port=9222
@cptntommy #BrightonSEO
/Applications/Google
Chrome.app/Contents/MacOS/Google
Chrome --headless --remote-debugging-
port=9222 --disable-gpu
@cptntommy #BrightonSEO
/Applications/Google
Chrome.app/Contents/MacOS/Google
Chrome --headless --remote-debugging-
port=9222 --disable-gpu
https://www.bluearray.co.uk
@cptntommy #BrightonSEO
Now
@cptntommy #BrightonSEO
I
Really
Really
Love Using
Command
Line
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
But This
Really
Really
Made
Me
Cry
@cptntommy #BrightonSEO
So How Do I Make It
Easy?
@cptntommy #BrightonSEO
Like I Said - I’m
Always Trying To
Make My Job Easier
@cptntommy #BrightonSEO
And This
Was Not
Easy!
@cptntommy #BrightonSEO
So I Went Away &
Did A Bigger Shit
Ton Of
Research
@cptntommy #BrightonSEO
Eric Bidelman
@cptntommy #BrightonSEO
What Is
Chrome
Puppeteer?
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
BlahBlahBlahBlahBlahBlahBlah
BlahBlahBlahBlahBlahBlahBlahBlahBlah
BlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlahBlah
BlahBlahBlahBlahBlahBlahBlahBlahBlahBlah
BlahBlahBlahBlahBlahBlahBlah
@cptntommy #BrightonSEO
OOOOOOOO API
@cptntommy #BrightonSEO
Node Can Be Used
For Making
Applications
@cptntommy #BrightonSEO
And It Can Also
Be Used To help
Control Headless
Chrome
@cptntommy #BrightonSEO
And Trust Me
It’s Easy!
@cptntommy #BrightonSEO
So How Can I
Get Chrome
Puppeteer?
@cptntommy #BrightonSEO
If You Want To
Run Tests On
Your Local
Machine
@cptntommy #BrightonSEO
You Have To
Install NPM &
Node.js
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
Someone’s
Made This
Easy!
@cptntommy #BrightonSEO
So If
You
Are On
PC
@cptntommy #BrightonSEO
It’s Pretty
Straightforward
@cptntommy #BrightonSEO
Just Install From
The Node.js
Websites
@cptntommy #BrightonSEO
bit.ly/pc-pup-brighton19
@cptntommy #BrightonSEO
If You
Are On
Mac
@cptntommy #BrightonSEO
(Like Me)
@cptntommy #BrightonSEO
It’s Not That
Easy
@cptntommy #BrightonSEO
bit.ly/pupbrighton19
@cptntommy #BrightonSEO
You Wanna
Open Up
Terminal
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
ruby -e "$(curl -fsSL
https://raw.githubusercontent.com/Homebrew/
install/master/install)"
@cptntommy #BrightonSEO
This Installs
Homebrew,
That Makes
Everything E-Z
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
When
This
Has
Done Its
Thing
@cptntommy #BrightonSEO
You Have To
Install 2 More
Things, And
We’ll Be Ready
To Rock
@cptntommy #BrightonSEO
brew install node
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
And Then
@cptntommy #BrightonSEO
npm i puppeteer
@cptntommy #BrightonSEO
Now You Are All
Good!
@cptntommy #BrightonSEO
You Can Now Run
Chrome Puppeteer On
Your Machine!
@cptntommy #BrightonSEO
For Example
@cptntommy #BrightonSEO
If I Wanted To Take A
Screenshot Of A
Single Webpage
@cptntommy #BrightonSEO
There Is A Bunch Of
Code Coming Up
@cptntommy #BrightonSEO
That Can All Be Seen
In The Following Link
(I’ll Also Tweet It)
@cptntommy #BrightonSEO
https://bit.ly/Brighton
SEO19
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
let browser = await
puppeteer.launch({headless:
true});
@cptntommy #BrightonSEO
let page = await
browser.newPage();
@cptntommy #BrightonSEO
await
page.goto('https://www.
bluearray.co.uk/');
@cptntommy #BrightonSEO
await
page.screenshot({
@cptntommy #BrightonSEO
await
page.screenshot({ path:
'./testimg.jpg',
@cptntommy #BrightonSEO
await
page.screenshot({ path:
'./testimg.jpg', type:
'jpeg'});
@cptntommy #BrightonSEO
await page.close();
await
browser.close();
@cptntommy #BrightonSEO
File Is Saved As
Screenshot.js
@cptntommy #BrightonSEO
So To Run This Small
Piece Of Code
@cptntommy #BrightonSEO
Go To Terminal (In
Same Folder As Code),
And Type In
@cptntommy #BrightonSEO
Node Screenshot.js
@cptntommy #BrightonSEO
And Then, 5 Seconds
later,
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
If You Wanted To See
The Browser Do These
Steps
@cptntommy #BrightonSEO
let browser = await
puppeteer.launch({headless:
True});
@cptntommy #BrightonSEO
let browser = await
puppeteer.launch({headless:
False});
@cptntommy #BrightonSEO
You Can Also Provide
A List Of URLs
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
And Get A Shit Ton Of
Screenshots!
Now I’m Sure You Can
See Where This Is
Headed
@cptntommy #BrightonSEO
Faking Googlebot!
@cptntommy #BrightonSEO
With A Few Tweaks to
The Code
@cptntommy #BrightonSEO
await
page.setUserAgent
('Googlebot');
@cptntommy #BrightonSEO
Googlebot’s User
Agent Is Not Just
‘Googlebot’
@cptntommy #BrightonSEO
It’s Fuck*** Huge
@cptntommy #BrightonSEO
Mozilla/5.0 (Linux; Android 6.0.1;
Nexus 5X Build/MMB29P)
AppleWebKit/537.36 (KHTML,
like Gecko) Chrome/41.0.2272.96
Mobile Safari/537.36
(compatible; Googlebot/2.1;
+http://www.google.com/bot.ht@cptntommy #BrightonSEO
And Then You Gotta
Set Googlebot’s
Viewport
@cptntommy #BrightonSEO
await
page.setViewport
@cptntommy #BrightonSEO
await
page.setViewport
({width: 1024, height:
1024});
@cptntommy #BrightonSEO
FYI This Is Not Really
Googlebot
@cptntommy #BrightonSEO
As Unfortunately
@cptntommy #BrightonSEO
Can’t Change Chrome
Version That
Puppeteer Uses To 41
:(
@cptntommy #BrightonSEO
As Chrome Puppeteer
Was Released After
Chrome 41
(*Not Backwards Compatible)
@cptntommy #BrightonSEO
However!
@cptntommy #BrightonSEO
Can Be Persuasive In
Getting A Client To
Ensure Their Content
Is SSR’d
(If Needed)
@cptntommy #BrightonSEO
Chrome Puppeteer
Can Be Installed On
The Server
@cptntommy #BrightonSEO
We Can Then Provide
Puppeteer With A List
Of URLs, And It Can
Work Through Them
All
@cptntommy #BrightonSEO
And Show How They
Would Appear To
Google, Instead Of
@cptntommy #BrightonSEO
In The Case Of Some
JS Sites
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
A Blank Page
@cptntommy #BrightonSEO
Which Is Cool & A
Nice Trick
@cptntommy #BrightonSEO
But The Really Cool
Stuff Is Yet To Come
@cptntommy #BrightonSEO
So Who Here
Has Heard
Of (Or Used)
ContentKing?
@cptntommy #BrightonSEO
It’s Fairly Awesome
@cptntommy #BrightonSEO
Allows You To
Monitor A Site In
Real-Time
@cptntommy #BrightonSEO
With It Letting you
Know Of Any Issues
@cptntommy #BrightonSEO
Meta Changes, New
404 Errors, Updated
Links….
@cptntommy #BrightonSEO
BUT
@cptntommy #BrightonSEO
Like Most Good Tools,
It Costs Money
@cptntommy #BrightonSEO
Maybe You
Don’t Wanna
Eat Into Your
Budget
@cptntommy #BrightonSEO
This Next Example
Shows How We Can
Use Puppeteer
@cptntommy #BrightonSEO
Monitor Your Site
When You Want
&
Report Of Any
Changes To Key Areas
@cptntommy #BrightonSEO
Including
@cptntommy #BrightonSEO
Title Changes
@cptntommy #BrightonSEO
Description Changes
@cptntommy #BrightonSEO
Word Count
Increases/Decreases
@cptntommy #BrightonSEO
Robots Directives
@cptntommy #BrightonSEO
Canonicals
@cptntommy #BrightonSEO
So Basically The
REALLY Important
Shit In The HTML
@cptntommy #BrightonSEO
So I Wrote Some Code
@cptntommy #BrightonSEO
As With All Code, Required A
Bit Of Research
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
And With A Bit Of Luck,
@cptntommy #BrightonSEO
We Now Have A Way To
Monitor Basic Areas Of Sites!
@cptntommy #BrightonSEO
So.
@cptntommy #BrightonSEO
There Is About 200 Lines Of
Code
@cptntommy #BrightonSEO
@cptntommy #BrightonSEO
And I Don’t Have Time To Go
Through The Full Thing
@cptntommy #BrightonSEO
But
@cptntommy #BrightonSEO
There Are A Few Interesting
Snippets I’d Like To Share
@cptntommy #BrightonSEO
We Launch Headless Chrome
& Puppeteer As Highlighted A
Minute Ago
@cptntommy #BrightonSEO
const browser = await
puppeteer.launch();
const page = await
browser.newPage();
@cptntommy #BrightonSEO
Provide A List Of URLs For
Puppeteer To Go And Play
With
@cptntommy #BrightonSEO
try {data =
fs.readFileSync('/Users/tomp
ool/Desktop/PuppeteerRender
ing/PageMonitor/urls.txt','utf
8');}
@cptntommy #BrightonSEO
And Then Pull Relevant Meta
Data
@cptntommy #BrightonSEO
For Example
@cptntommy #BrightonSEO
Meta Title
@cptntommy #BrightonSEO
try {title = await page.title();}
catch (e1) {title = 'n/a';}
@cptntommy #BrightonSEO
Then Create An Array Of All
The Meta Data
@cptntommy #BrightonSEO
let retArray =
[date,url,title,description
,canonical,robots,wordC
ount];
@cptntommy #BrightonSEO
And Pushed This To A txt File
@cptntommy #BrightonSEO
The Script Then Loops
Through All Provided URLs
@cptntommy #BrightonSEO
And Checks For Differences In
The Returned Data
@cptntommy #BrightonSEO
If There Are Any Differences,
These Get Saved In Another
txt File
@cptntommy #BrightonSEO
That I Can Check Whenever
@cptntommy #BrightonSEO
So I Can See What Has
Changed From
Yesterday/When I Last Ran
The Code.
@cptntommy #BrightonSEO
This Required Me To Run The
Code Each Day
@cptntommy #BrightonSEO
(That I Forgot To Do)
@cptntommy #BrightonSEO
So I Went One Step Further
@cptntommy #BrightonSEO
Chucked It On A Raspberry Pi
@cptntommy #BrightonSEO
And Set Up A CronJob To
Automatically Run The Script
At The Same Time
@cptntommy #BrightonSEO
Every Day
@cptntommy #BrightonSEO
And Then
@cptntommy #BrightonSEO
(This Was The Longest Bit)
@cptntommy #BrightonSEO
Email Me If Anything Changed
@cptntommy #BrightonSEO
This Is By No Means A
Finished Product, And Is Still
An Ongoing Project
@cptntommy #BrightonSEO
These Usages Of Chrome
Puppeteer
@cptntommy #BrightonSEO
Barely Scratch The Surface Of
What Is Possible
@cptntommy #BrightonSEO
So, To Recap
@cptntommy #BrightonSEO
Today We Have Covered
@cptntommy #BrightonSEO
Headless Chrome
@cptntommy #BrightonSEO
Puppeteer
@cptntommy #BrightonSEO
Basic Scripts Using Node.js
@cptntommy #BrightonSEO
And Automation Of All Of
These To Save You Valuable
Time
@cptntommy #BrightonSEO
And Hopefully, Allow You To
@cptntommy #BrightonSEO
And Hopefully, Allow You To
@cptntommy #BrightonSEO
THANKS!
@cptntommy #BrightonSEO

More Related Content

Similar to BrightonSEO April 2019 - Tom Pool - Chrome Puppeteer, Fake Googlebot & Monitor Your Site!

To always be shipping (SPS)
To always be shipping (SPS)To always be shipping (SPS)
To always be shipping (SPS)bridgetkromhout
 
The Journey of Chaos Engineering Begins with a Single Step
The Journey of Chaos Engineering Begins with a Single StepThe Journey of Chaos Engineering Begins with a Single Step
The Journey of Chaos Engineering Begins with a Single StepBruce Wong
 
Chrome Extensions: Make Chrome Work Chrome Work for You ISTE June 28, 2015
Chrome Extensions: Make Chrome Work Chrome Work for You   ISTE June 28, 2015Chrome Extensions: Make Chrome Work Chrome Work for You   ISTE June 28, 2015
Chrome Extensions: Make Chrome Work Chrome Work for You ISTE June 28, 2015Samantha Morra
 
Blogworkshop Part 1
Blogworkshop Part 1Blogworkshop Part 1
Blogworkshop Part 1planetsab
 
Cómo usar y exprimir Chrome DevTools #dsm19
Cómo usar y exprimir Chrome DevTools #dsm19Cómo usar y exprimir Chrome DevTools #dsm19
Cómo usar y exprimir Chrome DevTools #dsm19MJ Cachón Yáñez
 

Similar to BrightonSEO April 2019 - Tom Pool - Chrome Puppeteer, Fake Googlebot & Monitor Your Site! (6)

@sugree and Twitter
@sugree and Twitter@sugree and Twitter
@sugree and Twitter
 
To always be shipping (SPS)
To always be shipping (SPS)To always be shipping (SPS)
To always be shipping (SPS)
 
The Journey of Chaos Engineering Begins with a Single Step
The Journey of Chaos Engineering Begins with a Single StepThe Journey of Chaos Engineering Begins with a Single Step
The Journey of Chaos Engineering Begins with a Single Step
 
Chrome Extensions: Make Chrome Work Chrome Work for You ISTE June 28, 2015
Chrome Extensions: Make Chrome Work Chrome Work for You   ISTE June 28, 2015Chrome Extensions: Make Chrome Work Chrome Work for You   ISTE June 28, 2015
Chrome Extensions: Make Chrome Work Chrome Work for You ISTE June 28, 2015
 
Blogworkshop Part 1
Blogworkshop Part 1Blogworkshop Part 1
Blogworkshop Part 1
 
Cómo usar y exprimir Chrome DevTools #dsm19
Cómo usar y exprimir Chrome DevTools #dsm19Cómo usar y exprimir Chrome DevTools #dsm19
Cómo usar y exprimir Chrome DevTools #dsm19
 

Recently uploaded

Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....kzayra69
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 

Recently uploaded (20)

Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....What are the key points to focus on before starting to learn ETL Development....
What are the key points to focus on before starting to learn ETL Development....
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 

BrightonSEO April 2019 - Tom Pool - Chrome Puppeteer, Fake Googlebot & Monitor Your Site!

Editor's Notes

  1. Like many of us, I’m constantly trying to find any new ways to make my (and my teams) jobs easier
  2. So this awesome guy - Eric Bidelman - is a software engineer at Google, and works on headless chrome, lighthouse & dev tools.
  3. I can use chrome puppeteer to help me with my job
  4. So I went away and did a literal shit ton of research, that is worth sharing.
  5. So I went away and did a literal shit ton of research, that is worth sharing.
  6. So I went away and did a literal shit ton of research, that is worth sharing.
  7. So I went away and did a literal shit ton of research, that is worth sharing.
  8. So I went away and did a literal shit ton of research, that is worth sharing.
  9. So I went away and did a literal shit ton of research, that is worth sharing.
  10. So I went away and did a literal shit ton of research, that is worth sharing.
  11. So I went away and did a literal shit ton of research, that is worth sharing.
  12. So I went away and did a literal shit ton of research, that is worth sharing.
  13. So I went away and did a literal shit ton of research, that is worth sharing.
  14. So, the first thing i was looking for was a basic definition.
  15. Contrary to what i wanted to believe, it did not involve any decapitation
  16. So when you open up Google Chrome normally, you get a wonderful User Interface with bookmarks
  17. And a search bar, plugins, buttons, tabs
  18. And usable functionality.
  19. With headless chrome, you get none of that shit.
  20. So here I am running headless chrome
  21. And we can see that it is in the background, but I have no Chrome windows open.
  22. So Google Chrome is Running, but with NO User Interface.
  23. SO it is running without the UX/UI head
  24. Why should you even care about this sort of stuff though?
  25. Through this research journey, I found out that you can do a bunch of stuff with it!
  26. Scrape the literal shit out of Javascript websites (as well as basic HTML scraping)
  27. You can copy the DOM, and then paste it into a text file, with which you canm
  28. Compare the source code of the site with the DOM, and then export differences. This can allow you to identify any potential rendering issues.
  29. Can use it to generate screenshots of
  30. And effectively crawl single page applications
  31. JS Can be a bit of a pain to work with, but unfortunately, it is not going away!
  32. So Screaming Frog (and a majority of crawling softwares), utilise something like headless chrome to emulate a browser, and provide JS rendering features.
  33. And we all know about issues that Google can have with crawling JS, ranging from having slight issues with rendering, to completely drawing a blank.
  34. So there have been a bunch of JS indexing and rendering case studies over the past couple of years.
  35. So it can help you crawl these guys.
  36. We can also use Headless Chrome to automate web page checks, and I provide an in depth investigation to this later on in this deck.
  37. AND it can be used for general webpage testing. Including clicking on stuff, filling in forms, general fuckery with the mouse and keyboard.
  38. It is really good for emulating user behaviour. So great for pretending to be a user, and browsing around a site.
  39. SO it is basically really great for seeing exactly how much shit a website can take before it breaks!
  40. However, the problem with running all of these tasks is
  41. You have to run basic headless chrome through the command line interface
  42. So first you gotta install some dependencies, and have a shit ton of errors hit you in the face, and you gotta know where chrome is stored on your local machine...
  43. Then you gotta run directly from that location
  44. Then specify headless chrome to launch
  45. Then open a port to use
  46. Then you gotta disable GPU
  47. Then you can add a single URL, or a URL list into the command line
  48. Now then
  49. I really really really love using command line
  50. In fact so much so that I spoke about it at Brighton last year
  51. But doing all of this shit really really really really made me wanna cry
  52. So how do I make utilsiing headless chrome, which is freaking awesome - easy?
  53. Like I said a few minutes ago, I’m always trying to find ways to make my job easier
  54. And doing all of these boring ass steps was really really not easy. At All.
  55. So I went away and did a bigger shit ton of research.
  56. So, in this talk at Google IO, Eric mentions something called Google Puppeteer ()shoutout eric
  57. So what is Chrome Puppeteer?
  58. Doing a simple Google Search for Chrome Puppeteer reveals all.
  59. But the stuff I’m interested in is this. A Node Library, and
  60. Oooooooooo an API
  61. So Node - for those that do not have dev experience, can be used for making some pretty kick-ass applications
  62. It can also be used to help control headless chrome in an easy to digest and utilise package
  63. So Node - for those that do not have dev experience, can be used for making some pretty kick-ass applications
  64. So how can you actually get chrome pupppeteer?
  65. If you want to run tests on your local machine, you have to install a few things first.
  66. Node.js - which is a runtime environment, and NPM which is a package manager for node.
  67. Chill out though, it’s fairly straightforward
  68. Someone a while ago has made this easy
  69. So If you are on PC it’s fairly simple to get and install,
  70. You’ve just gotta install these things from the Node JS website
  71. I’ve linked to a guide here - that takes you through step by step.
  72. If, like me, you are on a Mac
  73. If, like me, you are on a Mac
  74. Its not that easy.
  75. There’s a wicked awesome guide here that takes you through step by step what you need to do.
  76. So you wanna start off by opening up terminal
  77. And then typing in a few lines of shit
  78. This installs homebrew, that makes everything even ez-er
  79. This installs homebrew, that makes everything even ez-er
  80. This installs homebrew, that makes everything even ez-er
  81. So when homebrew is downloaded - it shouldnt take too long - a max of 5 mins
  82. So You Have To Install 2 More Things, And We’ll Be Ready To Rock. These are npm and node.
  83. So just type in this. It installs node through homebrew, directly onto your machine with no fuckery.
  84. So this installs node and npm, you’ll get a nice progress bar tellling you how far along it is
  85. Then you wanna use npm to install the latest version of puppeteer.
  86. Now that’s it, you are all good and groovy!
  87. You can
  88. So for example.
  89. If I wanted to take a screenshotof a single page
  90. So just type in this, and you should be good to go.
  91. So just type in this, and you should be good to go.
  92. So just type in this, and you should be good to go.
  93. You’ll need to code some stuff up - but I’ve put everything together into a single google doc, that makes it simple & easy to understand what each bit does. Exmplain that you are going to go through it.
  94. So we are starting up a headless browser, in true headless mode, so you won’t see what goes on (running in the background)
  95. And then we are opening up a new tab/page
  96. And then we specify exactly what URL we want to go to. So in this instance, we are testing the BlueArray Hoempage
  97. Then we are taking a screenshot. We have to specify 2 things to allow the code to work correctly
  98. So the path, so where and what we want the file to be saved as
  99. And then saving as a specific filetype. Can fuck around with this, and get the ideal filetype that is good for you.
  100. And then we close the page, and then close the broswer.
  101. And then we close the page, and then close the broswer.
  102. Go to terminal, make sure you are in the same folder as your code, and type in
  103. Go to terminal, make sure you are in the same folder as your code, and type in
  104. Node screenshot.js.
  105. And then a couple of seconds later, you’ll see
  106. A nice screenshot get added to your folder with your code in
  107. If you wanted to see the browser test this exactly for you,
  108. Just change the headless mode to false. This is great for seeing exactly what the browser sees, and looks pretty cool, having a chrome window doing all sorts of shit in front of you!
  109. Just change the headless mode to false. This is great for seeing exactly what the browser sees, and looks pretty cool, having a chrome window doing all sorts of shit in front of you!
  110. You can also modify the script slightly to run through a list of provided URLs
  111. And then get a bunch of screenshots!
  112. Now I’m sure that you guys can see where this is headed
  113. Faking Googlebot and seeing what they would see
  114. So with a few little tweaks to the code that we have for the first example
  115. Adding in a user agent string, and setting it to what Googlebot use
  116. FYI Googlebot user agent string is not ‘Googlebot’ it is fucking massive
  117. FYI Googlebot user agent string is not ‘Googlebot’ it is fuckinhg massive
  118. And wouldn’t fit on the slide
  119. Node screenshot.js. Screenshot.js is the name of the file.
  120. Using the await page set viewport option
  121. So we have to specify the width and the height of the viewport that we want to use
  122. This isn’t reallt Googlebot, just a decent attempt at emulation
  123. AS unfortunately
  124. As puppeteer was launched way after Chrome 41, we cannot specify it to use this version of Chrome :*(
  125. As puppeteer was launched way after Chrome 41, we cannot specify it to use this version of Chrome :*(
  126. However
  127. This can be persuasive in getting a client to ensure that their content is Rendered Server Side, as opposed to client side, if needed
  128. This can be persuasive in getting a client to ensure that their content is Rendered Server Side, as opposed to client side, if needed
  129. We can then provide a list of URLs that we want to get screenshotted
  130. And show how they would appear to Google through puppeteer rendering, instead of
  131. In the case of some rather shit JS sites
  132. Absolutely fuck all
  133. Nothing - a blank page
  134. Which is pretty cool, and allows for bulk page testing
  135. But the really cool stuff is yet to come!
  136. So who here has heard of, or even used Content King?
  137. It’s a fairly awesome piece of software
  138. That allows you to monitor a site in -real time ish,
  139. With it alerting you of any issues such as
  140. Meta data changes, New pages that 404, Updated links, redirects, indexable and non-indexable pages….
  141. However!
  142. Like most really good tools, it costs money
  143. Maybe You Don’t Wanna Eat Into Your Budget For Content King for a personal project site, or you don’t need the level of detail that those guys provide for a smaller, shitter site?
  144. This Next Example Shows How We Can Use puppeteer to
  145. Monitor a chosen site when you want, and report of any changes to key areas
  146. Including some key areas, such as
  147. Meta title changes
  148. Meta description updates
  149. Any increase or decrease in the word count of the page.
  150. Pull out any robots directives, and highlights any differences between them
  151. Any differences in canonical elements
  152. So basically the really important shit from a HTML webpage
  153. So I wrote some code So I’ll be tweeting this out after for those who are interested..
  154. As with all coding, this required a bit of research
  155. Ahem stackoverflow ahem
  156. And with a little bit of luck
  157. We now have a way to monitor these basic areas for web pages
  158. This is how it works
  159. There is about 200 lines of code in total
  160. Heres a small snapshot
  161. An i don’t have time to go through the full thing today,
  162. but
  163. There are a few really interesting snippets that I’d really like to share, that can come in handy
  164. So we launch headless chrome as highlighted a few minutes ago
  165. Like so. So we launch the browser, and then create a new page within the browser, awaiting for further instruction...
  166. And then we provide a list of URLs for Puppeteer to go and fuck around with
  167. So here we are quoting the file that we will use for this program, we parse (or read it) using a couple more lines, that don’t really look that exciting!
  168. And then we pull in teh relevant meta data that I mentioned
  169. SO, for example
  170. Gonna show you guys how we pull in meta titles
  171. So we are just pulling the title from the page. If there isn’t one - we get an error, so add in this - n/a
  172. And then create an array of all the meta data - so a nice, formatted list of data that we can use later on within the script
  173. So this just tells the script to treat all this data as one line, that we can then refer back to later
  174. And we then pushed all this data to a text file
  175. The Script then loops through every URL that is provided, pullingout all data for each
  176. It then checks for differences in the data - so compares this run with the previous one.
  177. If there are any differences between the two sets of data, these get saved within a changes.txt file
  178. That i can then check whenever
  179. So I can see what has changed from yesterday, or whenever I last ran the code
  180. This required me to run the code each day manually
  181. That I completely forgot to do
  182. So, I went one step further, to make my life even easier
  183. Chucked the code on a Raspberry Pi
  184. And set up a cron job within my local machine to automatically run the script at the same time
  185. Every day
  186. And then
  187. This was the bit that took the most amount of time by faarrr
  188. Send an email to me if there were any changes.
  189. Send an email to me if there were any changes.
  190. Imgh
  191. Imgh
  192. Imgh
  193. Imgh
  194. Imgh
  195. Imgh
  196. Imgh
  197. Imgh
  198. Imgh
  199. Imgh