SlideShare a Scribd company logo
1 of 30
Amazon Web Services
at
Mendeley
Dan Harvey
Data Architect



twitter: @danharvey
dan.harvey@mendeley.com
Overview
• What do we do?
• System design
• AWS details
• Future plans
• Summary
Mendeley helps researchers work smarter
Mendeley helps researchers work smarter




1) Install
Mendeley Desktop
Mendeley helps researchers work smarter




1) Install
Mendeley Desktop
                            Automatic data extraction




               2) Manage
            your research
                   papers
Mendeley helps researchers work smarter




1) Install
Mendeley Desktop
                            External database integration




               2) Manage
            your research
                   papers
Mendeley helps researchers work smarter




1) Install
Mendeley Desktop
                            Automatic bibliography generation




               2) Manage
            your research
                   papers
Mendeley helps researchers work smarter




1) Install
Mendeley Desktop
                            Tagging and annotation




               2) Manage
            your research
                   papers
Mendeley helps researchers work smarter


                        3) Mendeley aggregates research
                                       data in the cloud
1) Install
Mendeley Desktop




               2) Manage
            your research
                   papers
By doing this, Mendeley makes science more
collaborative and transparent
Mendeley in numbers
• 1 million users

• 130 million research articles
• 40 million unique

• 14 million unique files uploaded
• 13 TB in total
System Overview
     S3
                                                                                                  ng
            Amazon Web                                       Web             Web           S ynci
             Services                                       Server          Server
EM
  R
                                                                                           Brow
                                                                                               sing




                                                             Docs
     EC
       2




                                                                              Usage Logs
                                           MySQL

                                                    MySQL


                                                               MySQL
           Da
             ta S
                 erv
                    ice
                       s
                              Map Reduce




                                                   HB
                                                     ase               HD
                                                                         FS
File Storage
• Sync to and from clients
 –Backed onto S3

• How to render 13TB of pdfs?
PDF Previews
• Elastic Beanstalk
• Java servlet
 –Load & render
 –Store into S3
• Quick to prototype
 –Fast iterations
 –No infrastructure to set up
                                   ©   Elas%c
Beanstalk,
Ma/
Wood,
AWS,
2011

 –Developers in control
 –No upfront cost in hardware
• No dependency on rest of our infrastructure
Adapt to take advantage
• Improve delivery
 –Cloud Front
 –Faster worldwide

• Re-working for cost saving
 –SQS
 –Spot instances
 –Render when it’s cheapest!
Article Search
• 40 million papers
• Gives 40GB index in Solr

• Variable load

• Moved to EC2
 –Elastic Load Balancer
                             Two
fold
variance
in
traffic
over
a
week
 –Auto-scale instances
Solr Instance Layout
• Master
                                         Solr
 –Single instance                       Master

 –Matched to indexing load
 –Backed onto EBS
                              Solr
                                          Solr        Solr
                             Slave
                                         Slave       Slave

• Slaves
 –HTTP sync to master
 –Pre-built AMI images                  Elastic
                                     Load Balancer
 –EC2 auto scaling
Desktop Client
• Client Downloads
 –From S3
 –Adding CloudFront


• Crash Reports
 –Stack traces into S3
 –Analytic reports on top
 –More focused bug fixing
The future
• Aim to buy no more hardware

• More Java on Elastic Beanstalk
• SQS - replace queues

• EMR - log analysis
• SimpleDB & S3 for data stores
Problems Faced
• Accounting usage
 –Mix of users on account
 –Start early with this!
 –IAM helps

• Orchestration
 –Cloud Formation
 –Elastic Beanstalk
 –Finding we need more
Summary
• Not all or nothing

• Focus on your problem
       not “Undifferentiated heavy lifting”
                                  - Werner Vogels


• Learn the building blocks provided
• Modular system design helps
Mendeley Binary Battle
• $10,001 prize + $1000 aws vouchers
• Collaboration with PLoS
• Prizes to best use of the API

• Judging panel includes
 –Werner Vogels
 –Tim O'Reilly
We’re hiring
     http://mendeley.com/careers/

             or chat to me after

• Lead Mobile Developer, iOS
• Web Developer, PHP/MySQL
• Software Engineer, Java

More Related Content

Viewers also liked

Structured writing using ms word
Structured writing using ms wordStructured writing using ms word
Structured writing using ms wordWouter Verkerken
 
Mendeley Workshop Presentation
Mendeley Workshop PresentationMendeley Workshop Presentation
Mendeley Workshop PresentationSalma Patel
 
Scientific writing process
Scientific writing processScientific writing process
Scientific writing processKhalid Hakeem
 
How to write a scientific article?
How to write a scientific article?How to write a scientific article?
How to write a scientific article?Annette Gerritsen
 
Dental drugs prescription
Dental drugs prescriptionDental drugs prescription
Dental drugs prescriptionDani Firman
 
Scientific writing pro : Office word & Mendeley (dani r firman)
Scientific writing pro : Office word & Mendeley (dani r firman)Scientific writing pro : Office word & Mendeley (dani r firman)
Scientific writing pro : Office word & Mendeley (dani r firman)Dani Firman
 
Parts of a Research Paper
Parts of a Research PaperParts of a Research Paper
Parts of a Research PaperDraizelle Sexon
 

Viewers also liked (10)

Structured writing using ms word
Structured writing using ms wordStructured writing using ms word
Structured writing using ms word
 
Mendeley Workshop Presentation
Mendeley Workshop PresentationMendeley Workshop Presentation
Mendeley Workshop Presentation
 
Scientific writing process
Scientific writing processScientific writing process
Scientific writing process
 
How to write a scientific article?
How to write a scientific article?How to write a scientific article?
How to write a scientific article?
 
Dental drugs prescription
Dental drugs prescriptionDental drugs prescription
Dental drugs prescription
 
Scientific writing pro : Office word & Mendeley (dani r firman)
Scientific writing pro : Office word & Mendeley (dani r firman)Scientific writing pro : Office word & Mendeley (dani r firman)
Scientific writing pro : Office word & Mendeley (dani r firman)
 
How to Write a Thesis
How to Write a ThesisHow to Write a Thesis
How to Write a Thesis
 
Structured writing - What's it Good For?
Structured writing - What's it Good For?Structured writing - What's it Good For?
Structured writing - What's it Good For?
 
Introduction to-mendeley presentation-2014
Introduction to-mendeley presentation-2014Introduction to-mendeley presentation-2014
Introduction to-mendeley presentation-2014
 
Parts of a Research Paper
Parts of a Research PaperParts of a Research Paper
Parts of a Research Paper
 

More from Dan Harvey

Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Dan Harvey
 
Data Processing in the Work of NoSQL? An Introduction to Hadoop
Data Processing in the Work of NoSQL? An Introduction to HadoopData Processing in the Work of NoSQL? An Introduction to Hadoop
Data Processing in the Work of NoSQL? An Introduction to HadoopDan Harvey
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to HadoopDan Harvey
 
Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011Dan Harvey
 
Project Voldemort: Big data loading
Project Voldemort: Big data loadingProject Voldemort: Big data loading
Project Voldemort: Big data loadingDan Harvey
 
HBase at Mendeley
HBase at MendeleyHBase at Mendeley
HBase at MendeleyDan Harvey
 

More from Dan Harvey (6)

Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
 
Data Processing in the Work of NoSQL? An Introduction to Hadoop
Data Processing in the Work of NoSQL? An Introduction to HadoopData Processing in the Work of NoSQL? An Introduction to Hadoop
Data Processing in the Work of NoSQL? An Introduction to Hadoop
 
An Introduction to Hadoop
An Introduction to HadoopAn Introduction to Hadoop
An Introduction to Hadoop
 
Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011Overview of Hadoop in 2010 and what's coming up in 2011
Overview of Hadoop in 2010 and what's coming up in 2011
 
Project Voldemort: Big data loading
Project Voldemort: Big data loadingProject Voldemort: Big data loading
Project Voldemort: Big data loading
 
HBase at Mendeley
HBase at MendeleyHBase at Mendeley
HBase at Mendeley
 

Recently uploaded

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Amazon Web Services at Mendeley

  • 1. Amazon Web Services at Mendeley Dan Harvey Data Architect twitter: @danharvey dan.harvey@mendeley.com
  • 2. Overview • What do we do? • System design • AWS details • Future plans • Summary
  • 4. Mendeley helps researchers work smarter 1) Install Mendeley Desktop
  • 5. Mendeley helps researchers work smarter 1) Install Mendeley Desktop Automatic data extraction 2) Manage your research papers
  • 6. Mendeley helps researchers work smarter 1) Install Mendeley Desktop External database integration 2) Manage your research papers
  • 7. Mendeley helps researchers work smarter 1) Install Mendeley Desktop Automatic bibliography generation 2) Manage your research papers
  • 8. Mendeley helps researchers work smarter 1) Install Mendeley Desktop Tagging and annotation 2) Manage your research papers
  • 9. Mendeley helps researchers work smarter 3) Mendeley aggregates research data in the cloud 1) Install Mendeley Desktop 2) Manage your research papers
  • 10. By doing this, Mendeley makes science more collaborative and transparent
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18. Mendeley in numbers • 1 million users • 130 million research articles • 40 million unique • 14 million unique files uploaded • 13 TB in total
  • 19. System Overview S3 ng Amazon Web Web Web S ynci Services Server Server EM R Brow sing Docs EC 2 Usage Logs MySQL MySQL MySQL Da ta S erv ice s Map Reduce HB ase HD FS
  • 20. File Storage • Sync to and from clients –Backed onto S3 • How to render 13TB of pdfs?
  • 21. PDF Previews • Elastic Beanstalk • Java servlet –Load & render –Store into S3 • Quick to prototype –Fast iterations –No infrastructure to set up © Elas%c
Beanstalk,
Ma/
Wood,
AWS,
2011 –Developers in control –No upfront cost in hardware • No dependency on rest of our infrastructure
  • 22. Adapt to take advantage • Improve delivery –Cloud Front –Faster worldwide • Re-working for cost saving –SQS –Spot instances –Render when it’s cheapest!
  • 23. Article Search • 40 million papers • Gives 40GB index in Solr • Variable load • Moved to EC2 –Elastic Load Balancer Two
fold
variance
in
traffic
over
a
week –Auto-scale instances
  • 24. Solr Instance Layout • Master Solr –Single instance Master –Matched to indexing load –Backed onto EBS Solr Solr Solr Slave Slave Slave • Slaves –HTTP sync to master –Pre-built AMI images Elastic Load Balancer –EC2 auto scaling
  • 25. Desktop Client • Client Downloads –From S3 –Adding CloudFront • Crash Reports –Stack traces into S3 –Analytic reports on top –More focused bug fixing
  • 26. The future • Aim to buy no more hardware • More Java on Elastic Beanstalk • SQS - replace queues • EMR - log analysis • SimpleDB & S3 for data stores
  • 27. Problems Faced • Accounting usage –Mix of users on account –Start early with this! –IAM helps • Orchestration –Cloud Formation –Elastic Beanstalk –Finding we need more
  • 28. Summary • Not all or nothing • Focus on your problem not “Undifferentiated heavy lifting” - Werner Vogels • Learn the building blocks provided • Modular system design helps
  • 29. Mendeley Binary Battle • $10,001 prize + $1000 aws vouchers • Collaboration with PLoS • Prizes to best use of the API • Judging panel includes –Werner Vogels –Tim O'Reilly
  • 30. We’re hiring http://mendeley.com/careers/ or chat to me after • Lead Mobile Developer, iOS • Web Developer, PHP/MySQL • Software Engineer, Java

Editor's Notes

  1. \n
  2. \n
  3. as\n
  4. as\n
  5. as\n
  6. as\n
  7. as\n
  8. as\n
  9. as\n
  10. as\n
  11. as\n
  12. as\n
  13. as\n
  14. as\n
  15. as\n
  16. as\n
  17. as\n
  18. as\n
  19. as\n
  20. as\n
  21. as\n
  22. as\n
  23. as\n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n