SlideShare a Scribd company logo
1 of 33
BuzzNumbers Presentation Moving From SQL Server to MongoDB
Todays Presentation Problems faced with Social Media Monitoring/Analytics Why choose NoSQL over SQL  Why choose MongoDB NOSQL Vs SQL Schema Design Infinite scalability with commodity hardware & .NET Why we still use .NET (why not Ruby/Java/Python) Lessons Learned
NOSQL at BuzzNumbers About BuzzNumbers
About BuzzNumbers SaaSWeb Product Company Web and Social Media Analytics  Collect “big data”web content Near-Realtime data capture News, Blogs, Social Mediaetc Scraping, API’s, Feeds Analytics & Business Intelligence BI, Text, Sentiment, Locations, NLP, Machine Learning
BuzzNumbers Project Team  Nick Holmes a Court - @nickhac Brett Anderson - @brehtt Steve Casey - @stevencasey Jacinto Santamaria Chris Fulstow - @chrisfulstow Josie Kidd - @jose9
NOSQL at BuzzNumbers Problems Faced at BuzzNumbers
Problems faced at BuzzNumbers  Large and fast growing DB Tables Lots of Read/Writes from data collection 24/7  Massive Table Scans for user reports (< 3 sec SLA)  Large Joins (10+ Tables) with Nested Views  Complex Queries (Aggregates, Where’s, FullText)  FullText Search Indexes needed real-time updates   Read/Write Contention    Rapid Index fragmentation, Slow rebuilds   DB Locks occurring (with no implicit Transactions)  Blocking Transactions (both small/large tables)
Outgrew SQL Server Enterprise 2008 “Free” Software from MSFT from BizSpark  Tried everything with SQL Enterprise Significant SQL Performance Tuning   Dirty Reads (nolock), Offline Index Rebuilds Replication / Clustering / Multi-Instance  Problems  Schema changes impossible with uptime requirements DBA tasks made system unavailable for hours/days Hardware / SQL DBA got very expensive  Web users experienced annoying / unnecessary waits on blocked queries that were non-complex because of joins
BuzzNumbers NOSQL Presentation Why NOSQL over SQL
What is NOSQL  New generation of “Databases”  “Not Only SQL”  - Mostly Open Source   NOSQL Distributed database designed to deliver  Distributed “Big Data” storage  Distributed processing of queries/calculations  NOSQL Examples include Google– BigTable Yahoo -Hadoop (30k+ Nodes) Facebook - Cassandra FourSquare - MongoDB
Why NoSQL over SQL SQL  Guaranteed consistency Transactions Schemas / DataTypes Joins / Foreign Keys TSQL/PL-SQL (Views, Procs) Scale Up (hardware) Many Benefits including Ease of use Many developers skilled in SQL Trusted for decades / Proven NoSQL Eventual Consistency No Transaction Support Key/Value Data (mostly) Flat Data (no joins) Key Lookups / MapReduce / Code Scale out (distributed) Many Benefits including Performance / Scale Lower license costs Solves Web2 problems
Why NoSQL over SQL CAP Theorem  Consistency Availability Partitioning Only 2 of 3 are Possible Consistency/Availability  RDBMS Availability / Partitioning  NOSQL Consistency / Partitioning  Availability Issues (No one wants this)
BuzzNumbers NOSQL Presentation  Why MongoDB for NOSQL?
NOSQL Providers
Who uses Mongo?
Why Mongo  Proven for multiple usage scenarios High performance (eventual consistency)   Data stored in JSON (not only Key/Value) Supports Multiple Indexes (Anywhere in JSON) Easy to Install, Easy to Use(Linux/Windows) Easy to Scale for High Volume Writes (Sharding) Easy to Scale for High Volume Reads (Replica Sets) Automatic Failover and Redundancy (Replica Sets) REST Interface and Drivers for Ruby/.NET/Java/Etc Easy to Query via multiple techniques Key/Value, Mongo Query, JavaScript, MapReduce
BuzzNumbers NOSQL Presentation  Moving from SQL Schema to No-Schema
BuzzNumbers NOSQL Presentation  RDMBS Schema (Tables) Mongo Collection (JSON)
BuzzNumbers NOSQL Presentation  RDMBS Schema Mongo JSON Document
BuzzNumbers NOSQL Presentation  RDMBS Schema Mongo JSON Document One Document Per Website Per Day
BuzzNumbers NOSQL Presentation  RDMBS Schema Mongo JSON Document Pre-Aggregate SUM/COUNT/AVG Calculations using UPSERT
BuzzNumbers NOSQL Presentation  RDMBS Schema Mongo JSON Document Store Line Items with rich data as Nested Arrays . Use JavaScript or MapReduce to Query
Basic SQL vs Mongo Syntax Select * from Clients db.clients.find() Select * from Clients where clientid = 1 db.clients.find({”ClientID” :1}) Insert into clients (ClientID, Name) Values (1, “ACME”) db.clients.ìnsert({”ClientID” :1,”Name”:”ACME” }) Create Table / Alter Table  Just start inserting db.client.insert({JSON HERE}) Create Index db.clients.ensureIndex({“ClientID”:1, “Name”:1})
Basic SQL vs Mongo Syntax Select * from Clients db.clients.find() Select * from Clients where clientid = 1 db.clients.find({”ClientID” :1}) Insert into clients (ClientID, Name) Values (“ACME”, 1) db.clients.ìnsert({”ClientID” :1,”Name”:”ACME” }) Create Table  Just start inserting Create Index db.clients.ensureIndex({“ClientID”:1, “Name”:1})
BuzzNumbers NOSQL Presentation  Infinite Scale with .NET and NOSQL
Infinite Scale with .NET  Use .NET for Rapid Product Development  Web Applications (IIS, ASP.NET, User Databases)  Server Applications (Scraping, Apps, Services, Data) Scheduled Tasks / Backend Jobs  Use Open Source for Infinite Scale on Linux MongoDB for Big Data Storage   SOLR (distributed Lucene) for Full Text Indexing .NET Drivers Available for Mongo/SOLR
Infinite Scale with .NET  Cloud Hosting for Low Cost Scale  Rackspace Cloud ($200 p/m per 4GB-RAM server)  Windows and Ubuntu – Image/Clone/API support Zabbix Monitoring – notify when near capacity  Amazon/Heroku/dotCloud alternates  Tips to deliver fantastic performance at scale  Indexes MUST fit in RAM (Disk Reads are Slow) SSD’s HardDisks are worth the extra price 4GB RAM / 160GB Disk seems to be optimum price/performance per node in distributed system
BuzzNumbers NOSQL Presentation  Why we stay with .NET?
Why we stay with .NET  Visual Studio best IDE!!! SQL Server great database for most Data  Proven Tech Stack (low corporate risk)    Lots of support (MSFT and Consultants)  Large online community with code samples  Many Open Source libraries   ASP.NET MVC RAZOR is RAD Non-Complex Sysadmin for Windows Servers  Drivers/Integration available for most OSS Projects  Lots of Agile/Scrum/TDD/CI/Project Management tools  Lots of smart .NET web developers & engineers
BuzzNumbers NOSQL Presentation  Lessons Learned
Lessons Learned “Big Data” is not 100M records: but 1BN+  Don’t scale until you need to (Premature optimisation costs - big time) SQL RBDMS solves most problems but Scale up costs are prohibitive for startups so plan in advance when you might need to switch Mixing SQL for SmallData and NOSQL for BigData delivers both ease/speed of development and performance Mongo/SOLR works well to solve specific performance problems  Not all problems are equal: optimiseeach solution per performance problem Don’t go NOSQL unless you absolutely need to Very early technology with lots of learning overhead, risks and production issues Skilled .NET/Mongo/SOLR engineers are very  hard to find If client/data segmentation is possible, multiple SQL instances can deliver Ensure Indexes fit in Memory Spend time planning your schema in advances based on query requirements
BuzzNumbers NOSQL Presentation  Interested to learn more?
Thanks for your time  Speak with one of the Buzz Team tonight  Join our Team? We’re Hiring! Web Developers Software Engineers UX / Web Designers Immediate and Future roles… Talk to us!

More Related Content

What's hot

Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
TO THE NEW | Technology
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
DATAVERSITY
 

What's hot (20)

Sizing Your MongoDB Cluster
Sizing Your MongoDB ClusterSizing Your MongoDB Cluster
Sizing Your MongoDB Cluster
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Real-Time Streaming Data on AWS
Real-Time Streaming Data on AWSReal-Time Streaming Data on AWS
Real-Time Streaming Data on AWS
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Presto: SQL-on-anything
Presto: SQL-on-anythingPresto: SQL-on-anything
Presto: SQL-on-anything
 
Webinar: MongoDB Schema Design and Performance Implications
Webinar: MongoDB Schema Design and Performance ImplicationsWebinar: MongoDB Schema Design and Performance Implications
Webinar: MongoDB Schema Design and Performance Implications
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
 
Snowflake for Data Engineering
Snowflake for Data EngineeringSnowflake for Data Engineering
Snowflake for Data Engineering
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Data Vault and DW2.0
Data Vault and DW2.0Data Vault and DW2.0
Data Vault and DW2.0
 
introduction à MongoDB
introduction à MongoDBintroduction à MongoDB
introduction à MongoDB
 
Real-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon KinesisReal-Time Streaming: Intro to Amazon Kinesis
Real-Time Streaming: Intro to Amazon Kinesis
 
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
Deep Dive on PostgreSQL Databases on Amazon RDS (DAT324) - AWS re:Invent 2018
 
AWS RDS Benchmark - Instance comparison
AWS RDS Benchmark - Instance comparisonAWS RDS Benchmark - Instance comparison
AWS RDS Benchmark - Instance comparison
 
Sqoop
SqoopSqoop
Sqoop
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Mongo DB
Mongo DB Mongo DB
Mongo DB
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 

Viewers also liked

Introduccion en Titanium Appcelerator - Dan Tamas Betabeers Oviedo
Introduccion en Titanium Appcelerator - Dan Tamas Betabeers Oviedo  Introduccion en Titanium Appcelerator - Dan Tamas Betabeers Oviedo
Introduccion en Titanium Appcelerator - Dan Tamas Betabeers Oviedo
dan_tamas
 

Viewers also liked (20)

Transitioning from SQL to MongoDB
Transitioning from SQL to MongoDBTransitioning from SQL to MongoDB
Transitioning from SQL to MongoDB
 
Migration from SQL to MongoDB - A Case Study at TheKnot.com
Migration from SQL to MongoDB - A Case Study at TheKnot.com Migration from SQL to MongoDB - A Case Study at TheKnot.com
Migration from SQL to MongoDB - A Case Study at TheKnot.com
 
Scala with mongodb
Scala with mongodbScala with mongodb
Scala with mongodb
 
Introduccion en Titanium Appcelerator - Dan Tamas Betabeers Oviedo
Introduccion en Titanium Appcelerator - Dan Tamas Betabeers Oviedo  Introduccion en Titanium Appcelerator - Dan Tamas Betabeers Oviedo
Introduccion en Titanium Appcelerator - Dan Tamas Betabeers Oviedo
 
Speed up your apps by Dan Tamas TiConf 2013
Speed up your apps by Dan Tamas TiConf 2013Speed up your apps by Dan Tamas TiConf 2013
Speed up your apps by Dan Tamas TiConf 2013
 
MongoDB
MongoDBMongoDB
MongoDB
 
Gizli Tehlike : AntiPatterns
Gizli Tehlike : AntiPatternsGizli Tehlike : AntiPatterns
Gizli Tehlike : AntiPatterns
 
NoSQL - Yazılımcı Bakışıyla
NoSQL - Yazılımcı BakışıylaNoSQL - Yazılımcı Bakışıyla
NoSQL - Yazılımcı Bakışıyla
 
Benefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSsBenefits of Using MongoDB Over RDBMSs
Benefits of Using MongoDB Over RDBMSs
 
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB Days UK: Using MongoDB and Python for Data Analysis PipelinesMongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
MongoDB Days UK: Using MongoDB and Python for Data Analysis Pipelines
 
İlişkisel Veritabanı Sistemleri ve NoSQL
İlişkisel Veritabanı Sistemleri ve NoSQLİlişkisel Veritabanı Sistemleri ve NoSQL
İlişkisel Veritabanı Sistemleri ve NoSQL
 
MongoDB Aggregations Indexing and Profiling
MongoDB Aggregations Indexing and ProfilingMongoDB Aggregations Indexing and Profiling
MongoDB Aggregations Indexing and Profiling
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
Introduction aux bases de données NoSQL
Introduction aux bases de données NoSQLIntroduction aux bases de données NoSQL
Introduction aux bases de données NoSQL
 
MongoDB: la BBDD NoSQL más popular del mercado
MongoDB: la BBDD NoSQL más popular del mercadoMongoDB: la BBDD NoSQL más popular del mercado
MongoDB: la BBDD NoSQL más popular del mercado
 
Data visualization in python/Django
Data visualization in python/DjangoData visualization in python/Django
Data visualization in python/Django
 
Indexing and Query Optimizer (Richard Kreuter)
Indexing and Query Optimizer (Richard Kreuter)Indexing and Query Optimizer (Richard Kreuter)
Indexing and Query Optimizer (Richard Kreuter)
 
Indexing
IndexingIndexing
Indexing
 
MongoDB Workshop
MongoDB WorkshopMongoDB Workshop
MongoDB Workshop
 
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
AWS re:Invent 2016| DAT318 | Migrating from RDBMS to NoSQL: How Sony Moved fr...
 

Similar to Moving from SQL Server to MongoDB

Similar to Moving from SQL Server to MongoDB (20)

Introduction to PolyBase
Introduction to PolyBaseIntroduction to PolyBase
Introduction to PolyBase
 
If NoSQL is your answer, you are probably asking the wrong question.
If NoSQL is your answer, you are probably asking the wrong question.If NoSQL is your answer, you are probably asking the wrong question.
If NoSQL is your answer, you are probably asking the wrong question.
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
 
NoSQL
NoSQLNoSQL
NoSQL
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
 
SQL or NoSQL, is this the question? - George Grammatikos
SQL or NoSQL, is this the question? - George GrammatikosSQL or NoSQL, is this the question? - George Grammatikos
SQL or NoSQL, is this the question? - George Grammatikos
 
Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)Azure Synapse Analytics Overview (r1)
Azure Synapse Analytics Overview (r1)
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDB
 
BI 2008 Simple
BI 2008 SimpleBI 2008 Simple
BI 2008 Simple
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
 
Big Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI MobileBig Data Analytics from Azure Cloud to Power BI Mobile
Big Data Analytics from Azure Cloud to Power BI Mobile
 
Considerations for using NoSQL technology on your next IT project - Akmal Cha...
Considerations for using NoSQL technology on your next IT project - Akmal Cha...Considerations for using NoSQL technology on your next IT project - Akmal Cha...
Considerations for using NoSQL technology on your next IT project - Akmal Cha...
 
Introduction to NoSQL Database
Introduction to NoSQL DatabaseIntroduction to NoSQL Database
Introduction to NoSQL Database
 
Building Cloud-Native Applications with Microsoft Windows Azure
Building Cloud-Native Applications with Microsoft Windows AzureBuilding Cloud-Native Applications with Microsoft Windows Azure
Building Cloud-Native Applications with Microsoft Windows Azure
 
Windows Azure and a little SQL Data Services
Windows Azure and a little SQL Data ServicesWindows Azure and a little SQL Data Services
Windows Azure and a little SQL Data Services
 
MinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with CassandraMinneBar 2013 - Scaling with Cassandra
MinneBar 2013 - Scaling with Cassandra
 
Making your RDBMS fast!
Making your RDBMS fast! Making your RDBMS fast!
Making your RDBMS fast!
 
It ready dw_day3_rev00
It ready dw_day3_rev00It ready dw_day3_rev00
It ready dw_day3_rev00
 
JS App Architecture
JS App ArchitectureJS App Architecture
JS App Architecture
 

More from Nick Court (8)

GoodCall - Getting to your first 100k MRR with Outbound Sales
GoodCall - Getting to your first 100k MRR with Outbound SalesGoodCall - Getting to your first 100k MRR with Outbound Sales
GoodCall - Getting to your first 100k MRR with Outbound Sales
 
BuzzNumbers @ Accelerating Asia Summit (Singapore Dec 2010)
BuzzNumbers @ Accelerating Asia Summit (Singapore Dec 2010)BuzzNumbers @ Accelerating Asia Summit (Singapore Dec 2010)
BuzzNumbers @ Accelerating Asia Summit (Singapore Dec 2010)
 
Web 3.0 and The Future of Social Media - BuzzNumbers Presentation
Web 3.0 and The Future of Social Media - BuzzNumbers PresentationWeb 3.0 and The Future of Social Media - BuzzNumbers Presentation
Web 3.0 and The Future of Social Media - BuzzNumbers Presentation
 
BuzzNumbers - Where are the crowds online? (New Media Summit Presentation)
BuzzNumbers - Where are the crowds online? (New Media Summit Presentation)BuzzNumbers - Where are the crowds online? (New Media Summit Presentation)
BuzzNumbers - Where are the crowds online? (New Media Summit Presentation)
 
Why Social Media matters to Professional Services
Why Social Media matters to Professional ServicesWhy Social Media matters to Professional Services
Why Social Media matters to Professional Services
 
Buzz Numbers Actionable Business Intelligence From Online Conversations
Buzz Numbers   Actionable Business Intelligence From Online ConversationsBuzz Numbers   Actionable Business Intelligence From Online Conversations
Buzz Numbers Actionable Business Intelligence From Online Conversations
 
Buzz Numbers Mumbrella Presentation
Buzz Numbers Mumbrella PresentationBuzz Numbers Mumbrella Presentation
Buzz Numbers Mumbrella Presentation
 
How Social Networks are changing consumer behavior
How Social Networks are changing consumer behaviorHow Social Networks are changing consumer behavior
How Social Networks are changing consumer behavior
 

Recently uploaded

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Moving from SQL Server to MongoDB

  • 1. BuzzNumbers Presentation Moving From SQL Server to MongoDB
  • 2. Todays Presentation Problems faced with Social Media Monitoring/Analytics Why choose NoSQL over SQL Why choose MongoDB NOSQL Vs SQL Schema Design Infinite scalability with commodity hardware & .NET Why we still use .NET (why not Ruby/Java/Python) Lessons Learned
  • 3. NOSQL at BuzzNumbers About BuzzNumbers
  • 4. About BuzzNumbers SaaSWeb Product Company Web and Social Media Analytics Collect “big data”web content Near-Realtime data capture News, Blogs, Social Mediaetc Scraping, API’s, Feeds Analytics & Business Intelligence BI, Text, Sentiment, Locations, NLP, Machine Learning
  • 5. BuzzNumbers Project Team Nick Holmes a Court - @nickhac Brett Anderson - @brehtt Steve Casey - @stevencasey Jacinto Santamaria Chris Fulstow - @chrisfulstow Josie Kidd - @jose9
  • 6. NOSQL at BuzzNumbers Problems Faced at BuzzNumbers
  • 7. Problems faced at BuzzNumbers Large and fast growing DB Tables Lots of Read/Writes from data collection 24/7 Massive Table Scans for user reports (< 3 sec SLA) Large Joins (10+ Tables) with Nested Views Complex Queries (Aggregates, Where’s, FullText) FullText Search Indexes needed real-time updates Read/Write Contention Rapid Index fragmentation, Slow rebuilds DB Locks occurring (with no implicit Transactions) Blocking Transactions (both small/large tables)
  • 8. Outgrew SQL Server Enterprise 2008 “Free” Software from MSFT from BizSpark Tried everything with SQL Enterprise Significant SQL Performance Tuning Dirty Reads (nolock), Offline Index Rebuilds Replication / Clustering / Multi-Instance Problems Schema changes impossible with uptime requirements DBA tasks made system unavailable for hours/days Hardware / SQL DBA got very expensive Web users experienced annoying / unnecessary waits on blocked queries that were non-complex because of joins
  • 9. BuzzNumbers NOSQL Presentation Why NOSQL over SQL
  • 10. What is NOSQL New generation of “Databases” “Not Only SQL” - Mostly Open Source NOSQL Distributed database designed to deliver Distributed “Big Data” storage Distributed processing of queries/calculations NOSQL Examples include Google– BigTable Yahoo -Hadoop (30k+ Nodes) Facebook - Cassandra FourSquare - MongoDB
  • 11. Why NoSQL over SQL SQL Guaranteed consistency Transactions Schemas / DataTypes Joins / Foreign Keys TSQL/PL-SQL (Views, Procs) Scale Up (hardware) Many Benefits including Ease of use Many developers skilled in SQL Trusted for decades / Proven NoSQL Eventual Consistency No Transaction Support Key/Value Data (mostly) Flat Data (no joins) Key Lookups / MapReduce / Code Scale out (distributed) Many Benefits including Performance / Scale Lower license costs Solves Web2 problems
  • 12. Why NoSQL over SQL CAP Theorem Consistency Availability Partitioning Only 2 of 3 are Possible Consistency/Availability RDBMS Availability / Partitioning NOSQL Consistency / Partitioning Availability Issues (No one wants this)
  • 13. BuzzNumbers NOSQL Presentation Why MongoDB for NOSQL?
  • 16. Why Mongo Proven for multiple usage scenarios High performance (eventual consistency) Data stored in JSON (not only Key/Value) Supports Multiple Indexes (Anywhere in JSON) Easy to Install, Easy to Use(Linux/Windows) Easy to Scale for High Volume Writes (Sharding) Easy to Scale for High Volume Reads (Replica Sets) Automatic Failover and Redundancy (Replica Sets) REST Interface and Drivers for Ruby/.NET/Java/Etc Easy to Query via multiple techniques Key/Value, Mongo Query, JavaScript, MapReduce
  • 17. BuzzNumbers NOSQL Presentation Moving from SQL Schema to No-Schema
  • 18. BuzzNumbers NOSQL Presentation RDMBS Schema (Tables) Mongo Collection (JSON)
  • 19. BuzzNumbers NOSQL Presentation RDMBS Schema Mongo JSON Document
  • 20. BuzzNumbers NOSQL Presentation RDMBS Schema Mongo JSON Document One Document Per Website Per Day
  • 21. BuzzNumbers NOSQL Presentation RDMBS Schema Mongo JSON Document Pre-Aggregate SUM/COUNT/AVG Calculations using UPSERT
  • 22. BuzzNumbers NOSQL Presentation RDMBS Schema Mongo JSON Document Store Line Items with rich data as Nested Arrays . Use JavaScript or MapReduce to Query
  • 23. Basic SQL vs Mongo Syntax Select * from Clients db.clients.find() Select * from Clients where clientid = 1 db.clients.find({”ClientID” :1}) Insert into clients (ClientID, Name) Values (1, “ACME”) db.clients.ìnsert({”ClientID” :1,”Name”:”ACME” }) Create Table / Alter Table Just start inserting db.client.insert({JSON HERE}) Create Index db.clients.ensureIndex({“ClientID”:1, “Name”:1})
  • 24. Basic SQL vs Mongo Syntax Select * from Clients db.clients.find() Select * from Clients where clientid = 1 db.clients.find({”ClientID” :1}) Insert into clients (ClientID, Name) Values (“ACME”, 1) db.clients.ìnsert({”ClientID” :1,”Name”:”ACME” }) Create Table Just start inserting Create Index db.clients.ensureIndex({“ClientID”:1, “Name”:1})
  • 25. BuzzNumbers NOSQL Presentation Infinite Scale with .NET and NOSQL
  • 26. Infinite Scale with .NET Use .NET for Rapid Product Development Web Applications (IIS, ASP.NET, User Databases) Server Applications (Scraping, Apps, Services, Data) Scheduled Tasks / Backend Jobs Use Open Source for Infinite Scale on Linux MongoDB for Big Data Storage SOLR (distributed Lucene) for Full Text Indexing .NET Drivers Available for Mongo/SOLR
  • 27. Infinite Scale with .NET Cloud Hosting for Low Cost Scale Rackspace Cloud ($200 p/m per 4GB-RAM server) Windows and Ubuntu – Image/Clone/API support Zabbix Monitoring – notify when near capacity Amazon/Heroku/dotCloud alternates Tips to deliver fantastic performance at scale Indexes MUST fit in RAM (Disk Reads are Slow) SSD’s HardDisks are worth the extra price 4GB RAM / 160GB Disk seems to be optimum price/performance per node in distributed system
  • 28. BuzzNumbers NOSQL Presentation Why we stay with .NET?
  • 29. Why we stay with .NET Visual Studio best IDE!!! SQL Server great database for most Data Proven Tech Stack (low corporate risk) Lots of support (MSFT and Consultants) Large online community with code samples Many Open Source libraries ASP.NET MVC RAZOR is RAD Non-Complex Sysadmin for Windows Servers Drivers/Integration available for most OSS Projects Lots of Agile/Scrum/TDD/CI/Project Management tools Lots of smart .NET web developers & engineers
  • 31. Lessons Learned “Big Data” is not 100M records: but 1BN+ Don’t scale until you need to (Premature optimisation costs - big time) SQL RBDMS solves most problems but Scale up costs are prohibitive for startups so plan in advance when you might need to switch Mixing SQL for SmallData and NOSQL for BigData delivers both ease/speed of development and performance Mongo/SOLR works well to solve specific performance problems Not all problems are equal: optimiseeach solution per performance problem Don’t go NOSQL unless you absolutely need to Very early technology with lots of learning overhead, risks and production issues Skilled .NET/Mongo/SOLR engineers are very hard to find If client/data segmentation is possible, multiple SQL instances can deliver Ensure Indexes fit in Memory Spend time planning your schema in advances based on query requirements
  • 32. BuzzNumbers NOSQL Presentation Interested to learn more?
  • 33. Thanks for your time Speak with one of the Buzz Team tonight Join our Team? We’re Hiring! Web Developers Software Engineers UX / Web Designers Immediate and Future roles… Talk to us!

Editor's Notes

  1. {&quot;WebsiteID&quot;: 12345,&quot;DomainName&quot;:&quot;buzznumbershq.com&quot;,&quot;DateSummary&quot;: &quot;2011-09-22&quot;,&quot;UserIDSummary&quot;:[1,2,3,4,5,6,7,8]&quot;PageVisitSummary&quot;:{ &quot;Home&quot;: [&quot;VisitCount&quot;: 20000, &quot;Uniques&quot;:55], &quot;About&quot;: [&quot;VisitCount&quot;: 1667, &quot;Uniques&quot;:44], &quot;Products&quot;: [&quot;VisitCount&quot;: 1223, &quot;Uniques&quot;:33], &quot;Contact&quot;: [&quot;VisitCount&quot;: 50, &quot;Uniques&quot;:22]},&quot;PageVisits&quot;:{ &quot;PageVisit&quot;: [&quot;UserID&quot;:1, &quot;PageName&quot;:&quot;Home&quot;], &quot;PageVisit&quot;: [&quot;UserID&quot;:2, &quot;PageName&quot;:&quot;About&quot;], &quot;PageVisit&quot;: [&quot;UserID&quot;:3, &quot;PageName&quot;:&quot;Products&quot;], &quot;PageVisit&quot;: [&quot;UserID&quot;:4, &quot;PageName&quot;:&quot;Contact&quot;],etcetc } } Proven Tech Stack (low risk) Lots of smart web developers/engineers Visual Studio best IDE by Miles Lots of support (MSFT and Consultants) Large online community with code samples Many Open Source libraries ASP.NET MVC RAZOR is RAD Low levels of SysAdmin Drivers/Integration available for most OSS Lots of Agile/Scrum/TDD/CI/Project Management tools