SlideShare a Scribd company logo
1 of 49
Download to read offline
© 2019 Frédéric G. MARAND - licensed under a Creative Commons Attribution 4.0 International License.
Scaling up and accelerating Drupal 8 with NoSQL
Frédéric G. MARAND
drupal.org: fgm - irc/twitter: @osinet
<MongoDB module maintainer />
NoSQL
Topic ?
Simple idea: “No SQL”
● Alternate storage engines: KV, Structures, Document,
Graph, Columnar…
● No standard, often no fixed schema, no joins, no FKs
● → Engine-specific application design
● Drupal architecture ?
Evolved idea: Not Only SQL
● For engines, add equivalent features to SQL
● For Drupal, combine SQL et NoSQL solutions
● Start from the default SQL-based architecture
● Offload services to non-SQL implementations
○ front-end caches, search engines, queue servers
○ specialized storage: cache, KV, lock, sessions…
● Often involves NoSQL as cache for SQL
espace 1 espace 2
NOSQL: do you need it ?
● Start by observing the current state
○ Database queries → devel + webprofiler
○ Cache → heisencache (D7), webprofiler (D8)
○ Build cacheability → renderviz
● Observe behaviour
○ Core observability built-in: DBTNG logging, cache decorators, QueryInterface for KV, config, content…
○ Monitoring module (400 sites) by Karan Poddar (Google SoC) and MD Systems
○ Add your choice of time-series store (e.g. Prometheus, InfluxDB) and UI (e.g. Grafana)
○ ⇨ Use it !
● You want to see this when it happens ⟶
“ “
Peter Drucker
If you can’t
measure it, you
can’t improve it.
Fixing an identified problem is cheaper than “trying things”
Fix from acquired information
● It /MAY/ involve taking queries off the main DB to a NoSQL solution
● But poorly configured NoSQL may make it worse.
“Just do it” ?
● Drupal is built on SQL:
○ Views depends on it by default
○ Most sites rely on Views data model awareness
○ → Contrib often assumes SQL, injects @database
○ NoSQL support doable, rarely done
● Contrib support level is limited
○ Most NoSQL contrib not ported from D7 to D8
○ Drupalshop knowledge limited except biggest or
specialized
○ Products may die… e.g. RethinkDB
● Pro support from publishers = costs. Availability.
● Extra support needed = costs
NoSQL == added build costs
→ balance gains vs costs
Example case: RethinkDB
At DevDays Milan 2016, after lots of work, Gizra’s @RoySegall
demoed a Drupal 8 ORM/ODM for RethinkDB.
Then, this happened...
“ “
http://www.commitstrip.com/en/2012/04/10/what-do-you-mean-its-oversized
Do you really need it ?
Front caching
Caching ahead of real work
Default situation with SQL
● Browser caching, limited
● Internal / dynamic page cache in main SQL DB
● Need DB connection, a few SELECT queries
● Fetch cache from DB
● All data from main storage
● ⇨ Serve cached pages in about 20 msec
All this work makes DoS-ing comparatively cheap.
NoSQL improvements
● Add caching ahead of site itself
○ Browser
■ Optimized browser caching (Cache-Control)
■ PWA: use browser local storage
○ CDN
■ CDN module (2k sites)
■ Akamai module (600 sites)
■ ⇨ Serve cached pages in about 15 msec (TTFB)
■ Web-scale
○ Varnish and other reverse proxies
■ ⇨ Serve cached pages in about 10 msec (TTFB)
■ Core support
■ Varnish Purger (3k sites)
● ⇨ Most request will mean 0 SQL queries
○ DoS-ing more costly, especially with CDN
● Move page caches off main DB: next section
Choices
Storage
Storage: the “Big 3”
The most active NoSQL suites for Drupal 8.x
Redis
● Type: Key-value (structure server)
● Module
○ redis
● DB-Engines ranking:
○ #1 Key-value store
● Usage
○ Drupal 7: 10k sites
○ Drupal 8: 10k sites
● Supported by
○ Drupal 7: Makina Corpus
○ Drupal 8: MD Systems
Memcached
● Type: Key-value
● Module
○ memcache
● DB-Engines ranking:
○ #3 Key-value store
○ #5 Key-value store (Hazelcast)
● Usage (memcache_storage)
○ Drupal 7: 32k (2k) sites
○ Drupal 8: 15k (800) sites
● Supported by:
○ Acquia
○ Tag1 Consulting
MongoDB / CosmosDB
● Type: Document store
● Module
○ mongodb
● DB-Engines ranking:
○ #1 Document store (MongoDB)
○ #4 Document store (CosmosDB)
● Usage
○ Drupal 7: 300 sites
○ Drupal 8: 50 sites
● Supported by
○ OSInet
Redis
https://www.drupal.org/project/redis
● Driver support
○ phpredis and predis both supported
● Supported Services
○ Driver adapter for custom code
○ Cache, including invalidations
○ Flood
○ Lock
○ Lock.Persistent
○ Queue
● CLI support
○ Not included
● Other modules
○ Redis Watchdog: logger + UI
Recent events (from @Berdir)
● Deadlock/race condition on node_list invalidations
(#2966607) finally fixed in core 8.8.x with latest
release
● php-redis 5.0 broke module, fixed in latest 8.x and 7.x
releases
● Module users: please test and report !
Performance / scalability
Redis
https://www.drupal.org/project/redis
● Performance, single-server
○ Memory-only implementation
■ Usually among the fastest
■ Often the fastest
■ Even with concurrent access
○ Persistent
■ A bit slower even with just RDB
■ Slower with AOF
● Persistence, single instance
○ RDB:
■ compact snapshots, shippable off-site
■ data loss: since latest snapshot
○ AOF
■ up to last-second fsync’ed journal
■ less compact
● Fault-tolerance: Sentinel 2
○ master/slave supervision
○ automatic failover possible
○ observability support
● Scaling
○ Cluster-based sharding
○ Master → Slaves → Slaves
○ No strong consistency
○ Recommended config: 6 servers
● Cloud-native:
○ Redis Enteprise Cloud
○ AWS Elasticache, Azure, Google Memorystore
○ many others
Redis
https://www.drupal.org/project/memcache
● Driver support
○ memcache extension (limited availability)
○ memcached extension
○ PHP ≥ 5.6
● Supported Services
○ Driver adapter for custom code
○ Cache, including invalidations
○ Lock
○ Lock.Persistent removed in #2995907
○ Sessions ported, then removed in 7.x
○ Monitoring UI
● CLI support
○ Not included: core commands
● Other module: memcache_storage
○ Cache with core SQL invalidations
○ No lock
○ Monitoring UI
Recent events (from @Berdir)
● Deadlock/race condition on node_list invalidations
(#2966607) finally fixed in core 8.8.x with latest
release, based on Redis fix.
● Performance, single-server
○ Memory-only implementation
■ Usually among the fastest
■ Slower than in-memory Redis
■ A bit faster than to MySQL / MongoDB K/V
○ Persistence: extstore NVRAM support
■ No significant slowdown
■ Usually a bad idea (expectations)
■ https://memcached.org/blog/persistent-m
emory/
● Fault-tolerance
○ Module support for sharded clusters
○ Consistent hashing: avoid thundering herd prob.
○ Replication: with Hazelcache
Performance / scalability
Redis
https://www.drupal.org/project/memcache
● Scaling
○ Cluster-based sharding
○ Consistent hashing allows elastic scaling
○ Recommended config: 2 instances per
cluster, 1 cluster per bin, with some
exceptions: usually 10-20 instances per D8 site
○ Some bins must stay in core (form, update)
● Monitoring
○ Instant: module-provided memcache_admin
○ Evolved: phpmemcacheadmin
● Cloud-native
○ AWS Elasticache
○ Azure Memcached Cloud
○ Google AppEngine Memcache
Mainstream packages
MongoDB
https://www.drupal.org/project/mongodb
Drupal 7 features
● Driver support:
○ mongo extension for PHP 5.x
○ mongodb extension for PHP 7.x
○ MongoDB 2.x, 3.x
● Supported Services
○ Driver adapter for custom code
○ Block
○ Cache
○ Path
○ Queue
● Unsupported services
○ Field storage
○ Lock
○ (Session)
○ Watchdog = logger + UI
● Other modules
○ Views driver: EFQ Views
Drupal 8.x-2.x features
● Driver support
○ mongodb extension for PHP ≥ 7.1
○ mongodb/mongodb php driver
○ MongoDB 3.x, 4.x
● Supported Services
○ Driver adapter for custom code
○ Key-value (e.g. State)
○ Key-value expirable (e.g. *tempstore*, form_cache)
○ Watchdog = logger + UI
● CLI support
○ Drupal Console 1.9.x
○ Drush 9.x
● Other services
○ Entity/field storage
● Other modules
○ MongoDB Indexer
Exotic packages
MongoDB
https://www.drupal.org/project/mongodb
Drupal 8.x-1.x
● Driver support:
○ mongo extension for PHP 5.x
○ MongoDB 3.x
● Supported services
○ Complete NoSQL distribution
○ @database implementation
○ No SQL DBMS needed
○ Unpatched Drupal core
● Status
○ Sponsored by MongoDB, led by chx
○ Development halted before Drupal 8.0.0
● Performance:
○ About 4x faster than equivalent Drupal core
Drumongous
● Driver support
○ mongo extension for PHP ≥ 5.6
○ MongoDB ≥ 3.6
● Supported Services
○ Complete NoSQL distribution
○ @database implementation
● Source: patched Drupal core + module
○ https://gitlab.com/daffie/drumongous/
○ https://gitlab.com/daffie/mongodb
● CLI support
○ Drupal Console 1.x
○ Drush 9.x
● Status
○ Off-drupal.org
○ No issue queue
○ Active, led by daffie
espace réservé non accepté
Performance / scalability
Engine features
● Fault-tolerance
○ Built-in replication
○ Recommended config: 2+1 servers
● Scaling
○ Read-only replicas
○ Data-center awareness
○ Sharding
● Both supported by existing module
Monitoring / Ops
● In-module: logs
● Cloud: MongoDB Atlas, free monitoring, OpsManager
Cloud native
● Azure: CosmosDB
● MongoDB: Atlas
● Mlab (née Mongolab)
MongoDB
https://www.drupal.org/project/mongodb
Production example
Custom social network (2M users), migrated from MySQL:
MySQL slow queries: -85%, uncached content build time: -98%
NoSQL storage features
Other NoSQL support modules
NoSQL Product Module Wrapper Features 7.x 8.x Supported ?
Neo4J neo4j Y - Y Y N
RethinkDB renthinkdb Y ORM N Y ?
CouchDB couchdb Y Node export Y N N
Couchbase couchbase Y Logger + UI Y N ?
ElasticSearch elasticsearch_connector Y Logger + improved UI,
Statistics, Views
Y N Y
SearchAPI Y Y
AWS DynamoDB dynamodb N Cache Y N ?
AWS SimpleDB awssdk, creeper Y - Y N ?
Riak riak_field_storage Y Field storage, map-reduce Y N unsupported
Apache Cassandra cassandra Y Example app 6.x N unsupported
Tokyo Tyrant node/844354 N Logger + UI 6.x N unapproved
Sessions
NoSQL Sessions ?
● Why the weak/removed session support, especially for memcache ?
○ Memcache session support is baked in PHP memcached extension
○ It was popular in Drupal 6.x time
○ It is popular in Symfony, even documented on symfony.com
○ So ?
● Experience
○ Session data
○ Instance restart → all sessions data on instance lost
○ Bigger session data saturating bin → evictions
○ LRU means vulnerability to DoS-ing and blocking admins via evictions
○ DB load is bigger in Drupal than most frameworks
■ Session DB load is a smaller part of load for us
Logs
Logs in core
The “SQL” problem
● All sites really need some sort of logging feature
● Smaller sites only have a database
○ ⇨ Database Logging default-enabled
● Code is not perfect, throws notices, errors
● Modules are verbose, log debug info
● “Drupal is too slow, please help, agency is stuck”
○ ⇨ Audit : 1500 inserts/min in watchdog table
○ ⇨ Other audits: watchdog > 99% of site size
● DBlog inserts compete with content work
● Owner disables logging
○ ⇨ now misses essential info
● Does not disable logging
○ ⇨ now can’t find essential info buried in noise
The core NoSQL module
● Core has been bundling a syslog client since 6.0
● Decouple logs from DB load
○ ⇨ No more SQL logs workload
● But where do they go ?
○ ⇨ Needs OS-level configuration
● How are logs cleaned ?
○ ⇨ Needs OS-level configuration
● Where is the UI ?
○ ⇨ Needs extra tools
● Solutions ?
○ D7 has logging hook
○ D8 has PSR/3 standard logging
○ ⇨ Contributions
NoSQL on-site logs
(mongodb|redis)_watchdog
● mongodb_watchdog
○ Logger service
■ Standard Drupal PSR/3 logs backend
■ Pre-storage filtering
■ Uses capped collections: auto-rotation, no ops
■ Dedicated database: zero contention
■ Per-request event tracing
○ Improved logs UI
■ Based on core UI
■ Groups recurring events on single line
■ Details page for occurrences
■ Per-HTTP-request log page
○ Most common reason to deploy MongoDB on D8
● redis_watchdog
○ Logger service
○ Logs UI based on core UI
○ Usage: 1 site
Off-site logs: BELK stack
BELK stack
● Beats (typically FileBeat)
● Elastic Search
● Logstash
● Kibana
Operation
● Drupal syslog → local syslog server → local logs
● DON’T log straight from Drupal
● Filebeat pulls logs, sends to Logstash
● Logstash massages logs, sends to ES
● ES provides storage, indexing
● Kibana provides UI
Deployment
● Hosted with site
● SaaS: Loggly, Logz.io, ...
Off-site logs: Graylog
Graylog
● Dual server: ES (logs, search) + MongoDB (meta, conf)
● Includes GROK log handling
● Accept syslog or GELF input
● Designed from Splunk
Operation
● Drupal syslog → local syslog server → local logs
● DON’T log straight from Drupal via monolog_gelf
● Local syslog forwards to Graylog2
● Graylog2 massages logs, sends to ES
● ES provides storage, indexing
● Graylog2 provides UI
Deployment
● Hosted with site
● SaaS: StackHero
(source: Graylog)
Off-site logs: BELK vs Graylog design
Non-SQL Logs: do I need them ?
● Small site, little traffic, single webmaster: just use dblog
● Any other site: upgrade to something else
○ Hosting company provides a logs dashboard (e.g. Splunk): use it
■ syslog into their stack, via local syslog then pull
○ Have an internal ops team ?
■ syslog into internal BELK or Graylog
○ No ops expertise ? don’t have time to learn Kibana/Graylog ? hosting company
doesn’t provide real time logs access ?
■ Want to minimize costs and/or have logs in-site ?
● use mongodb_watchdog
■ Otherwise, use SaaS logs vendor
● Datadog, Scalyr, Loggly or Papertrail (SolarWinds), Logz.io...
Queues
Queue API services
● Core: mostly for Batch API
● General D8 use: proxy invalidation
○ Invalidation queues
● Commerce sites
○ ERP links
○ Third-party catalog/inventory
● Media sites
○ Real time news feeds ingestion
○ Deferred derived media generation
Queue modules
SQL and NoSQL
SQL
● Core bundled: queue.database service
○ used by all Drupal sites
● advanced_queue project
○ created for Drupal Commerce projects
○ used by Commerce 2.x
NoSQL: storage-based
● Core bundled: queue.memory service
● Redis:
○ 7.x: redis_queue project
○ 8.x: redis project
● MongoDB
○ 7.x: mongodb project
NoSQL: message servers
● Beanstalkd
○ 6.x/7.x: popular, used by drupal.org itself
○ 8.x complete port, but no users (?)
● RabbitMQ
○ 7.x: little used, 8.x: most popular
○ Users include public TV, major french e-tailer
○ Hardened by production at these levels
● AWS SQS
○ 7.x: some use, but no 8.x port
● Apache Kafka
○ 8.x only
○ Created for largest french retail chain
● Other queue services
○ Less used: Gearman, IronMQ, 0MQ
○ No 8.x versions
Queue API modules by usage D7/D8
NoSQL Queue: do I need it ?
● Mainstream Drupal site without Varnish / CDN
○ probably not, advancedqueue is still a nice improvement though
● Content site with a lot of generated content, Varnish and/or CDN
○ consider using Redis (D8), MongoDB (D7), RabbitMQ (D8)
○ or use Kafka (D8) if you need to (e.g. corporate mandate)
● Drupal Commerce standalone
○ advancedqueue is normally enough
● Site generating lots of dynamic media (image, video, sound) ...or ingesting fast feeds (> 1 item/sec)
○ need a dedicated message server
NoSQL Queue: which should I use ?
● The one your ops team supports best
○ Content management has a low event rate (< 1 event/sec)
● Kafka-class is for high-throughput queues
○ Think LinkedIn, Twitter, Netflix, Spotify, Airbnb, Paypal…
● RabbitMQ is solid
○ usually well known and monitored
○ D8 driver used for years on Cyber Monday, Black Friday, Olympic games...
● Beanstalkd is simple
○ It “just works”
○ Good first queue upgrading from DB
Search
SQL-based search
● Search has long been the weakest core feature in Drupal
○ In spite of improvements with each version
● Relevant issues
○ Good recall, but bad precision
○ Multilingual support, but no language awareness
○ Low awareness of language inflections → preprocessing API
○ Limited ability to handle asian (CJK) languages
○ Slow updates, cron-based pull mode
○ Indexing costs impacting site users
○ Indexed search for content only → search plugins
○ Other entity types limited to unindexed search by default
○ No support for restricted content search
● Useful complements: porterstemmer, snowball_stemmer
● SQL Alternative: Search API database search. Similar.
NoSQL search solutions
Cloud-based / SaaS
● SaaS offerings:
○ Algolia
○ Google CSE
● Drupal Hosting offerings (alphabetic order):
○ Acquia Search SOLR
○ Amazee.io SOLR
○ Pantheon SOLR
○ Platform.sh ElasticSearch / SOLR
On-site / near-site
● Core support: Search API (14% of D7, 16% of D8 sites)
● Standard solution:
○ Local SOLR
○ Multilingual search supported
● Alternatives:
○ Elastic Search → heart of BELK suite
○ Xunsearch: Xapian for Chinese
○ Xapian (8.x dev)
● D7 backends not on D8:
○ Elastic Search via Elastica
○ Google Search Appliance: killed by Google
○ MongoDB via MongoDB module
○ Sphinx
● Proprietary search engine publishers have custom,
unpublished, non-GPL (!) Drupal modules
SQL and NoSQL search solutions by usage in D8
Non-core search: which should I use ?
● Any content deserves search
● SQL
○ Core for small content quantities
○ Search API DB backend used by drupal.org
● SaaS
○ For entry level: Algolia/Google = 0 recurring cost, near 0 set-up cost
○ Both perform better than core, but non-free
● Drupal PaaS have managed ES/SOLR
● Others: cost equilibrium
○ ES/SOLR have setup and recurring costs of possession (server load)
○ SaaS has lower set-up costs, but recurring fees
○ Core search has the cost of lost opportunity
Best practices
Best current practice: NoSQL in general
Drupal 8 core tries hard to be SQL-agnostic
● Every use of the DB goes through @database
○ So anything able to pass for a SQL engine may be used
○ The mongodb_dbtng, mongodb 8.x-1.x, and Drumongous projects do just that
● Even Views has a query plugin. Project efq_views (7.x, 8.x) supports NoSQL engines that way
● No service except “storage” services should receive databases
○ Write a storage service for your data, defining its interface
○ Write a SQL provider implementing it, receiving @database
○ Tag the service as “backend_overridable”
○ Core mostly does it, custom code should always do it.
● References:
○ https://www.drupal.org/project/drupal/issues/2302617
○ https://www.drupal.org/node/2306083
Best current practice: MongoDB
● Connecting to MongoDB with 8.x-2.x
○ Using multiple databases ? Use @mongodb.client_factory
■ The client you get is a standard mongodb/mongodb Client instance
■ You have to handle topology
○ Using single database ? Use @mongodb.database_factory
■ The database you get is a standard mongodb/mongodb Database instance
■ Your DB topology is now configurable in settings
○ You probably don’t want to use Doctrine ODM, especially when interacting with Drupal data
● Designing a custom schema
○ Start from the queries, not from some canonicalization
○ For large scale data sets, consider:
■ Splitting live and archive data for sharding
■ Having a write DB and a read DB, and a CLI-based service between them - read about CQRS
○ Never use a monotonic increasing key for sharding
○ In most cases, joined data in lists don’t need to be as up-to-date as primary views
■ Embed “light” versions of dependent objects for lists, only use $lookup and DBRef joins on full datum view
“ “
There, I said it !
Contribution is
its own reward
Join us for
contribution opportunities
Thursday, October 31, 2019
9:00-18:00
Room: Europe Foyer 2
Mentored
Contribution
First Time
Contributor Workshop
General
Contribution
#DrupalContributions
9:00-14:00
Room: Diamond Lounge
9:00-18:00
Room: Europe Foyer 2
What did you think?
Locate this session at the DrupalCon Amsterdam website:
https://drupal.kuoni-congress.info/2019/program/
Take the Survey!
https://www.surveymonkey.com/r/DrupalConAmsterdam

More Related Content

What's hot

What's hot (20)

MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
MySQL Database Architectures - MySQL InnoDB ClusterSet 2021-11
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
MySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELKMySQL Slow Query log Monitoring using Beats & ELK
MySQL Slow Query log Monitoring using Beats & ELK
 
Kafka internals
Kafka internalsKafka internals
Kafka internals
 
Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain Kafka streams windowing behind the curtain
Kafka streams windowing behind the curtain
 
Highly efficient backups with percona xtrabackup
Highly efficient backups with percona xtrabackupHighly efficient backups with percona xtrabackup
Highly efficient backups with percona xtrabackup
 
MySQL Group Replication
MySQL Group ReplicationMySQL Group Replication
MySQL Group Replication
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
MariaDB 10.5 binary install (바이너리 설치)
MariaDB 10.5 binary install (바이너리 설치)MariaDB 10.5 binary install (바이너리 설치)
MariaDB 10.5 binary install (바이너리 설치)
 
ProxySQL High Availability (Clustering)
ProxySQL High Availability (Clustering)ProxySQL High Availability (Clustering)
ProxySQL High Availability (Clustering)
 
Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020
 
Reddit/Quora Software System Design
Reddit/Quora Software System DesignReddit/Quora Software System Design
Reddit/Quora Software System Design
 
Iceberg: a fast table format for S3
Iceberg: a fast table format for S3Iceberg: a fast table format for S3
Iceberg: a fast table format for S3
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
kafka
kafkakafka
kafka
 
NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?
 
Using Processes and Timers for Long-Running Asynchronous Tasks
Using Processes and Timers for Long-Running Asynchronous TasksUsing Processes and Timers for Long-Running Asynchronous Tasks
Using Processes and Timers for Long-Running Asynchronous Tasks
 
SRV308 Deep Dive on Amazon Aurora
SRV308 Deep Dive on Amazon AuroraSRV308 Deep Dive on Amazon Aurora
SRV308 Deep Dive on Amazon Aurora
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
 

Similar to Scaling up and accelerating Drupal 8 with NoSQL

Ukoug 2011 mysql_arch_for_orcl_dba
Ukoug 2011 mysql_arch_for_orcl_dbaUkoug 2011 mysql_arch_for_orcl_dba
Ukoug 2011 mysql_arch_for_orcl_dba
orablue11
 
Plain english guide to drupal 8 criticals
Plain english guide to drupal 8 criticalsPlain english guide to drupal 8 criticals
Plain english guide to drupal 8 criticals
Angela Byron
 
Benchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetesBenchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetes
DoKC
 
What's New in OpenLDAP
What's New in OpenLDAPWhat's New in OpenLDAP
What's New in OpenLDAP
LDAPCon
 

Similar to Scaling up and accelerating Drupal 8 with NoSQL (20)

MySQL and MariaDB Backups
MySQL and MariaDB BackupsMySQL and MariaDB Backups
MySQL and MariaDB Backups
 
Mongo nyc nyt + mongodb
Mongo nyc nyt + mongodbMongo nyc nyt + mongodb
Mongo nyc nyt + mongodb
 
PL22 - Backup and Restore Performance.pptx
PL22 - Backup and Restore Performance.pptxPL22 - Backup and Restore Performance.pptx
PL22 - Backup and Restore Performance.pptx
 
Ukoug 2011 mysql_arch_for_orcl_dba
Ukoug 2011 mysql_arch_for_orcl_dbaUkoug 2011 mysql_arch_for_orcl_dba
Ukoug 2011 mysql_arch_for_orcl_dba
 
Scaling Redis: Dmitry Polyakovsky
Scaling Redis: Dmitry PolyakovskyScaling Redis: Dmitry Polyakovsky
Scaling Redis: Dmitry Polyakovsky
 
Doctrine Project
Doctrine ProjectDoctrine Project
Doctrine Project
 
MongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseMongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL Database
 
Decoupled (Headless) Drupal
Decoupled (Headless) DrupalDecoupled (Headless) Drupal
Decoupled (Headless) Drupal
 
Redis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs TalksRedis Developers Day 2014 - Redis Labs Talks
Redis Developers Day 2014 - Redis Labs Talks
 
Plain english guide to drupal 8 criticals
Plain english guide to drupal 8 criticalsPlain english guide to drupal 8 criticals
Plain english guide to drupal 8 criticals
 
Benchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetesBenchmarking for postgresql workloads in kubernetes
Benchmarking for postgresql workloads in kubernetes
 
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo Seidel
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo SeidelOSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo Seidel
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo Seidel
 
Drupal 7 and RDF
Drupal 7 and RDFDrupal 7 and RDF
Drupal 7 and RDF
 
Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...
Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...
Second Skin: Real-Time Retheming a Legacy Web Application with Diazo in the C...
 
Lupus Decoupled Drupal - Drupal Austria Meetup - 2023-04.pdf
Lupus Decoupled Drupal - Drupal Austria Meetup - 2023-04.pdfLupus Decoupled Drupal - Drupal Austria Meetup - 2023-04.pdf
Lupus Decoupled Drupal - Drupal Austria Meetup - 2023-04.pdf
 
Drupal 7 performance and optimization
Drupal 7 performance and optimizationDrupal 7 performance and optimization
Drupal 7 performance and optimization
 
HTML, CSS & Javascript Architecture (extended version) - Jan Kraus
HTML, CSS & Javascript Architecture (extended version) - Jan KrausHTML, CSS & Javascript Architecture (extended version) - Jan Kraus
HTML, CSS & Javascript Architecture (extended version) - Jan Kraus
 
What's New in OpenLDAP
What's New in OpenLDAPWhat's New in OpenLDAP
What's New in OpenLDAP
 
Drupal performance
Drupal performanceDrupal performance
Drupal performance
 
Scaling symfony apps
Scaling symfony appsScaling symfony apps
Scaling symfony apps
 

More from OSInet

Equipe drupal
Equipe drupalEquipe drupal
Equipe drupal
OSInet
 
Pourquoi choisir un CMS Open Source ?
Pourquoi choisir un CMS Open Source ?Pourquoi choisir un CMS Open Source ?
Pourquoi choisir un CMS Open Source ?
OSInet
 

More from OSInet (15)

Interface texte plein écran en Go avec TView
Interface texte plein écran en Go avec TViewInterface texte plein écran en Go avec TView
Interface texte plein écran en Go avec TView
 
Mon site web est hacké ! Que faire ?
Mon site web est hacké ! Que faire ?Mon site web est hacké ! Que faire ?
Mon site web est hacké ! Que faire ?
 
Faster Drupal sites using Queue API
Faster Drupal sites using Queue APIFaster Drupal sites using Queue API
Faster Drupal sites using Queue API
 
Life after the hack
Life after the hackLife after the hack
Life after the hack
 
Delayed operations with queues for website performance
Delayed operations with queues for website performanceDelayed operations with queues for website performance
Delayed operations with queues for website performance
 
Drupal 8 : regards croisés
Drupal 8 : regards croisésDrupal 8 : regards croisés
Drupal 8 : regards croisés
 
Cache speedup with Heisencache for Drupal 7 and Drupal 8
Cache speedup with Heisencache for Drupal 7 and Drupal 8Cache speedup with Heisencache for Drupal 7 and Drupal 8
Cache speedup with Heisencache for Drupal 7 and Drupal 8
 
Recueil des mauvaises pratiques constatées lors de l'audit de sites Drupal 7
Recueil des mauvaises pratiques constatées lors de l'audit de sites Drupal 7Recueil des mauvaises pratiques constatées lors de l'audit de sites Drupal 7
Recueil des mauvaises pratiques constatées lors de l'audit de sites Drupal 7
 
Le groupe PHP-FIG et les standards PSR
Le groupe  PHP-FIG et les standards PSRLe groupe  PHP-FIG et les standards PSR
Le groupe PHP-FIG et les standards PSR
 
Les blocs Drupal de drop.org à Drupal 8
Les blocs Drupal de drop.org à Drupal 8Les blocs Drupal de drop.org à Drupal 8
Les blocs Drupal de drop.org à Drupal 8
 
Utiliser drupal
Utiliser drupalUtiliser drupal
Utiliser drupal
 
Equipe drupal
Equipe drupalEquipe drupal
Equipe drupal
 
Pourquoi choisir un CMS Open Source ?
Pourquoi choisir un CMS Open Source ?Pourquoi choisir un CMS Open Source ?
Pourquoi choisir un CMS Open Source ?
 
Drupal et le NoSQL - drupagora 2011
Drupal et le NoSQL - drupagora 2011Drupal et le NoSQL - drupagora 2011
Drupal et le NoSQL - drupagora 2011
 
Drupal Views development
Drupal Views developmentDrupal Views development
Drupal Views development
 

Recently uploaded

Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptx
ChloeMeadows1
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
lolsDocherty
 

Recently uploaded (16)

iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWebiThome_CYBERSEC2024_Drive_Into_the_DarkWeb
iThome_CYBERSEC2024_Drive_Into_the_DarkWeb
 
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
TORTOGEL TELAH MENJADI SALAH SATU PLATFORM PERMAINAN PALING FAVORIT.
 
Reggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirtsReggie miller choke t shirts
Reggie miller choke t shirtsReggie miller choke t shirts
 
I’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 ShirtI’ll See Y’All Motherfuckers In Game 7 Shirt
I’ll See Y’All Motherfuckers In Game 7 Shirt
 
GOOGLE Io 2024 At takes center stage.pdf
GOOGLE Io 2024 At takes center stage.pdfGOOGLE Io 2024 At takes center stage.pdf
GOOGLE Io 2024 At takes center stage.pdf
 
Pvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdfPvtaan Social media marketing proposal.pdf
Pvtaan Social media marketing proposal.pdf
 
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital PresenceCyber Security Services Unveiled: Strategies to Secure Your Digital Presence
Cyber Security Services Unveiled: Strategies to Secure Your Digital Presence
 
Thank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirtsThank You Luv I’ll Never Walk Alone Again T shirts
Thank You Luv I’ll Never Walk Alone Again T shirts
 
Premier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdfPremier Mobile App Development Agency in USA.pdf
Premier Mobile App Development Agency in USA.pdf
 
Development Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of appsDevelopment Lifecycle.pptx for the secure development of apps
Development Lifecycle.pptx for the secure development of apps
 
Statistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdfStatistical Analysis of DNS Latencies.pdf
Statistical Analysis of DNS Latencies.pdf
 
How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?How Do I Begin the Linksys Velop Setup Process?
How Do I Begin the Linksys Velop Setup Process?
 
Production 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptxProduction 2024 sunderland culture final - Copy.pptx
Production 2024 sunderland culture final - Copy.pptx
 
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkkaudience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
audience research (emma) 1.pptxkkkkkkkkkkkkkkkkk
 
Bug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's GuideBug Bounty Blueprint : A Beginner's Guide
Bug Bounty Blueprint : A Beginner's Guide
 
The Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case StudyThe Use of AI in Indonesia Election 2024: A Case Study
The Use of AI in Indonesia Election 2024: A Case Study
 

Scaling up and accelerating Drupal 8 with NoSQL

  • 1. © 2019 Frédéric G. MARAND - licensed under a Creative Commons Attribution 4.0 International License. Scaling up and accelerating Drupal 8 with NoSQL Frédéric G. MARAND drupal.org: fgm - irc/twitter: @osinet <MongoDB module maintainer />
  • 3. Topic ? Simple idea: “No SQL” ● Alternate storage engines: KV, Structures, Document, Graph, Columnar… ● No standard, often no fixed schema, no joins, no FKs ● → Engine-specific application design ● Drupal architecture ? Evolved idea: Not Only SQL ● For engines, add equivalent features to SQL ● For Drupal, combine SQL et NoSQL solutions ● Start from the default SQL-based architecture ● Offload services to non-SQL implementations ○ front-end caches, search engines, queue servers ○ specialized storage: cache, KV, lock, sessions… ● Often involves NoSQL as cache for SQL espace 1 espace 2
  • 4. NOSQL: do you need it ? ● Start by observing the current state ○ Database queries → devel + webprofiler ○ Cache → heisencache (D7), webprofiler (D8) ○ Build cacheability → renderviz ● Observe behaviour ○ Core observability built-in: DBTNG logging, cache decorators, QueryInterface for KV, config, content… ○ Monitoring module (400 sites) by Karan Poddar (Google SoC) and MD Systems ○ Add your choice of time-series store (e.g. Prometheus, InfluxDB) and UI (e.g. Grafana) ○ ⇨ Use it ! ● You want to see this when it happens ⟶
  • 5. “ “ Peter Drucker If you can’t measure it, you can’t improve it.
  • 6. Fixing an identified problem is cheaper than “trying things” Fix from acquired information ● It /MAY/ involve taking queries off the main DB to a NoSQL solution ● But poorly configured NoSQL may make it worse.
  • 7. “Just do it” ? ● Drupal is built on SQL: ○ Views depends on it by default ○ Most sites rely on Views data model awareness ○ → Contrib often assumes SQL, injects @database ○ NoSQL support doable, rarely done ● Contrib support level is limited ○ Most NoSQL contrib not ported from D7 to D8 ○ Drupalshop knowledge limited except biggest or specialized ○ Products may die… e.g. RethinkDB ● Pro support from publishers = costs. Availability. ● Extra support needed = costs NoSQL == added build costs → balance gains vs costs Example case: RethinkDB At DevDays Milan 2016, after lots of work, Gizra’s @RoySegall demoed a Drupal 8 ORM/ODM for RethinkDB. Then, this happened...
  • 10. Caching ahead of real work Default situation with SQL ● Browser caching, limited ● Internal / dynamic page cache in main SQL DB ● Need DB connection, a few SELECT queries ● Fetch cache from DB ● All data from main storage ● ⇨ Serve cached pages in about 20 msec All this work makes DoS-ing comparatively cheap. NoSQL improvements ● Add caching ahead of site itself ○ Browser ■ Optimized browser caching (Cache-Control) ■ PWA: use browser local storage ○ CDN ■ CDN module (2k sites) ■ Akamai module (600 sites) ■ ⇨ Serve cached pages in about 15 msec (TTFB) ■ Web-scale ○ Varnish and other reverse proxies ■ ⇨ Serve cached pages in about 10 msec (TTFB) ■ Core support ■ Varnish Purger (3k sites) ● ⇨ Most request will mean 0 SQL queries ○ DoS-ing more costly, especially with CDN ● Move page caches off main DB: next section
  • 13. Storage: the “Big 3” The most active NoSQL suites for Drupal 8.x Redis ● Type: Key-value (structure server) ● Module ○ redis ● DB-Engines ranking: ○ #1 Key-value store ● Usage ○ Drupal 7: 10k sites ○ Drupal 8: 10k sites ● Supported by ○ Drupal 7: Makina Corpus ○ Drupal 8: MD Systems Memcached ● Type: Key-value ● Module ○ memcache ● DB-Engines ranking: ○ #3 Key-value store ○ #5 Key-value store (Hazelcast) ● Usage (memcache_storage) ○ Drupal 7: 32k (2k) sites ○ Drupal 8: 15k (800) sites ● Supported by: ○ Acquia ○ Tag1 Consulting MongoDB / CosmosDB ● Type: Document store ● Module ○ mongodb ● DB-Engines ranking: ○ #1 Document store (MongoDB) ○ #4 Document store (CosmosDB) ● Usage ○ Drupal 7: 300 sites ○ Drupal 8: 50 sites ● Supported by ○ OSInet
  • 14. Redis https://www.drupal.org/project/redis ● Driver support ○ phpredis and predis both supported ● Supported Services ○ Driver adapter for custom code ○ Cache, including invalidations ○ Flood ○ Lock ○ Lock.Persistent ○ Queue ● CLI support ○ Not included ● Other modules ○ Redis Watchdog: logger + UI Recent events (from @Berdir) ● Deadlock/race condition on node_list invalidations (#2966607) finally fixed in core 8.8.x with latest release ● php-redis 5.0 broke module, fixed in latest 8.x and 7.x releases ● Module users: please test and report !
  • 15. Performance / scalability Redis https://www.drupal.org/project/redis ● Performance, single-server ○ Memory-only implementation ■ Usually among the fastest ■ Often the fastest ■ Even with concurrent access ○ Persistent ■ A bit slower even with just RDB ■ Slower with AOF ● Persistence, single instance ○ RDB: ■ compact snapshots, shippable off-site ■ data loss: since latest snapshot ○ AOF ■ up to last-second fsync’ed journal ■ less compact ● Fault-tolerance: Sentinel 2 ○ master/slave supervision ○ automatic failover possible ○ observability support ● Scaling ○ Cluster-based sharding ○ Master → Slaves → Slaves ○ No strong consistency ○ Recommended config: 6 servers ● Cloud-native: ○ Redis Enteprise Cloud ○ AWS Elasticache, Azure, Google Memorystore ○ many others
  • 16. Redis https://www.drupal.org/project/memcache ● Driver support ○ memcache extension (limited availability) ○ memcached extension ○ PHP ≥ 5.6 ● Supported Services ○ Driver adapter for custom code ○ Cache, including invalidations ○ Lock ○ Lock.Persistent removed in #2995907 ○ Sessions ported, then removed in 7.x ○ Monitoring UI ● CLI support ○ Not included: core commands ● Other module: memcache_storage ○ Cache with core SQL invalidations ○ No lock ○ Monitoring UI Recent events (from @Berdir) ● Deadlock/race condition on node_list invalidations (#2966607) finally fixed in core 8.8.x with latest release, based on Redis fix.
  • 17. ● Performance, single-server ○ Memory-only implementation ■ Usually among the fastest ■ Slower than in-memory Redis ■ A bit faster than to MySQL / MongoDB K/V ○ Persistence: extstore NVRAM support ■ No significant slowdown ■ Usually a bad idea (expectations) ■ https://memcached.org/blog/persistent-m emory/ ● Fault-tolerance ○ Module support for sharded clusters ○ Consistent hashing: avoid thundering herd prob. ○ Replication: with Hazelcache Performance / scalability Redis https://www.drupal.org/project/memcache ● Scaling ○ Cluster-based sharding ○ Consistent hashing allows elastic scaling ○ Recommended config: 2 instances per cluster, 1 cluster per bin, with some exceptions: usually 10-20 instances per D8 site ○ Some bins must stay in core (form, update) ● Monitoring ○ Instant: module-provided memcache_admin ○ Evolved: phpmemcacheadmin ● Cloud-native ○ AWS Elasticache ○ Azure Memcached Cloud ○ Google AppEngine Memcache
  • 18. Mainstream packages MongoDB https://www.drupal.org/project/mongodb Drupal 7 features ● Driver support: ○ mongo extension for PHP 5.x ○ mongodb extension for PHP 7.x ○ MongoDB 2.x, 3.x ● Supported Services ○ Driver adapter for custom code ○ Block ○ Cache ○ Path ○ Queue ● Unsupported services ○ Field storage ○ Lock ○ (Session) ○ Watchdog = logger + UI ● Other modules ○ Views driver: EFQ Views Drupal 8.x-2.x features ● Driver support ○ mongodb extension for PHP ≥ 7.1 ○ mongodb/mongodb php driver ○ MongoDB 3.x, 4.x ● Supported Services ○ Driver adapter for custom code ○ Key-value (e.g. State) ○ Key-value expirable (e.g. *tempstore*, form_cache) ○ Watchdog = logger + UI ● CLI support ○ Drupal Console 1.9.x ○ Drush 9.x ● Other services ○ Entity/field storage ● Other modules ○ MongoDB Indexer
  • 19. Exotic packages MongoDB https://www.drupal.org/project/mongodb Drupal 8.x-1.x ● Driver support: ○ mongo extension for PHP 5.x ○ MongoDB 3.x ● Supported services ○ Complete NoSQL distribution ○ @database implementation ○ No SQL DBMS needed ○ Unpatched Drupal core ● Status ○ Sponsored by MongoDB, led by chx ○ Development halted before Drupal 8.0.0 ● Performance: ○ About 4x faster than equivalent Drupal core Drumongous ● Driver support ○ mongo extension for PHP ≥ 5.6 ○ MongoDB ≥ 3.6 ● Supported Services ○ Complete NoSQL distribution ○ @database implementation ● Source: patched Drupal core + module ○ https://gitlab.com/daffie/drumongous/ ○ https://gitlab.com/daffie/mongodb ● CLI support ○ Drupal Console 1.x ○ Drush 9.x ● Status ○ Off-drupal.org ○ No issue queue ○ Active, led by daffie
  • 20. espace réservé non accepté Performance / scalability Engine features ● Fault-tolerance ○ Built-in replication ○ Recommended config: 2+1 servers ● Scaling ○ Read-only replicas ○ Data-center awareness ○ Sharding ● Both supported by existing module Monitoring / Ops ● In-module: logs ● Cloud: MongoDB Atlas, free monitoring, OpsManager Cloud native ● Azure: CosmosDB ● MongoDB: Atlas ● Mlab (née Mongolab) MongoDB https://www.drupal.org/project/mongodb Production example Custom social network (2M users), migrated from MySQL: MySQL slow queries: -85%, uncached content build time: -98%
  • 22. Other NoSQL support modules NoSQL Product Module Wrapper Features 7.x 8.x Supported ? Neo4J neo4j Y - Y Y N RethinkDB renthinkdb Y ORM N Y ? CouchDB couchdb Y Node export Y N N Couchbase couchbase Y Logger + UI Y N ? ElasticSearch elasticsearch_connector Y Logger + improved UI, Statistics, Views Y N Y SearchAPI Y Y AWS DynamoDB dynamodb N Cache Y N ? AWS SimpleDB awssdk, creeper Y - Y N ? Riak riak_field_storage Y Field storage, map-reduce Y N unsupported Apache Cassandra cassandra Y Example app 6.x N unsupported Tokyo Tyrant node/844354 N Logger + UI 6.x N unapproved
  • 24. NoSQL Sessions ? ● Why the weak/removed session support, especially for memcache ? ○ Memcache session support is baked in PHP memcached extension ○ It was popular in Drupal 6.x time ○ It is popular in Symfony, even documented on symfony.com ○ So ? ● Experience ○ Session data ○ Instance restart → all sessions data on instance lost ○ Bigger session data saturating bin → evictions ○ LRU means vulnerability to DoS-ing and blocking admins via evictions ○ DB load is bigger in Drupal than most frameworks ■ Session DB load is a smaller part of load for us
  • 25. Logs
  • 26. Logs in core The “SQL” problem ● All sites really need some sort of logging feature ● Smaller sites only have a database ○ ⇨ Database Logging default-enabled ● Code is not perfect, throws notices, errors ● Modules are verbose, log debug info ● “Drupal is too slow, please help, agency is stuck” ○ ⇨ Audit : 1500 inserts/min in watchdog table ○ ⇨ Other audits: watchdog > 99% of site size ● DBlog inserts compete with content work ● Owner disables logging ○ ⇨ now misses essential info ● Does not disable logging ○ ⇨ now can’t find essential info buried in noise The core NoSQL module ● Core has been bundling a syslog client since 6.0 ● Decouple logs from DB load ○ ⇨ No more SQL logs workload ● But where do they go ? ○ ⇨ Needs OS-level configuration ● How are logs cleaned ? ○ ⇨ Needs OS-level configuration ● Where is the UI ? ○ ⇨ Needs extra tools ● Solutions ? ○ D7 has logging hook ○ D8 has PSR/3 standard logging ○ ⇨ Contributions
  • 27. NoSQL on-site logs (mongodb|redis)_watchdog ● mongodb_watchdog ○ Logger service ■ Standard Drupal PSR/3 logs backend ■ Pre-storage filtering ■ Uses capped collections: auto-rotation, no ops ■ Dedicated database: zero contention ■ Per-request event tracing ○ Improved logs UI ■ Based on core UI ■ Groups recurring events on single line ■ Details page for occurrences ■ Per-HTTP-request log page ○ Most common reason to deploy MongoDB on D8 ● redis_watchdog ○ Logger service ○ Logs UI based on core UI ○ Usage: 1 site
  • 28. Off-site logs: BELK stack BELK stack ● Beats (typically FileBeat) ● Elastic Search ● Logstash ● Kibana Operation ● Drupal syslog → local syslog server → local logs ● DON’T log straight from Drupal ● Filebeat pulls logs, sends to Logstash ● Logstash massages logs, sends to ES ● ES provides storage, indexing ● Kibana provides UI Deployment ● Hosted with site ● SaaS: Loggly, Logz.io, ...
  • 29. Off-site logs: Graylog Graylog ● Dual server: ES (logs, search) + MongoDB (meta, conf) ● Includes GROK log handling ● Accept syslog or GELF input ● Designed from Splunk Operation ● Drupal syslog → local syslog server → local logs ● DON’T log straight from Drupal via monolog_gelf ● Local syslog forwards to Graylog2 ● Graylog2 massages logs, sends to ES ● ES provides storage, indexing ● Graylog2 provides UI Deployment ● Hosted with site ● SaaS: StackHero
  • 30. (source: Graylog) Off-site logs: BELK vs Graylog design
  • 31. Non-SQL Logs: do I need them ? ● Small site, little traffic, single webmaster: just use dblog ● Any other site: upgrade to something else ○ Hosting company provides a logs dashboard (e.g. Splunk): use it ■ syslog into their stack, via local syslog then pull ○ Have an internal ops team ? ■ syslog into internal BELK or Graylog ○ No ops expertise ? don’t have time to learn Kibana/Graylog ? hosting company doesn’t provide real time logs access ? ■ Want to minimize costs and/or have logs in-site ? ● use mongodb_watchdog ■ Otherwise, use SaaS logs vendor ● Datadog, Scalyr, Loggly or Papertrail (SolarWinds), Logz.io...
  • 33. Queue API services ● Core: mostly for Batch API ● General D8 use: proxy invalidation ○ Invalidation queues ● Commerce sites ○ ERP links ○ Third-party catalog/inventory ● Media sites ○ Real time news feeds ingestion ○ Deferred derived media generation
  • 34. Queue modules SQL and NoSQL SQL ● Core bundled: queue.database service ○ used by all Drupal sites ● advanced_queue project ○ created for Drupal Commerce projects ○ used by Commerce 2.x NoSQL: storage-based ● Core bundled: queue.memory service ● Redis: ○ 7.x: redis_queue project ○ 8.x: redis project ● MongoDB ○ 7.x: mongodb project NoSQL: message servers ● Beanstalkd ○ 6.x/7.x: popular, used by drupal.org itself ○ 8.x complete port, but no users (?) ● RabbitMQ ○ 7.x: little used, 8.x: most popular ○ Users include public TV, major french e-tailer ○ Hardened by production at these levels ● AWS SQS ○ 7.x: some use, but no 8.x port ● Apache Kafka ○ 8.x only ○ Created for largest french retail chain ● Other queue services ○ Less used: Gearman, IronMQ, 0MQ ○ No 8.x versions
  • 35. Queue API modules by usage D7/D8
  • 36. NoSQL Queue: do I need it ? ● Mainstream Drupal site without Varnish / CDN ○ probably not, advancedqueue is still a nice improvement though ● Content site with a lot of generated content, Varnish and/or CDN ○ consider using Redis (D8), MongoDB (D7), RabbitMQ (D8) ○ or use Kafka (D8) if you need to (e.g. corporate mandate) ● Drupal Commerce standalone ○ advancedqueue is normally enough ● Site generating lots of dynamic media (image, video, sound) ...or ingesting fast feeds (> 1 item/sec) ○ need a dedicated message server
  • 37. NoSQL Queue: which should I use ? ● The one your ops team supports best ○ Content management has a low event rate (< 1 event/sec) ● Kafka-class is for high-throughput queues ○ Think LinkedIn, Twitter, Netflix, Spotify, Airbnb, Paypal… ● RabbitMQ is solid ○ usually well known and monitored ○ D8 driver used for years on Cyber Monday, Black Friday, Olympic games... ● Beanstalkd is simple ○ It “just works” ○ Good first queue upgrading from DB
  • 39. SQL-based search ● Search has long been the weakest core feature in Drupal ○ In spite of improvements with each version ● Relevant issues ○ Good recall, but bad precision ○ Multilingual support, but no language awareness ○ Low awareness of language inflections → preprocessing API ○ Limited ability to handle asian (CJK) languages ○ Slow updates, cron-based pull mode ○ Indexing costs impacting site users ○ Indexed search for content only → search plugins ○ Other entity types limited to unindexed search by default ○ No support for restricted content search ● Useful complements: porterstemmer, snowball_stemmer ● SQL Alternative: Search API database search. Similar.
  • 40. NoSQL search solutions Cloud-based / SaaS ● SaaS offerings: ○ Algolia ○ Google CSE ● Drupal Hosting offerings (alphabetic order): ○ Acquia Search SOLR ○ Amazee.io SOLR ○ Pantheon SOLR ○ Platform.sh ElasticSearch / SOLR On-site / near-site ● Core support: Search API (14% of D7, 16% of D8 sites) ● Standard solution: ○ Local SOLR ○ Multilingual search supported ● Alternatives: ○ Elastic Search → heart of BELK suite ○ Xunsearch: Xapian for Chinese ○ Xapian (8.x dev) ● D7 backends not on D8: ○ Elastic Search via Elastica ○ Google Search Appliance: killed by Google ○ MongoDB via MongoDB module ○ Sphinx ● Proprietary search engine publishers have custom, unpublished, non-GPL (!) Drupal modules
  • 41. SQL and NoSQL search solutions by usage in D8
  • 42. Non-core search: which should I use ? ● Any content deserves search ● SQL ○ Core for small content quantities ○ Search API DB backend used by drupal.org ● SaaS ○ For entry level: Algolia/Google = 0 recurring cost, near 0 set-up cost ○ Both perform better than core, but non-free ● Drupal PaaS have managed ES/SOLR ● Others: cost equilibrium ○ ES/SOLR have setup and recurring costs of possession (server load) ○ SaaS has lower set-up costs, but recurring fees ○ Core search has the cost of lost opportunity
  • 44. Best current practice: NoSQL in general Drupal 8 core tries hard to be SQL-agnostic ● Every use of the DB goes through @database ○ So anything able to pass for a SQL engine may be used ○ The mongodb_dbtng, mongodb 8.x-1.x, and Drumongous projects do just that ● Even Views has a query plugin. Project efq_views (7.x, 8.x) supports NoSQL engines that way ● No service except “storage” services should receive databases ○ Write a storage service for your data, defining its interface ○ Write a SQL provider implementing it, receiving @database ○ Tag the service as “backend_overridable” ○ Core mostly does it, custom code should always do it. ● References: ○ https://www.drupal.org/project/drupal/issues/2302617 ○ https://www.drupal.org/node/2306083
  • 45. Best current practice: MongoDB ● Connecting to MongoDB with 8.x-2.x ○ Using multiple databases ? Use @mongodb.client_factory ■ The client you get is a standard mongodb/mongodb Client instance ■ You have to handle topology ○ Using single database ? Use @mongodb.database_factory ■ The database you get is a standard mongodb/mongodb Database instance ■ Your DB topology is now configurable in settings ○ You probably don’t want to use Doctrine ODM, especially when interacting with Drupal data ● Designing a custom schema ○ Start from the queries, not from some canonicalization ○ For large scale data sets, consider: ■ Splitting live and archive data for sharding ■ Having a write DB and a read DB, and a CLI-based service between them - read about CQRS ○ Never use a monotonic increasing key for sharding ○ In most cases, joined data in lists don’t need to be as up-to-date as primary views ■ Embed “light” versions of dependent objects for lists, only use $lookup and DBRef joins on full datum view
  • 46. “ “ There, I said it ! Contribution is its own reward
  • 47.
  • 48. Join us for contribution opportunities Thursday, October 31, 2019 9:00-18:00 Room: Europe Foyer 2 Mentored Contribution First Time Contributor Workshop General Contribution #DrupalContributions 9:00-14:00 Room: Diamond Lounge 9:00-18:00 Room: Europe Foyer 2
  • 49. What did you think? Locate this session at the DrupalCon Amsterdam website: https://drupal.kuoni-congress.info/2019/program/ Take the Survey! https://www.surveymonkey.com/r/DrupalConAmsterdam