SlideShare a Scribd company logo
1 of 91
Download to read offline
BUSINESS
DASHBOARDS
using Bonobo, Airflow and Grafana
makersquad.fr
makersquad.fr
Romain Dorgueil

romain@makersquad.fr


Maker ~0 years
Containers ~5 years
Python ~10 years
Web ~15 years
Linux ~20 years
Code ~25 years
rdorgueil
Intro. Product
1. Plan.
2. Implement.
3. Visualize.
4. Monitor.
5. Iterate.
Outro. References, Pointers, Sidekick
Content
DISCLAIMERS
If you build a product, do your own research.

Take time to learn and understand tools you consider.





I don’t know nothing, and I recommend nothing.

I assume things and change my mind when I hit a wall.

I’m the creator and main developper of Bonobo ETL.

I’ll try to be objective, but there is a bias here.
PRODUCT
GET https://aprc.it/api/800x600/ep2018.europython.eu/
January 2009
timedelta(years=9)
Febuary 2018
March 2018
April 2018
June 2018
July 2018
not expected
UNDER THE
HOOD
Load Balancer
TCP / L4
Janitor
(asyncio)
SpideSpideSpideSpideSpideSpideSpider
(asyncio)
“Events” Message Queue
Database
(postgres)
“Orders” Message Queue
Object
Storage
Redis
AMQP
HTTP
“CRAWL”
“CREATED”
AMQP/RabbitMQ
HTTP/HTTPS
SQL/Storage
Reverse Proxy
HTTP2 / L7
WebsiWebsite
(django)
APISeAPISeAPISeAPISeAPISeAPIServer
(tornado)
“MISS”
Local
Cache
Load Balancer
TCP / L4
HTTP
Reverse Proxy
HTTP2 / L7
HTTP/HTTPS
SQL/Storage
Prometheus
AlertManager
Grafana
Weblate
MANAGEMENT
SERVICES
Google Analytics
EXTERNAL SERVICES
Stripe
Slack
Sentry
Mailgun
Drift
MixMax
…
Prometheus
Kubernetes
RabbitMQ
Redis
PostgreSQL
NGINX + VTS
Apercite
…
PROMETHEUS EXPORTERS
Database
(postgres) +
PLAN
- Peter Drucker (?)
« If you can’t measure it,
you can’t improve it. »
« What gets measured
gets improved. »
- Peter Drucker (?)
- Take your time to choose metrics wisely.
- Cause vs Effect.
- Less is More.
- One at a time. Or one by team.
- There is not one answer to this question.
Planning
Vanity metrics will
waste your time
- What business are you in?
- What stage are you at?
- Start with a framework.
- You may build your own, later.
Planning
Pirate Metrics
Pirate Metrics(based on Ash Maurya version)
Lean Analytics(book by Alistair Croll & Benjamin Yoskovitz)
Plan A
Plan A
- What business? Software as a Service
- What stage? Empathy / Stickyness
- What metric matters?
- Rate from acquisition to activation.
- QOS (both for display and measure
improvements).
IMPLEMENT
Idea
DataSources
anything
Database
metric → value
Aggregated
(dims, metrics)
Model
Metric
(id) → name
HourlyValue

(metric, date, hour) → value
DailyValue

(metric, date) → value
1
1
n
n
Quick to write.
Not the best.
Keywords to read more: Star and Snowflake Schemas
Bonobo
Extract Transform Load
Select('''
SELECT *
FROM …
WHERE …
''')
def qualify(row):
yield (
row,
'active' if …
else 'inactive'
)
Join('''
SELECT count(*)
FROM …
WHERE uid = %(id)s
''')
def report(row):
send_email(
render(
'email.html', row
)
)
Bonobo
- Independent threads.
- Data is passed first in, first out.
- Supports any kind of directed acyclic graphs.
- Standard Python callable and iterators.
- Getting started still fits in here (ok, barely)
$ pip install bonobo
$ bonobo init somejob.py
$ python somejob.py
Bonobo
Let’s write our jobs.
Extract
from bonobo.config import use_context, Service
from bonobo_sqlalchemy.readers import Select
@use_context
class ObjectCountsReader(Select):
engine = Service('website.engine')
query = '''
SELECT count(%(0)s.id) AS cnt
FROM %(0)s
'''
output_fields = ['dims', 'metrics']
def formatter(self, input_row, row):
now = datetime.datetime.now()
return ({
'date': now.date(),
'hour': now.hour,
}, {
'objects.{}.count'.format(input_row[1]): row['cnt']
})
… counts from website’s database
Extract
TABLES_METRICS = {
AsIs('apercite_account_user'): 'users',
AsIs('apercite_account_userprofile'): 'profiles',
AsIs('apercite_account_apikey'): 'apikeys',
}
def get_readers():
return [
TABLES_METRICS.items(),
ObjectCountsReader(),
]
… counts from website’s database
Normalize
bonobo.SetFields(['dims', 'metrics'])
All data should look the same
Load
class AnalyticsWriter(InsertOrUpdate):
dims = Option(required=True)
filter = Option(required=True)
@property
def discriminant(self):
return ('metric_id', *self.dims)
def get_or_create_metrics(self, context, connection, metrics):
…
def __call__(self, connection, table, buffer, context, row, engine):
dims, metrics = row
if not self.filter(dims, metrics): return
# Get database rows for metric objects.
db_metrics_ids = self.get_or_create_metrics(context, connection, metrics)
# Insert or update values.
for metric, value in metrics.items():
yield from self._put(table, connection, buffer, {
'metric_id': db_metrics_ids[metric],
**{dim: dims[dim] for dim in self.dims},
'value': value,
})
Compose
def get_graph():
normalize = bonobo.SetFields(['dims', 'metrics'])
graph = bonobo.Graph(*get_readers(), normalize)
graph.add_chain(
AnalyticsWriter(
table_name=HourlyValue.__tablename__,
dims=('date', 'hour',),
filter=lambda dims, metrics: 'hour' in dims,
name='Hourly',
), _input=normalize
)
graph.add_chain(
AnalyticsWriter(
table_name=DailyValue.__tablename__,
dims=('date',),
filter=lambda dims, metrics: 'hour' not in dims,
name='Daily',
),
_input=normalize
)
return graph
Configure
def get_services():
return {
'sqlalchemy.engine': EventsDatabase().create_engine(),
'website.engine': WebsiteDatabase().create_engine(),
}
bonobo inspect --graph job.py | dot -o graph.png -T png
Inspect
Run
$ python -m apercite.analytics read objects --write
- dict_items in=1 out=3 [done]
- ObjectCountsReader in=3 out=3 [done]
- SetFields(['dims', 'metrics']) in=3 out=3 [done]
- HourlyAnalyticsWriter in=3 out=3 [done]
- DailyAnalyticsWriter in=3 [done]
Got it.
Let’s add readers.
We’ll run through, you’ll have the code.
Google Analytics
@use('google_analytics')
def read_analytics(google_analytics):
reports = google_analytics.reports().batchGet(
body={…}
).execute().get('reports', [])
for report in reports:
dimensions = report['columnHeader']['dimensions']
metrics = report[‘columnHeader']['metricHeader']['metricHeaderEntries']
rows = report['data']['rows']
for row in rows:
dim_values = zip(dimensions, row['dimensions'])
yield (
{
GOOGLE_ANALYTICS_DIMENSIONS.get(dim, [dim])[0]:
GOOGLE_ANALYTICS_DIMENSIONS.get(dim, [None, IDENTITY])[1](val)
for dim, val in dim_values
},
{
GOOGLE_ANALYTICS_METRICS.get(metric['name'], metric['name']):
GOOGLE_ANALYTICS_TYPES[metric['type']](value)
for metric, value in zip(metrics, row['metrics'][0]['values'])
},
)
Prometheus
class PrometheusReader(Configurable):
http = Service('http')
endpoint = 'http://{}:{}/api/v1'.format(PROMETHEUS_HOST, PROMETHEUS_PORT)
queries = […]
def __call__(self, *, http):
start_at, end_at = self.get_timerange()
for query in self.queries:
for result in http.get(…).json().get('data', {}).get('result', []):
metric = result.get('metric', {})
for ts, val in result.get('values', []):
name = query.target.format(**metric)
_date, _hour = …
yield {
'date': _date,
'hour': _hour,
}, {
name: float(val)
}
Spider counts
class SpidersReader(Select):
kwargs = Option()
output_fields = ['row']
@property
def query(self):
return '''
SELECT spider.value AS name,
spider.created_at AS created_at,
spider_status.attributes AS attributes,
spider_status.created_at AS updated_at
FROM
spider
JOIN …
WHERE spider_status.created_at > %(now)s
ORDER BY spider_status.created_at DESC
'''
def formatter(self, input_row, row):
return (row, )
Spider counts
def spider_reducer(self, left, right):
result = dict(left)
result['spider.total'] += len(right.attributes)
for worker in right.attributes:
if 'stage' in worker:
result['spider.active'] += 1
else:
result['spider.idle'] += 1
return result
Spider counts
now = datetime.datetime.utcnow() - datetime.timedelta(minutes=30)
def get_readers():
return (
SpidersReader(kwargs={'now': now}),
Reduce(spider_reducer, initializer={
'spider.idle': 0,
'spider.active': 0,
'spider.total': 0,
}),
(lambda x: ({'date': now.date(), 'hour': now.hour}, x))
)
You got the idea.
Inspect
We can generate ETL graphs with all readers or only a few.
Run
$ python -m apercite.analytics read all --write
- read_analytics in=1 out=91 [done]
- EventsReader in=1 out=27 [done]
- EventsTimingsReader in=1 out=2039 [done]
- group_timings in=2039 out=24 [done]
- format_timings_for_metrics in=24 out=24 [done]
- SpidersReader in=1 out=1 [done]
- Reduce in=1 out=1 [done]
- <lambda> in=1 out=1 [done]
- PrometheusReader in=1 out=3274 [done]
- dict_items in=1 out=3 [done]
- ObjectCountsReader in=3 out=3 [done]
- SetFields(['dims', 'metrics']) in=3420 out=3420 [done]
- HourlyAnalyticsWriter in=3420 out=3562 [done]
- DailyAnalyticsWriter in=3420 out=182 [done]
Easy to build.
Easy to add or replace parts.
Easy to run.
Told ya, slight bias.
VISUALIZE
Grafana
Analytics & Monitoring
Dashboards
Quality of Service
Quality of Service
Quality of Service
Public Dashboards
Acquisition Rate
+
User Counts New Sessions
Acquisition Rate
We’re just getting started.
MONITOR
Iteration 0
- Cron job runs everything every 30
minutes.
- No way to know if something fails.
- Expensive tasks.
- Hard to run manually.
I’ll handle that.
Monitoring?
Airflow




«Airflow is a platform to
programmatically author,
schedule and monitor
workflows.»
- Official docs
Airflow
- Created by Airbnb, joined Apache incubation.
- Schedules & monitor jobs.
- Distribute workloads through Celery, Dask, K8S…
- Can run anything, not just Python.
Airflow
Webserver Scheduler
Worker
Metadata
Worker
Worker
Worker
Simplified to show high-level concept.

Depends on executor (celery, dask, k8s, local, sequential …)
DAGsimport shlex
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
def _get_bash_command(*args, module='apercite.analytics'):
return '(cd /usr/local/apercite; /usr/local/env/bin/python -m {} {})'.format(
module, ' '.join(map(shlex.quote, args)),
)
def build_dag(name, *args, schedule_interval='@hourly'):
dag = DAG(
name,
schedule_interval=schedule_interval,
default_args=default_args,
catchup=False,
)
dag >> BashOperator(
dag=dag,
task_id=args[0],
bash_command=_get_bash_command(*args),
env=env,
)
return dag
DAGs
# Build datasource-to-metrics-db related dags.
for source in ('google-analytics', 'events', 'events-timings', 'spiders',
'prometheus', 'objects'):
name = 'apercite.analytics.' + source.replace('-', '_')
globals()[name] = build_dag(name, 'read', source, '--write')
# Cleanup dag.
name = 'apercite.analytics.cleanup'
globals()[name] = build_dag(name, 'clean', 'all', schedule_interval='@daily')
Data Sources
from airflow.models import Connection
from airflow.settings import Session
session = Session()
website = session.query(Connection).filter_by(conn_id='apercite_website').first()
events = session.query(Connection).filter_by(conn_id='apercite_events').first()
session.close()
env = {}
if website:
env['DATABASE_HOST'] = str(website.host)
env['DATABASE_PORT'] = str(website.port)
env['DATABASE_USER'] = str(website.login)
env['DATABASE_NAME'] = str(website.schema)
env['DATABASE_PASSWORD'] = str(website.password)
if events:
env['EVENT_DATABASE_USER'] = str(events.login)
env['EVENT_DATABASE_NAME'] = str(events.schema)
env['EVENT_DATABASE_PASSWORD'] = str(events.password)
Warning: sub-optimal
Airflow
- Where to store the dags ?
- Build : separate virtualenv
- Everything we do run locally
- Deployment
Learnings
- Multiple services, not trivial
- Helm charts :-(
- Astronomer Distro :-)
- Read the Source, Luke
ITERATE
Plan N+1
- Create a framework for
experiments.
- Timebox constraint
- Objective & Key Result
- Decide upon results.
Plan N+1read: Scaling Lean, by Ash Maurya
Tech Side
- Month on Month
- Year on Year
- % Growth
Ideas …
- Revenue (stripe, billing …)
- Trafic & SEO (analytics, console …)
- Conversion (AARRR)
- Quality of Service
- Processing (AMQP, …)
- Service Level (HTTP statuses, Time of Requests …)
- Vanity metrics
- Business metrics
Pick one.

Rince.

Repeat.
OUTRO
Airflow helps you manage the whole factory.

Does not care about the jobs’ content.

Bonobo helps you build assembly lines.

Does not care about the surrounding factory.
References
Feedback

+

Bonobo ETL Sprint
Slides, resources, feedback …
apercite.fr/europython
romain@makersquad.fr rdorgueil

More Related Content

What's hot

ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方Yoshiyasu SAEKI
 
eStargzイメージとlazy pullingによる高速なコンテナ起動
eStargzイメージとlazy pullingによる高速なコンテナ起動eStargzイメージとlazy pullingによる高速なコンテナ起動
eStargzイメージとlazy pullingによる高速なコンテナ起動Kohei Tokunaga
 
バイトコードって言葉をよく目にするけど一体何なんだろう?(JJUG CCC 2022 Spring 発表資料)
バイトコードって言葉をよく目にするけど一体何なんだろう?(JJUG CCC 2022 Spring 発表資料)バイトコードって言葉をよく目にするけど一体何なんだろう?(JJUG CCC 2022 Spring 発表資料)
バイトコードって言葉をよく目にするけど一体何なんだろう?(JJUG CCC 2022 Spring 発表資料)NTT DATA Technology & Innovation
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and ThenSATOSHI TAGOMORI
 
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)NTT DATA Technology & Innovation
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkKazuaki Ishizaki
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
 
PostgreSQL開発コミュニティに参加しよう! ~2022年版~(Open Source Conference 2022 Online/Kyoto 発...
PostgreSQL開発コミュニティに参加しよう! ~2022年版~(Open Source Conference 2022 Online/Kyoto 発...PostgreSQL開発コミュニティに参加しよう! ~2022年版~(Open Source Conference 2022 Online/Kyoto 発...
PostgreSQL開発コミュニティに参加しよう! ~2022年版~(Open Source Conference 2022 Online/Kyoto 発...NTT DATA Technology & Innovation
 
Distributed Tracing in Practice
Distributed Tracing in PracticeDistributed Tracing in Practice
Distributed Tracing in PracticeDevOps.com
 
Kernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uringKernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uringAnne Nicolas
 
Automated CloudStack Deployment
Automated CloudStack DeploymentAutomated CloudStack Deployment
Automated CloudStack DeploymentShapeBlue
 
Opentelemetry - From frontend to backend
Opentelemetry - From frontend to backendOpentelemetry - From frontend to backend
Opentelemetry - From frontend to backendSebastian Poxhofer
 
DPDKによる高速コンテナネットワーキング
DPDKによる高速コンテナネットワーキングDPDKによる高速コンテナネットワーキング
DPDKによる高速コンテナネットワーキングTomoya Hibi
 
How to Avoid the Top 5 NGINX Configuration Mistakes
How to Avoid the Top 5 NGINX Configuration MistakesHow to Avoid the Top 5 NGINX Configuration Mistakes
How to Avoid the Top 5 NGINX Configuration MistakesNGINX, Inc.
 
ML2/OVN アーキテクチャ概観
ML2/OVN アーキテクチャ概観ML2/OVN アーキテクチャ概観
ML2/OVN アーキテクチャ概観Yamato Tanaka
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019Brendan Gregg
 
Observability in Java: Getting Started with OpenTelemetry
Observability in Java: Getting Started with OpenTelemetryObservability in Java: Getting Started with OpenTelemetry
Observability in Java: Getting Started with OpenTelemetryDevOps.com
 

What's hot (20)

Linux Network Stack
Linux Network StackLinux Network Stack
Linux Network Stack
 
ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方ストリーム処理を支えるキューイングシステムの選び方
ストリーム処理を支えるキューイングシステムの選び方
 
eStargzイメージとlazy pullingによる高速なコンテナ起動
eStargzイメージとlazy pullingによる高速なコンテナ起動eStargzイメージとlazy pullingによる高速なコンテナ起動
eStargzイメージとlazy pullingによる高速なコンテナ起動
 
バイトコードって言葉をよく目にするけど一体何なんだろう?(JJUG CCC 2022 Spring 発表資料)
バイトコードって言葉をよく目にするけど一体何なんだろう?(JJUG CCC 2022 Spring 発表資料)バイトコードって言葉をよく目にするけど一体何なんだろう?(JJUG CCC 2022 Spring 発表資料)
バイトコードって言葉をよく目にするけど一体何なんだろう?(JJUG CCC 2022 Spring 発表資料)
 
Fluentd Overview, Now and Then
Fluentd Overview, Now and ThenFluentd Overview, Now and Then
Fluentd Overview, Now and Then
 
TripleO Deep Dive 1.1
TripleO Deep Dive 1.1TripleO Deep Dive 1.1
TripleO Deep Dive 1.1
 
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
 
Enabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache SparkEnabling Vectorized Engine in Apache Spark
Enabling Vectorized Engine in Apache Spark
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
 
PostgreSQL開発コミュニティに参加しよう! ~2022年版~(Open Source Conference 2022 Online/Kyoto 発...
PostgreSQL開発コミュニティに参加しよう! ~2022年版~(Open Source Conference 2022 Online/Kyoto 発...PostgreSQL開発コミュニティに参加しよう! ~2022年版~(Open Source Conference 2022 Online/Kyoto 発...
PostgreSQL開発コミュニティに参加しよう! ~2022年版~(Open Source Conference 2022 Online/Kyoto 発...
 
Distributed Tracing in Practice
Distributed Tracing in PracticeDistributed Tracing in Practice
Distributed Tracing in Practice
 
Kernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uringKernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uring
 
Automated CloudStack Deployment
Automated CloudStack DeploymentAutomated CloudStack Deployment
Automated CloudStack Deployment
 
Opentelemetry - From frontend to backend
Opentelemetry - From frontend to backendOpentelemetry - From frontend to backend
Opentelemetry - From frontend to backend
 
DPDKによる高速コンテナネットワーキング
DPDKによる高速コンテナネットワーキングDPDKによる高速コンテナネットワーキング
DPDKによる高速コンテナネットワーキング
 
How to Avoid the Top 5 NGINX Configuration Mistakes
How to Avoid the Top 5 NGINX Configuration MistakesHow to Avoid the Top 5 NGINX Configuration Mistakes
How to Avoid the Top 5 NGINX Configuration Mistakes
 
ML2/OVN アーキテクチャ概観
ML2/OVN アーキテクチャ概観ML2/OVN アーキテクチャ概観
ML2/OVN アーキテクチャ概観
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
eBPF Perf Tools 2019
eBPF Perf Tools 2019eBPF Perf Tools 2019
eBPF Perf Tools 2019
 
Observability in Java: Getting Started with OpenTelemetry
Observability in Java: Getting Started with OpenTelemetryObservability in Java: Getting Started with OpenTelemetry
Observability in Java: Getting Started with OpenTelemetry
 

Similar to Business Dashboards using Bonobo ETL, Grafana and Apache Airflow

Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Simple ETL in python 3.5+ with Bonobo, Romain DorgueilSimple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Simple ETL in python 3.5+ with Bonobo, Romain DorgueilPôle Systematic Paris-Region
 
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Romain Dorgueil
 
Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Dieter Plaetinck
 
Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...Andreas Dewes
 
Cluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CCluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CSteffen Wenz
 
Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015Johann de Boer
 
Luigi presentation NYC Data Science
Luigi presentation NYC Data ScienceLuigi presentation NYC Data Science
Luigi presentation NYC Data ScienceErik Bernhardsson
 
All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2goMoriyoshi Koizumi
 
Fast REST APIs Development with MongoDB
Fast REST APIs Development with MongoDBFast REST APIs Development with MongoDB
Fast REST APIs Development with MongoDBMongoDB
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak PROIDEA
 
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiMonitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiInfluxData
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow규영 허
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentationJoseph Adler
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)Paul Chao
 
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...GeeksLab Odessa
 
An Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAn Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAdam Getchell
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisC4Media
 
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...InfluxData
 
Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Stefan Urbanek
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesSadayuki Furuhashi
 

Similar to Business Dashboards using Bonobo ETL, Grafana and Apache Airflow (20)

Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Simple ETL in python 3.5+ with Bonobo, Romain DorgueilSimple ETL in python 3.5+ with Bonobo, Romain Dorgueil
Simple ETL in python 3.5+ with Bonobo, Romain Dorgueil
 
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017Simple ETL in python 3.5+ with Bonobo - PyParis 2017
Simple ETL in python 3.5+ with Bonobo - PyParis 2017
 
Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014
 
Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...Code is not text! How graph technologies can help us to understand our code b...
Code is not text! How graph technologies can help us to understand our code b...
 
Cluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in CCluj.py Meetup: Extending Python in C
Cluj.py Meetup: Extending Python in C
 
Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015Digital analytics with R - Sydney Users of R Forum - May 2015
Digital analytics with R - Sydney Users of R Forum - May 2015
 
Luigi presentation NYC Data Science
Luigi presentation NYC Data ScienceLuigi presentation NYC Data Science
Luigi presentation NYC Data Science
 
All I know about rsc.io/c2go
All I know about rsc.io/c2goAll I know about rsc.io/c2go
All I know about rsc.io/c2go
 
Fast REST APIs Development with MongoDB
Fast REST APIs Development with MongoDBFast REST APIs Development with MongoDB
Fast REST APIs Development with MongoDB
 
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak   CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
 
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry PiMonitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
Monitoring Your ISP Using InfluxDB Cloud and Raspberry Pi
 
Swift for tensorflow
Swift for tensorflowSwift for tensorflow
Swift for tensorflow
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentation
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
Java/Scala Lab: Анатолий Кметюк - Scala SubScript: Алгебра для реактивного пр...
 
An Overview Of Python With Functional Programming
An Overview Of Python With Functional ProgrammingAn Overview Of Python With Functional Programming
An Overview Of Python With Functional Programming
 
Beyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic AnalysisBeyond Breakpoints: A Tour of Dynamic Analysis
Beyond Breakpoints: A Tour of Dynamic Analysis
 
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...
 
Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 

Recently uploaded

WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benonimasabamasaba
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 

Recently uploaded (20)

WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 

Business Dashboards using Bonobo ETL, Grafana and Apache Airflow

  • 1. BUSINESS DASHBOARDS using Bonobo, Airflow and Grafana makersquad.fr
  • 2. makersquad.fr Romain Dorgueil
 romain@makersquad.fr 
 Maker ~0 years Containers ~5 years Python ~10 years Web ~15 years Linux ~20 years Code ~25 years rdorgueil
  • 3. Intro. Product 1. Plan. 2. Implement. 3. Visualize. 4. Monitor. 5. Iterate. Outro. References, Pointers, Sidekick Content
  • 4. DISCLAIMERS If you build a product, do your own research.
 Take time to learn and understand tools you consider.
 
 
 I don’t know nothing, and I recommend nothing.
 I assume things and change my mind when I hit a wall.
 I’m the creator and main developper of Bonobo ETL.
 I’ll try to be objective, but there is a bias here.
  • 16. Load Balancer TCP / L4 Janitor (asyncio) SpideSpideSpideSpideSpideSpideSpider (asyncio) “Events” Message Queue Database (postgres) “Orders” Message Queue Object Storage Redis AMQP HTTP “CRAWL” “CREATED” AMQP/RabbitMQ HTTP/HTTPS SQL/Storage Reverse Proxy HTTP2 / L7 WebsiWebsite (django) APISeAPISeAPISeAPISeAPISeAPIServer (tornado) “MISS” Local Cache
  • 17. Load Balancer TCP / L4 HTTP Reverse Proxy HTTP2 / L7 HTTP/HTTPS SQL/Storage Prometheus AlertManager Grafana Weblate MANAGEMENT SERVICES Google Analytics EXTERNAL SERVICES Stripe Slack Sentry Mailgun Drift MixMax … Prometheus Kubernetes RabbitMQ Redis PostgreSQL NGINX + VTS Apercite … PROMETHEUS EXPORTERS Database (postgres) +
  • 18.
  • 19.
  • 20.
  • 21. PLAN
  • 22. - Peter Drucker (?) « If you can’t measure it, you can’t improve it. »
  • 23. « What gets measured gets improved. » - Peter Drucker (?)
  • 24. - Take your time to choose metrics wisely. - Cause vs Effect. - Less is More. - One at a time. Or one by team. - There is not one answer to this question. Planning
  • 26. - What business are you in? - What stage are you at? - Start with a framework. - You may build your own, later. Planning
  • 28. Pirate Metrics(based on Ash Maurya version)
  • 29. Lean Analytics(book by Alistair Croll & Benjamin Yoskovitz)
  • 31. Plan A - What business? Software as a Service - What stage? Empathy / Stickyness - What metric matters? - Rate from acquisition to activation. - QOS (both for display and measure improvements).
  • 34. Model Metric (id) → name HourlyValue
 (metric, date, hour) → value DailyValue
 (metric, date) → value 1 1 n n
  • 35. Quick to write. Not the best. Keywords to read more: Star and Snowflake Schemas
  • 37. Select(''' SELECT * FROM … WHERE … ''') def qualify(row): yield ( row, 'active' if … else 'inactive' ) Join(''' SELECT count(*) FROM … WHERE uid = %(id)s ''') def report(row): send_email( render( 'email.html', row ) ) Bonobo
  • 38. - Independent threads. - Data is passed first in, first out. - Supports any kind of directed acyclic graphs. - Standard Python callable and iterators. - Getting started still fits in here (ok, barely) $ pip install bonobo $ bonobo init somejob.py $ python somejob.py Bonobo
  • 40. Extract from bonobo.config import use_context, Service from bonobo_sqlalchemy.readers import Select @use_context class ObjectCountsReader(Select): engine = Service('website.engine') query = ''' SELECT count(%(0)s.id) AS cnt FROM %(0)s ''' output_fields = ['dims', 'metrics'] def formatter(self, input_row, row): now = datetime.datetime.now() return ({ 'date': now.date(), 'hour': now.hour, }, { 'objects.{}.count'.format(input_row[1]): row['cnt'] }) … counts from website’s database
  • 41. Extract TABLES_METRICS = { AsIs('apercite_account_user'): 'users', AsIs('apercite_account_userprofile'): 'profiles', AsIs('apercite_account_apikey'): 'apikeys', } def get_readers(): return [ TABLES_METRICS.items(), ObjectCountsReader(), ] … counts from website’s database
  • 43. Load class AnalyticsWriter(InsertOrUpdate): dims = Option(required=True) filter = Option(required=True) @property def discriminant(self): return ('metric_id', *self.dims) def get_or_create_metrics(self, context, connection, metrics): … def __call__(self, connection, table, buffer, context, row, engine): dims, metrics = row if not self.filter(dims, metrics): return # Get database rows for metric objects. db_metrics_ids = self.get_or_create_metrics(context, connection, metrics) # Insert or update values. for metric, value in metrics.items(): yield from self._put(table, connection, buffer, { 'metric_id': db_metrics_ids[metric], **{dim: dims[dim] for dim in self.dims}, 'value': value, })
  • 44. Compose def get_graph(): normalize = bonobo.SetFields(['dims', 'metrics']) graph = bonobo.Graph(*get_readers(), normalize) graph.add_chain( AnalyticsWriter( table_name=HourlyValue.__tablename__, dims=('date', 'hour',), filter=lambda dims, metrics: 'hour' in dims, name='Hourly', ), _input=normalize ) graph.add_chain( AnalyticsWriter( table_name=DailyValue.__tablename__, dims=('date',), filter=lambda dims, metrics: 'hour' not in dims, name='Daily', ), _input=normalize ) return graph
  • 45. Configure def get_services(): return { 'sqlalchemy.engine': EventsDatabase().create_engine(), 'website.engine': WebsiteDatabase().create_engine(), }
  • 46. bonobo inspect --graph job.py | dot -o graph.png -T png Inspect
  • 47. Run $ python -m apercite.analytics read objects --write - dict_items in=1 out=3 [done] - ObjectCountsReader in=3 out=3 [done] - SetFields(['dims', 'metrics']) in=3 out=3 [done] - HourlyAnalyticsWriter in=3 out=3 [done] - DailyAnalyticsWriter in=3 [done]
  • 48. Got it. Let’s add readers. We’ll run through, you’ll have the code.
  • 49. Google Analytics @use('google_analytics') def read_analytics(google_analytics): reports = google_analytics.reports().batchGet( body={…} ).execute().get('reports', []) for report in reports: dimensions = report['columnHeader']['dimensions'] metrics = report[‘columnHeader']['metricHeader']['metricHeaderEntries'] rows = report['data']['rows'] for row in rows: dim_values = zip(dimensions, row['dimensions']) yield ( { GOOGLE_ANALYTICS_DIMENSIONS.get(dim, [dim])[0]: GOOGLE_ANALYTICS_DIMENSIONS.get(dim, [None, IDENTITY])[1](val) for dim, val in dim_values }, { GOOGLE_ANALYTICS_METRICS.get(metric['name'], metric['name']): GOOGLE_ANALYTICS_TYPES[metric['type']](value) for metric, value in zip(metrics, row['metrics'][0]['values']) }, )
  • 50. Prometheus class PrometheusReader(Configurable): http = Service('http') endpoint = 'http://{}:{}/api/v1'.format(PROMETHEUS_HOST, PROMETHEUS_PORT) queries = […] def __call__(self, *, http): start_at, end_at = self.get_timerange() for query in self.queries: for result in http.get(…).json().get('data', {}).get('result', []): metric = result.get('metric', {}) for ts, val in result.get('values', []): name = query.target.format(**metric) _date, _hour = … yield { 'date': _date, 'hour': _hour, }, { name: float(val) }
  • 51. Spider counts class SpidersReader(Select): kwargs = Option() output_fields = ['row'] @property def query(self): return ''' SELECT spider.value AS name, spider.created_at AS created_at, spider_status.attributes AS attributes, spider_status.created_at AS updated_at FROM spider JOIN … WHERE spider_status.created_at > %(now)s ORDER BY spider_status.created_at DESC ''' def formatter(self, input_row, row): return (row, )
  • 52. Spider counts def spider_reducer(self, left, right): result = dict(left) result['spider.total'] += len(right.attributes) for worker in right.attributes: if 'stage' in worker: result['spider.active'] += 1 else: result['spider.idle'] += 1 return result
  • 53. Spider counts now = datetime.datetime.utcnow() - datetime.timedelta(minutes=30) def get_readers(): return ( SpidersReader(kwargs={'now': now}), Reduce(spider_reducer, initializer={ 'spider.idle': 0, 'spider.active': 0, 'spider.total': 0, }), (lambda x: ({'date': now.date(), 'hour': now.hour}, x)) )
  • 54. You got the idea.
  • 55. Inspect We can generate ETL graphs with all readers or only a few.
  • 56. Run $ python -m apercite.analytics read all --write - read_analytics in=1 out=91 [done] - EventsReader in=1 out=27 [done] - EventsTimingsReader in=1 out=2039 [done] - group_timings in=2039 out=24 [done] - format_timings_for_metrics in=24 out=24 [done] - SpidersReader in=1 out=1 [done] - Reduce in=1 out=1 [done] - <lambda> in=1 out=1 [done] - PrometheusReader in=1 out=3274 [done] - dict_items in=1 out=3 [done] - ObjectCountsReader in=3 out=3 [done] - SetFields(['dims', 'metrics']) in=3420 out=3420 [done] - HourlyAnalyticsWriter in=3420 out=3562 [done] - DailyAnalyticsWriter in=3420 out=182 [done]
  • 57. Easy to build. Easy to add or replace parts. Easy to run. Told ya, slight bias.
  • 69. Iteration 0 - Cron job runs everything every 30 minutes. - No way to know if something fails. - Expensive tasks. - Hard to run manually.
  • 71. Airflow 
 
 «Airflow is a platform to programmatically author, schedule and monitor workflows.» - Official docs
  • 72. Airflow - Created by Airbnb, joined Apache incubation. - Schedules & monitor jobs. - Distribute workloads through Celery, Dask, K8S… - Can run anything, not just Python.
  • 73. Airflow Webserver Scheduler Worker Metadata Worker Worker Worker Simplified to show high-level concept.
 Depends on executor (celery, dask, k8s, local, sequential …)
  • 74.
  • 75. DAGsimport shlex from airflow import DAG from airflow.operators.bash_operator import BashOperator def _get_bash_command(*args, module='apercite.analytics'): return '(cd /usr/local/apercite; /usr/local/env/bin/python -m {} {})'.format( module, ' '.join(map(shlex.quote, args)), ) def build_dag(name, *args, schedule_interval='@hourly'): dag = DAG( name, schedule_interval=schedule_interval, default_args=default_args, catchup=False, ) dag >> BashOperator( dag=dag, task_id=args[0], bash_command=_get_bash_command(*args), env=env, ) return dag
  • 76. DAGs # Build datasource-to-metrics-db related dags. for source in ('google-analytics', 'events', 'events-timings', 'spiders', 'prometheus', 'objects'): name = 'apercite.analytics.' + source.replace('-', '_') globals()[name] = build_dag(name, 'read', source, '--write') # Cleanup dag. name = 'apercite.analytics.cleanup' globals()[name] = build_dag(name, 'clean', 'all', schedule_interval='@daily')
  • 77.
  • 78. Data Sources from airflow.models import Connection from airflow.settings import Session session = Session() website = session.query(Connection).filter_by(conn_id='apercite_website').first() events = session.query(Connection).filter_by(conn_id='apercite_events').first() session.close() env = {} if website: env['DATABASE_HOST'] = str(website.host) env['DATABASE_PORT'] = str(website.port) env['DATABASE_USER'] = str(website.login) env['DATABASE_NAME'] = str(website.schema) env['DATABASE_PASSWORD'] = str(website.password) if events: env['EVENT_DATABASE_USER'] = str(events.login) env['EVENT_DATABASE_NAME'] = str(events.schema) env['EVENT_DATABASE_PASSWORD'] = str(events.password) Warning: sub-optimal
  • 79. Airflow - Where to store the dags ? - Build : separate virtualenv - Everything we do run locally - Deployment
  • 80. Learnings - Multiple services, not trivial - Helm charts :-( - Astronomer Distro :-) - Read the Source, Luke
  • 82. Plan N+1 - Create a framework for experiments. - Timebox constraint - Objective & Key Result - Decide upon results.
  • 83. Plan N+1read: Scaling Lean, by Ash Maurya
  • 84. Tech Side - Month on Month - Year on Year - % Growth
  • 85. Ideas … - Revenue (stripe, billing …) - Trafic & SEO (analytics, console …) - Conversion (AARRR) - Quality of Service - Processing (AMQP, …) - Service Level (HTTP statuses, Time of Requests …) - Vanity metrics - Business metrics
  • 87. OUTRO
  • 88. Airflow helps you manage the whole factory.
 Does not care about the jobs’ content.
 Bonobo helps you build assembly lines.
 Does not care about the surrounding factory.
  • 91. Slides, resources, feedback … apercite.fr/europython romain@makersquad.fr rdorgueil