Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Distilling  insights  @                                      
Arnon Rotem-­‐Gal-­‐Oz
Chief  Data  Officer
Kafka
Columnar Database
(Redshift- evaluating Vertica)
IMDG
(Ignite - evaluating Geode)
Secor
Spark
Aggregations
SparkSQL
...
Data’s  hierarchy  of  needs*
*With  apologies  to  Maslow
Acted
upon
presented
Distilled
Usable
Accessible
Exist
Exist
Kafka
Columnar Database
(Redshift- evaluating Vertica)
IMDG
(Ignite - evaluating Geode)
Secor
Spark
Aggregations
SparkSQL
...
Kafka
Columnar Database
(Redshift- evaluating Vertica)
IMDG
(Ignite - evaluating Geode)
Secor
Spark
Aggregations
SparkSQL
...
Working  off  
of  RAW  data  
“Malting”
Just  slap  SQL  on  everything  
Accessible
Kafka
Columnar Database
(Redshift- evaluating Vertica)
IMDG
(Ignite - evaluating Geode)
Secor
Spark
Aggregations
SparkSQL
...
Fermenting
Usable
Kafka
Columnar Database
(Redshift- evaluating Vertica)
IMDG
(Ignite - evaluating Geode)
Secor
Spark
Aggregations
SparkSQL
...
Distilling  
Distilled
Kafka
Columnar Database
(Redshift- evaluating Vertica)
IMDG
(Ignite - evaluating Geode)
Secor
Spark
Aggregations
SparkSQL
...
RT  insights
Predictive  
Prescriptive
Dashboards
whatnot
presented
Sidetrack:
On  use  of  Spark
Hadoop  &  Mesos
Land  data  in  a  queue
All  data  is  
time-­‐series
Enrich  with  foreign
keys  before  persisting
Analyze  and  
balance  jobs
Not  everything  is  
big  data
We’re  hiring….
jobs@appsflyer.com
Distilling Insights @ Appsflyer (Data Architecture)
Distilling Insights @ Appsflyer (Data Architecture)
Distilling Insights @ Appsflyer (Data Architecture)
Upcoming SlideShare
Loading in …5
×

Distilling Insights @ Appsflyer (Data Architecture)

1,774 views

Published on

Appsflyer's data architecture

Published in: Technology
  • Login to see the comments

Distilling Insights @ Appsflyer (Data Architecture)

  1. 1. Distilling  insights  @                                       Arnon Rotem-­‐Gal-­‐Oz Chief  Data  Officer
  2. 2. Kafka Columnar Database (Redshift- evaluating Vertica) IMDG (Ignite - evaluating Geode) Secor Spark Aggregations SparkSQL (evaluating Drill, Presto) SQL SQL Raw (sequence files) DW (parquet files) DM (Aggregations) Application dashboard Self-serve BI (TBD) Spark ETL Spark Spark ML Latest Events Scoring exploration Agg. logic Internal tools installs clicksinapplaunches Accounts
  3. 3. Data’s  hierarchy  of  needs* *With  apologies  to  Maslow Acted upon presented Distilled Usable Accessible Exist
  4. 4. Exist
  5. 5. Kafka Columnar Database (Redshift- evaluating Vertica) IMDG (Ignite - evaluating Geode) Secor Spark Aggregations SparkSQL (evaluating Drill, Presto) SQL SQL Raw (sequence files) DW (parquet files) DM (Aggregations) Application dashboard Self-serve BI (TBD) Spark ETL Spark Spark ML Latest Events Scoring exploration Agg. logic Internal tools installs clicksinapplaunches Accounts
  6. 6. Kafka Columnar Database (Redshift- evaluating Vertica) IMDG (Ignite - evaluating Geode) Secor Spark Aggregations SparkSQL (evaluating Drill, Presto) SQL SQL Raw (sequence files) DW (parquet files) DM (Aggregations) Application dashboard Self-serve BI (TBD) Spark ETL Spark Spark ML Latest Events Scoring exploration Agg. logic Internal tools installs clicksinapplaunches Accounts
  7. 7. Working  off   of  RAW  data  
  8. 8. “Malting” Just  slap  SQL  on  everything   Accessible
  9. 9. Kafka Columnar Database (Redshift- evaluating Vertica) IMDG (Ignite - evaluating Geode) Secor Spark Aggregations SparkSQL (evaluating Drill, Presto) SQL SQL Raw (sequence files) DW (parquet files) DM (Aggregations) Application dashboard Self-serve BI (TBD) Spark ETL Spark Spark ML Latest Events Scoring exploration Agg. logic Internal tools installs clicksinapplaunches Accounts
  10. 10. Fermenting Usable
  11. 11. Kafka Columnar Database (Redshift- evaluating Vertica) IMDG (Ignite - evaluating Geode) Secor Spark Aggregations SparkSQL (evaluating Drill, Presto) SQL SQL Raw (sequence files) DW (parquet files) DM (Aggregations) Application dashboard Self-serve BI (TBD) Spark ETL Spark Spark ML Latest Events Scoring exploration Agg. logic Internal tools installs clicksinapplaunches Accounts
  12. 12. Distilling   Distilled
  13. 13. Kafka Columnar Database (Redshift- evaluating Vertica) IMDG (Ignite - evaluating Geode) Secor Spark Aggregations SparkSQL (evaluating Drill, Presto) SQL SQL Raw (sequence files) DW (parquet files) DM (Aggregations) Application dashboard Self-serve BI (TBD) Spark ETL Spark Spark ML Latest Events Scoring exploration Agg. logic Internal tools installs clicksinapplaunches Accounts
  14. 14. RT  insights Predictive   Prescriptive Dashboards whatnot presented
  15. 15. Sidetrack: On  use  of  Spark
  16. 16. Hadoop  &  Mesos
  17. 17. Land  data  in  a  queue
  18. 18. All  data  is   time-­‐series
  19. 19. Enrich  with  foreign keys  before  persisting
  20. 20. Analyze  and   balance  jobs
  21. 21. Not  everything  is   big  data
  22. 22. We’re  hiring…. jobs@appsflyer.com

×