Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow

By Sudheesh Katkam
PyData New York City 2017

Dremio is a new open source project for self-service data fabric. Dremio simplifies and accelerates access to data from any source and any size, including relational databases, NoSQL, Hadoop, Parquet, and text files. We'll show you how you can use Dremio to visually curate data from any source, then access via Pandas or Jupyter notebook for rapid access.

  • Login to see the comments

Simplifying And Accelerating Data Access for Python With Dremio and Apache Arrow

  1. 1. Sudheesh Katkam Simplifying Data Access for Python
  2. 2. Introduction • Data comes in all shapes, sizes and formats • Data captured in multiple storage systems • Data takes a complex path to (Python) applications • How do we simplify access to data?
  3. 3. Demo
  4. 4. Traditional Memory Buffer Memory Layout Table
  5. 5. Traditional Memory Buffer Arrow Memory Buffer Memory Layout Table
  6. 6. Apache Arrow Goals • Cache-efficient columnar memory • Zero-copy messaging / IPC • Language-agnostic metadata • Complex/ nested schema support • Main implementations in C++ and Java, with bindings for C, Python, Ruby, JavaScript
  7. 7. Apache Arrow
  8. 8. Apache Arrow Adoption
  9. 9. About Dremio • Launched in July 2017 • Self-Service Data Platform • Apache License • Built entirely on Apache Arrow, Calcite, Parquet • Narwhal’s name is Gnarly (see me for stickers!)
  10. 10. SQL Data Virtualization RDBMS, MongoDB, Elasticsearch, Hadoop, S3, NAS, Excel, JSON Data Acceleration OLAP and ad hoc queries at interactive speed, without cubes or BI extracts Data Curation Wrangle, prepare, enrich any source without making copies of your data Data Catalog Interactive Data Discovery, Enterprise and Personal Data Assets New Tier in Analytics: Self-Service Data
  11. 11. Demo
  12. 12. Join the Community! • GitHub: • Dremio Community: • Arrow • Twitter: @ApacheArrow, @DremioHQ