Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Incrementally streaming rdbms data to your data lake automagically

274 views

Published on

Incrementally streaming rdbms data to your data lake automagically using Apache NiFi to load Oracle data to Apache Hive, Apache Kudu, Apache Impala, Apache HDFS

Published in: Technology
  • Be the first to comment

Incrementally streaming rdbms data to your data lake automagically

  1. 1. APACHECON @HOME Spt, 29th – Oct. 1st 2020
  2. 2. APACHECON NA Spt, 28th – Oct. 2nd 2020
  3. 3. APACHECON NA Spt, 28th – Oct. 2nd 2020
  4. 4. • • • • Incrementally Streaming RDBMS Data
  5. 5. 5 Future of Data - Princeton @PaasDev https://www.meetup.com/futureofdata-princeton/ From Big Data to AI to Streaming to Containers to Cloud to Analytics to Cloud Storage to Fast Data to Machine Learning to Microservices to ...
  6. 6. 6 Speakers John Kuchmek Senior Solutions Engineer
  7. 7. 7 Speakers Tim Spann Principal DataFlow Field Engineer @PaasDev DZone Zone Leader and Big Data MVB Princeton NJ Future of Data Meetup https://github.com/tspannhw https://www.datainmotion.dev/
  8. 8. 8 JDBC Database to Apache Kudu / JDBC Database to HDFS and Hive
  9. 9. 9 Trillions of Messages to SQL Databases and Data Warehouses PROCESS DELIVER
  10. 10. 10 https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache_15.html
  11. 11. 11 Reference Architecture Files to RDBMS
  12. 12. 12 Messages to Databases
  13. 13. ‘QueryRecord’ Processor https://medium.com/@abdelkrim.hadjidj/democratizing-nifi-record-processors-with-automatic-schemas-inferenc e-4f2b2794c427
  14. 14. ADVANCED XML PROCESSING https://www.datainmotion.dev/2019/03/advanced-xml-processing-with-apache.html https://pierrevillard.com/2018/06/28/nifi-1-7-xml-reader-writer-and-forkrecord-processor/
  15. 15. • Example • Flat files on an FTP server named by date • Downloads file • HTTP REST API endpoint • Invokes API and downloads data • Legacy/Remote DB • Performs SQL queries 1 5 DBCP Connection Pool to remote SQL Server ExecuteSQLRecord processor
  16. 16. • 1 6 https://www.datainmotion.dev/2019/03/implementing-streaming-use-case-from.html
  17. 17. INGEST RDBMS TABLES https://community.cloudera.com/t5/Community-Articles/Incrementally-Streaming-RDBMS-Data-to-Your-Hadoop-DataLake/ta-p/247927
  18. 18. https://dzone.com/articles/lets-build-a-simple-ingest-to-cloud-data-warehouse

×