Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Localized Hadoop Development

Slides from the January 2021 St. Louis Big Data IDEA meeting by Tim Bytnar regarding using Docker containers for a localized Hadoop development cluster.

  • Be the first to comment

  • Be the first to like this

Localized Hadoop Development

  1. 1. Localized Hadoop Development How to get up and running quickly by Tim Bytnar This Photo by Unknown Author is licensed under CC BY-SA
  2. 2. Tim Bytnar 17 years in the industry Data Engineering Microsoft Development and Application Stack Systems Automation Datacenter Infrastructure Network Engineering Email: Tim.Bytnar@Daugherty.com LinkedIn: https://www.linkedin.com/in/timbytnar/ I have not failed. I've just found 10,000 ways that won't work. - Thomas A. Edison
  3. 3. What is the problem? Hadoop development has a steep requirement of having access to an environment that allows you to freely explore the overwhelming ecosystem
  4. 4. Are there other options? CLOUD PROVIDER “FREE” TIME BOOK LEARNING OR VIDEO TRAINING HOME LAB (IF YOU HAVE ONE OF THESE LYING AROUND LIKE I DON’T)
  5. 5. What do you propose? This Photo by Unknown Author is licensed under CC BY-SA
  6. 6. Dockerized Hadoop and Spark Environments
  7. 7. What the environment is for. • Learning Hadoop! • Developing… • BASH Scripts • Hive Automations • Spark Processing • Data Analysis (Tableau, PowerBI, Jupyter, etc…) • Rapid Proof of Concept • Will this dataset work in Hadoop? • What advantages would Spark give me for this workload?
  8. 8. What the environment is NOT for.
  9. 9. Demo Time
  10. 10. How to get started? > git clone https://github.com/tbytnar/docker-hive.git
  11. 11. Want any help? The repository is public and open for pull requests or forks Future Plans • Keep it updated • Add more modularity • Add walkthroughs and challenges • Improve Cross-platform Portability • Baseline Performance Optimized Version
  12. 12. Questions and Answers
  13. 13. Tim Bytnar Email: Tim.Bytnar@Daugherty.com LinkedIn: https://www.linkedin.com/in/timbytnar/ > git clone https://github.com/tbytnar/docker-hive.git Thank you to: Ivan Ermilov and his team at Big Data Europe http://github.com/big-data-europe/docker-hadoop http://github.com/big-data-europe/docker-hive

    Be the first to comment

Slides from the January 2021 St. Louis Big Data IDEA meeting by Tim Bytnar regarding using Docker containers for a localized Hadoop development cluster.

Views

Total views

271

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

2

Shares

0

Comments

0

Likes

0

×