Mallikarjuna Gandhamsetty

2 Followers

I am a passionate Big Data Engineer who loves building large scale real time data processing pipelines. Current working as Big Data Engineer developing large scale real time data processing pipelines in Microsoft Azure Cloud using Cloudera CDH tools that includes Hadoop, Spark, Kudu, kafka, Flume, HDFS, YARN, Azure SQL Data warehouse, Azure SQL Database etc. Prior to this, worked on Data Warehouse migration from Oracle to Hadoop ecosystem on Azure using Spark as ETL, Oozie as Scheduler and Hive as target data warehouse with data in Parquet format. Sqoop to load historical data. Responsible for designing and implementing scalable and robust platform. Prior to that, worked as Data Wareho...

Activity
About

Mallikarjuna Gandhamsetty

Likes

11 Principles of Applied Analytics

Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production

Fast Data Analytics with Spark and Python

Intro to Spark development

Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San Jose 2015

Spark rdd part 2

IBM Spark Meetup - RDD & Spark Basics

Custom Applications with Spark's RDD: Spark Summit East talk by Tejas Patil

Top 5 Mistakes to Avoid When Writing Apache Spark Applications

(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR

Real time Analytics with Apache Kafka and Apache Spark

Trends for Big Data and Apache Spark in 2017 by Matei Zaharia

Deep Dive: Memory Management in Apache Spark

Install Apache Hadoop for Development/Production

Apache Spark & Hadoop : Train-the-trainer

Apache Spark Architecture

Scala for dummies

Introduction to spark

Distributed computing with spark

Spark after Dark by Chris Fregly of Databricks

Spark after Dark by Chris Fregly of Databricks

Apache Spark Introduction and Resilient Distributed Dataset basics and deep dive

Not Your Father's Database by Vida Ha

IndexedRDD: Efficeint Fine-Grained Updates for RDD's-(Ankur Dave, UC Berkeley)

Real-Time Event & Stream Processing on MS Azure

Enterprise Cloud Data Platforms - with Microsoft Azure

Large scale ETL with Hadoop