Personal Information
Organization / Workplace
London, United Kingdom United Kingdom
Occupation
Data Science and Big Data
Industry
Technology / Software / Internet
About
Problem Solver. Python/Hadoop Coder. I have done end to end work involving development, administration and Data Science in Big Data.
I have set up Hadoop clusters, built ETL pipelines by writing MapReduce/Spark code and have worked on data science problems. I have used a variety of technologies including Spark, Hive, Pig, HBase, R, etc.
I look at Big Data everyday and use map reduce features of Hadoop to solve big data problems and extract useful information from them. I have done expert work in search quality by analyzing millions of queries searched by users everyday.
Here are some Data Science problems I have worked on solving so far
1) Understand the relationships between users wh...
Tags
newbie
pycon
python
programming
pycon2010
See more
- Presentations
- Documents
- Infographics
Netezza Architecture and Administration
Braja Krishna Das
•
7 years ago
Netezza Deep Dives
Rush Shah
•
7 years ago
Notes from Coursera Deep Learning courses by Andrew Ng
Tess Ferrandez
•
6 years ago
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for large-scale data analytics
Databricks
•
8 years ago
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
•
8 years ago
Scala - The Simple Parts, SFScala presentation
Martin Odersky
•
9 years ago
Pragmatic Real-World Scala (short version)
Jonas Bonér
•
15 years ago
Scala Data Pipelines @ Spotify
Neville Li
•
8 years ago
Recommender Systems (Machine Learning Summer School 2014 @ CMU)
Xavier Amatriain
•
9 years ago
Hive tuning
Michael Zhang
•
10 years ago
Spark SQL Deep Dive @ Melbourne Spark Meetup
Databricks
•
8 years ago
Spark Summit East 2015 Advanced Devops Student Slides
Databricks
•
9 years ago
DTCC '14 Spark Runtime Internals
Cheng Lian
•
10 years ago
Tuning and Debugging in Apache Spark
Databricks
•
9 years ago
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San Jose 2015
Databricks
•
9 years ago
Why Scala Is Taking Over the Big Data World
Dean Wampler
•
9 years ago
storm at twitter
Krishna Gade
•
10 years ago
Collaborative Filtering with Spark
Chris Johnson
•
9 years ago
DataFu @ ApacheCon 2014
William Vaughan
•
10 years ago
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
•
9 years ago