SlideShare a Scribd company logo
1 of 99
Download to read offline
Avoiding big data anti-patterns
whoami
• Alex Holmes
• Software engineer
• @grep_alex
• grepalex.com
Why should I care about anti-patterns?
big data
^
Agenda
• Walking through anti-patterns
• Looking at why they should
be avoided
• Consider some mitigations
• It’s big data!
• A single tool for the job
• Polyglot data integration
• Full scans FTW!
• Tombstones
• Counting with Java built-in
collections
• It’s open
I’ll cover: by …
Meet your protagonists
Alex Jade
(the amateur) (the pro)
It’s big data!
i want to calculate some
statistics on some static user
data . . .
how big is the data?
it’s big data,
so huge,
20GB!!!
i need to order a hadoop cluster!
What’s the problem?
you think you have big
data ...
but you don’t!
Poll: how much RAM can a single server support?
A. 256 GB
B. 512 GB
C. 1TB
http://yourdatafitsinram.com
keep it simple . . .
use MYSQL or POSTGRES
or R/python/matlab
Summary
• Simplify your analytics toolchain when working with
small data (especially if you don’t already have an
investment in big data tooling)
• Old(er) tools such as OLTP/OLAP/R/Python still
have their place for this type of work
A single tool for the job
that looks
like a nail!!!
nosql
PROBLEM
What’s the problem?
Big data tools are usually designed to do one thing well
(maybe two)
^
Types of workloads
• Low-latency data lookups
• Near real-time processing
• Interactive analytics
• Joins
• Full scans
• Search
• Data movement and integration
• ETL
The old world was simple
OLTP/OLAP
The new world … not so much
You need to research and find the
best-in-class for your function
Best-in-class big data tools (in my opinion)
If you want … Consider …
Low-latency lookups Cassandra, memcached
Near real-time processing Storm
Interactive analytics Vertica, Teradata
Full scans, system of record data,
ETL, batch processing
HDFS, MapReduce, Hive,
Pig
Data movement and integration Kafka
Summary
• There is no single big data tool that does it all
• We live in a polyglot world where new tools are
announced every day - don’t believe the hype!
• Test the claims on your own hardware and data;
start small and fail fast
Polyglot data integration
i need to move
clickstream data
from my application
to hadoop
Application
Hadoop Loader
Hadoop
shoot, i need to
use that same
data in streaming
Application
Hadoop Loader
Hadoop
JMS
JMS
What’s the problem?
OLTP
OLAP /
EDW
HBase
Cassan
dra
Voldem
ort
Hadoop
SecurityAnalytics
Rec.
Engine
Search Monitoring
Social
Graph
that way new consumers can
be added without any work!
we need a central data
repository and pipeline to isolate
consumers from the source
let’s use kafka!
OLTP
OLAP /
EDW
HBase
Cassan
dra
Voldem
ort
Hadoop
SecurityAnalytics
Rec.
Engine
Search Monitoring
Social
Graph
kafka
Background
• Apache project
• Originated from LinkedIn
• Open-sourced in 2011
• Written in Scala and Java
• Borrows concepts in messaging systems and logs
• Foundational data movement and integration
technology
What’s the big whoop about Kafka?
Throughput
http://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
Producer
Consumer Consumer Consumer
Kafka Server Kafka Server Kafka Server
Producer Producer
2,024,032 TPS
2,615,968 TPS
2ms
O.S. page cache is leveraged
OS page cache
...11109876543210
ProducerConsumer A Consumer B
Disk
writesreadsreads
Things to look out for
• Leverages ZooKeeper, which is tricky to configure
• Reads can become slow when the page cache is
missed and disk needs to be hit
• Lack of security
Summary
• Don’t write your own data integration
• Use Kafka for light-weight, fast and scalable data
integration
Full scans FTW!
i heard that hadoop was
designed to work with huge
data volumes!
i’m going to stick my
data on hadoop . . .
and run some joins
SELECT * FROM huge_table
JOIN ON other_huge_table
…
What’s the problem?
yes, hadoop is very efficient
at batch workloads
files are split into large blocks and
distributed throughout the cluster
“data locality” is a first-class concept,
where compute is pushed to storage Scheduler
but hadoop doesn’t negate
all these optimizations we
learned when working on
relational databases
so partition your data
according to how you
will most commonly
access it
hdfs:/data/tweets/date=20140929/
hdfs:/data/tweets/date=20140930/
hdfs:/data/tweets/date=20140931/
disk io is slow
and then make sure to
include a filter in your
queries so that only
those partitions are read
...
WHERE DATE=20151027
include projections to
reduce data that needs
to be read from disk or
pushed over the network
SELECT id, name
FROM...
hash joins require
network io which is slow
65.23VRSN
39.54MSFT
526.62GOOGL
PriceSymbol
RestonVRSN
RedmondMSFT
Mtn ViewGOOGL
HeadquartersSymbol
merge joins are way
more efficient
Records in all datasets
sorted by join key
Headquarters
Redmond
Mtn View
Reston65.23VRSN
39.54MSFT
526.62GOOGL
PriceSymbol
The merge algorithm
streams and performs an
inline merge of the
datasets
you’ll have to bucket
and sort your data
and tell your query
engine to use a sort-
merge-bucket (SMB) join
— Hive properties to enable a SMB join
set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
and look at using a
columnar data
format like parquet
Column Strorage
Column 1 (Symbol)
526.62
MSFT
05-10-2014
05-10-2014
39.54
GOOGL
Column 2 (Date)
Column 3 (Price)
Summary
• Partition, filter and project your data (same as you
used to do with relational databases)
• Look at bucketing and sorting your data to support
advanced join techniques such as sort-merge-
bucket
• Consider storing your data in columnar form
Tombstones
i need to store data in a Highly
Available persistent queue
. . .
and we already have Cassandra
deployed
. . .
bingo!!!
What is Cassandra?
• Low-latency distributed database
• Apache project modeled after Dynamo and
BigTable
• Data replication for fault tolerance and scale
• Multi-datacenter support
• CAP
Node
Node
Node
Node
Node
Node
East West
What’s the problem?
V VVV V VVVVV V
KKKKKKKKKKK
V VVV V VVVVV V
KKKKKKKKKKK
tombstone markers indicate that the
column has been deleted
deletes in Cassandra are soft;
deleted columns are marked
with tombstones
these tombstoned columns
slow-down reads
by default tombstones stay
around for 10 days
IF you want to know why, read
up on
gc_grace_secs,
and reappearing deletes
don’t use Cassandra, use
kafka
design your schema and read
patterns to avoid tombstones
getting in the way of your reads
keep track of consumer offsets,
and add a time or bucket semantic
to rows. only delete rows after
some time has elapsed, or once all
consumers have consumed them.
ID bucket
1 2
2 1
msg
81583
723804
offset
723803
81582
msg
81582
723803
msg
81581
723802
bucket
bucket
consumer
consumer
. . .
. . .
ID
1
2
Summary
• Try to avoid use cases that require high volume
deletes and slice queries that scan over tombstone
columns
• Design your schema and delete/read patterns with
tombstone avoidance in mind
Counting with Java’s built-in collections
I’m going to COUNT
THE DISTINCT NUMBER
OF USERS THAT
viewed a tweet
What’s the problem?
Poll: what does HashSet<K> use under the covers?
A. K[]
B. Entry<K>[]
C. HashMap<K,V>
D. TreeMap<K,V>
Memory consumption
HashSet = 32 * SIZE + 4 * CAPACITY
String = 8 * (int) ((((no chars) * 2) + 45) / 8)
Average user is 6 characters long = 64 bytes
number of elements in set set capacity (array length)
For 10,000,000 users this is at least 1GiB
USE HYPERLOGLOG TO WORK
WITH approximate DISTINCt
COUNTS @SCALE
HyperLogLog
• Cardinality estimation algorithm
• Uses (a lot) less space than sets
• Doesn’t provide exact distinct counts (being
“close” is probably good enough)
• Cardinality Estimation for Big Data: http://druid.io/
blog/2012/05/04/fast-cheap-and-98-right-
cardinality-estimation-for-big-data.html
1 billion distinct elements = 1.5kb memory
standard error = 2%
https://www.flickr.com/photos/redwoodphotography/4356518997
Hashes
10110100100101010101100111001011
Good hash functions
should result in each bit
having a 50% probability
of occurring
h(entity):
Bit pattern observations
1xxxxxxxxx..x
01xxxxxxxx..x
001xxxxxxx..x
0001xxxxxx..x
50% of hashed values will look like:
25% of hashed values will look like:
12.5% of hashed values will look like:
6.25% of hashed values will look like:
0 0
0
0 0
0
0
0
0 00
0
0
0
00
000
0 0
00 0
register
0 0
0
0 0
0
0
0
0 00
0
0
0
00
000
0 0
00 0
h(entity):
register index:
4
register value:
1
register
harmonic_mean
(
HLL
estimated
cardinality =
(= 1
01010 01 0 0 01
1
HLL Java library
• https://github.com/aggregateknowledge/java-hll
• Neat implementation - it automatically promotes
internal data structure to HLL once it grows beyond
a certain size
Approximate count algorithms
• HyperLogLog (distinct counts)
• CountMinSketch (frequencies of members)
• Bloom Filter (set membership)
Summary
• Data skew is a reality when working at Internet
scale
• Java’s builtin collections have a large memory
footprint don’t scale
• For high-cardinality data use approximate
estimation algorithms
stepping away . . .
MATH
It’s open
prototyping/VIABILITY - DONE
coding - done
testing - done
performance & scalability testing -
done
monitoring - done
i’m ready to ship!
What’s the problem?
https://www.flickr.com/photos/arjentoet/8428179166
https://www.flickr.com/photos/gowestphoto/3922495716
https://www.flickr.com/photos/joybot/6026542856
How the old world worked
OLTP/OLAP
Authentication
Authorization
DBA
security’s not my job!!!
we disagree
infosec
Important questions to ask
• Is my data encrypted when it’s in motion?
• Is my data encrypted on disk?
• Are there ACL’s defining who has access to what?
• Are these checks enabled by default?
How do tools stack up?
ACL’s
At-rest
encryption
In-motion
encryption
Enabled by
default
Ease of use
Oracle
Hadoop
Cassandra
ZooKeeper
Kafka
Summary
• Enable security for your tools!
• Include security as part of evaluating a tool
• Ask vendors and project owners to step up to the
plate
We’re done!
Conclusions
• Don’t assume that a particular big data technology will
work for your use case - verify it for yourself on your
own hardware and data early on in the evaluation of a
tool
• Be wary of the “new hotness” and vendor claims - they
may burn you
• Make sure that load/scale testing is a required part of
your go-to-production plan
Thanks for your time!

More Related Content

What's hot

Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)Chris Nauroth
 
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...DataStax
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkEvan Chan
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014hadooparchbook
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureGwen (Chen) Shapira
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop OperationsOwen O'Malley
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkEvan Chan
 
Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangSpark Summit
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Chris Nauroth
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupGwen (Chen) Shapira
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaDataWorks Summit
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016alanfgates
 
Performance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsPerformance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsDataWorks Summit/Hadoop Summit
 
What no one tells you about writing a streaming app
What no one tells you about writing a streaming appWhat no one tells you about writing a streaming app
What no one tells you about writing a streaming apphadooparchbook
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataRahul Jain
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerEvan Chan
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Sparkrhatr
 
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud DetectionArchitecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detectionhadooparchbook
 

What's hot (20)

Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
Getting Started with Apache Cassandra and Apache Zeppelin (DuyHai DOAN, DataS...
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014Application architectures with Hadoop – Big Data TechCon 2014
Application architectures with Hadoop – Big Data TechCon 2014
 
Scaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding FailureScaling ETL with Hadoop - Avoiding Failure
Scaling ETL with Hadoop - Avoiding Failure
 
Next Generation Hadoop Operations
Next Generation Hadoop OperationsNext Generation Hadoop Operations
Next Generation Hadoop Operations
 
Twitter with hadoop for oow
Twitter with hadoop for oowTwitter with hadoop for oow
Twitter with hadoop for oow
 
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and SparkCassandra Day 2014: Interactive Analytics with Cassandra and Spark
Cassandra Day 2014: Interactive Analytics with Cassandra and Spark
 
Using Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene PangUsing Spark with Tachyon by Gene Pang
Using Spark with Tachyon by Gene Pang
 
Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4Keep your hadoop cluster at its best! v4
Keep your hadoop cluster at its best! v4
 
Kafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn MeetupKafka and Hadoop at LinkedIn Meetup
Kafka and Hadoop at LinkedIn Meetup
 
Event Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache KafkaEvent Detection Pipelines with Apache Kafka
Event Detection Pipelines with Apache Kafka
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
 
Performance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data PlatformsPerformance Comparison of Streaming Big Data Platforms
Performance Comparison of Streaming Big Data Platforms
 
What no one tells you about writing a streaming app
What no one tells you about writing a streaming appWhat no one tells you about writing a streaming app
What no one tells you about writing a streaming app
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
Architecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud DetectionArchitecting applications with Hadoop - Fraud Detection
Architecting applications with Hadoop - Fraud Detection
 

Viewers also liked

Searching and reporting with splunk 6.x e learning
Searching and reporting with splunk 6.x   e learningSearching and reporting with splunk 6.x   e learning
Searching and reporting with splunk 6.x e learningJacky Lai
 
Data Mining with Splunk
Data Mining with SplunkData Mining with Splunk
Data Mining with SplunkDavid Carasso
 
Modern Distributed Messaging and RPC
Modern Distributed Messaging and RPCModern Distributed Messaging and RPC
Modern Distributed Messaging and RPCMax Alexejev
 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVROairisData
 
La Gouvernance des Données
La Gouvernance des DonnéesLa Gouvernance des Données
La Gouvernance des DonnéesSoft Computing
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streamsjimriecken
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystemconfluent
 
Impact Maps and Story Maps: delivering what really matters
Impact Maps and Story Maps: delivering what really mattersImpact Maps and Story Maps: delivering what really matters
Impact Maps and Story Maps: delivering what really mattersChristian Hassa
 

Viewers also liked (9)

Searching and reporting with splunk 6.x e learning
Searching and reporting with splunk 6.x   e learningSearching and reporting with splunk 6.x   e learning
Searching and reporting with splunk 6.x e learning
 
Data Mining with Splunk
Data Mining with SplunkData Mining with Splunk
Data Mining with Splunk
 
Modern Distributed Messaging and RPC
Modern Distributed Messaging and RPCModern Distributed Messaging and RPC
Modern Distributed Messaging and RPC
 
Parquet and AVRO
Parquet and AVROParquet and AVRO
Parquet and AVRO
 
La Gouvernance des Données
La Gouvernance des DonnéesLa Gouvernance des Données
La Gouvernance des Données
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystem
 
Impact Maps and Story Maps: delivering what really matters
Impact Maps and Story Maps: delivering what really mattersImpact Maps and Story Maps: delivering what really matters
Impact Maps and Story Maps: delivering what really matters
 

Similar to Avoiding big data antipatterns

Bigdata antipatterns
Bigdata antipatternsBigdata antipatterns
Bigdata antipatternsAnurag S
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNATomas Cervenka
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series DatabasePramit Choudhary
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the Worldjhugg
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Databases benoitg 2009-03-10
Databases benoitg 2009-03-10Databases benoitg 2009-03-10
Databases benoitg 2009-03-10benoitg
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraCaserta
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgDavid Pilato
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewGreat Wide Open
 
Scaling graphite for application metrics
Scaling graphite for application metricsScaling graphite for application metrics
Scaling graphite for application metricsJim Plush
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BIDenny Lee
 
DSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De BoerDSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De BoerDeltares
 

Similar to Avoiding big data antipatterns (20)

Bigdata antipatterns
Bigdata antipatternsBigdata antipatterns
Bigdata antipatterns
 
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNAFirst Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
First Hive Meetup London 2012-07-10 - Tomas Cervenka - VisualDNA
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
 
Building a Database for the End of the World
Building a Database for the End of the WorldBuilding a Database for the End of the World
Building a Database for the End of the World
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Databases benoitg 2009-03-10
Databases benoitg 2009-03-10Databases benoitg 2009-03-10
Databases benoitg 2009-03-10
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and CassandraLow-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
Low-Latency Analytics with NoSQL – Introduction to Storm and Cassandra
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed Luxembourg
 
Modern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An OverviewModern Big Data Analytics Tools: An Overview
Modern Big Data Analytics Tools: An Overview
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Scaling graphite for application metrics
Scaling graphite for application metricsScaling graphite for application metrics
Scaling graphite for application metrics
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BI
 
DSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De BoerDSD-INT 2017 The use of big data for dredging - De Boer
DSD-INT 2017 The use of big data for dredging - De Boer
 
Apache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real TimeApache Eagle - Monitor Hadoop in Real Time
Apache Eagle - Monitor Hadoop in Real Time
 

Recently uploaded

Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...Amil baba
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...IJAEMSJORNAL
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communicationpanditadesh123
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 
Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier Fernández Muñoz
 
priority interrupt computer organization
priority interrupt computer organizationpriority interrupt computer organization
priority interrupt computer organizationchnrketan
 
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...KrishnaveniKrishnara1
 
Secure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech LabsSecure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech Labsamber724300
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfShreyas Pandit
 
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...shreenathji26
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxStephen Sitton
 
Structural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot MuiliStructural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot MuiliNimot Muili
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.elesangwon
 
Module-1-Building Acoustics(Introduction)(Unit-1).pdf
Module-1-Building Acoustics(Introduction)(Unit-1).pdfModule-1-Building Acoustics(Introduction)(Unit-1).pdf
Module-1-Building Acoustics(Introduction)(Unit-1).pdfManish Kumar
 
tourism-management-srs_compress-software-engineering.pdf
tourism-management-srs_compress-software-engineering.pdftourism-management-srs_compress-software-engineering.pdf
tourism-management-srs_compress-software-engineering.pdfchess188chess188
 

Recently uploaded (20)

Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
 
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...
Uk-NO1 kala jadu karne wale ka contact number kala jadu karne wale baba kala ...
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
ASME-B31.4-2019-estandar para diseño de ductos
ASME-B31.4-2019-estandar para diseño de ductosASME-B31.4-2019-estandar para diseño de ductos
ASME-B31.4-2019-estandar para diseño de ductos
 
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
Guardians of E-Commerce: Harnessing NLP and Machine Learning Approaches for A...
 
multiple access in wireless communication
multiple access in wireless communicationmultiple access in wireless communication
multiple access in wireless communication
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
Javier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptxJavier_Fernandez_CARS_workshop_presentation.pptx
Javier_Fernandez_CARS_workshop_presentation.pptx
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
priority interrupt computer organization
priority interrupt computer organizationpriority interrupt computer organization
priority interrupt computer organization
 
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
22CYT12 & Chemistry for Computer Systems_Unit-II-Corrosion & its Control Meth...
 
Secure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech LabsSecure Key Crypto - Tech Paper JET Tech Labs
Secure Key Crypto - Tech Paper JET Tech Labs
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdf
 
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
Introduction to Artificial Intelligence: Intelligent Agents, State Space Sear...
 
Turn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptxTurn leadership mistakes into a better future.pptx
Turn leadership mistakes into a better future.pptx
 
Structural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot MuiliStructural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
Structural Integrity Assessment Standards in Nigeria by Engr Nimot Muili
 
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
2022 AWS DNA Hackathon 장애 대응 솔루션 jarvis.
 
Module-1-Building Acoustics(Introduction)(Unit-1).pdf
Module-1-Building Acoustics(Introduction)(Unit-1).pdfModule-1-Building Acoustics(Introduction)(Unit-1).pdf
Module-1-Building Acoustics(Introduction)(Unit-1).pdf
 
tourism-management-srs_compress-software-engineering.pdf
tourism-management-srs_compress-software-engineering.pdftourism-management-srs_compress-software-engineering.pdf
tourism-management-srs_compress-software-engineering.pdf
 

Avoiding big data antipatterns