SlideShare a Scribd company logo
1 of 44
11
Apache Accumulo Overview
Bill Havanki
Solutions Architect, Cloudera Government Solutions
2 ©2014 Cloudera, Inc. All rights reserved.
2
•Quick History
•Storage Model
•Loading and Querying
•Daemons
•Getting Started, a.k.a., the Pitch
Agenda
3
A Quick History
3
4 ©2014 Cloudera, Inc. All rights reserved.
Google BigTable
Compressed, high-performance, scalable,
distributed sorted map
4
5 ©2014 Cloudera, Inc. All rights reserved.
Google BigTable
• Began development in 2004
• Built on Google File System
• Non-relational
• Byte-oriented and schemaless
• Stores data in the petabyte range
• Research paper published in 2006
5
6 ©2014 Cloudera, Inc. All rights reserved.
Child(ren) of BigTable
• Apache HBase (begun 2006, top-level 2010)
• Apache Cassandra (begun 2008-ish, top-level 2010)
• Apache Accumulo ...
6
7 ©2014 Cloudera, Inc. All rights reserved.
From Cloudbase to Accumulo
• Started in 2008 as National Security Agency project
• Submitted to Apache Incubator in 2011 (and renamed)
• Top-level project in 2012
7
8
Storage Model
8
9 ©2014 Cloudera, Inc. All rights reserved.
Key / Value Store
Accumulo stores tables of key / value pairs
9
10 ©2014 Cloudera, Inc. All rights reserved.
Key / Value Store
A row is a sorted sequence of key / value pairs
Each pair is a cell
10
11 ©2014 Cloudera, Inc. All rights reserved.
The Key
11
row
column
timestamp
family qualifier visibility
12 ©2014 Cloudera, Inc. All rights reserved.
An example key
12
bhavanki
column
1401041295
personal middle PII
13 ©2014 Cloudera, Inc. All rights reserved.
Another example key
13
brees
column
1401041296
employment salary FIN
14 ©2014 Cloudera, Inc. All rights reserved.
It’s all bytes
All key and value data are stored as bytes
except timestamp is a long
There are no built-in data types
but lexicoders help with common types
Key components are usually UTF-8 strings
14
15 ©2014 Cloudera, Inc. All rights reserved.
Some rows for you
15
row cf cq cv ts value
bhavanki job employer 2013-09-01 Cloudera
bhavanki personal beer 2013-09-15 Omission
bhavanki personal house NOMUGGL 2014-01-25 Ravenclaw
brees job employer 2013-10-01 White Cliffs
brees personal house NOMUGGL 2014-01-01 Hufflepuff
16 ©2014 Cloudera, Inc. All rights reserved.
Visibility Labels
Boolean expression
Specialist | (Management & SpecTraining)
Authorizations are provided in each scan
16
17 ©2014 Cloudera, Inc. All rights reserved.
Locality Groups
You can identify sets of one or more column families as
locality groups
Data in a locality group is stored together for improved
read performance
17
18 ©2014 Cloudera, Inc. All rights reserved.
Tablets
A table is comprised of one or more tablets
18
employeesemployees
employees;Semployees;Semployees;Hemployees;H employees;~employees;~
19 ©2014 Cloudera, Inc. All rights reserved.
Tablets
Tablets maps to data files in HDFS
19
employees;Semployees;Semployees;Hemployees;H employees;~employees;~
rfile 2rfile 2rfile 1rfile 1 rfile 3rfile 3
20 ©2014 Cloudera, Inc. All rights reserved.
Tablets
Data also kept in write-ahead logs and memtable
20
employees;Hemployees;H
rfile 1rfile 1
walogswalogs
memtablememtable
21
Loading and Querying
21
22 ©2014 Cloudera, Inc. All rights reserved.
Java Client API
22
23 ©2014 Cloudera, Inc. All rights reserved.
Java Client API
Read using scanners
Scanner s = conn.createScanner(“employees”, new
Authorizations());
s.setRange(“alice”, “eve”);
s.setColumnFamily(“personal”);
for (Entry<Key, Value> e : s)
employeeIds.add(e.getKey().getRow());
23
24 ©2014 Cloudera, Inc. All rights reserved.
Java Client API
Read access via iterator pattern
• server-side system iterators handle timestamps,
authorization checks, and lots more
• iterators almost always wrap other iterators, forming a
chain
• you can define your own, client-side or server-side
24
25 ©2014 Cloudera, Inc. All rights reserved.
Java Client API
Scanners fetch sorted rows from one range
Batch scanners fetch unsorted rows from multiple
ranges in parallel
Isolated scanners ensure that you do not see a row
mid-change
25
26 ©2014 Cloudera, Inc. All rights reserved.
MapReduce
AccumuloInputFormat
AccumuloOutputFormat
26
27 ©2014 Cloudera, Inc. All rights reserved.
MapReduce
AccumuloRowInputFormat
AccumuloRowOutputFormat
27
28 ©2014 Cloudera, Inc. All rights reserved.
Shell
Command-line / manual access to Accumulo data
• scan, insert, delete
• iterator management
• table management (creation, deletion, cloning)
• user and authorization management
• table splitting and merging
• ... more
28
29 ©2014 Cloudera, Inc. All rights reserved.
Bulk Import
Got lots of data to import quickly?
• Use MR job to format data using
AccumuloFileOutputFormat
• Import files using shell
Trade off latency / availablity for throughput
29
30
Daemons
30
31 ©2014 Cloudera, Inc. All rights reserved.
Tablet Server
Serves tablets (table data)
• writes data to walog, memtable; deals with compaction
• serves data for reads from files, memtable
• handles recovery from walogs in case of server failure
Most client calls go to tablet servers
31
32 ©2014 Cloudera, Inc. All rights reserved.
Master
• assigns tablets to tablet servers
• detects tablet server failures and reassigns tablets
• balances tablet assignments over time
• coordinates table operations
Multiple supported for failover, only one active
32
33 ©2014 Cloudera, Inc. All rights reserved.
Everybody Else in Accumulo
Garbage Collector (GC) - identifies and deletes files in
HDFS that are no longer needed
Tracer - listens for and stores distributed trace messages
using a special table
33
34 ©2014 Cloudera, Inc. All rights reserved.
Everybody Else in Accumulo
• Monitor - collects and serves status information
• server status
• log inspection
• performance data
• table inspection
34
35 ©2014 Cloudera, Inc. All rights reserved.
Everybody Else outside Accumulo
• HDFS (as part of Apache Hadoop)
• stores tablet files
• stores write-ahead logs (1.5+)
• MapReduce (Hadoop)
• bulk import
• batch processing
• Apache ZooKeeper
35
36
Getting Started
a.k.a. the Pitch
36
37 ©2014 Cloudera, Inc. All rights reserved.
Easy as 1-2-3?
1.Install Hadoop (HDFS and MapReduce)
2.Install ZooKeeper
3.Install Accumulo!
37
38 ©2014 Cloudera, Inc. All rights reserved.
Making Steps 1 and 2 Easier
Use a complete, pre-packaged Hadoop distribution
... like CDH!
a leading commercial distribution centered on Apache
Hadoop
•many ecosystem components
•configured / updated to work together
38
39 ©2014 Cloudera, Inc. All rights reserved.
Making Steps 1 and 2 Easier
Cloudera Manager
•deployment
•configuration
•operation
•security
39
40 ©2014 Cloudera, Inc. All rights reserved.
Making Step 3 Easier
Standard Apache Accumulo installation is via tarball
• no longer shipping RPM / DEB / ...
Using CDH/CM you can use:
• a tarball, RPM or DEB with Accumulo packaged for CDH
• a parcel (like RPM / ZIP) for easier upgrades
• 1.4.4 and 1.4.5 available now
• 1.6.0 soon
40
41 ©2014 Cloudera, Inc. All rights reserved.
Where to Go for More
• http://accumulo.apache.org/
• http://www.cloudera.com/content/cloudera/en/products-and-service
• http://www.cloudera.com/content/cloudera/en/products-and-service
• http://www.cloudera.com/content/cloudera/en/products-and-
services/cdh/accumulo.html
41
42 ©2014 Cloudera, Inc. All rights reserved.
Accumulo Summit
Join us on June 12
42
43 ©2014 Cloudera, Inc. All rights reserved.
Quick Thanks
• My slide reviewers
• Sean Busbey
• Mike Drob
• Accumulo community
• You all for listening
43
44 ©2014 Cloudera, Inc. All rights reserved.
Thank you!
Bill Havanki
bhavanki@clouderagovt.com
44

More Related Content

What's hot

One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)DataWorks Summit
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseHortonworks
 
A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 DataWorks Summit
 
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARNHadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARNDataWorks Summit
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureHortonworks
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on DockerRakesh Saha
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessCloudera, Inc.
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudDataWorks Summit
 
Getting Apache Spark Customers to Production
Getting Apache Spark Customers to ProductionGetting Apache Spark Customers to Production
Getting Apache Spark Customers to ProductionCloudera, Inc.
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera managerChris Westin
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016alanfgates
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Hortonworks
 

What's hot (20)

One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)One Click Hadoop Clusters - Anywhere (Using Docker)
One Click Hadoop Clusters - Anywhere (Using Docker)
 
Double Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSenseDouble Your Hadoop Hardware Performance with SmartSense
Double Your Hadoop Hardware Performance with SmartSense
 
Hadoop Operations
Hadoop OperationsHadoop Operations
Hadoop Operations
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3 A First-Hand Look at What's New in HDP 2.3
A First-Hand Look at What's New in HDP 2.3
 
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARNHadoop {Submarine} Project: Running Deep Learning Workloads on YARN
Hadoop {Submarine} Project: Running Deep Learning Workloads on YARN
 
Apache Ambari: Past, Present, Future
Apache Ambari: Past, Present, FutureApache Ambari: Past, Present, Future
Apache Ambari: Past, Present, Future
 
Hadoop on Docker
Hadoop on DockerHadoop on Docker
Hadoop on Docker
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
YARN and the Docker container runtime
YARN and the Docker container runtimeYARN and the Docker container runtime
YARN and the Docker container runtime
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
 
Running Enterprise Workloads in the Cloud
Running Enterprise Workloads in the CloudRunning Enterprise Workloads in the Cloud
Running Enterprise Workloads in the Cloud
 
Intro to Apache Spark
Intro to Apache SparkIntro to Apache Spark
Intro to Apache Spark
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
 
Cloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep DiveCloudbreak - Technical Deep Dive
Cloudbreak - Technical Deep Dive
 
Getting Apache Spark Customers to Production
Getting Apache Spark Customers to ProductionGetting Apache Spark Customers to Production
Getting Apache Spark Customers to Production
 
Cluster management and automation with cloudera manager
Cluster management and automation with cloudera managerCluster management and automation with cloudera manager
Cluster management and automation with cloudera manager
 
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
Hive2.0 sql speed-scale--hadoop-summit-dublin-apr-2016
 
Apache ignite v1.3
Apache ignite v1.3Apache ignite v1.3
Apache ignite v1.3
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
 

Viewers also liked

Stupid Shell Tricks with Apache Accumulo
Stupid Shell Tricks with Apache AccumuloStupid Shell Tricks with Apache Accumulo
Stupid Shell Tricks with Apache AccumuloCloudera, Inc.
 
Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411Sqrrl
 
Introduction to Continuous Integration
Introduction to Continuous IntegrationIntroduction to Continuous Integration
Introduction to Continuous IntegrationBill Havanki
 
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...Accumulo Summit
 
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]Accumulo Summit
 
Accumulo design
Accumulo designAccumulo design
Accumulo designscsorensen
 
Accumulo meetup 20130109
Accumulo meetup 20130109Accumulo meetup 20130109
Accumulo meetup 20130109Sqrrl
 
Apache Accumulo and the Data Lake
Apache Accumulo and the Data LakeApache Accumulo and the Data Lake
Apache Accumulo and the Data LakeAaron Cordova
 
Accumulo Summit 2016: Accumulo in the Enterprise
Accumulo Summit 2016: Accumulo in the EnterpriseAccumulo Summit 2016: Accumulo in the Enterprise
Accumulo Summit 2016: Accumulo in the EnterpriseAccumulo Summit
 
Large Scale Accumulo Clusters
Large Scale Accumulo ClustersLarge Scale Accumulo Clusters
Large Scale Accumulo ClustersAaron Cordova
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit
 
Software Team Hierarchy of Needs
Software Team Hierarchy of NeedsSoftware Team Hierarchy of Needs
Software Team Hierarchy of NeedsBill Havanki
 
Accumulo: A Quick Introduction
Accumulo: A Quick IntroductionAccumulo: A Quick Introduction
Accumulo: A Quick IntroductionJames Salter
 
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in AccumuloAccumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in AccumuloAccumulo Summit
 
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]Accumulo Summit
 
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit
 
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big DataOct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big DataYahoo Developer Network
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DCCCRinc
 

Viewers also liked (20)

Stupid Shell Tricks with Apache Accumulo
Stupid Shell Tricks with Apache AccumuloStupid Shell Tricks with Apache Accumulo
Stupid Shell Tricks with Apache Accumulo
 
Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411Sqrrl real time_big_data_20130411
Sqrrl real time_big_data_20130411
 
Introduction to Continuous Integration
Introduction to Continuous IntegrationIntroduction to Continuous Integration
Introduction to Continuous Integration
 
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
 
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
 
Accumulo design
Accumulo designAccumulo design
Accumulo design
 
Accumulo meetup 20130109
Accumulo meetup 20130109Accumulo meetup 20130109
Accumulo meetup 20130109
 
Apache Accumulo and the Data Lake
Apache Accumulo and the Data LakeApache Accumulo and the Data Lake
Apache Accumulo and the Data Lake
 
Accumulo Summit 2016: Accumulo in the Enterprise
Accumulo Summit 2016: Accumulo in the EnterpriseAccumulo Summit 2016: Accumulo in the Enterprise
Accumulo Summit 2016: Accumulo in the Enterprise
 
Large Scale Accumulo Clusters
Large Scale Accumulo ClustersLarge Scale Accumulo Clusters
Large Scale Accumulo Clusters
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
 
Software Team Hierarchy of Needs
Software Team Hierarchy of NeedsSoftware Team Hierarchy of Needs
Software Team Hierarchy of Needs
 
Accumulo: A Quick Introduction
Accumulo: A Quick IntroductionAccumulo: A Quick Introduction
Accumulo: A Quick Introduction
 
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in AccumuloAccumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
 
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
 
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
 
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big DataOct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data
 
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DC
 

Similar to Apache Accumulo Overview

Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoopmarkgrover
 
Strata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applicationsStrata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applicationshadooparchbook
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Stefan Lipp
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Cloudera, Inc.
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Hadoop / Spark Conference Japan
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialhadooparchbook
 
Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & ExtensibilityCloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & ExtensibilityClouderaUserGroups
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kiteJoey Echeverria
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaSwiss Big Data User Group
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform WebinarCloudera, Inc.
 
Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015hadooparchbook
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impalahuguk
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全Jianwei Li
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera, Inc.
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupAndrei Savu
 
One Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsOne Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsCloudera, Inc.
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudGoDataDriven
 
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfTimothy Spann
 

Similar to Apache Accumulo Overview (20)

Applications on Hadoop
Applications on HadoopApplications on Hadoop
Applications on Hadoop
 
Strata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applicationsStrata EU tutorial - Architectural considerations for hadoop applications
Strata EU tutorial - Architectural considerations for hadoop applications
 
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
Cloudera Analytics and Machine Learning Platform - Optimized for Cloud
 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 
Part 2: A Visual Dive into Machine Learning and Deep Learning 

Part 2: A Visual Dive into Machine Learning and Deep Learning 

 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
Apache Kudu Fast Analytics on Fast Data (Hadoop / Spark Conference Japan 2016...
 
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorialStrata NY 2014 - Architectural considerations for Hadoop applications tutorial
Strata NY 2014 - Architectural considerations for Hadoop applications tutorial
 
Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & ExtensibilityCloudera User Group SF - Cloudera Manager: APIs & Extensibility
Cloudera User Group SF - Cloudera Manager: APIs & Extensibility
 
Building data pipelines with kite
Building data pipelines with kiteBuilding data pipelines with kite
Building data pipelines with kite
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Spark One Platform Webinar
Spark One Platform WebinarSpark One Platform Webinar
Spark One Platform Webinar
 
Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015Hadoop Application Architectures tutorial at Big DataService 2015
Hadoop Application Architectures tutorial at Big DataService 2015
 
Building a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with ImpalaBuilding a Hadoop Data Warehouse with Impala
Building a Hadoop Data Warehouse with Impala
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 
大数据数据治理及数据安全
大数据数据治理及数据安全大数据数据治理及数据安全
大数据数据治理及数据安全
 
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the CloudCloudera Director: Unlock the Full Potential of Hadoop in the Cloud
Cloudera Director: Unlock the Full Potential of Hadoop in the Cloud
 
One Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data MeetupOne Hadoop, Multiple Clouds - NYC Big Data Meetup
One Hadoop, Multiple Clouds - NYC Big Data Meetup
 
One Hadoop, Multiple Clouds
One Hadoop, Multiple CloudsOne Hadoop, Multiple Clouds
One Hadoop, Multiple Clouds
 
Cloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the CloudCloudera GoDataFest Deploying Cloudera in the Cloud
Cloudera GoDataFest Deploying Cloudera in the Cloud
 
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
 

Recently uploaded

BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 

Recently uploaded (20)

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 

Apache Accumulo Overview

  • 1. 11 Apache Accumulo Overview Bill Havanki Solutions Architect, Cloudera Government Solutions
  • 2. 2 ©2014 Cloudera, Inc. All rights reserved. 2 •Quick History •Storage Model •Loading and Querying •Daemons •Getting Started, a.k.a., the Pitch Agenda
  • 4. 4 ©2014 Cloudera, Inc. All rights reserved. Google BigTable Compressed, high-performance, scalable, distributed sorted map 4
  • 5. 5 ©2014 Cloudera, Inc. All rights reserved. Google BigTable • Began development in 2004 • Built on Google File System • Non-relational • Byte-oriented and schemaless • Stores data in the petabyte range • Research paper published in 2006 5
  • 6. 6 ©2014 Cloudera, Inc. All rights reserved. Child(ren) of BigTable • Apache HBase (begun 2006, top-level 2010) • Apache Cassandra (begun 2008-ish, top-level 2010) • Apache Accumulo ... 6
  • 7. 7 ©2014 Cloudera, Inc. All rights reserved. From Cloudbase to Accumulo • Started in 2008 as National Security Agency project • Submitted to Apache Incubator in 2011 (and renamed) • Top-level project in 2012 7
  • 9. 9 ©2014 Cloudera, Inc. All rights reserved. Key / Value Store Accumulo stores tables of key / value pairs 9
  • 10. 10 ©2014 Cloudera, Inc. All rights reserved. Key / Value Store A row is a sorted sequence of key / value pairs Each pair is a cell 10
  • 11. 11 ©2014 Cloudera, Inc. All rights reserved. The Key 11 row column timestamp family qualifier visibility
  • 12. 12 ©2014 Cloudera, Inc. All rights reserved. An example key 12 bhavanki column 1401041295 personal middle PII
  • 13. 13 ©2014 Cloudera, Inc. All rights reserved. Another example key 13 brees column 1401041296 employment salary FIN
  • 14. 14 ©2014 Cloudera, Inc. All rights reserved. It’s all bytes All key and value data are stored as bytes except timestamp is a long There are no built-in data types but lexicoders help with common types Key components are usually UTF-8 strings 14
  • 15. 15 ©2014 Cloudera, Inc. All rights reserved. Some rows for you 15 row cf cq cv ts value bhavanki job employer 2013-09-01 Cloudera bhavanki personal beer 2013-09-15 Omission bhavanki personal house NOMUGGL 2014-01-25 Ravenclaw brees job employer 2013-10-01 White Cliffs brees personal house NOMUGGL 2014-01-01 Hufflepuff
  • 16. 16 ©2014 Cloudera, Inc. All rights reserved. Visibility Labels Boolean expression Specialist | (Management & SpecTraining) Authorizations are provided in each scan 16
  • 17. 17 ©2014 Cloudera, Inc. All rights reserved. Locality Groups You can identify sets of one or more column families as locality groups Data in a locality group is stored together for improved read performance 17
  • 18. 18 ©2014 Cloudera, Inc. All rights reserved. Tablets A table is comprised of one or more tablets 18 employeesemployees employees;Semployees;Semployees;Hemployees;H employees;~employees;~
  • 19. 19 ©2014 Cloudera, Inc. All rights reserved. Tablets Tablets maps to data files in HDFS 19 employees;Semployees;Semployees;Hemployees;H employees;~employees;~ rfile 2rfile 2rfile 1rfile 1 rfile 3rfile 3
  • 20. 20 ©2014 Cloudera, Inc. All rights reserved. Tablets Data also kept in write-ahead logs and memtable 20 employees;Hemployees;H rfile 1rfile 1 walogswalogs memtablememtable
  • 22. 22 ©2014 Cloudera, Inc. All rights reserved. Java Client API 22
  • 23. 23 ©2014 Cloudera, Inc. All rights reserved. Java Client API Read using scanners Scanner s = conn.createScanner(“employees”, new Authorizations()); s.setRange(“alice”, “eve”); s.setColumnFamily(“personal”); for (Entry<Key, Value> e : s) employeeIds.add(e.getKey().getRow()); 23
  • 24. 24 ©2014 Cloudera, Inc. All rights reserved. Java Client API Read access via iterator pattern • server-side system iterators handle timestamps, authorization checks, and lots more • iterators almost always wrap other iterators, forming a chain • you can define your own, client-side or server-side 24
  • 25. 25 ©2014 Cloudera, Inc. All rights reserved. Java Client API Scanners fetch sorted rows from one range Batch scanners fetch unsorted rows from multiple ranges in parallel Isolated scanners ensure that you do not see a row mid-change 25
  • 26. 26 ©2014 Cloudera, Inc. All rights reserved. MapReduce AccumuloInputFormat AccumuloOutputFormat 26
  • 27. 27 ©2014 Cloudera, Inc. All rights reserved. MapReduce AccumuloRowInputFormat AccumuloRowOutputFormat 27
  • 28. 28 ©2014 Cloudera, Inc. All rights reserved. Shell Command-line / manual access to Accumulo data • scan, insert, delete • iterator management • table management (creation, deletion, cloning) • user and authorization management • table splitting and merging • ... more 28
  • 29. 29 ©2014 Cloudera, Inc. All rights reserved. Bulk Import Got lots of data to import quickly? • Use MR job to format data using AccumuloFileOutputFormat • Import files using shell Trade off latency / availablity for throughput 29
  • 31. 31 ©2014 Cloudera, Inc. All rights reserved. Tablet Server Serves tablets (table data) • writes data to walog, memtable; deals with compaction • serves data for reads from files, memtable • handles recovery from walogs in case of server failure Most client calls go to tablet servers 31
  • 32. 32 ©2014 Cloudera, Inc. All rights reserved. Master • assigns tablets to tablet servers • detects tablet server failures and reassigns tablets • balances tablet assignments over time • coordinates table operations Multiple supported for failover, only one active 32
  • 33. 33 ©2014 Cloudera, Inc. All rights reserved. Everybody Else in Accumulo Garbage Collector (GC) - identifies and deletes files in HDFS that are no longer needed Tracer - listens for and stores distributed trace messages using a special table 33
  • 34. 34 ©2014 Cloudera, Inc. All rights reserved. Everybody Else in Accumulo • Monitor - collects and serves status information • server status • log inspection • performance data • table inspection 34
  • 35. 35 ©2014 Cloudera, Inc. All rights reserved. Everybody Else outside Accumulo • HDFS (as part of Apache Hadoop) • stores tablet files • stores write-ahead logs (1.5+) • MapReduce (Hadoop) • bulk import • batch processing • Apache ZooKeeper 35
  • 37. 37 ©2014 Cloudera, Inc. All rights reserved. Easy as 1-2-3? 1.Install Hadoop (HDFS and MapReduce) 2.Install ZooKeeper 3.Install Accumulo! 37
  • 38. 38 ©2014 Cloudera, Inc. All rights reserved. Making Steps 1 and 2 Easier Use a complete, pre-packaged Hadoop distribution ... like CDH! a leading commercial distribution centered on Apache Hadoop •many ecosystem components •configured / updated to work together 38
  • 39. 39 ©2014 Cloudera, Inc. All rights reserved. Making Steps 1 and 2 Easier Cloudera Manager •deployment •configuration •operation •security 39
  • 40. 40 ©2014 Cloudera, Inc. All rights reserved. Making Step 3 Easier Standard Apache Accumulo installation is via tarball • no longer shipping RPM / DEB / ... Using CDH/CM you can use: • a tarball, RPM or DEB with Accumulo packaged for CDH • a parcel (like RPM / ZIP) for easier upgrades • 1.4.4 and 1.4.5 available now • 1.6.0 soon 40
  • 41. 41 ©2014 Cloudera, Inc. All rights reserved. Where to Go for More • http://accumulo.apache.org/ • http://www.cloudera.com/content/cloudera/en/products-and-service • http://www.cloudera.com/content/cloudera/en/products-and-service • http://www.cloudera.com/content/cloudera/en/products-and- services/cdh/accumulo.html 41
  • 42. 42 ©2014 Cloudera, Inc. All rights reserved. Accumulo Summit Join us on June 12 42
  • 43. 43 ©2014 Cloudera, Inc. All rights reserved. Quick Thanks • My slide reviewers • Sean Busbey • Mike Drob • Accumulo community • You all for listening 43
  • 44. 44 ©2014 Cloudera, Inc. All rights reserved. Thank you! Bill Havanki bhavanki@clouderagovt.com 44