Time's Up! Getting Value from Big Data Now

Grab some
coffee and
enjoy the
pre-show
banter
before the
top of the
hour! !

The Briefing Room
Time's Up! Getting Value from Big Data Now

Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
@eric_kavanagh

u  Reveal the essential characteristics of enterprise
software, good and bad
u  Provide a forum for detailed analysis of today s innovative
technologies
u  Give vendors a chance to explain their product to savvy
analysts
u  Allow audience members to pose serious questions... and
get answers!
Mission

Big Integration
u  Old infrastructure
lacking
u  New pipes are
needed
u  Well begun is half
done!

Analyst
Robin Bloor is
Chief Analyst at
The Bloor Group
robin.bloor@bloorgroup.com
@robinbloor

CASK
u  CASK offers a unified integration platform for big data
applications and data lakes
u  Its CDAP architecture provides data containers,
program containers and application containers for data
and applications on Hadoop
u  CASK also offers Hydrator for building and managing
data pipelines and data lakes, and Tracker for data
lake governance

Guest
Jonathan Gray
Jonathan Gray, Founder & CEO of Cask, is an entrepreneur
and software engineer with a background in startups, open
source and all things data. Prior to founding Cask, Jonathan
was a software engineer at Facebook where he drove HBase
engineering efforts, including Facebook Messages and
several other large-scale projects from inception to
production.
An open source evangelist, Jonathan was responsible for
helping build the Facebook engineering brand through
developer outreach and refocusing the open source strategy
of the company. Prior to Facebook, Jonathan founded
Streamy.com, where he became an early adopter of Hadoop
and HBase and is now a core contributor and active
committer in the community.
Jonathan holds a bachelor’s degree in Electrical and
Computer Engineering and Business Administration from
Carnegie Mellon University.

Big Data on Tap
cask.co November 1, 2016
The Briefing Room
Jonathan Gray
Founder & CEO

cask.co
Hadoop Enables New Applications and Architectures
2
ENTERPRISE DATA LAKES BIG DATA ANALYTICS PRODUCTION DATA APPS
Batch and Realtime
Data Ingestion
Any type of data from any
type of source in any volume
Batch and Streaming ETL
Code-free self-service creation
and management of pipelines
SQL Exploration and
Data Science
All data is automatically
accessible via SQL and client SDKs
Data as a Service
Easily expose generic or
custom REST APIs on any data
360o
Customer View
Integrate data from any source
and expose through queries
and APIs
Realtime Dashboards
Perform realtime OLAP
aggregations and serve them
through REST APIs
Time Series Analysis
Store, process and serve massive
volumes of time-series data
Realtime Log Analytics
Ingestion and processing of
high-throughput streaming
log events
Recommendation Engines
Build models in batch using
historical data and serve them
in realtime
Anomaly Detection Systems
Process streaming events and
predictably compare them in
realtime to historical data
NRT Event Monitoring
Reliably monitor large streams of
data and perform deﬁned actions
within a speciﬁed time
Internet of Things
Ingestion, storage and processing
of events that is highly-available,
scalable and consistent
Batch and Realtime
Data Ingestion
SQL Exploration and
Data Science
Data as a Service
360o
Customer View
and APIs
Realtime Dashboards
through REST APIs
log events
in realtime
Internet of Things
Batch and Realtime
Data Ingestion
SQL Exploration and
Data Science
Data as a Service
360o
Customer View
and APIs
Realtime Dashboards
through REST APIs
log events
in realtime
Internet of Things
Data Applications Drive Meaningful Business Value

cask.co3
But Getting Value from Big Data is Hard
Too much focus on infrastructure and integration, rather than applications and analytics
Divergence of distributions
and technologies
Integration silos created by
narrow point solutions
Proliferation of projects,
services and APIs
Complexity of technologies
and new user learning curve

cask.co4
Without a consistent set of tools, IT will not be an effective data enabler for the business
Developer
Architecture & Programming
Focused on Apps & Solutions
Ops
Configuring & Monitoring
Focused on Infrastructure & SLA’s
LOB / Product
Driving Revenue & Decision Making
Focused on Products & Insights
Data Scientist
Scripting & Machine Learning
Focused on Data & Algorithms
And There Are Many Faces of Hadoop

cask.co5
Enter Cask
AT&T, Cloudera and Ericsson

Strategic Investors
3.5 Cask Data Application Platform,
Cask Hydrator and Cask Tracker
Latest Release
AT&T, Ericsson, Lotame, Salesforce, Cloudera,
Hortonworks, MapR, Microsoft, IBM, Tableau…
Key Customers & Partners
By early Hadoop engineers from
Facebook and Yahoo!
Founded in 2011
Andreessen Horowitz, Safeguard,
Battery Venture and Ignition Partners
Raised $37+ Million
Featuring Cask Market, 
the “big data app store”
NEW: CDAP 4 Preview
A Container Architecture that puts
Big Data on Tap
Why “Cask” ?

cask.co6
Convergence of Big Data Apps and Data Integration
The Evolution of the Cask Platform
Big Data Apps + Data Integration
• Data ingest
• Data pipelines
• Workﬂows and metadata
“WebLogic Meets Informatica”
CDAP
v3
Big Data App Server
• Abstractions & integrations
• Metrics & logs
• Debugging environment
“WebLogic for Hadoop”
CDAP
v2
Uniﬁed Integration for Big Data
• Security & governance
• Self-service environment
• Enterprise integrations
“Unified Big Data Integration”
CDAP
v4

cask.co
Introducing Cask Data Application Platform (CDAP)
7
First Unified Integration Platform for Big Data 
 
Platform for distributed apps, bringing together 
application management with data integration
 
• 100% open source and built for extensibility

• Supports all major Hadoop distributions and clouds

• Integrates the latest open source big data technologies
Data Lake
Fraud
Detection
Recommendation
Engine
Sensor Data
Analytics
Customer
360
Modern Data
Integration
Distributed
Application
Framework
Self-Service
User Experience
Enterprise-grade
Security &
Governance

cask.co8
• Real-time and Batch
• Reliable and Scalable
• Simple and Self-Service
Modern Data Integration
EXPLORE
for analytics and
data science
PROCESS
for ETL and
machine learning
SERVE
any data to any
destination
INGEST
any data from
any source

cask.co9
Distributed Application Framework
DEVELOP
rapidly build
applications
TEST
powerful test and
CI framework
DEPLOY
run any apps in
any environment
SCALE
horizontally scale
apps and data
• Real-time and Batch
• Memory, Local, Distributed
• Analytics and Applications

cask.co10
Security and Governance
CAPTURE
store all metadata
about your data
DISCOVER
easily locate any
of your data
TRACK
every audit plus
lineage graphs
ANALYZE
understand usage
patterns of data
AUTHENTICATE AUTHORIZEENCRYPT

cask.co11
A data discovery tool to explore metadata and usageA code-free framework to build and run data pipelines
Self-Service User Experience
Drag & drop
graphical
interface
Create,
debug,
deploy and
manage
Separation
of logic and
execution
environment
Native to
Hadoop &
Spark —
scales out
Rich app-
level
metadata
Track
lineage and
audits
Analyze
usage of
datasets
MDM
integration
framework

cask.co
The CDAP Architecture
12
Applications
Programs
MapReduce Spark
Tigon Workflow
Service Worker
Metadata
Datasets
Table Avro Parquet
Timeseries OLAP Cube
Geospatial ObjectStore
Metadata
Metadata
• Application Container Architecture
• Reusable Programming Abstractions
• Global User and Machine Metadata
• Highly Extensible Plugin Architecture

cask.co13
Single framework for building and running data apps and data lakes on Hadoop and Spark
Rapid
Development
• Standardization, deep
integrations, tools and docs

• Separation of app logic from
data logic and integration logic

• Conceptual integrity within
applications and consistency
across environments
Production
Operations &
Governance
• Simplified packaging, deployment
and monitoring of apps on Hadoop

• Enhanced security and governance
with centralized metrics and logs

• Tracking and exploration of
metadata, data provenance, audit
trails and usage analytics
CDAP Enables the Full Big Data Application Lifecycle
reduces time to develop and deploy big data apps by 80%
reduces time to insights and accelerates business value
removes barriers to innovation and future-proofs your apps

cask.co14
Customer Success Stories
Customer 
Situation
Lack of existing Hadoop expertise
and frustration with hand-coding
and scripting tools
Cask Hydrator for rapid creation of
data pipelines and Cask Tracker for
data discovery
POC in 2 days 
Production in 2 months
Cask 
Solution
Small team and significant
technical challenges limit pace of
development and solution scale
CDAP for real-time ingestion and
consistent processing with
production operations support
Development in 1 month 
Production in 3 months
Hundreds of Users 
Thousands of Pipelines
Multiple teams and technologies
with widely varied skillsets and
incompatible design choices
CDAP for data lake management
and orchestration, tightly
integrated into existing systems
Health Insurance Provider 
offloading clinical / immunization
reporting from Netezza
Leading SaaS Platform 
taking new real-time, massive
scale products to market
Large Telco Enterprise 
building a centralized, secured, 
multi-tenant Data Lake

cask.co15
Cask was Named a
Gartner Cool
Vendor 2016
Cask was Certified a
Great Place to Work 2016
“ … for the rest of us who lack the technological chips or patience to
make it all work, there’s good news: it will soon get easier, thanks to the
work done by the big data pioneers, as well as vendors like Cask …”

(Alex Woodie, Managing Editor, Datanami)
Awards and Accolades
“ … “Cask has tilted the playing ﬁeld, earning a massive unfair
advantage over proprietary point products for data integration and
ingest …”

(Nik Rouda, Senior Analyst, Enterprise Strategy Group)
“ … “CDAP is a big win for us … the amount of code we needed to
write was minimal with CDAP, and it was much easier and faster than
we ever expected …”

(Jia-Long Wu, Data Architect, Lotame)

cask.co16
NEW: CDAP 4 — Big Data Apps on Tap!
Available for download now!
Release of CDAP 4 Preview
“Big Data App Store”
Cask Market
Interactive Data Preparation
Cask Wrangler
Interactive Wizards for Common Tasks
Resource Center
Rewrite based on React
Reimagined CDAP UI

cask.co17
The “App Store for Big Data”
Cask Market
• Goal: Time to value in minutes w/ no existing experience
• Application and Library Ecosystem with pre-built Hadoop
solutions, reusable templates, and third-party plugins
• Available from anywhere inside the CDAP UI with a click
• Initially, everything in the Cask Market has been bootstrapped
by Cask based on ongoing work across our customers, is 100%
open source and available on GitHub
• Eventually, developers and ISVs will be able to showcase and
market their own applications and libraries (ex: Graylog)
Cask Market includes Interactive, Guided Wizards for Configuring Pre-Built Templates
NEW: CDAP 4 — Big Data Apps on Tap!

cask.co18
Building Data Pipelines on Hadoop with
Cask Hydrator
Data Lake Webinar
Introduction to Cask Hydrator
CDAP - Containers on Hadoop
CDAP Extensions - Cask Hydrator and
Cask Tracker
ESG Solution Spotlight
CDAP Technical Concepts (video)
Cask / Cloudera Solution Brief
Cask Resources

cask.co
● CDAP provides the first unified integration
platform for big data
● Cask Hydrator and Cask Tracker are visual
extensions of CDAP for self-service access
● CDAP empowers enterprise IT to deliver 
faster time to value for Hadoop and Spark, from
prototype to production

● Cask Market is a “big data app store” available in
CDAP 4 with pre-built apps, pipelines, plugins

● CDAP is 100% open source, highly extensible,
enterprise-ready, and commercially supported
Big Data on Tap
Summary

cask.co20
For more information, go to: cask.co
Thanks!

Perceptions & Questions
Analyst:
Robin Bloor

Big Data Foundations?
Robin Bloor, PhD

Neither Hadoop Nor Spark Is a Solution
However, both are useful and
increasingly versatile components for
Big Data applications

The Evolution of the Little Elephant
u  Hortonworks: Apache pure
play. No apparent vision.
u  Cloudera: Some proprietary
components (Cloudera
Manager, Impala, Cloudera
Search). Vision is corporate
data hub(?)
u  MapR: Also some proprietary
components (MapR-FS, MapR
Streams, MapR-DB)
u  And then there’s the cloud.

The Ship of Fools
Until Hadoop’s direction is controlled by
a single “captain” we may have to
tolerate the ship of fools

The “Big Data Hype Cycle” Is Misleading
u  Big Data is an ecosystem,
not a technology – which
distorts this graph
u  Some analytics applications
have experienced “absurd
acceleration”
u  Hadoop is, in many
instances, a laggard - Spark
too
u  Nevertheless, we seem to
be exiting “the trough”

The System Management Issue
Mobile
Devices
DesktopsServers
IoT
The
Cloud
Archive
Data
Stores
Data
Assaying
Data
Capture
Real-Time
Streaming?
Data
Mgt
Data
Serving
The Prospecting Domain
Apps
Data
Life Cycle
Mgt
Staging
Area
(Hadoop?)
System
Management

The Fundamental Issue
Big Data does not really have a
foundation. Neither, imho, does the
Data Lake.
Luckily, there are third parties…

u  Regarding Hadoop, do you have any “preferred
components?”
u  How do you stay current with the various distros?
Backward compatibility? Can a customer upgrade
at will?
u  How does your technology impact performance (if
at all)?
u  Do you provide a consultancy service?

u  Which companies/services do you regard as
competitive?
u  Do you have any specific partners?
u  What does an implementation look like?

THANK YOU
for your
ATTENTION!
Some images provided courtesy of Wikimedia Commons

Time's Up! Getting Value from Big Data Now

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Time's Up! Getting Value from Big Data Now

Similar to Time's Up! Getting Value from Big Data Now (20)

More from Eric Kavanagh

More from Eric Kavanagh (20)

Recently uploaded

Recently uploaded (20)

Time's Up! Getting Value from Big Data Now