SlideShare a Scribd company logo
1 of 67
Download to read offline
© 2014 Ellen Friedman 1 
Seeing With Your Eyes Closed 
Ellen Friedman 
No SQL Matters Barcelona 
22 November 2014
© 2014 Ellen Friedman 2 
Contact Information 
Ellen Friedman 
Solutions Consultant and Commentator 
Apache Mahout committer, Apache Drill contributor 
Email ellenf@apache.org 
efriedman@maprtech.com 
Twitter @Ellen_Friedman @ApacheDrill 
Hashtag today: #NoSQL14
© 2014 Ellen Friedman 3 
Thinking With Your Eyes Closed 
When some people think… 
© 2014 Ellen Friedman 
… they close their eyes in order to “see”.
© 2014 Ellen Friedman 4 
Getting Past the Details 
• Look at your data with an open mind 
• Listen to what data tells you 
• Find the key concepts in what you do 
• Give yourself an opportunity for discovery
© 2014 Ellen Friedman 5 
NoSQL 
• Founded on discovery 
• Solution-driven 
• Don’t be bound by the tool 
• Flexibility is important 
• How do you keep your ability for invention?
! ! ! ! !Basic idea: 
! !“Eyes open” ! !“Eyes closed” 
! ! !D e tails ! ! ! !Discovery! 
© 2014 Ellen Friedman 6
© 2014 Ellen Friedman 7 
Imagination, technology and 
careful reasoning 
Think where this may take you.
Things don’t always turn out the way you predict… 
With exploration into new frontiers, you may 
meet your goal in surprising ways. 
© 2014 Ellen Friedman 8 
Spanish explorers came to the Americas in 
search for riches. 
They were looking for gold and silver. 
They found cochineal. 
Red dye worth a fortune. 
A Perfect Red, 
by Amy Butler Greenfield
Big Data and Open Source in the 19th Century 
Here’s a story with the power of vision (eyes closed thinking) plus 
keen observation and attention to detail (eyes open thinking) 
It’s got: 
• Adventure on the high seas 
• Time series data (a hot topic in the NoSQL world today) 
• Clever community building for open source participation 
• World speed record 
• (but no pirates) 
© 2014 Ellen Friedman 9
© 2014 Ellen Friedman 10 
Here’s the story!
© 2014 Ellen Friedman 11 
Matthew Fountain Maury was a sailor 
in the 1830s. 
Injured at sea, the US Navy gave 
him a “desk job”. 
Oddly, that’s where the 
real adventure starts.!
© 2014 Ellen Friedman 12 
Time Series Data – An Old Idea 
Captain’s log book entry 
for the Steam Ship Bear, 
1884 trip to Arctic 
From image digitized by 
www.oldweather.org and 
provided via 
www.naval-history.net . 
Image modified by Ellen 
Friedman and Ted 
Dunning. 
Ship captains kept log books with various comments plus 
measurements recorded at specific times.
© 2014 Ellen Friedman 13 
Time Series Data – An Old Idea 
The basis of a time series is the repeated measurement of parameters 
over time, together with the times at which the measurements were made.
© 2014 Ellen Friedman 14 
Time Series Data – An Old Idea 
At his desk job in the U.S. Navy Office of Charts, Maury 
discovered boxes with hundreds of ship’s logs, largely forgotten.
Big data project: Bring the data together 
• Using the log data, Maury and his team built maps to indicate wind, 
temperature, currents 
– They extracted, transformed and aggregated this huge volume of data 
– By hand! 
• Mariners would be able to predict conditions on various routes at 
© 2014 Ellen Friedman 15 
different times of the year 
• His theory was that this would help navigation 
• Maury published his Winds and Currents charts to be widely available
Big data project: Maury’s Wind and Currents charts 
At first, no body was 
interested in them… 
© 2014 Ellen Friedman 16
© 2014 Ellen Friedman 17 
Maury’s Wind and Currents charts 
Using Maury’s carefully compiled data, Captain Jackson got back one month 
early on a trip from Baltimore in the US to Rio de Janeiro in Brazil.
© 2014 Ellen Friedman 18 
Maury’s Wind and Currents charts 
Now everybody wanted one of his charts. 
Here’s where the open source parts comes in…
Maury’s Open Source Project: The Abstract Log 
Maury wanted better data from the ship’s captains. To get one of 
Maury’s Winds and Currents charts: 
• Captains first had to fill in a special template for one of their trips 
• They returned the template, called Abstract Log, to Maury and 
got a chart 
• Maury’s team collected new data that was better than before: 
regular and systematic time series data 
© 2014 Ellen Friedman 19
Data-Drive Decisions Set a World Record 
• In 1853, clipper ship Flying Cloud set record for fastest sailing 
from New York City to San Francisco 
• Maury’s charts played a key role in the navigator’s expert, data-driven 
© 2014 Ellen Friedman 20 
decisions about the route 
• Surprisingly, the navigator was a woman, Eleanor Creesy
© 2014 Ellen Friedman 21 
Key Lessons from Maury’s Work 
• Give to get 
– Give the Abstract Log to captains, get data collected in careful way 
• Big data consortium wins 
– Merging data gives pictures nobody else can see 
• Building open source community is valuable 
– The collective effort builds the basis for exploration and discovery 
• Lessons like today: Just 150 years before everybody else
© 2014 Ellen Friedman 22 
Where exploration is 
taking us now!
© 2014 Ellen Friedman 23 
Exploration takes you to surprising places 
The really scary part is knowing the amount of 
computing power in the Apollo 11 guidance 
system… 
Buzz Aldrin steps onto Moon 
photo by Neil Armstrong, Apollo 11 
20 July 1969 
NASA photo http://1.usa.gov/1uXi53U
© 2014 Ellen Friedman 24 
Computing power in familiar objects 
For comparison: SIM chip in smart card similar 
to the SIM chip in a cell phone 
Has about 0.5 kilobytes RAM 
16.0 kilobytes ROM 
Only a little less than Apollo…
© 2014 Ellen Friedman 25 
Computing power in familiar objects 
SIM chip in smart card similar to the SIM chip in a 
cell phone 
Has about 0.5 kilobytes RAM 
16.0 kilobytes ROM 
Phone processor is very powerful: 
1.3 GHz, dual core,1 GB of RAM 
Much more powerful than Apollo
© 2014 Ellen Friedman 26 
Computing power in familiar objects 
Arduino is a little microprocessor with enough 
power to interact with sensors in the IoT 
The question is, what can you use these 
powerful, compact technologies to do?
Things may not turn out the way you predict 
© 2014 Ellen Friedman 27 
Surprising use for a 
microprocessor: 
Family cat equipped with “smart 
collar” investigates neighborhood 
and reveals weak security for 
local wi-fi 
Humorous glimpse at the potential 
for IoT 
https://www.mapr.com/blog/the-internet-of-cat-toys
© 2014 Ellen Friedman 28 
Who Needs Time Series Data? 
Utility providers use 
smart meters to monitor 
very short term changes 
in energy usage
© 2014 Ellen Friedman 29 
Who Needs Time Series Data? 
Manufacturers who monitor 
equipment on the assembly 
line 
Manufacturers who produce 
“smart parts” that report back 
after the parts are in operation
© 2014 Ellen Friedman 30 
Unmanned Ocean Robot: Wave Glider 
• Made by Liquid Robotics 
http://liquidr.com/technology/waveglider/how-it-works.html 
• Powered by wave motion 
• Onboard sensors solar powered 
• Travelled from San Francisco to 
Hawaii, Japan & Australia 
• Survived shark attack and typhoon 
• Cool
© 2014 Ellen Friedman 31 
Environmental Monitoring 
• Big trend and growing 
• Companies to collect, store and analyze data 
• Example: Planet OS 
– Multi-sensor, machine data 
– Time series + spatial data 
– https://planetos.com
© 2014 Ellen Friedman 32 
Smart Shirt 
• Sensors embedded in fabric 
– Measures heart rate & movement 
– Includes time stamp and geo data 
• Smart fabric uses smart phone as hub 
• Fabric also used for other industries 
• Made by Smart Sensing, part of 
Cityzen Sciences Consortium 
• Also cool. 
Feb 2014 article in gizmag 
http://www.gizmag.com/cityzen-smart-shirt-sensing- 
fabric-health-monitoring/30428/
© 2014 Ellen Friedman 33 
Cityzen Data 
• Spin-off from consortium Cityzen Sciences 
• Provides data platform for storage & analysis of sensor 
data inc smart shirt 
• http://www.cityzendata.com 
• Presentation by Cityzen Data CTO Mathias Herberts “From 
Thread to API” (Feb 2014 ) 
https://www.youtube.com/watch?v=RV_Wgc-0yOs 
• Presentation in Silicon Valley in June 2014 
http://www.slideshare.net/Mathias-Herberts/20140611-io-tsiliconvalley
When is a NoSQL time series database useful? 
© 2014 Ellen Friedman 34 
Build a NoSQL time series database when 
• Most of your scans are based on a time range 
• Data is at large scale
© 2014 Ellen Friedman 35 
!
© 2014 Ellen Friedman 36 
Lesson: 
It’s scary to go the Moon with the 
computing power of a credit card!
© 2014 Ellen Friedman 37 
Lesson: 
Modern computing + NoSQL methods = 
enormous potential!
© 2014 Ellen Friedman 38 
Communication matters… 
!
Like monkeys trying to describe a Capybara… 
© 2014 Ellen Friedman 39 
Seen on Twitter: https://twitter.com/rudytheelder/status/500471789042954240
Getting Past the Details 
It’s no longer acceptable for technical and non-technical teams to be 
unable to communicate 
• Data science team needs to clearly exchange ideas about project 
© 2014 Ellen Friedman 40 
goals, resources and planning with domain experts 
• Find a new language to describe your work appropriately 
• Find the key concepts in what you do 
• Describe them in a way that makes sense to your audience
© 2014 Ellen Friedman 41 
Basic idea: 
Seeing key concepts leads to 
discovery and implementation!
Time Series Databases 
by Ted Dunning and Ellen Friedman © Oct 2014 (published by O’Reilly) 
© 2014 Ellen Friedman 42 
How to store & access 
time series data using 
NoSQL database (HBase 
or MapR-DB) 
e-books currently available 
courtesy of MapR 
http://bit.ly/1GMk9yY
Innovations in Recommendation 
by Ted Dunning and Ellen Friedman © Feb 2014 (published by O’Reilly) 
© 2014 Ellen Friedman 43
A New Look at Anomaly Detection 
by Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly) 
© 2014 Ellen Friedman 44
! ! ! ! !Basic idea: 
! !“Eyes open” ! !“Eyes closed” 
! ! !P r esent ! ! ! ! !Future! 
© 2014 Ellen Friedman 45
Flexibility is a key aspect of 
© 2014 Ellen Friedman 46 
NoSQL!
How would you like to be able to… 
• Query multiple data types including JSON or Parquet with SQL? 
• Use directory name as a table name when you query so you don’t have to 
© 2014 Ellen Friedman 47 
know in advance the files you’re going for? 
• Use standard SQL query on Hadoop or NoSQL, with low-latency? 
• Go schema-less !? 
(shocking!) 
• Reduce the distance to your data? 
• This is where Apache Drill comes in… 
• That’s where Drill comes in…
Apache Drill 
• Low latency SQL query engine for Apache Hadoop and NoSQL 
• Extremely flexible: 
– 1st and only distributed SQL query engine that does not require schema 
– Uses wide range of data types including nested, JSON, Parquet 
© 2014 Ellen Friedman 48 
• Convenient: 
– Uses familiar ANSI SQL commands 
– Lets you continue to use standard BI tools 
• Open source community: 
– Approaching graduation
Real SQL instead of “SQL-like” 
• May be surprising to boast in a NoSQL conference, but flexibility 
is important – find solutions, not bound by one tool 
• Sample TPC-H SQL benchmark query that Drill can run “as is”: 
© 2014 Ellen Friedman 49
© 2014 Ellen Friedman 50 
Schema-less distributed SQL engine 
• Save weeks or months 
– would have been spent on defining schema, ETL and maintaining 
schema 
• Drill automatically understands the structure of data 
• Simply point Drill at data and run queries 
– Works on file, directory, Hbase or MapR-DB, table etc.
Query complex, semi-structured data “as is” 
• No need to flatten or transform data prior to query execution 
• Intuitive extensions to SQL to work with nested data 
• Here is simple query on a JSON file: 
© 2014 Ellen Friedman 51
Apache Drill 
• Open source, open opportunities 
• What would you use Drill to do? 
• Best use case will be featured in upcoming book on Drill 
© 2014 Ellen Friedman 52
© 2014 Ellen Friedman 53 
Looking Forward: 
Apache Drill SQL on NoSQL!
Big Impact on Society! 
© 2014 Ellen Friedman 54
What if you needed to uniquely 
identify every person in India?! 
A ll 1.2 billion of them?! 
© 2014 Ellen Friedman 55
PEOPLE © 2014 Ellen Friedman 56 
1.2 B 
Largest Biometric Database in the World 
PPEEOOPPLLEE 
The Aadhaar Project: 
• Unique 12 – digit number for each person in India 
• Proof of identity and address, authenticated anytime, anywhere 
• Runs on NoSQL database MapR-DB
© 2014 Ellen Friedman 57 
A Day in the Life of the Aadhaar Project 
Data platform must handle: 
• 1 million new enrollments /day 
– After 4 years, ~ 600 million of the 1.2 billion already enrolled 
– 4+ PB of raw data 
• Each new enrollment needs de-duplication 
– 100s of millions of transaction over billions of records doing 100s of trillions of 
biometric matches/day 
• Online sub-second authentications 
– as many as 100 million per day 
From Pramod Varma, Chief Architect of UIDAI at Strata / Hadoop World NYC Oct 2014 
http://strataconf.com/stratany2014/public/schedule/detail/36305 
Official website of Unique Identification Authority of India (UIDAI) 
http://uidai.gov.in
© 2014 Ellen Friedman 58 
What does Aadhaar mean for India? 
• Better delivery of welfare services 
• More open society 
– Identification without regard to cast, creed, religion or geography 
• Reduction in embezzlement – save billions in government funds 
• NoSQL is changing society for the better
! ! ! ! !Basic idea: 
! !“Eyes open” ! !“Eyes closed” 
! !Implementation ! ! !Vision! 
© 2014 Ellen Friedman 59
© 2014 Ellen Friedman 60 
Exploration takes you to surprising places 
Buzz Aldrin steps onto Moon 
photo by Neil Armstrong, Apollo 11 
20 July 1969 
NASA photo http://1.usa.gov/1uXi53U
© 2014 Ellen Friedman 61 
India’s Space Program: Mission to Mars 
• India’s ISRO gets Mars orbit on 1st try 
• US NASA & India’s ISRO look forward 
to collaboration (while @MarsOrbiter 
chats with @MarsCuriosity) 
• Also cool
© 2014 Ellen Friedman 62 
India’s Women Engineers at ISRO 
• ISRO and NASA have many women 
engineers 
• Very cool
European Space Agency: Rosetta Mission to Comet 
• Mission took 10 years, 8 mo, 19 days 
• Philae lander touched down on comet 
on 12 November 2014 
• Outrageously cool! 
© 2014 Ellen Friedman 63
What do I predict for the NoSQL 
© 2014 Ellen Friedman 64 
future?!
What future do you want to build?! 
© 2014 Ellen Friedman 65
© 2014 Ellen Friedman 66 
Contact Information 
Ellen Friedman 
Solutions Consultant and Commentator 
Apache Mahout committer, Apache Drill contributor 
Email ellenf@apache.org 
efriedman@maprtech.com 
Twitter @Ellen_Friedman @ApacheDrill 
Hashtag today: #NoSQL14
© 2014 Ellen Friedman 67 
Thank you!!

More Related Content

Similar to Ellen Friedman - Keynote NoSQL matters Barcelona 2014

Cloud_Big_Data_Analytics_Mobile_Social_modern_internet_scale_business_models_...
Cloud_Big_Data_Analytics_Mobile_Social_modern_internet_scale_business_models_...Cloud_Big_Data_Analytics_Mobile_Social_modern_internet_scale_business_models_...
Cloud_Big_Data_Analytics_Mobile_Social_modern_internet_scale_business_models_...John Sing
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesTed Dunning
 
An Insight to the World of Wearable Computing
An Insight to the World of Wearable ComputingAn Insight to the World of Wearable Computing
An Insight to the World of Wearable ComputingFAIZAL T H
 
From data and information to knowledge : the web of tomorrow - Serge abitboul...
From data and information to knowledge : the web of tomorrow - Serge abitboul...From data and information to knowledge : the web of tomorrow - Serge abitboul...
From data and information to knowledge : the web of tomorrow - Serge abitboul...Kezhan SHI
 
Lessons learnt from DIY innovation: The story of Public Lab
Lessons learnt from DIY innovation: The story of Public LabLessons learnt from DIY innovation: The story of Public Lab
Lessons learnt from DIY innovation: The story of Public LabCindy Regalado
 
Mirko Lorenz - Data Journalism Event
Mirko Lorenz - Data Journalism EventMirko Lorenz - Data Journalism Event
Mirko Lorenz - Data Journalism EventDan Davies
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web ArchivesMichael Nelson
 
The Planets
The PlanetsThe Planets
The Planetsdeloris1
 
Our solar system
Our solar systemOur solar system
Our solar systemdeloris1
 
From Archives to Climate Science
From Archives to Climate ScienceFrom Archives to Climate Science
From Archives to Climate Sciencelifeofdata
 
Knowledge Management and Open Data for Innovation
Knowledge Management and Open Data for InnovationKnowledge Management and Open Data for Innovation
Knowledge Management and Open Data for InnovationJeanne Holm
 

Similar to Ellen Friedman - Keynote NoSQL matters Barcelona 2014 (16)

Spark
SparkSpark
Spark
 
Cloud_Big_Data_Analytics_Mobile_Social_modern_internet_scale_business_models_...
Cloud_Big_Data_Analytics_Mobile_Social_modern_internet_scale_business_models_...Cloud_Big_Data_Analytics_Mobile_Social_modern_internet_scale_business_models_...
Cloud_Big_Data_Analytics_Mobile_Social_modern_internet_scale_business_models_...
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
 
An Insight to the World of Wearable Computing
An Insight to the World of Wearable ComputingAn Insight to the World of Wearable Computing
An Insight to the World of Wearable Computing
 
From data and information to knowledge : the web of tomorrow - Serge abitboul...
From data and information to knowledge : the web of tomorrow - Serge abitboul...From data and information to knowledge : the web of tomorrow - Serge abitboul...
From data and information to knowledge : the web of tomorrow - Serge abitboul...
 
Lessons learnt from DIY innovation: The story of Public Lab
Lessons learnt from DIY innovation: The story of Public LabLessons learnt from DIY innovation: The story of Public Lab
Lessons learnt from DIY innovation: The story of Public Lab
 
Mirko Lorenz - Data Journalism Event
Mirko Lorenz - Data Journalism EventMirko Lorenz - Data Journalism Event
Mirko Lorenz - Data Journalism Event
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
 
MOA 2015, Keynote - Open All The Things
MOA 2015, Keynote - Open All The ThingsMOA 2015, Keynote - Open All The Things
MOA 2015, Keynote - Open All The Things
 
A New Horizon: Space Apps in Canada 2019-11-21
A New Horizon: Space Apps in Canada 2019-11-21A New Horizon: Space Apps in Canada 2019-11-21
A New Horizon: Space Apps in Canada 2019-11-21
 
The Planets
The PlanetsThe Planets
The Planets
 
Our solar system
Our solar systemOur solar system
Our solar system
 
FDL 2018 Virtual Briefing 1
FDL 2018 Virtual Briefing 1FDL 2018 Virtual Briefing 1
FDL 2018 Virtual Briefing 1
 
From Archives to Climate Science
From Archives to Climate ScienceFrom Archives to Climate Science
From Archives to Climate Science
 
Research Data Management and Spatial Data
Research Data Management and Spatial DataResearch Data Management and Spatial Data
Research Data Management and Spatial Data
 
Knowledge Management and Open Data for Innovation
Knowledge Management and Open Data for InnovationKnowledge Management and Open Data for Innovation
Knowledge Management and Open Data for Innovation
 

More from NoSQLmatters

Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...
Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...
Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...NoSQLmatters
 
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...NoSQLmatters
 
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015NoSQLmatters
 
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...NoSQLmatters
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...NoSQLmatters
 
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015NoSQLmatters
 
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...NoSQLmatters
 
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...NoSQLmatters
 
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015NoSQLmatters
 
Chris Ward - Understanding databases for distributed docker applications - No...
Chris Ward - Understanding databases for distributed docker applications - No...Chris Ward - Understanding databases for distributed docker applications - No...
Chris Ward - Understanding databases for distributed docker applications - No...NoSQLmatters
 
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...NoSQLmatters
 
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...NoSQLmatters
 
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015NoSQLmatters
 
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...NoSQLmatters
 
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...NoSQLmatters
 
David Pilato - Advance search for your legacy application - NoSQL matters Par...
David Pilato - Advance search for your legacy application - NoSQL matters Par...David Pilato - Advance search for your legacy application - NoSQL matters Par...
David Pilato - Advance search for your legacy application - NoSQL matters Par...NoSQLmatters
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015NoSQLmatters
 
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015NoSQLmatters
 
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...NoSQLmatters
 
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015NoSQLmatters
 

More from NoSQLmatters (20)

Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...
Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...
Nathan Ford- Divination of the Defects (Graph-Based Defect Prediction through...
 
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
Stefan Hochdörfer - The NoSQL Store everyone ignores: PostgreSQL - NoSQL matt...
 
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
Adrian Colyer - Keynote: NoSQL matters - NoSQL matters Dublin 2015
 
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
 
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
Mark Harwood - Building Entity Centric Indexes - NoSQL matters Dublin 2015
 
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...
Prassnitha Sampath - Real Time Big Data Analytics with Kafka, Storm & HBase -...
 
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...
 
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
Michael Hackstein - NoSQL meets Microservices - NoSQL matters Dublin 2015
 
Chris Ward - Understanding databases for distributed docker applications - No...
Chris Ward - Understanding databases for distributed docker applications - No...Chris Ward - Understanding databases for distributed docker applications - No...
Chris Ward - Understanding databases for distributed docker applications - No...
 
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
Philipp Krenn - Host your database in the cloud, they said... - NoSQL matters...
 
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
Lucian Precup - Back to the Future: SQL 92 for Elasticsearch? - NoSQL matters...
 
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
Bruno Guedes - Hadoop real time for dummies - NoSQL matters Paris 2015
 
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
DuyHai DOAN - Real time analytics with Cassandra and Spark - NoSQL matters Pa...
 
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
Benjamin Guinebertière - Microsoft Azure: Document DB and other noSQL databas...
 
David Pilato - Advance search for your legacy application - NoSQL matters Par...
David Pilato - Advance search for your legacy application - NoSQL matters Par...David Pilato - Advance search for your legacy application - NoSQL matters Par...
David Pilato - Advance search for your legacy application - NoSQL matters Par...
 
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
Tugdual Grall - From SQL to NoSQL in less than 40 min - NoSQL matters Paris 2015
 
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
Gregorry Letribot - Druid at Criteo - NoSQL matters 2015
 
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
Michael Hackstein - Polyglot Persistence & Multi-Model NoSQL Databases - NoSQ...
 
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015
Rob Harrop- Key Note The God, the Bad and the Ugly - NoSQL matters Paris 2015
 

Recently uploaded

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 

Recently uploaded (20)

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 

Ellen Friedman - Keynote NoSQL matters Barcelona 2014

  • 1. © 2014 Ellen Friedman 1 Seeing With Your Eyes Closed Ellen Friedman No SQL Matters Barcelona 22 November 2014
  • 2. © 2014 Ellen Friedman 2 Contact Information Ellen Friedman Solutions Consultant and Commentator Apache Mahout committer, Apache Drill contributor Email ellenf@apache.org efriedman@maprtech.com Twitter @Ellen_Friedman @ApacheDrill Hashtag today: #NoSQL14
  • 3. © 2014 Ellen Friedman 3 Thinking With Your Eyes Closed When some people think… © 2014 Ellen Friedman … they close their eyes in order to “see”.
  • 4. © 2014 Ellen Friedman 4 Getting Past the Details • Look at your data with an open mind • Listen to what data tells you • Find the key concepts in what you do • Give yourself an opportunity for discovery
  • 5. © 2014 Ellen Friedman 5 NoSQL • Founded on discovery • Solution-driven • Don’t be bound by the tool • Flexibility is important • How do you keep your ability for invention?
  • 6. ! ! ! ! !Basic idea: ! !“Eyes open” ! !“Eyes closed” ! ! !D e tails ! ! ! !Discovery! © 2014 Ellen Friedman 6
  • 7. © 2014 Ellen Friedman 7 Imagination, technology and careful reasoning Think where this may take you.
  • 8. Things don’t always turn out the way you predict… With exploration into new frontiers, you may meet your goal in surprising ways. © 2014 Ellen Friedman 8 Spanish explorers came to the Americas in search for riches. They were looking for gold and silver. They found cochineal. Red dye worth a fortune. A Perfect Red, by Amy Butler Greenfield
  • 9. Big Data and Open Source in the 19th Century Here’s a story with the power of vision (eyes closed thinking) plus keen observation and attention to detail (eyes open thinking) It’s got: • Adventure on the high seas • Time series data (a hot topic in the NoSQL world today) • Clever community building for open source participation • World speed record • (but no pirates) © 2014 Ellen Friedman 9
  • 10. © 2014 Ellen Friedman 10 Here’s the story!
  • 11. © 2014 Ellen Friedman 11 Matthew Fountain Maury was a sailor in the 1830s. Injured at sea, the US Navy gave him a “desk job”. Oddly, that’s where the real adventure starts.!
  • 12. © 2014 Ellen Friedman 12 Time Series Data – An Old Idea Captain’s log book entry for the Steam Ship Bear, 1884 trip to Arctic From image digitized by www.oldweather.org and provided via www.naval-history.net . Image modified by Ellen Friedman and Ted Dunning. Ship captains kept log books with various comments plus measurements recorded at specific times.
  • 13. © 2014 Ellen Friedman 13 Time Series Data – An Old Idea The basis of a time series is the repeated measurement of parameters over time, together with the times at which the measurements were made.
  • 14. © 2014 Ellen Friedman 14 Time Series Data – An Old Idea At his desk job in the U.S. Navy Office of Charts, Maury discovered boxes with hundreds of ship’s logs, largely forgotten.
  • 15. Big data project: Bring the data together • Using the log data, Maury and his team built maps to indicate wind, temperature, currents – They extracted, transformed and aggregated this huge volume of data – By hand! • Mariners would be able to predict conditions on various routes at © 2014 Ellen Friedman 15 different times of the year • His theory was that this would help navigation • Maury published his Winds and Currents charts to be widely available
  • 16. Big data project: Maury’s Wind and Currents charts At first, no body was interested in them… © 2014 Ellen Friedman 16
  • 17. © 2014 Ellen Friedman 17 Maury’s Wind and Currents charts Using Maury’s carefully compiled data, Captain Jackson got back one month early on a trip from Baltimore in the US to Rio de Janeiro in Brazil.
  • 18. © 2014 Ellen Friedman 18 Maury’s Wind and Currents charts Now everybody wanted one of his charts. Here’s where the open source parts comes in…
  • 19. Maury’s Open Source Project: The Abstract Log Maury wanted better data from the ship’s captains. To get one of Maury’s Winds and Currents charts: • Captains first had to fill in a special template for one of their trips • They returned the template, called Abstract Log, to Maury and got a chart • Maury’s team collected new data that was better than before: regular and systematic time series data © 2014 Ellen Friedman 19
  • 20. Data-Drive Decisions Set a World Record • In 1853, clipper ship Flying Cloud set record for fastest sailing from New York City to San Francisco • Maury’s charts played a key role in the navigator’s expert, data-driven © 2014 Ellen Friedman 20 decisions about the route • Surprisingly, the navigator was a woman, Eleanor Creesy
  • 21. © 2014 Ellen Friedman 21 Key Lessons from Maury’s Work • Give to get – Give the Abstract Log to captains, get data collected in careful way • Big data consortium wins – Merging data gives pictures nobody else can see • Building open source community is valuable – The collective effort builds the basis for exploration and discovery • Lessons like today: Just 150 years before everybody else
  • 22. © 2014 Ellen Friedman 22 Where exploration is taking us now!
  • 23. © 2014 Ellen Friedman 23 Exploration takes you to surprising places The really scary part is knowing the amount of computing power in the Apollo 11 guidance system… Buzz Aldrin steps onto Moon photo by Neil Armstrong, Apollo 11 20 July 1969 NASA photo http://1.usa.gov/1uXi53U
  • 24. © 2014 Ellen Friedman 24 Computing power in familiar objects For comparison: SIM chip in smart card similar to the SIM chip in a cell phone Has about 0.5 kilobytes RAM 16.0 kilobytes ROM Only a little less than Apollo…
  • 25. © 2014 Ellen Friedman 25 Computing power in familiar objects SIM chip in smart card similar to the SIM chip in a cell phone Has about 0.5 kilobytes RAM 16.0 kilobytes ROM Phone processor is very powerful: 1.3 GHz, dual core,1 GB of RAM Much more powerful than Apollo
  • 26. © 2014 Ellen Friedman 26 Computing power in familiar objects Arduino is a little microprocessor with enough power to interact with sensors in the IoT The question is, what can you use these powerful, compact technologies to do?
  • 27. Things may not turn out the way you predict © 2014 Ellen Friedman 27 Surprising use for a microprocessor: Family cat equipped with “smart collar” investigates neighborhood and reveals weak security for local wi-fi Humorous glimpse at the potential for IoT https://www.mapr.com/blog/the-internet-of-cat-toys
  • 28. © 2014 Ellen Friedman 28 Who Needs Time Series Data? Utility providers use smart meters to monitor very short term changes in energy usage
  • 29. © 2014 Ellen Friedman 29 Who Needs Time Series Data? Manufacturers who monitor equipment on the assembly line Manufacturers who produce “smart parts” that report back after the parts are in operation
  • 30. © 2014 Ellen Friedman 30 Unmanned Ocean Robot: Wave Glider • Made by Liquid Robotics http://liquidr.com/technology/waveglider/how-it-works.html • Powered by wave motion • Onboard sensors solar powered • Travelled from San Francisco to Hawaii, Japan & Australia • Survived shark attack and typhoon • Cool
  • 31. © 2014 Ellen Friedman 31 Environmental Monitoring • Big trend and growing • Companies to collect, store and analyze data • Example: Planet OS – Multi-sensor, machine data – Time series + spatial data – https://planetos.com
  • 32. © 2014 Ellen Friedman 32 Smart Shirt • Sensors embedded in fabric – Measures heart rate & movement – Includes time stamp and geo data • Smart fabric uses smart phone as hub • Fabric also used for other industries • Made by Smart Sensing, part of Cityzen Sciences Consortium • Also cool. Feb 2014 article in gizmag http://www.gizmag.com/cityzen-smart-shirt-sensing- fabric-health-monitoring/30428/
  • 33. © 2014 Ellen Friedman 33 Cityzen Data • Spin-off from consortium Cityzen Sciences • Provides data platform for storage & analysis of sensor data inc smart shirt • http://www.cityzendata.com • Presentation by Cityzen Data CTO Mathias Herberts “From Thread to API” (Feb 2014 ) https://www.youtube.com/watch?v=RV_Wgc-0yOs • Presentation in Silicon Valley in June 2014 http://www.slideshare.net/Mathias-Herberts/20140611-io-tsiliconvalley
  • 34. When is a NoSQL time series database useful? © 2014 Ellen Friedman 34 Build a NoSQL time series database when • Most of your scans are based on a time range • Data is at large scale
  • 35. © 2014 Ellen Friedman 35 !
  • 36. © 2014 Ellen Friedman 36 Lesson: It’s scary to go the Moon with the computing power of a credit card!
  • 37. © 2014 Ellen Friedman 37 Lesson: Modern computing + NoSQL methods = enormous potential!
  • 38. © 2014 Ellen Friedman 38 Communication matters… !
  • 39. Like monkeys trying to describe a Capybara… © 2014 Ellen Friedman 39 Seen on Twitter: https://twitter.com/rudytheelder/status/500471789042954240
  • 40. Getting Past the Details It’s no longer acceptable for technical and non-technical teams to be unable to communicate • Data science team needs to clearly exchange ideas about project © 2014 Ellen Friedman 40 goals, resources and planning with domain experts • Find a new language to describe your work appropriately • Find the key concepts in what you do • Describe them in a way that makes sense to your audience
  • 41. © 2014 Ellen Friedman 41 Basic idea: Seeing key concepts leads to discovery and implementation!
  • 42. Time Series Databases by Ted Dunning and Ellen Friedman © Oct 2014 (published by O’Reilly) © 2014 Ellen Friedman 42 How to store & access time series data using NoSQL database (HBase or MapR-DB) e-books currently available courtesy of MapR http://bit.ly/1GMk9yY
  • 43. Innovations in Recommendation by Ted Dunning and Ellen Friedman © Feb 2014 (published by O’Reilly) © 2014 Ellen Friedman 43
  • 44. A New Look at Anomaly Detection by Ted Dunning and Ellen Friedman © June 2014 (published by O’Reilly) © 2014 Ellen Friedman 44
  • 45. ! ! ! ! !Basic idea: ! !“Eyes open” ! !“Eyes closed” ! ! !P r esent ! ! ! ! !Future! © 2014 Ellen Friedman 45
  • 46. Flexibility is a key aspect of © 2014 Ellen Friedman 46 NoSQL!
  • 47. How would you like to be able to… • Query multiple data types including JSON or Parquet with SQL? • Use directory name as a table name when you query so you don’t have to © 2014 Ellen Friedman 47 know in advance the files you’re going for? • Use standard SQL query on Hadoop or NoSQL, with low-latency? • Go schema-less !? (shocking!) • Reduce the distance to your data? • This is where Apache Drill comes in… • That’s where Drill comes in…
  • 48. Apache Drill • Low latency SQL query engine for Apache Hadoop and NoSQL • Extremely flexible: – 1st and only distributed SQL query engine that does not require schema – Uses wide range of data types including nested, JSON, Parquet © 2014 Ellen Friedman 48 • Convenient: – Uses familiar ANSI SQL commands – Lets you continue to use standard BI tools • Open source community: – Approaching graduation
  • 49. Real SQL instead of “SQL-like” • May be surprising to boast in a NoSQL conference, but flexibility is important – find solutions, not bound by one tool • Sample TPC-H SQL benchmark query that Drill can run “as is”: © 2014 Ellen Friedman 49
  • 50. © 2014 Ellen Friedman 50 Schema-less distributed SQL engine • Save weeks or months – would have been spent on defining schema, ETL and maintaining schema • Drill automatically understands the structure of data • Simply point Drill at data and run queries – Works on file, directory, Hbase or MapR-DB, table etc.
  • 51. Query complex, semi-structured data “as is” • No need to flatten or transform data prior to query execution • Intuitive extensions to SQL to work with nested data • Here is simple query on a JSON file: © 2014 Ellen Friedman 51
  • 52. Apache Drill • Open source, open opportunities • What would you use Drill to do? • Best use case will be featured in upcoming book on Drill © 2014 Ellen Friedman 52
  • 53. © 2014 Ellen Friedman 53 Looking Forward: Apache Drill SQL on NoSQL!
  • 54. Big Impact on Society! © 2014 Ellen Friedman 54
  • 55. What if you needed to uniquely identify every person in India?! A ll 1.2 billion of them?! © 2014 Ellen Friedman 55
  • 56. PEOPLE © 2014 Ellen Friedman 56 1.2 B Largest Biometric Database in the World PPEEOOPPLLEE The Aadhaar Project: • Unique 12 – digit number for each person in India • Proof of identity and address, authenticated anytime, anywhere • Runs on NoSQL database MapR-DB
  • 57. © 2014 Ellen Friedman 57 A Day in the Life of the Aadhaar Project Data platform must handle: • 1 million new enrollments /day – After 4 years, ~ 600 million of the 1.2 billion already enrolled – 4+ PB of raw data • Each new enrollment needs de-duplication – 100s of millions of transaction over billions of records doing 100s of trillions of biometric matches/day • Online sub-second authentications – as many as 100 million per day From Pramod Varma, Chief Architect of UIDAI at Strata / Hadoop World NYC Oct 2014 http://strataconf.com/stratany2014/public/schedule/detail/36305 Official website of Unique Identification Authority of India (UIDAI) http://uidai.gov.in
  • 58. © 2014 Ellen Friedman 58 What does Aadhaar mean for India? • Better delivery of welfare services • More open society – Identification without regard to cast, creed, religion or geography • Reduction in embezzlement – save billions in government funds • NoSQL is changing society for the better
  • 59. ! ! ! ! !Basic idea: ! !“Eyes open” ! !“Eyes closed” ! !Implementation ! ! !Vision! © 2014 Ellen Friedman 59
  • 60. © 2014 Ellen Friedman 60 Exploration takes you to surprising places Buzz Aldrin steps onto Moon photo by Neil Armstrong, Apollo 11 20 July 1969 NASA photo http://1.usa.gov/1uXi53U
  • 61. © 2014 Ellen Friedman 61 India’s Space Program: Mission to Mars • India’s ISRO gets Mars orbit on 1st try • US NASA & India’s ISRO look forward to collaboration (while @MarsOrbiter chats with @MarsCuriosity) • Also cool
  • 62. © 2014 Ellen Friedman 62 India’s Women Engineers at ISRO • ISRO and NASA have many women engineers • Very cool
  • 63. European Space Agency: Rosetta Mission to Comet • Mission took 10 years, 8 mo, 19 days • Philae lander touched down on comet on 12 November 2014 • Outrageously cool! © 2014 Ellen Friedman 63
  • 64. What do I predict for the NoSQL © 2014 Ellen Friedman 64 future?!
  • 65. What future do you want to build?! © 2014 Ellen Friedman 65
  • 66. © 2014 Ellen Friedman 66 Contact Information Ellen Friedman Solutions Consultant and Commentator Apache Mahout committer, Apache Drill contributor Email ellenf@apache.org efriedman@maprtech.com Twitter @Ellen_Friedman @ApacheDrill Hashtag today: #NoSQL14
  • 67. © 2014 Ellen Friedman 67 Thank you!!