That wonderful new phone in your pocket, so shiny, so useful, so ubiquitous, so invasive. These tools have allowed new industries to form by removing entire layers of service and creating new levels of service around previously unorganized communities of people (Uber, AirBNB, Lyft). There are now wearable devices that are proliferating sending our biometric data to 3rd parties to analyze. There are security cameras that record 24/7 happenings inside your home and alert you of any irregularities. All of these technologies create an immense amount of data: network addresses, locations, things clicked, web searches, people liked, photos shared. But what does all this mean for our society and our personal lives moving forward? Our speaker, John Tomizuka, will discuss the impacts and trends of what is happening in this brave and scary new world.
6. D E F I N I T I O N S
• Big Data: unstructured data, don’t know what questions are yet
• Business Intelligence: structured data, know what the questions
you want answered
• Statistics: structured data, not realtime, no action taken as a
result
• Machine Learning: creation of algorithms and applying them to
data sets in an attempt to learn from data
• Predictive Analytics: extracting existing data to predict trends
7. W H Y N O W ?
• 2003: Doug Cutting & Mike Cafarella, Nutch
• 2004:Google Labs: Map Reduce
• 2006:Doug Cutting moves to Yahoo and creates Hadoop
• 2008: Yahoo open sources Hadoop, Apache Software Foun
• 2009: Matei Zaharia starts Spark at UC Berkley
• 2013: Spark open sourced under Apache
8. M A P R E D U C E
Traditional / Sequential
Map
Reduce
20. P O L I T I C S
O B A M A C A M P A I G N 2 0 1 2
21. S C I E N C E
M O N T E R E Y B A Y A Q U A R I U M R E S E A R C H I N S T I T U T E
22. H E A L T H
A P P L E R E S E A R C H K I T
xt, Stanford says that it would normally take a national year-long effort to get that kind of scale. The flood of dat
23. M O R E R E A D I N G
• http://www.domo.com/blog/2014/04/data-never-sleeps-2-0/
• http://www.redorbit.com/education/reference_library/general-2/history-of/1113190638/the-history-of-
mobile-phone-technology/
• http://www.forbes.com/sites/gilpress/2013/05/09/a-very-short-history-of-big-data/
• http://www.wired.com/2015/04/robots-roam-earths-imperiled-oceans/?mbid=nl_041315
• http://www.allbusiness.com/what-does-your-supermarket-know-about-you-15611312-1.html
• http://www.geekwire.com/2015/baseball-analytics-mystery-mlb-team-uses-a-cray-supercomputer-to-
crunch-data/
• http://www.geekwire.com/2015/this-big-data-startup-just-raised-cash-to-analyze-driver-behavior-creating-
safety-scores-for-individual-
motorists/?utm_source=GeekWire+Daily+Digest&utm_campaign=20eb1892b3-daily-digest-
email&utm_medium=email&utm_term=04e93fc7dfd-20eb1892b3-
233387065&mc_cid=20eb1892b3&mc_eid=7b61e5049a
• http://www.newyorker.com/culture/culture-desk/the-horror-of-amazons-new-dash-button
• https://www.amazon.com/oc/dash-button
• http://harvardmagazine.com/2014/03/why-big-data-is-a-big-deal http://www.businessinsider.com/big-data-
is-growing-thanks-to-mobile-2013-1http://venturebeat.com/2015/04/03/how-microsofts-using-big-data-to-
predict-traffic-jams-up-to-an-hour-in-advance/
• http://www.engadget.com/2015/04/13/ibm-watson-health-
cloud/?utm_source=Feed_Classic_Full&utm_medium=feed&utm_campaign=Engadget&?ncid=rss_full
This is the ultimate Big Data scenario. It’s bigger than big data. When building the NSA Prism data center in Utah, they referred to Yottabyte storage. Calculations at the time suggested that it would cost trillions to create that size storage array.
Each of these are cases of a data breach, where customer data was stolen. In most cases, these are things like credit card data, address data. When we get to breaches like Blue Cross, the scenario starts to darken. This is only the beginning, once more of what represents who you are is online, the greater the risks of having that identity stolen.
For starters, Bolding notes that 95 percent of baseball stats have been created over the last five years thanks to the growing amount of data sensors and innovative methods of analyzing players.
“They are gathering so much data that a single person with an Excel spreadsheet can no longer analyze, in a sophisticated way, all the data they have,” Bolding said. “They need bigger and bigger computers to be able to analyze the data.”
As popularized by Michael Lewis’ Moneyball and the subsequent movie, using baseball data to drive decisions about player personnel — and ultimately win more games — was a strategy first used successfully by the Oakland in 2003.
The intent of Media Optimizer was to enable much more targeted ad purchases. Prior to Media Optimizer, TV ad buys were based on broad demographics, which is both costly and inefficient. With Media Optimizer in place, the campaign could use statistical analysis to identify the target voters in the DNC database. Next, the voter data was enriched, both with demographics data from TV ratings as well as advertisement pricing data. Finally, the results were fed back into Vertica and reanalyzed for further tuning.
With the overall picture combining likely voters for Obama, the shows they watch, and the prices of the ads -- as well as the analysis feedback loop -- it was much easier to determine the most efficient ad buys. One result was that the Obama campaign purchased twice the number of cable TV advertisements as the Romney campaign, many during niche programs, aimed at the precise demographic slices the Obama campaign was trying to reach.
MBARI has a fleet of them, three different kinds—autonomous machines that prowl the open oceans gathering data, allowing researchers to monitor it in real time. The machines do not tire, and they cannot drown. They survive shark bites. They can roam for months on end, beaming a steady stream of data to scientists sitting safely onshore.