1. Big Data
A really ‘Big’ deal or just another
hammer looking for a nail?
-Keshav Deshpande
Software Developer
kdeshpande1@verizon.net
2. A little bit of theory - The V’s of BigData
•Volume scaled at terabyte/petabyte levels
•Variety structured, unstructured, hybrid data formats
•Velocity data generated at internet speeds (tera, exa – range)
Often, Veracity is added to this list reliability of the data
Implications on IT Solutions Architectures
•Current computing paradigm Data layer/Middleware/UI layer (n-tier
architectures)
• fetch data from Data Layer
• ship data to Middleware for processing (or to UI layer for
display)
• ship data back to Data Layer for storage.
At ‘Big Data’ scale, this approach simply does not
perform/scale!
What is so ‘big’ about Big Data
3. If you can’t go to the mountain, let the mountain come to -
you !
Proposed ‘solution’
Ship processing to where the data is located, instead of shipping data
to where process is located
Process smaller chunks of data, in parallel, then combine the results
OK, so with this scheme, we are assured of ‘scale’ and even
‘performance’ – so what do I do with it?
Remember the hammer and nail?
It seems we have ourselves a hammer,
So lets look for the ‘nails’…..
4. • Besides storing/retrieving/processing data at scale
• parallel and distributed nature - necessitated by the 3 (or 4) V’s
• high level of concurrency - storing, retrieving or processing
• high level of asynchrony
• non-blocking, fire-and-forget
• call and then notify when “answer” is ready
• However Data is still ‘raw’
• Needs to be retrieved (mined) and processed (analyzed) to get at
‘Information’ or ‘Actionable Intelligence
Big Data Characteristics
5. Information is –
•not just confined to relationships between data entities (like in vanilla RDBMS) –
• both data and associated meta-data are information
• increasingly expressed as graphs (sparse or dense) entity relations are
still important, but they are now multi-dimensional
• very rich, data (and metadata) include
•
• data entities (vertices)
• inter-relationships (links and edges)
• degrees of separation between vertices, links and edges
•RDBMS-like design approaches fall short, under-perform, and do not scale
The real Big Data challenges, then are -
6. What is involved?
•Retrieving data from large, distributed data stores mining
of data for nuggets of information
•Analysis of data, but at internet scale to provide
actionable intelligence
• Analytics processing required to wring intelligence
out of raw data
•Information Visualization present analysis to the user
• Dashboards/UI Composites
All of the above, but in real-time (or near real-time)
Big Data Processing
7. An emerging trend – data in constant motion
• Conventionally, data is at rest. Implication data is
stale instantly
• any analysis on at-rest is after-the-fact or post-
mortem, if you will…
• Data in motion implies as-it-happens, event-based,
very loosely-coupled, asynchronous, non-blocking
• Analytics and BI at the point of streaming real-time,
complex event processing
Big Data Processing
8. By no means, an exhaustive listing –
•Business Intelligence derive Insights better Decision-making
•Insights crystal ball possible future states
• predictive and prescriptive analytics
•Automating development of such insight, developing algorithms
• machine learning
Outcomes
•Predictive Analytics from both historical, and real time data
•Automated (and perpetual) Machine Learning
Applications of Big Data