Short explanation for Big Data

•Download as PPTX, PDF•

1 like•538 views

Read&Subscribe to my blogs@siddhithakkar.com. Details of slides below: This document is aimed at explaining the following: 1) How big is big data? 2) Types of big data (with examples) 3) When does your data become big data? (objective parameters) Remember to look into the notes section for more hints.

Data & Analytics

Key points to be discussed:
• How big is big data?
• Types of big data
• When does your data
become big data?

How Big is Big Data?
• It isn’t just size!!
• 4 V’s determine if data is
really big

Types of Big Data
• Structured data
• Follows a fixed format
• Based on relational db
• Unstructured data
• Doesn’t follow a fixed format
• Based on character and binary
data
• Semi-structured data
• Hybrid case
• Based on xml/json

When does your data become Big Data?
Parameter Trad. Data Big Data
Generated rate Per hour/ day.. Rapid
Structure Structured Semi-structured and
unstructured
Data Source Centralized Fully distributed
Data Store RDBMS HDFS, NoSQL
Scenarios Repeated read
and write
Write once,
repeated read
Integration Easy Difficult

Big Data Analytics
1) Descriptive analysis
• “What happened?”
2) Diagnostic analysis
• “Why did it happen?”
3) Predictive analysis
• “What will happen next?”
4) Prescriptive analysis
• “What should I do?”
1
2
3
4

• Often a target environment for
ETL tools
• Electronic storage of data
• Designed for query and
analysis
• Not meant for transactional
processing
• Makes data mining possible
• Core of BI systems
What is data warehouse?

Popular Data terms
• Data science is the
umbrella term
• Data analytics is
examining data to deliver
business insights
• Data mining is looking for
patterns in data

What is data science?
• Umbrella term
• Includes:
• Data munging
• Data exploration
• Data representation

Thank you
Siddhi Thakkar
siddhithakkar@yahoo.com

Recently uploaded

Anomaly detection and data imputation within time seriesParis Women in Machine Learning and Data Science

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823

Predicting Loan Approval: A Data Science ProjectBoston Institute of Analytics

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823

➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823

Discover Why Less is More in B2B Researchmichael115558

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823

➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823

CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823

Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics

Recently uploaded (20)

Anomaly detection and data imputation within time series

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand

Predicting Loan Approval: A Data Science Project

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand

➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...

➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...

Discover Why Less is More in B2B Research

Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service

Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand

➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...

CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore

Detecting Credit Card Fraud: A Machine Learning Approach

Featured

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools

12 Ways to Increase Your Influence at WorkGetSmarter

ChatGPT webinar slidesAlireza Esmikhani

More than Just Lines on a Map: Best Practices for U.S Bike RoutesProject for Public Spaces & National Center for Biking and Walking

Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference

Featured (20)

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...

12 Ways to Increase Your Influence at Work

ChatGPT webinar slides

More than Just Lines on a Map: Best Practices for U.S Bike Routes

Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...

Short explanation for Big Data

1. BIG

2. Key points to be discussed: • How big is big data? • Types of big data • When does your data become big data?

3. How Big is Big Data? • It isn’t just size!! • 4 V’s determine if data is really big

4. Types of Big Data • Structured data • Follows a fixed format • Based on relational db • Unstructured data • Doesn’t follow a fixed format • Based on character and binary data • Semi-structured data • Hybrid case • Based on xml/json

5. When does your data become Big Data? Parameter Trad. Data Big Data Generated rate Per hour/ day.. Rapid Structure Structured Semi-structured and unstructured Data Source Centralized Fully distributed Data Store RDBMS HDFS, NoSQL Scenarios Repeated read and write Write once, repeated read Integration Easy Difficult

6. Big Data Analytics 1) Descriptive analysis • “What happened?” 2) Diagnostic analysis • “Why did it happen?” 3) Predictive analysis • “What will happen next?” 4) Prescriptive analysis • “What should I do?” 1 2 3 4

7. • Often a target environment for ETL tools • Electronic storage of data • Designed for query and analysis • Not meant for transactional processing • Makes data mining possible • Core of BI systems What is data warehouse?

8. Popular Data terms • Data science is the umbrella term • Data analytics is examining data to deliver business insights • Data mining is looking for patterns in data

9. What is data science? • Umbrella term • Includes: • Data munging • Data exploration • Data representation

10. Thank you Siddhi Thakkar siddhithakkar@yahoo.com

Editor's Notes

Most of us are aware that big data corresponds to huge amount of data- so huge that our traditional data management tools can’t store or process them. But very important to understand in the beginning itself is that the complexity of big data isn’t just about its size. There are so many other factors that make big data the thing it is today. These factors or characteristics are called 4V’s of Big Data. First is the volume of data and when you say volume- it isn’t just about its current size but also the rate at which your data is growing. Second is the variety of data i.e. from how many heterogenous sources or distributed systems are you collecting this data Thirdly, Velocity which is the rate at which data is coming in. As an example, there could be billions of transactional data being recorded in a second And lastly Veracity which refers to the unreliability of data. If your data is from a source that you don’t trust, business insights delivered wont be of much value either. So, if you have term your data as big- make sure that it is scoring fairly well on these four characteristics.
Next is the types of big data. It can be classified into three basic areas: Structured data is one that follows a pre-defined schema and can perfectly fit into our relational databases. That kind of data is simplest to manage, and then of course is the easiest one to analyze. You could think of an excel sheet about employees of a company as an example of structured data. Unstructured data, like the name implies, doesn’t follow a fixed format and neither do they fit into our mainstream relational databases. As an example, all the character and binary data generated from (text, pictures, word, pdf of) our social media tools is extremely unstructured. It’s a big challenge to manage and analyze such kind of data. Semi-structured data is mix of both types. It does follow a fixed format- but still difficult to analyze as compared to completely structured data. You could think of xml files generated from some of software solutions as an example of semi-structured data.
In this slide, we are going to talk about some objective parameters that can help you identify big data. If your data is generated at an extremely rapid rate- say billions of transactions per second- this could already be a hint towards moving in the direction of big data. Where as traditional data expands on per day/per hour basis. Secondly, big data is mostly highly unstructured- or at best- semi structured- its extremely difficult to analyze it. Whereas traditional is structured and hence our conventional data tools are able to manage it. Such kind of data is mostly coming from complex heterogenous and distributed systems whereas traditional data can be picked up from one source. Since data is not in a form that can fit into a structured tabular format, therefore this huge hue and cry about non-conventional NoSQL and Hadoop systems whereas structured data can live inside relational databases. - Our traditional data could change anytime- for example: bank account information, phone number etc. Therefore, you need multiple repeated updates there- whereas big data represents events- for example: purchase in a store, a web page view etc. and event data by nature doesn’t change. So, there is no need of multiple write statements there. - Because data is being collected from several heterogenous sources, data integration is a difficult task. This is the reason you have so many ETL tools doing the rounds in market where as with tradition data you don’t need any such tools.
Analytics, as we know, is the process of examining data sets so as to discover hidden patterns or unknown correlations. The ultimate stakeholder of such a process are the business stakeholders because such analytics are supposed to help businesses take informed decisions. There are four different types of big data analytics. Descriptive analytics- describes what had happened in a particular situation. It creates reports or simple visualizations to help you understand what happened at one particular time or over a period of time. They are least complex and don’t involve any AI or ML stuff. (Helps you understand the past) For example: summarizing the success of a market campaign on social media Diagnostic analytics answers the question as to why something happened at all. It allows analysts to deep dive into data and really understand the root cause of a problem/events. Such analysis is moderately complex and can involve use of AI and ML techniques. (Helps you understand the past) For example: why did a marketing campaign succeed- because how many number of posts made, number of followers or mentions etc. Predictive analytics helps you answer what can happen in the future. Such analytics is complex because it involves the use of highly advanced algorithms. (Helps you understand the future) Prescriptive analytics will help you answer what should you do in order to achieve a desired result. Such analysis require highly complex machine learning techniques which very few tools are able to offer. Example: Your gps devices makes use of geo spatial data to suggest the route that you should take. (Helps you understand the future)
It is electronic storage of a large amount of information by a business which is designed for query and analysis instead of transaction processing. It is a process of transforming data into information and making it available to users in a timely manner to make a difference. data mining: finding patterns in data https://www.guru99.com/data-warehousing.html#13
Data science is an umbrella term that includes several data processes. You may think of data mining as the process of preparing a meal. The first step of preparing food is the collection of raw materials, cleaning and cutting them. Similarly, in data science the first process is munging i.e. collection of data from several sources using ETL tools and its cleansing. Second is cooking/processing the food. Similarly, in data science you explore through the data to find some hidden patterns and unknown co-relations. Thirdly, you serve food in a representable way. Similarly, data science involves the generation of reports or dashboards so that business owners are able to make sense of the data.

Short explanation for Big Data

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

Short explanation for Big Data

Editor's Notes