SlideShare a Scribd company logo
1 of 54
NASSCOM Future Skills Training
Course – Data Science & Analytics
Dhruv Saxena
Assistant Professor (TEQIP-NPIU)
1
2
3
4
5
6
7
Introduction
to
Data Science
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 8
OBJECTIVES
The objective of this course is to Impart necessary knowledge of the
mathematical foundations needed for data science and develop
programming skills required to build data science applications.
Duration – 60 Hours (40L + 20C)
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 10
LEARNING OUTCOMES
At the end of this course, the students will be able to:
● Demonstrate understanding of the mathematical foundations
needed for data science.
● Collect, explore, clean, munge and manipulate data.
● Implement models such as k-nearest Neighbors, Naïve Bayes,
linear and logistic regression, decision trees, neural networks and
clustering.
● Build data science applications using Python based toolkits.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 11
Data, Big Data and Challenges
Data Science
◦ Introduction
◦ Why Data Science
Data Scientists
◦ What do they do?
Major/Concentration in Data Science
◦ What courses to take.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 12
Data All Around
Lots of data is being collected and warehoused
◦Web data, e-commerce
◦Financial transactions, bank/credit transactions
◦Online trading and purchasing
◦Social Network
13
How Much Data Do We have?
Google processes 20 PB a day (2008)
Facebook has 60 TB of daily logs
eBay has 6.5 PB of user data + 50 TB/day (5/2009)
1000 genomes project: 200 TB
Cost of 1 TB of disk: $35
Time to read 1 TB disk: 3 hrs
(100 MB/s)
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 14
Big Data
Big Data is any data that is expensive to manage and hard to extract value
from
◦ Volume
◦ The size of the data
◦ Velocity
◦ The latency of data processing relative to the growing demand for interactivity
◦ Variety and Complexity
◦ the diversity of sources, formats, quality, structures.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 15
Big Data
vs
Data Science
vs
Data Analytics
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 16
What is Data Science?
Dealing with unstructured and structured data, Data Science is a
field that comprises everything that related to data cleansing,
preparation, and analysis.
Data Science is the combination of statistics, mathematics,
programming, problem-solving, capturing data in ingenious ways,
the ability to look at things differently, and the activity of cleansing,
preparing, and aligning the data.
In simple terms, it is the umbrella of techniques used when trying
to extract insights and information from data.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 17
What is Big Data?
Big Data refers to humongous volumes of data that cannot be processed effectively with
the traditional applications that exist. The processing of Big Data begins with the raw data
that isn’t aggregated and is most often impossible to store in the memory of a single
computer.
A buzzword that is used to describe immense volumes of data, both unstructured and
structured, Big Data inundates a business on a day-to-day basis. Big Data is something that
can be used to analyze insights that can lead to better decisions and strategic business
moves.
The definition of Big Data, given by Gartner, is, “Big data is high-volume, and high-velocity
or high-variety information assets that demand cost-effective, innovative forms of
information processing that enable enhanced insight, decision making, and process
automation.”
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 18
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 19
Big Data
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 20
What is Data Analytics?
Data Analytics the science of examining raw data to conclude that
information.
Data Analytics involves applying an algorithmic or mechanical process to
derive insights and, for example, running through several data sets to look for
meaningful correlations between each other.
It is used in several industries to allow organizations and companies to
make better decisions as well as verify and disprove existing theories or
models. The focus of Data Analytics lies in inference, which is the process of
deriving conclusions that are solely based on what the researcher already
knows.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 21
Types of Data We Have
Relational Data (Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
Graph Data
Social Network, Semantic Web (RDF), …
Streaming Data
You can afford to scan the data once
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 22
What To Do With These Data?
Aggregation and Statistics
◦ Data warehousing and OLAP
Indexing, Searching, and Querying
◦ Keyword based search
◦ Pattern matching (XML/RDF)
Knowledge discovery
◦ Data Mining
◦ Statistical Modeling
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 23
Big Data and Data Science
“… the sexy job in the next 10 years will be statisticians,” Hal Varian, Google Chief
Economist
The U.S. will need 140,000-190,000 predictive analysts and 1.5 million managers/analysts
by 2018.
McKinsey Global Institute’s June 2011
India will be needing around 160,000+ Data Scientists by 2020 and World demand
predicted to be around 2.7million by 2020.
New Data Science institutes being created or repurposed – NYU, Columbia, Washington,
UCB,...
New degree programs, courses, boot-camps:
◦ e.g., at Berkeley: Stats, I-School, CS, Astronomy…
◦ One proposal (elsewhere) for an MS in “Big Data Science”
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 24
What is Data Science?
An area that manages, manipulates, extracts, and interprets knowledge from
tremendous amount of data.
Data science (DS) is a multidisciplinary field of study with goal to address the challenges
in big data.
Data science principles apply to all data – big and small.
Simply – Extraction of knowledge from large volumes of data that are structure or
unstructured.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 25
What is Data Science?
Theories and techniques from many fields and disciplines are used to
investigate and analyze a large amount of data to help decision makers in
many industries such as science, engineering, economics, politics, finance,
and education.
◦ Computer Science
◦ Pattern recognition, visualization, data warehousing, High performance computing,
Databases, AI
◦ Mathematics
◦ Mathematical Modeling
◦ Statistics
◦ Statistical and Stochastic modeling, Probability.
Mr. Dhruv Saxena, Asst. Professor (TEQIP-NPIU) 26
Why is it sexy?
Gartner’s 2014 Hype Cycle
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 27
Data Science
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 28
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 29
Real Life Examples
Companies learn your secrets, shopping patterns, and preferences
◦ For example, can we know if a woman is pregnant, even if she doesn’t want us to know?
Target case study
Data Science and election (2008, 2012)
◦ 1 million people installed the Obama Facebook app that gave access to info on “friends”
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 30
Applications of Data Science
Internet Search
Search engines make use of data science algorithms to deliver the best results for search queries
in a fraction of seconds.
Digital Advertisements
The entire digital marketing spectrum uses the data science algorithms - from display banners to
digital billboards. This is the mean reason for digital ads getting higher CTR than traditional
advertisements.
Recommender Systems
The recommender systems not only make it easy to find relevant products from billions of
products available but also adds a lot to user-experience. A lot of companies use this system to
promote their products and suggestions in accordance with the user’s demands and relevance of
information. The recommendations are based on the user’s previous search results.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 31
Big Data for Retail
Brick and Mortar or an online e-tailer, the answer to staying the
game and being competitive is understanding the customer better
to serve them. This requires the ability to analyze all the disparate
data sources that companies deal with every day, including the
weblogs, customer transaction data, social media, store-branded
credit card data, and loyalty program data.
32
Applications of Big Data
Big Data for Financial Services
Credit card companies, retail banks, private wealth management
advisories, insurance firms, venture funds, and institutional investment
banks use big data for their financial services. The common problem
among them all is the massive amounts of multi-structured data living
in multiple disparate systems, which can be solved by big data. Thus big
data is used in several ways like:
Customer analytics
Compliance analytics
Fraud analytics
Operational analytics
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 33
Big Data in Communications
Gaining new subscribers, retaining customers, and
expanding within current subscriber bases are top
priorities for telecommunication service providers. The
solutions to these challenges lie in the ability to combine
and analyze the masses of customer-generated data and
machine-generated data that is being created every day.
34
Applications of Data Analytics
Healthcare
The main challenge for hospitals with cost pressures tightens is to treat as many patients
as they can efficiently, keeping in mind the improvement of the quality of care. Instrument
and machine data are being used increasingly to track as well as optimize patient flow,
treatment, and equipment used in the hospitals. It is estimated that there will be a 1%
efficiency gain that could yield more than $63 billion in global healthcare savings.
Travel
Data analytics can optimize the buying experience through mobile/ weblog and social
media data analysis. Travel sights can gain insights into the customer’s desires and
preferences. Products can be up-sold by correlating the current sales to the subsequent
browsing increase browse-to-buy conversions via customized packages and offers.
Personalized travel recommendations can also be delivered by data analytics based on
social media data.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 35
Gaming
Data Analytics helps in collecting data to optimize and spend within as well as
across games. Game companies gain insight into the dislikes, the
relationships, and the likes of the users.
Energy Management
Most firms are using data analytics for energy management, including smart-
grid management, energy optimization, energy distribution, and building
automation in utility companies. The application here is centered on the
controlling and monitoring of network devices, dispatch crews, and manage
service outages. Utilities are given the ability to integrate millions of data
points in the network performance and lets the engineers use the analytics to
monitor the network.
36
Data Scientists
Data Scientist
◦ The Sexiest Job of the 21st Century
“They find stories, extract
knowledge. They are not reporters “
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 37
Data Scientists
Data scientists are the key to realizing the opportunities presented by big data. They bring
structure to it, find compelling patterns in it, and advise executives on the implications for
products, processes, and decisions
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 38
What do Data Scientists do?
National Security
Cyber Security
Business Analytics
Engineering
Healthcare
And more ….
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 39
Concentration in Data Science
Mathematics and Applied Mathematics
Applied Statistics/Data Analysis
Solid Programming Skills (R, Python, Julia, SQL)
Data Mining
Data Base Storage and Management
Machine Learning and discovery
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 40
Machine Learning
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 41
What is Machine Learning ?
Machine learning (ML) is the study of computer algorithms
that improve automatically through experience.
It is seen as a subset of artificial intelligence.
Machine learning algorithms build a mathematical model
based on sample data, known as "training data", in order to
make predictions or decisions without being explicitly
programmed to do so.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 42
What is Machine Learning ?
Machine learning algorithms are used in a wide variety of
applications, such as email filtering and computer vision,
where it is difficult or infeasible to develop conventional
algorithms to perform the needed tasks.
Machine learning is closely related to computational
statistics, which focuses on making predictions using
computers.
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 43
Real-time applications
Video
44
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 45
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 46
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 47
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 48
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 49
NASSCOM Formative Assessments (Mid-training)
 Formative assessment of students shall be conducted for 100 marks and the test duration shall be
between 45-60 min.
Post training assessment and certification shall be conducted after the successful completion of
training.
Only those students who are Registered and Attending training on Future Skills shall be eligible for
mid-training and post-training assessment.
All assessments shall be conducted online and Auto Proctored through NASSCOM SSC.
The assessment results shall be shared within 3 working days with the SPOC of the institute.
Formative Assessment scores are independent and shall not be counted in the final assessment
scores for certification.
Tentative Date – 16th August 2020
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 50
NASSCOM Formative Assessment
Syllabus for Data Sci. & Analytics
Module
No. of
Questions
Type of
Questions
Indicative
Time/Module
Marks
Introduction to
Data Science
2
MCQ & DC 2 min 6
Mathematical
Foundations
18
MCQ, DC &
ScB
20 min 44
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 51
Multiple Choice
Questions
MCQ
In this type of question, the candidate is asked to choose one or more
responses from a limited list of choices. It also includes True/ False
questions(T/F) depending on the level of difficulty.
Scenario based ScB
This question asks the candidate to describe how they might respond
to a hypothetical situation.
Direct Concept DC
This type of question revolves around the concept that particular subject
deals with. The candidate would be asked a direct question pertaining
to the concept of that particular subject. This can be an MCQ or Fill in
the Blank or Multiple Response
Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 52
Next Lecture
Mathematical Foundations
Introduction & Syllabus
Linear Algebra – Vectors & Matrices
53Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU)
Mr. Dhruv Saxena
Asst. Professor (TEQIP-NPIU)54

More Related Content

What's hot

Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and AnalyticsSrinath Perera
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data scienceShilpaKrishna6
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Simplilearn
 
Data Visualization in Data Science
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data ScienceMaloy Manna, PMP®
 
Data Visualization & Analytics.pptx
Data Visualization & Analytics.pptxData Visualization & Analytics.pptx
Data Visualization & Analytics.pptxhiralpatel3085
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceMahir Haque
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Edureka!
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Venkata Reddy Konasani
 
Data Visualization & Data Storytelling
Data Visualization & Data StorytellingData Visualization & Data Storytelling
Data Visualization & Data Storytelling彭其捷 Jack
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptxSadhanaParameswaran
 
Introduction to data science club
Introduction to data science clubIntroduction to data science club
Introduction to data science clubData Science Club
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceSrishti44
 
Privacy, security and ethics in data science
Privacy, security and ethics in data sciencePrivacy, security and ethics in data science
Privacy, security and ethics in data scienceNikolaos Vasiloglou
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science ProcessVishal Patel
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data ScienceJason Geng
 

What's hot (20)

Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
 
Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...Data Science Training | Data Science For Beginners | Data Science With Python...
Data Science Training | Data Science For Beginners | Data Science With Python...
 
Data Visualization in Data Science
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data Science
 
Data Visualization & Analytics.pptx
Data Visualization & Analytics.pptxData Visualization & Analytics.pptx
Data Visualization & Analytics.pptx
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Data science
Data scienceData science
Data science
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Data Visualization & Data Storytelling
Data Visualization & Data StorytellingData Visualization & Data Storytelling
Data Visualization & Data Storytelling
 
Data science
Data scienceData science
Data science
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptx
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Introduction to data science club
Introduction to data science clubIntroduction to data science club
Introduction to data science club
 
data science
data sciencedata science
data science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Privacy, security and ethics in data science
Privacy, security and ethics in data sciencePrivacy, security and ethics in data science
Privacy, security and ethics in data science
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
 
Introduction of Data Science
Introduction of Data ScienceIntroduction of Data Science
Introduction of Data Science
 

Similar to Introduction to Data Science and Analytics

Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Joanne Luciano
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfvishal choudhary
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxssuser1a4f0f
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxwahiba ben abdessalem
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptxshalini s
 
Data Science Demystified_ Journeying Through Insights and Innovations
Data Science Demystified_ Journeying Through Insights and InnovationsData Science Demystified_ Journeying Through Insights and Innovations
Data Science Demystified_ Journeying Through Insights and InnovationsVaishali Pal
 
My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018Susanna-Assunta Sansone
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data scienceJordan Engbers
 
BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. maigva
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science LandscapePhilip Bourne
 
Introduction to Data Science: Unveiling Insights Hidden in Data
Introduction to Data Science: Unveiling Insights Hidden in DataIntroduction to Data Science: Unveiling Insights Hidden in Data
Introduction to Data Science: Unveiling Insights Hidden in Datahemayadav41
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First CourseArnab Majumdar
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleDr. Radhey Shyam
 
dissertation proposal writing service
dissertation proposal writing servicedissertation proposal writing service
dissertation proposal writing servicePhd Assistance
 
DSS_Understanding_the_paradigm_shift.pdf
DSS_Understanding_the_paradigm_shift.pdfDSS_Understanding_the_paradigm_shift.pdf
DSS_Understanding_the_paradigm_shift.pdfBizuayehuDesalegn
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfDr. Radhey Shyam
 

Similar to Introduction to Data Science and Analytics (20)

Information entanglement
Information entanglementInformation entanglement
Information entanglement
 
Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020Luciano uvi hackfest.28.10.2020
Luciano uvi hackfest.28.10.2020
 
Data_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdfData_Science_Applications_&_Use_Cases.pdf
Data_Science_Applications_&_Use_Cases.pdf
 
BIG-DATAPPTFINAL.ppt
BIG-DATAPPTFINAL.pptBIG-DATAPPTFINAL.ppt
BIG-DATAPPTFINAL.ppt
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Data_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptxData_Science_Applications_&_Use_Cases.pptx
Data_Science_Applications_&_Use_Cases.pptx
 
Real-time applications of Data Science.pptx
Real-time applications  of Data Science.pptxReal-time applications  of Data Science.pptx
Real-time applications of Data Science.pptx
 
Data Science Demystified_ Journeying Through Insights and Innovations
Data Science Demystified_ Journeying Through Insights and InnovationsData Science Demystified_ Journeying Through Insights and Innovations
Data Science Demystified_ Journeying Through Insights and Innovations
 
My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018My FAIR share of the work - Diamond Light Source - Dec 2018
My FAIR share of the work - Diamond Light Source - Dec 2018
 
Making an impact with data science
Making an impact  with data scienceMaking an impact  with data science
Making an impact with data science
 
BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm. BIMCV: The Perfect "Big Data" Storm.
BIMCV: The Perfect "Big Data" Storm.
 
The Analytics and Data Science Landscape
The Analytics and Data Science LandscapeThe Analytics and Data Science Landscape
The Analytics and Data Science Landscape
 
Introduction to Data Science: Unveiling Insights Hidden in Data
Introduction to Data Science: Unveiling Insights Hidden in DataIntroduction to Data Science: Unveiling Insights Hidden in Data
Introduction to Data Science: Unveiling Insights Hidden in Data
 
Data+Science : A First Course
Data+Science : A First CourseData+Science : A First Course
Data+Science : A First Course
 
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycle
 
dissertation proposal writing service
dissertation proposal writing servicedissertation proposal writing service
dissertation proposal writing service
 
Untitled document.pdf
Untitled document.pdfUntitled document.pdf
Untitled document.pdf
 
DSS_Understanding_the_paradigm_shift.pdf
DSS_Understanding_the_paradigm_shift.pdfDSS_Understanding_the_paradigm_shift.pdf
DSS_Understanding_the_paradigm_shift.pdf
 
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
 
BIG DATA.ppt
BIG DATA.pptBIG DATA.ppt
BIG DATA.ppt
 

More from Dhruv Saxena

Disaster Management Course Objectives
Disaster Management Course ObjectivesDisaster Management Course Objectives
Disaster Management Course ObjectivesDhruv Saxena
 
Disaster Management - Medical and Institutional arrangement
Disaster Management - Medical and Institutional arrangementDisaster Management - Medical and Institutional arrangement
Disaster Management - Medical and Institutional arrangementDhruv Saxena
 
Disaster Preparedness
Disaster PreparednessDisaster Preparedness
Disaster PreparednessDhruv Saxena
 
Disaster Management Introduction & Classification
Disaster Management Introduction & ClassificationDisaster Management Introduction & Classification
Disaster Management Introduction & ClassificationDhruv Saxena
 
Hazards in Textile processing Industries
Hazards in Textile processing IndustriesHazards in Textile processing Industries
Hazards in Textile processing IndustriesDhruv Saxena
 
Drought - Disaster management
Drought - Disaster managementDrought - Disaster management
Drought - Disaster managementDhruv Saxena
 
Cloudburst | Disaster Management
Cloudburst | Disaster ManagementCloudburst | Disaster Management
Cloudburst | Disaster ManagementDhruv Saxena
 
Small bore system: Wastewater Engineering
Small bore system: Wastewater EngineeringSmall bore system: Wastewater Engineering
Small bore system: Wastewater EngineeringDhruv Saxena
 

More from Dhruv Saxena (8)

Disaster Management Course Objectives
Disaster Management Course ObjectivesDisaster Management Course Objectives
Disaster Management Course Objectives
 
Disaster Management - Medical and Institutional arrangement
Disaster Management - Medical and Institutional arrangementDisaster Management - Medical and Institutional arrangement
Disaster Management - Medical and Institutional arrangement
 
Disaster Preparedness
Disaster PreparednessDisaster Preparedness
Disaster Preparedness
 
Disaster Management Introduction & Classification
Disaster Management Introduction & ClassificationDisaster Management Introduction & Classification
Disaster Management Introduction & Classification
 
Hazards in Textile processing Industries
Hazards in Textile processing IndustriesHazards in Textile processing Industries
Hazards in Textile processing Industries
 
Drought - Disaster management
Drought - Disaster managementDrought - Disaster management
Drought - Disaster management
 
Cloudburst | Disaster Management
Cloudburst | Disaster ManagementCloudburst | Disaster Management
Cloudburst | Disaster Management
 
Small bore system: Wastewater Engineering
Small bore system: Wastewater EngineeringSmall bore system: Wastewater Engineering
Small bore system: Wastewater Engineering
 

Recently uploaded

MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 

Recently uploaded (20)

MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 

Introduction to Data Science and Analytics

  • 1. NASSCOM Future Skills Training Course – Data Science & Analytics Dhruv Saxena Assistant Professor (TEQIP-NPIU) 1
  • 2. 2
  • 3. 3
  • 4. 4
  • 5. 5
  • 6. 6
  • 7. 7
  • 8. Introduction to Data Science Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 8
  • 9.
  • 10. OBJECTIVES The objective of this course is to Impart necessary knowledge of the mathematical foundations needed for data science and develop programming skills required to build data science applications. Duration – 60 Hours (40L + 20C) Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 10
  • 11. LEARNING OUTCOMES At the end of this course, the students will be able to: ● Demonstrate understanding of the mathematical foundations needed for data science. ● Collect, explore, clean, munge and manipulate data. ● Implement models such as k-nearest Neighbors, Naïve Bayes, linear and logistic regression, decision trees, neural networks and clustering. ● Build data science applications using Python based toolkits. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 11
  • 12. Data, Big Data and Challenges Data Science ◦ Introduction ◦ Why Data Science Data Scientists ◦ What do they do? Major/Concentration in Data Science ◦ What courses to take. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 12
  • 13. Data All Around Lots of data is being collected and warehoused ◦Web data, e-commerce ◦Financial transactions, bank/credit transactions ◦Online trading and purchasing ◦Social Network 13
  • 14. How Much Data Do We have? Google processes 20 PB a day (2008) Facebook has 60 TB of daily logs eBay has 6.5 PB of user data + 50 TB/day (5/2009) 1000 genomes project: 200 TB Cost of 1 TB of disk: $35 Time to read 1 TB disk: 3 hrs (100 MB/s) Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 14
  • 15. Big Data Big Data is any data that is expensive to manage and hard to extract value from ◦ Volume ◦ The size of the data ◦ Velocity ◦ The latency of data processing relative to the growing demand for interactivity ◦ Variety and Complexity ◦ the diversity of sources, formats, quality, structures. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 15
  • 16. Big Data vs Data Science vs Data Analytics Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 16
  • 17. What is Data Science? Dealing with unstructured and structured data, Data Science is a field that comprises everything that related to data cleansing, preparation, and analysis. Data Science is the combination of statistics, mathematics, programming, problem-solving, capturing data in ingenious ways, the ability to look at things differently, and the activity of cleansing, preparing, and aligning the data. In simple terms, it is the umbrella of techniques used when trying to extract insights and information from data. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 17
  • 18. What is Big Data? Big Data refers to humongous volumes of data that cannot be processed effectively with the traditional applications that exist. The processing of Big Data begins with the raw data that isn’t aggregated and is most often impossible to store in the memory of a single computer. A buzzword that is used to describe immense volumes of data, both unstructured and structured, Big Data inundates a business on a day-to-day basis. Big Data is something that can be used to analyze insights that can lead to better decisions and strategic business moves. The definition of Big Data, given by Gartner, is, “Big data is high-volume, and high-velocity or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.” Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 18
  • 19. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 19
  • 20. Big Data Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 20
  • 21. What is Data Analytics? Data Analytics the science of examining raw data to conclude that information. Data Analytics involves applying an algorithmic or mechanical process to derive insights and, for example, running through several data sets to look for meaningful correlations between each other. It is used in several industries to allow organizations and companies to make better decisions as well as verify and disprove existing theories or models. The focus of Data Analytics lies in inference, which is the process of deriving conclusions that are solely based on what the researcher already knows. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 21
  • 22. Types of Data We Have Relational Data (Tables/Transaction/Legacy Data) Text Data (Web) Semi-structured Data (XML) Graph Data Social Network, Semantic Web (RDF), … Streaming Data You can afford to scan the data once Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 22
  • 23. What To Do With These Data? Aggregation and Statistics ◦ Data warehousing and OLAP Indexing, Searching, and Querying ◦ Keyword based search ◦ Pattern matching (XML/RDF) Knowledge discovery ◦ Data Mining ◦ Statistical Modeling Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 23
  • 24. Big Data and Data Science “… the sexy job in the next 10 years will be statisticians,” Hal Varian, Google Chief Economist The U.S. will need 140,000-190,000 predictive analysts and 1.5 million managers/analysts by 2018. McKinsey Global Institute’s June 2011 India will be needing around 160,000+ Data Scientists by 2020 and World demand predicted to be around 2.7million by 2020. New Data Science institutes being created or repurposed – NYU, Columbia, Washington, UCB,... New degree programs, courses, boot-camps: ◦ e.g., at Berkeley: Stats, I-School, CS, Astronomy… ◦ One proposal (elsewhere) for an MS in “Big Data Science” Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 24
  • 25. What is Data Science? An area that manages, manipulates, extracts, and interprets knowledge from tremendous amount of data. Data science (DS) is a multidisciplinary field of study with goal to address the challenges in big data. Data science principles apply to all data – big and small. Simply – Extraction of knowledge from large volumes of data that are structure or unstructured. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 25
  • 26. What is Data Science? Theories and techniques from many fields and disciplines are used to investigate and analyze a large amount of data to help decision makers in many industries such as science, engineering, economics, politics, finance, and education. ◦ Computer Science ◦ Pattern recognition, visualization, data warehousing, High performance computing, Databases, AI ◦ Mathematics ◦ Mathematical Modeling ◦ Statistics ◦ Statistical and Stochastic modeling, Probability. Mr. Dhruv Saxena, Asst. Professor (TEQIP-NPIU) 26
  • 27. Why is it sexy? Gartner’s 2014 Hype Cycle Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 27
  • 28. Data Science Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 28
  • 29. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 29
  • 30. Real Life Examples Companies learn your secrets, shopping patterns, and preferences ◦ For example, can we know if a woman is pregnant, even if she doesn’t want us to know? Target case study Data Science and election (2008, 2012) ◦ 1 million people installed the Obama Facebook app that gave access to info on “friends” Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 30
  • 31. Applications of Data Science Internet Search Search engines make use of data science algorithms to deliver the best results for search queries in a fraction of seconds. Digital Advertisements The entire digital marketing spectrum uses the data science algorithms - from display banners to digital billboards. This is the mean reason for digital ads getting higher CTR than traditional advertisements. Recommender Systems The recommender systems not only make it easy to find relevant products from billions of products available but also adds a lot to user-experience. A lot of companies use this system to promote their products and suggestions in accordance with the user’s demands and relevance of information. The recommendations are based on the user’s previous search results. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 31
  • 32. Big Data for Retail Brick and Mortar or an online e-tailer, the answer to staying the game and being competitive is understanding the customer better to serve them. This requires the ability to analyze all the disparate data sources that companies deal with every day, including the weblogs, customer transaction data, social media, store-branded credit card data, and loyalty program data. 32
  • 33. Applications of Big Data Big Data for Financial Services Credit card companies, retail banks, private wealth management advisories, insurance firms, venture funds, and institutional investment banks use big data for their financial services. The common problem among them all is the massive amounts of multi-structured data living in multiple disparate systems, which can be solved by big data. Thus big data is used in several ways like: Customer analytics Compliance analytics Fraud analytics Operational analytics Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 33
  • 34. Big Data in Communications Gaining new subscribers, retaining customers, and expanding within current subscriber bases are top priorities for telecommunication service providers. The solutions to these challenges lie in the ability to combine and analyze the masses of customer-generated data and machine-generated data that is being created every day. 34
  • 35. Applications of Data Analytics Healthcare The main challenge for hospitals with cost pressures tightens is to treat as many patients as they can efficiently, keeping in mind the improvement of the quality of care. Instrument and machine data are being used increasingly to track as well as optimize patient flow, treatment, and equipment used in the hospitals. It is estimated that there will be a 1% efficiency gain that could yield more than $63 billion in global healthcare savings. Travel Data analytics can optimize the buying experience through mobile/ weblog and social media data analysis. Travel sights can gain insights into the customer’s desires and preferences. Products can be up-sold by correlating the current sales to the subsequent browsing increase browse-to-buy conversions via customized packages and offers. Personalized travel recommendations can also be delivered by data analytics based on social media data. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 35
  • 36. Gaming Data Analytics helps in collecting data to optimize and spend within as well as across games. Game companies gain insight into the dislikes, the relationships, and the likes of the users. Energy Management Most firms are using data analytics for energy management, including smart- grid management, energy optimization, energy distribution, and building automation in utility companies. The application here is centered on the controlling and monitoring of network devices, dispatch crews, and manage service outages. Utilities are given the ability to integrate millions of data points in the network performance and lets the engineers use the analytics to monitor the network. 36
  • 37. Data Scientists Data Scientist ◦ The Sexiest Job of the 21st Century “They find stories, extract knowledge. They are not reporters “ Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 37
  • 38. Data Scientists Data scientists are the key to realizing the opportunities presented by big data. They bring structure to it, find compelling patterns in it, and advise executives on the implications for products, processes, and decisions Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 38
  • 39. What do Data Scientists do? National Security Cyber Security Business Analytics Engineering Healthcare And more …. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 39
  • 40. Concentration in Data Science Mathematics and Applied Mathematics Applied Statistics/Data Analysis Solid Programming Skills (R, Python, Julia, SQL) Data Mining Data Base Storage and Management Machine Learning and discovery Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 40
  • 41. Machine Learning Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 41
  • 42. What is Machine Learning ? Machine learning (ML) is the study of computer algorithms that improve automatically through experience. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as "training data", in order to make predictions or decisions without being explicitly programmed to do so. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 42
  • 43. What is Machine Learning ? Machine learning algorithms are used in a wide variety of applications, such as email filtering and computer vision, where it is difficult or infeasible to develop conventional algorithms to perform the needed tasks. Machine learning is closely related to computational statistics, which focuses on making predictions using computers. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 43
  • 45. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 45
  • 46. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 46
  • 47. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 47
  • 48. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 48
  • 49. Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 49
  • 50. NASSCOM Formative Assessments (Mid-training)  Formative assessment of students shall be conducted for 100 marks and the test duration shall be between 45-60 min. Post training assessment and certification shall be conducted after the successful completion of training. Only those students who are Registered and Attending training on Future Skills shall be eligible for mid-training and post-training assessment. All assessments shall be conducted online and Auto Proctored through NASSCOM SSC. The assessment results shall be shared within 3 working days with the SPOC of the institute. Formative Assessment scores are independent and shall not be counted in the final assessment scores for certification. Tentative Date – 16th August 2020 Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 50
  • 51. NASSCOM Formative Assessment Syllabus for Data Sci. & Analytics Module No. of Questions Type of Questions Indicative Time/Module Marks Introduction to Data Science 2 MCQ & DC 2 min 6 Mathematical Foundations 18 MCQ, DC & ScB 20 min 44 Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 51
  • 52. Multiple Choice Questions MCQ In this type of question, the candidate is asked to choose one or more responses from a limited list of choices. It also includes True/ False questions(T/F) depending on the level of difficulty. Scenario based ScB This question asks the candidate to describe how they might respond to a hypothetical situation. Direct Concept DC This type of question revolves around the concept that particular subject deals with. The candidate would be asked a direct question pertaining to the concept of that particular subject. This can be an MCQ or Fill in the Blank or Multiple Response Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU) 52
  • 53. Next Lecture Mathematical Foundations Introduction & Syllabus Linear Algebra – Vectors & Matrices 53Mr. Dhruv Saxena, Assistant Professor (TEQIP-NPIU)
  • 54. Mr. Dhruv Saxena Asst. Professor (TEQIP-NPIU)54