Differences in Goal: let the machine learn vs. give a fact to human so human can make a decision. Difference is methodology: Reduction of data for Statistics: reduction in two directions, number of data, which is sampling, and number of features, which is to simplify.
Implementing Artificial Intelligence with Big Data
with Big Data
SoCal Data Science Conference 2017
Los Angeles, CA
AI vs. Machine Learning
vs. Deep Learning
Artificial Intelligence - Machine thinks, talks,
and behaves as human.
Machine Learning - Computer makes decision
without being explicitly programmed.
Deep Learning - A network of multi-layer non-
linear processing unit capable of adapting
itself to new data.
“AI problem is a Data Problem. The
more data, the merrier.”
- Raymond Fu
Machine Learning vs. Statistics
Goal: “learning” from data of all sorts
No assumptions about data distributions
Generalization is through training,
validation and test datasets
Tolerant of redundant features.
Does not promote data reduction prior to
Goal: Analyzing and summarizing data
Tight assumptions about data
Generalization is pursued using statistical
tests on the training dataset.
Preferable to use less input features
Promotes data reduction as much as
possible before modeling
Large Scale Data Processing
Labeled data is a group of samples with one specific meaning or tag.
● Label an image with objects in it.
● Label an X-ray photo with whether or not the patient has
● Join datasets that may correlate with each other.
Big Data Engineering
1. Data Cleansing: Create both better features and better
2. Self Service Analytics: Give data analyst tools to easily
prepare their data
3. Data Storage: Build performance and cost efficient data
4. Streaming: Fast data feed + AI = Fast decision making.