2. Machine learning is a subset of artificial
intelligence whose goalis to give computers
the ability to teach themselves, whereas
artificial intelligence is a general concept
of smart machines.
In other words, artificial intelligence is
implemented through machine learning or
- to be more precise
- through machine learning algorithms.
RUBYGARAGE2017
TECHNOLOGYMATTERS
artificial
intelligence
TEACH YOUR COMPUTER
3. EXAMPLES OF
HOW MACHINE LEARNING
IS USED IN THE REAL
WORLD
RUBYGARAGE2017
TECHNOLOGYMATTERS
- Facial recognition
- Voice recognition
- Text recognition
- Diagnostics into medicine,
- Self-driving cars
- Robots behavior adjustment
- Ads targeting,
- Predictions in financial trading
- Virtual and augmented reality
- Astronomy and space ???
4. The 21st century is the age of data.
It’s literally everywhere. In fact,
there has been an exponential
growth in the volume of data over
the past decade; the total amount
of data doubles every two years.
Most of it, however, isn’t used.
Huge volumes of data can be tagged,
structured, and analyzed,
revealing a lot of valuable information.
Only machine learning algorithms
can easily cope with this task.
RUBYGARAGE2017
TECHNOLOGYMATTERS
WHY THE FUTURE BELONGS TO MACHINE LEARNING
5. RUBYGARAGE2017
TECHNOLOGYMATTERS
HOW MACHINE LEARNING WORKS
Preprocessing Learinng Evaluation Prediction
Labels
Raw
Data
Labels
Labels
Final Model New DataTraining Dataset
Test Dataset
Learning
Algorithm
(putting data into
the necessary shape)
(creating a model with
the help of training data)
(model assessment
using test data)
application of the model)
6. TOOLS
RUBYGARAGE2017
TECHNOLOGYMATTERS
- Python
- Pandas - Powerful data analysis library for Python
Pandas is a powerful data analysis Python library that provides flexible and fast data structures
for processing “relational” or “labeled” data. This is a fundamental data analysis toolkit
in Python.
- Scikit-learn - Machine Learning in Python
These are simple and effective open-source tools for data mining and analysis.
- Statsmodels
This is a Python module providing functions and classes to estimate different statistical models
as well as to conduct tests and explore statistical data. The Statsmodels module offers a
comprehensive list of result statistics.
- Matplotlib
Matplotlib is a Python 2D plotting library that releases publication quality figures in multiple
formats and interactive environments in different platforms.
7. The quality of the data and the amount of useful information that it
contains are key factors that determine how well a machine learning
algorithm can learn. Therefore, it is absolutely critical that we make
sure to examine and preprocess a dataset before we feed it to a learning algorithm.
- Removing and imputing missing values from the dataset
- Getting categorical data into shape for machine learning algorithms
- Selecting relevant features for the model construction
RUBYGARAGE2017
TECHNOLOGYMATTERS
DATA PREPROCESSING
11. RUBYGARAGE2017
TECHNOLOGYMATTERS
DEALING WITH MISSING DATA
Most computational tools are unable to handle such missing values
or would produce unpredictable results if we simply ignored them.
Therefore, it is crucial that we take care of those missing values
before we proceed with further analyses.
12. - Eliminating samples or features with missing values
The easiest solution to this problem is simply to remove samples with missing
values from a dataset.
However, this seemingly handy approach has a number of drawbacks.
For example, removing too many of such samples is likely to compromise
the quality of the analysis.
- Imputing missing values
The solution is to use various interpolation techniques that help to “guess”
the missing values from other samples in a dataset.
RUBYGARAGE2017
TECHNOLOGYMATTERS
DEALING WITH MISSING DATA
28. REGRESSION
RUBYGARAGE2017
TECHNOLOGYMATTERS
Regression models (both linear and non-linear) are used for predicting a real value,
like salary for example. If your independent variable is time,
then you are forecasting future values, otherwise your model is predicting
present but unknown values.
37. BACKWARD ELIMINATION
RUBYGARAGE2017
TECHNOLOGYMATTERS
STEP 1: Select a significance level to stay in the model (e.g. SL = 0.05)
STEP 2: Fit the full model with all possible predictors
STEP 3: Consider the predictor with the highest P-value. If P > SL, go to STEP 4, otherwise go to FIN
STEP 4: Remove the predictor
STEP 5: Fit model without this variable*
59. ENSEMBLE LEARNING. RANDOM FOREST REGRESSION.
RUBYGARAGE2017
TECHNOLOGYMATTERS
STEP 1: Pick at random K data points from the Training set.
STEP 2: Build the Decision Tree associated to these K data points.
STEP 3: Choose the number Ntree of trees you want to build and repeat STEPS 1 & 2
STEP 4: For a new data point, make each one of your Ntree trees predict the value of Y
to for the data point in question, and assign the new data point the average across
all of the predicted Y values.
76. K-NEAREST NEIGHBORS
RUBYGARAGE2017
TECHNOLOGYMATTERS
STEP 1: Choose the number K of neighbors
STEP 2: Take the K nearest neighbors of the new data point, according to the Euclidean distance
STEP 3: Among these K neighbors, count the number of data points in each category
STEP 4: Assign the new data point to the category where you counted the most neighbors
Your Model is Ready
115. RUBYGARAGE2017
TECHNOLOGYMATTERS
RANDOM FOREST CLASSIFICATION
STEP 1: Pick at random K data points from the Training set.
STEP 2: Build the Decision Tree associated to these K data points.
STEP 3: Choose the number Ntree of trees you want to build and repeat STEPS 1 & 2
STEP 4: For a new data point, make each one of your Ntree trees predict the category to
which the data point belongs, and assign the new data point to the category that wins
the majority vote.
122. CLUSTERING
RUBYGARAGE2017
TECHNOLOGYMATTERS
Clustering is similar to classification, but the basis is different.
In Clustering you don’t know what you are looking for,
and you are trying to identify some segments or clusters in your data.
When you use clustering algorithms on your dataset,
unexpected things can suddenly pop up like structures,
clusters and groupings you would have never thought of otherwise.
124. K-MEANS CLUSTERING
RUBYGARAGE2017
TECHNOLOGYMATTERS
STEP 1: Choose the number K of clusters
STEP 2: Select at random K points, the centroids (not necessarily from your dataset)
STEP 3: Assign each data point to the closest centroid -> That forms K clusters
STEP 4: Compute and place the new centroid of each cluster
STEP 5: Reassign each data point to the new closest centroid.
If any reassignment took place, go to STEP 4, otherwise go to FIN.
Your Model is Ready
145. RUBYGARAGE2017
TECHNOLOGYMATTERS
HIERARCHICAL CLUSTERING AGGLOMERATIVE
STEP 1: Make each data point a single-point cluster That forms N clusters
STEP 2: Take the two closest data points and make them one cluster That forms N-1 clusters
STEP 3: Take the two closest clusters and make them one cluster That forms N-2 clusters
STEP 4: Repeat STEP 3 until there is only one cluster
FIN
166. REINFORCEMENT LEARNING
Reinforcement Learning is a branch of Machine Learning,
also called Online Learning. It is used to solve interacting
problems where the data observed up to time t is considered
to decide which action to take at time t + 1.
It is also used for Artificial Intelligence when training machines to perform
tasks such as walking. Desired outcomes provide the AI with reward,
undesired with punishment. Machines learn through trial and error.
RUBYGARAGE2017
TECHNOLOGYMATTERS
167. THE MULTI-ARMED BANDIT PROBLEM
Hot to bet to maximize your return
RUBYGARAGE2017
TECHNOLOGYMATTERS
195. NATURAL LANGUAGE PROCESSING
RUBYGARAGE2017
TECHNOLOGYMATTERS
A very well-known model in NLP is the Bag of Words model.
It is a model used to preprocess the texts to classify before
fitting the classification algorithms on the observations
containing the texts.