SlideShare a Scribd company logo
1 of 55
Download to read offline
Machine Learning
Math Essentials

Jeff Howbert

Introduction to Machine Learning

Winter 2012

1
Areas of math essential to machine learning
Machine learning is part of both statistics and computer
science
– Probability
– Statistical inference
– Validation
– Estimates of error, confidence intervals
Linear algebra
Li
l b
– Hugely useful for compact representation of linear
transformations on data
– Dimensionality reduction techniques
Optimization theory
p
y
Jeff Howbert

Introduction to Machine Learning

Winter 2012

2
Why worry about the math?
There are lots of easy-to-use machine learning
packages out there.
After this course, you will know how to apply
several of the most general-purpose algorithms.
g
p p
g
HOWEVER
To get really useful results, you need good
mathematical intuitions about ce ta ge e a
at e at ca tu t o s
certain general
machine learning principles, as well as the inner
workings of the individual algorithms.
Jeff Howbert

Introduction to Machine Learning

Winter 2012

3
Why worry about the math?
These intuitions will allow you to:
– Choose the right algorithm(s) for the problem
– Make good choices on parameter settings,
validation strategies
g
– Recognize over- or underfitting
– Troubleshoot poor / ambiguous results
– Put appropriate bounds of confidence /
uncertainty on results
– Do a better job of coding algorithms or
incorporating them into more complex
p
g
p
analysis pipelines
Jeff Howbert

Introduction to Machine Learning

Winter 2012

4
Notation
a∈A
|B|
|| v ||
∑
∫
ℜ
ℜn

set membership: a is member of set A
cardinality: number of items in set B
norm: length of vector v
summation
integral
the t f
th set of real numbers
l
b
real number space of dimension n
n = 2 : plane or 2-space
n = 3 : 3- (dimensional) space
p
yp p
n > 3 : n-space or hyperspace

Jeff Howbert

Introduction to Machine Learning

Winter 2012

5
Notation
x, y, z, vector (bold, lower case)
u, v
A, B, X matrix (bold, upper case)
y = f( x ) function (map): assigns unique value in
range of y to each value in domain of x
dy / dx derivative of y with respect to single
y
p
g
variable x
y = f( x ) function on multiple variables, i.e. a
(
p
vector of variables; function in n-space
∂y / ∂xi partial derivative of y with respect to
element i of vector x
Jeff Howbert

Introduction to Machine Learning

Winter 2012

6
The concept of probability

Intuition:
In some process, several outcomes are possible.
When the process is repeated a large number of
times, each outcome occurs with a characteristic
relative frequency or probability If a particular
frequency, probability.
outcome happens more often than another
outcome,
outcome we say it is more probable
probable.

Jeff Howbert

Introduction to Machine Learning

Winter 2012

7
The concept of probability
Arises in two contexts:
In actual repeated experiments.
– Example: You record the color of 1000 cars driving
by. 57 of them are green. You estimate the
probability of a car being green as 57 / 1000 = 0 0057
0.0057.
In idealized conceptions of a repeated process.
– Example: You consider the behavior of an unbiased
six-sided die. The expected probability of rolling a 5 is
1 / 6 = 0.1667.
– Example: You need a model for how people’s heights
are distributed. You choose a normal distribution
(
(bell-shaped curve) to represent the expected relative
p
)
p
p
probabilities.
Jeff Howbert

Introduction to Machine Learning

Winter 2012

8
Probability spaces
A probability space is a random process or experiment with
three components:
– Ω, the set of possible outcomes O
number of possible outcomes = | Ω | = N

– F th set of possible events E
F, the t f
ibl
t
an event comprises 0 to N outcomes
number of possible events = | F | = 2N

– P, the probability distribution
function mapping each outcome and event to real number
between 0 and 1 (th probability of O or E)
b t
d (the
b bilit f
probability of an event is sum of probabilities of possible
outcomes in event

Jeff Howbert

Introduction to Machine Learning

Winter 2012

9
Axioms of probability
1.

Non-negativity:
for any event E ∈ F p( E ) ≥ 0
F,

2.
2

All possible outcomes:
p( Ω ) = 1

3.

Additivity of disjoint events:
for all events E, E’ ∈ F where E ∩ E’ = ∅,
p( E U E’ ) = p( E ) + p( E’ )

Jeff Howbert

Introduction to Machine Learning

Winter 2012

10
Types of probability spaces
Define | Ω | = number of possible outcomes
Discrete space
| Ω | is finite
– Analysis involves summations ( ∑ )
Continuous space | Ω | is infinite
C ti
i i fi it
– Analysis involves integrals ( ∫ )

Jeff Howbert

Introduction to Machine Learning

Winter 2012

11
Example of discrete probability space
Single roll of a six-sided die
– 6 possible outcomes: O = 1, 2, 3, 4, 5, or 6
p
, , , , ,
– 26 = 64 possible events
example: E = ( O ∈ { 1, 3, 5 } ), i.e. outcome is odd

– If die is fair, then probabilities of outcomes are equal
p( 1 ) = p( 2 ) = p( 3 ) =
p( 4 ) = p( 5 ) = p( 6 ) = 1 / 6
example: probability of event E = ( outcome is odd ) is
p( 1 ) + p( 3 ) + p( 5 ) = 1 / 2

Jeff Howbert

Introduction to Machine Learning

Winter 2012

12
Example of discrete probability space
Three consecutive flips of a coin
– 8 possible outcomes: O = HHH, HHT, HTH, HTT,
p
,
,
,
,
THH, THT, TTH, TTT
– 28 = 256 possible events
example: E = ( O ∈ { HHT, HTH, THH } ), i.e. exactly two flips
are heads
example: E = ( O ∈ { THT, TTT } ), i.e. the first and third flips
are tails

– If coin is fair, then probabilities of outcomes are equal
p( HHH ) = p( HHT ) = p( HTH ) = p( HTT ) =
p( THH ) = p( THT ) = p( TTH ) = p( TTT ) = 1 / 8
example: probability of event E = ( exactly two heads ) is
p( HHT ) + p( HTH ) + p( THH ) = 3 / 8
(
(
(
Jeff Howbert

Introduction to Machine Learning

Winter 2012

13
Example of continuous probability space
Height of a randomly chosen American male
– Infinite number of possible outcomes: O has some
single value in range 2 feet to 8 feet
– Infinite number of possible events
example: E = ( O | O < 5.5 feet ), i.e. individual chosen is less
than 5.5 feet tall

– Probabilities of outcomes are not equal and are
equal,
described by a continuous function, p( O )

Jeff Howbert

Introduction to Machine Learning

Winter 2012

14
Example of continuous probability space
Height of a randomly chosen American male
– Probabilities of outcomes O are not equal and are
equal,
described by a continuous function, p( O )
– p( O ) is a relative, not an absolute probability
p( O ) for any particular O is zero
∫ p( O ) from O = -∞ to ∞ (i.e. area under curve) is 1
example: p( O = 5 8 ) > p( O = 6 2 )
5’8”
6’2”
example: p( O < 5’6” ) = ( ∫ p( O ) from O = -∞ to 5’6” ) ≈ 0.25

Jeff Howbert

Introduction to Machine Learning

Winter 2012

15
Probability distributions
Discrete:

probability mass function (pmf)

example:
sum of two
fair dice

probability density function (pdf)

example:
waiting time between
eruptions of Old Faithful
(minutes)
Jeff Howbert

probability

Continuous:

Introduction to Machine Learning

Winter 2012

16
Random variables
A random variable X is a function that associates a number x with
each outcome O of a process
– C
Common notation: X( O ) = x, or j t X = x
t ti
just
Basically a way to redefine (usually simplify) a probability space to a
new probability space
– X must obey axioms of probability (over the possible values of x)
– X can be discrete or continuous
Example: X = number of heads in three flips of a coin
– Possible values of X are 0, 1, 2, 3
– p( X = 0 ) = p( X = 3 ) = 1 / 8

p( X = 1 ) = p( X = 2 ) = 3 / 8

– Size of space (number of “outcomes”) reduced from 8 to 4
Example: X = average height of five randomly chosen American men
– Size of space unchanged (X can range from 2 feet to 8 feet) but
feet),
pdf of X different than for single man
Jeff Howbert

Introduction to Machine Learning

Winter 2012

17
Multivariate probability distributions
Scenario
– Several random processes occur (
p
(doesn’t matter
whether in parallel or in sequence)
– Want to know probabilities for each possible
combination of outcomes
bi ti
f t
Can describe as joint probability of several random
variables
– Example: two processes whose outcomes are
represented by random variables X and Y. Probability
that process X has outcome x and process Y has
outcome y is denoted as:
p( X = x, Y = y )
(
Jeff Howbert

Introduction to Machine Learning

Winter 2012

18
Example of multivariate distribution
joint probability: p( X = minivan, Y = European ) = 0.1481

Jeff Howbert

Introduction to Machine Learning

Winter 2012

19
Multivariate probability distributions
Marginal probability
– Probability distribution of a single variable in a
joint distribution
– Example: two random variables X and Y:
p( X = x ) = ∑b=all values of Y p( X = x, Y = b )
Conditional probability
– Probability distribution of one variable given
that another variable takes a certain value
– Example: two random variables X and Y:
p( X = x | Y = y ) = p( X = x Y = y ) / p( Y = y )
x,
Jeff Howbert

Introduction to Machine Learning

Winter 2012

20
Example of marginal probability
marginal probability: p( X = minivan ) = 0.0741 + 0.1111 + 0.1481 = 0.3333

Jeff Howbert

Introduction to Machine Learning

Winter 2012

21
Example of conditional probability
conditional probability: p( Y = European | X = minivan ) =
0.1481 / ( 0.0741 + 0.1111 + 0.1481 ) = 0.4433

p
probability

0.2
0.15
0 15
0.1
0.05
0
American

sport
Asian

Y = manufacturer

Jeff Howbert

SUV
minivan

European
sedan

Introduction to Machine Learning

X = model type

Winter 2012

22
Continuous multivariate distribution
Same concepts of joint, marginal, and conditional
probabilities apply (except use integrals)
Example: three-component Gaussian mixture in two
dimensions

Jeff Howbert

Introduction to Machine Learning

Winter 2012

23
Expected value
Given:
A discrete random variable X with possible
X,
values x = x1, x2, … xn
Probabilities p( X = xi ) that X takes on the
various values of xi
A function yi = f( xi ) defined on X
The expected value of f is the probability-weighted
“average” value of f( xi ):
E( f ) = ∑i p( xi ) ⋅ f( xi )
Jeff Howbert

Introduction to Machine Learning

Winter 2012

24
Example of expected value
Process: game where one card is drawn from the deck
– If face card, dealer pays y $10
,
p y you $
– If not a face card, you pay dealer $4
Random variable X = { face card, not face card }
– p( face card ) = 3/13
– p( not face card ) = 10/13
Function f( X ) is payout to you
– f( face card ) = 10
– f( not face card ) = -4
tf
d
4
Expected value of payout is:
E( f ) = ∑i p( xi ) ⋅ f( xi ) = 3/13 ⋅ 10 + 10/13 ⋅ -4 = -0.77
4 0 77
Jeff Howbert

Introduction to Machine Learning

Winter 2012

25
Expected value in continuous spaces
E( f ) = ∫x = a → b p( x ) ⋅ f( x )

Jeff Howbert

Introduction to Machine Learning

Winter 2012

26
Common forms of expected value (1)
Mean (μ)
f( xi ) = xi ⇒
μ = E( f ) = ∑i p( xi ) ⋅ xi
– Average value of X = xi, taking into account probability
of the various xi
– M t common measure of “center” of a distribution
Most
f“
t ” f di t ib ti
Compare to formula for mean of an actual sample
1 n
μ = ∑ xi
N i =1

Jeff Howbert

Introduction to Machine Learning

Winter 2012

27
Common forms of expected value (2)
Variance (σ2)
f( xi ) = ( xi - μ ) ⇒
σ2 = ∑i p( xi ) ⋅ ( xi - μ )2
– Average value of squared deviation of X = xi from
mean μ, taking into account probability of the various xi
– M t common measure of “spread” of a distribution
Most
f“
d” f di t ib ti
– σ is the standard deviation
Compare to formula for variance of an actual sample
1 n
2
σ =
( xi − μ ) 2
∑
N − 1 i =1

Jeff Howbert

Introduction to Machine Learning

Winter 2012

28
Common forms of expected value (3)
Covariance
f( xi ) = ( xi - μx ), g( yi ) = ( yi - μy ) ⇒
cov( x y ) = ∑i p( xi , yi ) ⋅ ( xi - μx ) ⋅ ( yi - μy )
x,

high (pos
sitive)
covaria
ance

no co
ovariance

– Measures tendency for x and y to deviate from their means in
same (or opposite) directions at same time

Compare to formula for covariance of actual samples

1 n
cov( x, y ) =
∑ ( xi − μ x )( yi − μ y )
N − 1 i =1
Jeff Howbert

Introduction to Machine Learning

Winter 2012

29
Correlation
Pearson’s correlation coefficient is covariance normalized
by the standard deviations of the two variables
cov( x, y )
corr( x, y ) =

σ xσ y

– Always lies in range -1 to 1
– Only reflects linear dependence between variables
Linear dependence
with noise
Linear dependence
without noise
Various nonlinear
dependencies
Jeff Howbert

Introduction to Machine Learning

Winter 2012

30
Complement rule
Given: event A, which can occur or not

p( not A ) = 1 - p( A )

Ω
A

not A

areas represent relative probabilities
Jeff Howbert

Introduction to Machine Learning

Winter 2012

31
Product rule
Given: events A and B, which can co-occur (or not)

p( A B ) = p( A | B ) ⋅ p( B )
A,
(same expression given previously to define conditional probability)

(not A, not B)

A

(A B)
A,

B

Ω
(A, not B)

(not A, B)

areas represent relative probabilities
Jeff Howbert

Introduction to Machine Learning

Winter 2012

32
Example of product rule
Probability that a man has white hair (event A)
and is over 65 (event B)
– p( B ) = 0.18
– p( A | B ) = 0.78
– p( A, B ) = p( A | B ) ⋅ p( B ) =
0.78 ⋅ 0.18 =
0.14

Jeff Howbert

Introduction to Machine Learning

Winter 2012

33
Rule of total probability
Given: events A and B, which can co-occur (or not)

p( A ) = p( A B ) + p( A not B )
A,
A,
(same expression given previously to define marginal probability)

(not A, not B)

A

(A B)
A,

B

Ω
(A, not B)

(not A, B)

areas represent relative probabilities
Jeff Howbert

Introduction to Machine Learning

Winter 2012

34
Independence
Given: events A and B, which can co-occur (or not)

p( A | B ) = p( A )

or

p( A B ) = p( A ) ⋅ p( B )
A,

Ω
(not A, not B)

(not A, B)

B
(A, not B)

A

( A, B )

areas represent relative probabilities
Jeff Howbert

Introduction to Machine Learning

Winter 2012

35
Examples of independence / dependence
Independence:
– Outcomes on multiple rolls of a die
p
– Outcomes on multiple flips of a coin
– Height of two unrelated individuals
– Probability of getting a king on successive draws from
a deck, if card from each draw is replaced
Dependence:
D
d
– Height of two related individuals
– Duration of successive eruptions of Old Faithful
– Probability of getting a king on successive draws from
a deck, if card from each draw is not replaced
,
p
Jeff Howbert

Introduction to Machine Learning

Winter 2012

36
Example of independence vs. dependence
Independence: All manufacturers have identical product
mix. p( X = x | Y = y ) = p( X = x ).
Dependence: American manufacturers love SUVs,
Europeans manufacturers don’t.

Jeff Howbert

Introduction to Machine Learning

Winter 2012

37
Bayes rule
A way to find conditional probabilities for one variable when
conditional probabilities for another variable are known.
p( B | A ) = p( A | B ) ⋅ p( B ) / p( A )
where p( A ) = p( A, B ) + p( A, not B )

(not A, not B)

A

( A, B )

B

Ω
(A, not B)

Jeff Howbert

(not A, B)

Introduction to Machine Learning

Winter 2012

38
Bayes rule
posterior probability ∝ likelihood × prior probability
p( B | A ) = p( A | B ) ⋅ p( B ) / p( A )

(not A, not B)

A

( A, B )

B

Ω
(A, not B)

Jeff Howbert

(not A, B)

Introduction to Machine Learning

Winter 2012

39
Example of Bayes rule
Marie is getting married tomorrow at an outdoor ceremony in the
desert. In recent years, it has rained only 5 days each year.
Unfortunately,
Unfortunately the weatherman is forecasting rain for tomorrow When
tomorrow.
it actually rains, the weatherman has forecast rain 90% of the time.
When it doesn't rain, he has forecast rain 10% of the time. What is the
probability it will rain on the day of Marie's wedding?
Marie s
Event A: The weatherman has forecast rain.
Event B: It rains.
We know:
– p( B ) = 5 / 365 = 0.0137 [ It rains 5 days out of the year. ]
– p( not B ) = 360 / 365 = 0.9863
– p( A | B ) = 0.9 [ When it rains, the weatherman has forecast
rain 90% of the time. ]
– p( A | not B ) = 0.1 [When it does not rain the weatherman has
01
rain,
forecast rain 10% of the time.]
Jeff Howbert

Introduction to Machine Learning

Winter 2012

40
Example of Bayes rule, cont’d.

1.
2.
3.

We want to know p( B | A ), the probability it will rain on
the day of Marie's wedding, given a forecast for rain by
the
th weatherman. Th answer can b d t
th
The
be determined f
i d from
Bayes rule:
p( B | A ) = p( A | B ) ⋅ p( B ) / p( A )
p( A ) = p( A | B ) ⋅ p( B ) + p( A | not B ) ⋅ p( not B ) =
(0.9)(0.014) + (0.1)(0.986) = 0.111
p( B | A ) = (0.9)(0.0137) / 0.111 = 0.111
The result seems unintuitive but is correct. Even when the
weatherman predicts rain, it only rains only about 11% of
p
gloomy p
y prediction, it
,
the time. Despite the weatherman's g
is unlikely Marie will get rained on at her wedding.

Jeff Howbert

Introduction to Machine Learning

Winter 2012

41
Probabilities: when to add, when to multiply
ADD: When you want to allow for occurrence of
any of several possible outcomes of a single
process. Comparable to logical OR.
MULTIPLY: When you want to allow for
simultaneous occurrence of particular outcomes
p
from more than one process. Comparable to
logical AND.
– But only if the processes are independent.

Jeff Howbert

Introduction to Machine Learning

Winter 2012

42
Linear algebra applications
1)
2)
3)
4)
5)
6)

Operations on or between vectors and matrices
Coordinate transformations
Dimensionality reduction
Linear regression
Solution of linear systems of equations
Many others
M
th

Applications 1) – 4) are directly relevant to this
course. Today we’ll start with 1).
Jeff Howbert

Introduction to Machine Learning

Winter 2012

43
Why vectors and matrices?
Most common form of data
organization for machine
learning is a 2D array, where
– rows represent samples
p
p
(records, items, datapoints)
– columns represent attributes
p
(features, variables)
Natural to think of each sample
as a vector of attributes, and
whole array as a matrix
Jeff Howbert

Introduction to Machine Learning

vector
Refund Marital
Status

Taxable
Income Cheat

Yes

Single

125K

No

No

Married

100K

No

No

Single

70K

No

Yes

Married

120K

No

No

Divorced 95K

Yes

No

Married

No

Yes

Divorced 220K

No

No

Single

85K

Yes

No

Married

75K

No

No

Single

90K

Yes

60K

10

matrix
Winter 2012

44
Vectors
Definition: an n-tuple of values (usually real
numbers).
– n referred to as the dimension of the vector
– n can be any positive integer from 1 to infinity
integer,
Can be written in column form or row form
– Column form is conventional
– Vector elements referenced by subscript
⎛ x1 ⎞
⎜ ⎟
x=⎜ M ⎟
⎜x ⎟
⎝ n⎠
Jeff Howbert

x T = ( x1 L xn )
T

means " t
transpose"
"

Introduction to Machine Learning

Winter 2012

45
Vectors
Can think of a vector as:
– a point in space or
– a directed line segment with a magnitude and
direction

Jeff Howbert

Introduction to Machine Learning

Winter 2012

46
Vector arithmetic
Addition of two vectors
– add corresponding elements

z = x + y = (x1 + y1 L xn + yn )

T

– result is a vector
Scalar multiplication of a vector
– multiply each element by scalar

y = ax = (a x1 L axn )

T

– result is a vector

Jeff Howbert

Introduction to Machine Learning

Winter 2012

47
Vector arithmetic
Dot product of two vectors
– multiply corresponding elements, th add products
lti l
di
l
t then dd
d t
n

a = x ⋅ y = ∑ xi yi
i =1

– result is a scalar
y
Dot product alternative form

a = x ⋅ y = x y cos (θ )

Jeff Howbert

Introduction to Machine Learning

θ

x

Winter 2012

48
Matrices
Definition: an m x n two-dimensional array of
values (usually real numbers).
– m rows
– n columns
Matrix referenced by two-element subscript
– first element in
⎛ a11 L a1n ⎞
subscript is row
⎜
⎟
A=⎜ M O M ⎟
– second element in
⎜a
L amn ⎟
⎝ m1
⎠
subscript is column
– example: A24 or a24 is element in second row,
fourth column of A
Jeff Howbert

Introduction to Machine Learning

Winter 2012

49
Matrices
A vector can be regarded as special case of a
matrix, where one of matrix dimensions = 1.
Matrix transpose (denoted T)
– swap columns and rows
row 1 becomes column 1, etc.

– m x n matrix becomes n x m matrix
– example:
4 ⎞
⎛2
⎛ 2 7 − 1 0 3⎞
A=⎜
⎜ 4 6 − 3 1 8⎟
⎟
⎝
⎠

Jeff Howbert

⎟
⎜
6 ⎟
⎜7
AT = ⎜ − 1 − 3⎟
⎜
⎟
1 ⎟
⎜0
⎜3
8 ⎟
⎝
⎠

Introduction to Machine Learning

Winter 2012

50
Matrix arithmetic
Addition of two matrices
– matrices must be same size
– add corresponding elements:

cij = aij + bij
– result is a matrix of same size

C= A+B =
⎛ a11 + b11 L a1n + b1n ⎞
⎜
⎟
M
O
M
⎜
⎟
⎜a + b
L amn + bmn ⎟
m1
m1
⎝
⎠

Scalar multiplication of a matrix
– multiply each element by scalar:

bij = d ⋅ aij

– result is a matrix of same size

Jeff Howbert

Introduction to Machine Learning

B = d ⋅A =
⎛ d ⋅ a11 L d ⋅ a1n ⎞
⎜
⎟
O
M ⎟
⎜ M
⎜d ⋅a
L d ⋅ amn ⎟
m1
⎝
⎠
Winter 2012

51
Matrix arithmetic
Matrix-matrix multiplication
– vector-matrix multiplication j
p
just a special case
p
TO THE BOARD!!
Multiplication is associative
A⋅(B⋅C)=(A⋅B)⋅C
Multiplication is not commutative
A ⋅ B ≠ B ⋅ A (generally)
Transposition rule:
( A ⋅ B )T = B T ⋅ A T
Jeff Howbert

Introduction to Machine Learning

Winter 2012

52
Matrix arithmetic
RULE: In any chain of matrix multiplications, the
column dimension of one matrix in the chain must
match the row dimension of the following matrix
in the chain.
Examples
A3x5
B5x5
C3x1
Right:
A ⋅ B ⋅ AT CT ⋅ A ⋅ B AT ⋅ A ⋅ B C ⋅ CT ⋅ A
Wrong:
A⋅B⋅A
C⋅A⋅B
A ⋅ AT ⋅ B CT ⋅ C ⋅ A
Jeff Howbert

Introduction to Machine Learning

Winter 2012

53
Vector projection
Orthogonal projection of y onto x
– Can take place in any space of dimensionality > 2
– Unit vector in direction of x is
y
x / || x ||
– Length of projection of y in
direction of x is
θ
x
|| y || ⋅ cos(θ )
projx( y )
– Orthogonal projection of
y onto x is the vector
projx( y ) = x ⋅ || y || ⋅ cos(θ ) / || x || =
[ ( x ⋅ y ) / || x ||2 ] x (using dot product alternate form)

Jeff Howbert

Introduction to Machine Learning

Winter 2012

54
Optimization theory topics
Maximum likelihood
Expectation maximization
Gradient descent

Jeff Howbert

Introduction to Machine Learning

Winter 2012

55

More Related Content

What's hot

Imprecise probability theory - Summer School 2014
Imprecise probability theory - Summer School 2014Imprecise probability theory - Summer School 2014
Imprecise probability theory - Summer School 2014Sebastien Destercke
 
Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Umberto Picchini
 
My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...Umberto Picchini
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Umberto Picchini
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Umberto Picchini
 
Introduction to random variables
Introduction to random variablesIntroduction to random variables
Introduction to random variablesHadley Wickham
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemErika G. G.
 
RuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: Input-Output STIT Logic for Normative SystemsRuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: Input-Output STIT Logic for Normative SystemsRuleML
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...Christian Robert
 
Statement of stochastic programming problems
Statement of stochastic programming problemsStatement of stochastic programming problems
Statement of stochastic programming problemsSSA KPI
 
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Christian Robert
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Pythonfreshdatabos
 
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...Umberto Picchini
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum EntropyJiawang Liu
 
Constructing List Homomorphisms from Proofs
Constructing List Homomorphisms from ProofsConstructing List Homomorphisms from Proofs
Constructing List Homomorphisms from ProofsYun-Yan Chi
 

What's hot (20)

Imprecise probability theory - Summer School 2014
Imprecise probability theory - Summer School 2014Imprecise probability theory - Summer School 2014
Imprecise probability theory - Summer School 2014
 
Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...Accelerated approximate Bayesian computation with applications to protein fol...
Accelerated approximate Bayesian computation with applications to protein fol...
 
Pareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQPareto Models, Slides EQUINEQ
Pareto Models, Slides EQUINEQ
 
My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...My data are incomplete and noisy: Information-reduction statistical methods f...
My data are incomplete and noisy: Information-reduction statistical methods f...
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...
 
Lecture5 xing
Lecture5 xingLecture5 xing
Lecture5 xing
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)
 
Introduction to random variables
Introduction to random variablesIntroduction to random variables
Introduction to random variables
 
Jsm09 talk
Jsm09 talkJsm09 talk
Jsm09 talk
 
A Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation ProblemA Maximum Entropy Approach to the Loss Data Aggregation Problem
A Maximum Entropy Approach to the Loss Data Aggregation Problem
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
RuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: Input-Output STIT Logic for Normative SystemsRuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: Input-Output STIT Logic for Normative Systems
 
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...On the vexing dilemma of hypothesis testing and the predicted demise of the B...
On the vexing dilemma of hypothesis testing and the predicted demise of the B...
 
Statement of stochastic programming problems
Statement of stochastic programming problemsStatement of stochastic programming problems
Statement of stochastic programming problems
 
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
Discussion of ABC talk by Stefano Cabras, Padova, March 21, 2013
 
Slides econ-lm
Slides econ-lmSlides econ-lm
Slides econ-lm
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
 
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
A likelihood-free version of the stochastic approximation EM algorithm (SAEM)...
 
Principle of Maximum Entropy
Principle of Maximum EntropyPrinciple of Maximum Entropy
Principle of Maximum Entropy
 
Constructing List Homomorphisms from Proofs
Constructing List Homomorphisms from ProofsConstructing List Homomorphisms from Proofs
Constructing List Homomorphisms from Proofs
 

Viewers also liked

Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Alex Pinto
 
高嘉良/Open Innovation as Strategic Plan
高嘉良/Open Innovation as Strategic Plan高嘉良/Open Innovation as Strategic Plan
高嘉良/Open Innovation as Strategic Plan台灣資料科學年會
 
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...Sri Ambati
 
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...Chris Fregly
 
Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016Chris Fregly
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用Mark Chang
 
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...Chris Fregly
 
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...Chris Fregly
 
陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰台灣資料科學年會
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Chris Fregly
 
Machine Learning Essentials (dsth Meetup#3)
Machine Learning Essentials (dsth Meetup#3)Machine Learning Essentials (dsth Meetup#3)
Machine Learning Essentials (dsth Meetup#3)Data Science Thailand
 
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...Chris Fregly
 
Machine Learning Preliminaries and Math Refresher
Machine Learning Preliminaries and Math RefresherMachine Learning Preliminaries and Math Refresher
Machine Learning Preliminaries and Math Refresherbutest
 
Machine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine LearningMachine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine LearningArshad Ahmed
 
The Genome Assembly Problem
The Genome Assembly ProblemThe Genome Assembly Problem
The Genome Assembly ProblemMark Chang
 
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...Chris Fregly
 
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探台灣資料科學年會
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial NetworksMark Chang
 
DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterMark Chang
 

Viewers also liked (20)

[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
 
高嘉良/Open Innovation as Strategic Plan
高嘉良/Open Innovation as Strategic Plan高嘉良/Open Innovation as Strategic Plan
高嘉良/Open Innovation as Strategic Plan
 
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...
 
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
Kafka Summit SF Apr 26 2016 - Generating Real-time Recommendations with NiFi,...
 
Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016Boston Spark Meetup May 24, 2016
Boston Spark Meetup May 24, 2016
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用
 
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...
Big Data Spain - Nov 17 2016 - Madrid Continuously Deploy Spark ML and Tensor...
 
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
High Performance Distributed TensorFlow with GPUs - TensorFlow Chicago Meetup...
 
陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰陸永祥/全球網路攝影機帶來的機會與挑戰
陸永祥/全球網路攝影機帶來的機會與挑戰
 
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
Gradient Descent, Back Propagation, and Auto Differentiation - Advanced Spark...
 
Machine Learning Essentials (dsth Meetup#3)
Machine Learning Essentials (dsth Meetup#3)Machine Learning Essentials (dsth Meetup#3)
Machine Learning Essentials (dsth Meetup#3)
 
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
Advanced Spark and TensorFlow Meetup 08-04-2016 One Click Spark ML Pipeline D...
 
Machine Learning Preliminaries and Math Refresher
Machine Learning Preliminaries and Math RefresherMachine Learning Preliminaries and Math Refresher
Machine Learning Preliminaries and Math Refresher
 
Machine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine LearningMachine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine Learning
 
The Genome Assembly Problem
The Genome Assembly ProblemThe Genome Assembly Problem
The Genome Assembly Problem
 
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
Deploy Spark ML and Tensorflow AI Models from Notebooks to Microservices - No...
 
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
[DSC 2016] 系列活動:李泳泉 / 星火燎原 - Spark 機器學習初探
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive Writer
 

Similar to 02 math essentials

Machine Learning, Financial Engineering and Quantitative Investing
Machine Learning, Financial Engineering and Quantitative InvestingMachine Learning, Financial Engineering and Quantitative Investing
Machine Learning, Financial Engineering and Quantitative InvestingShengyuan Wang Steven
 
Pre-Cal 40S Slides December 21, 2007
Pre-Cal 40S Slides December 21, 2007Pre-Cal 40S Slides December 21, 2007
Pre-Cal 40S Slides December 21, 2007Darren Kuropatwa
 
Optimization of probabilistic argumentation with Markov processes
Optimization of probabilistic argumentation with Markov processesOptimization of probabilistic argumentation with Markov processes
Optimization of probabilistic argumentation with Markov processesEmmanuel Hadoux
 
STOMA FULL SLIDE (probability of IISc bangalore)
STOMA FULL SLIDE (probability of IISc bangalore)STOMA FULL SLIDE (probability of IISc bangalore)
STOMA FULL SLIDE (probability of IISc bangalore)2010111
 
Pre-Cal 40S Slides May 15, 2007
Pre-Cal 40S Slides May 15, 2007Pre-Cal 40S Slides May 15, 2007
Pre-Cal 40S Slides May 15, 2007Darren Kuropatwa
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheetSuvrat Mishra
 
CPSC531-Probability.pptx
CPSC531-Probability.pptxCPSC531-Probability.pptx
CPSC531-Probability.pptxRidaIrfan10
 
Probability and probability distributions ppt @ bec doms
Probability and probability distributions ppt @ bec domsProbability and probability distributions ppt @ bec doms
Probability and probability distributions ppt @ bec domsBabasab Patil
 
Probability Theory.pdf
Probability Theory.pdfProbability Theory.pdf
Probability Theory.pdfcodewithmensah
 
Theory of Probability
Theory of Probability Theory of Probability
Theory of Probability ArchanaKadam19
 
Probability distribution for Dummies
Probability distribution for DummiesProbability distribution for Dummies
Probability distribution for DummiesBalaji P
 

Similar to 02 math essentials (20)

Introduction to Machine Learning
Introduction to Machine Learning Introduction to Machine Learning
Introduction to Machine Learning
 
Machine Learning, Financial Engineering and Quantitative Investing
Machine Learning, Financial Engineering and Quantitative InvestingMachine Learning, Financial Engineering and Quantitative Investing
Machine Learning, Financial Engineering and Quantitative Investing
 
pattern recognition
pattern recognition pattern recognition
pattern recognition
 
Pre-Cal 40S Slides December 21, 2007
Pre-Cal 40S Slides December 21, 2007Pre-Cal 40S Slides December 21, 2007
Pre-Cal 40S Slides December 21, 2007
 
Probability Cheatsheet.pdf
Probability Cheatsheet.pdfProbability Cheatsheet.pdf
Probability Cheatsheet.pdf
 
Optimization of probabilistic argumentation with Markov processes
Optimization of probabilistic argumentation with Markov processesOptimization of probabilistic argumentation with Markov processes
Optimization of probabilistic argumentation with Markov processes
 
STOMA FULL SLIDE (probability of IISc bangalore)
STOMA FULL SLIDE (probability of IISc bangalore)STOMA FULL SLIDE (probability of IISc bangalore)
STOMA FULL SLIDE (probability of IISc bangalore)
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
Pre-Cal 40S Slides May 15, 2007
Pre-Cal 40S Slides May 15, 2007Pre-Cal 40S Slides May 15, 2007
Pre-Cal 40S Slides May 15, 2007
 
Probability
ProbabilityProbability
Probability
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
CPSC531-Probability.pptx
CPSC531-Probability.pptxCPSC531-Probability.pptx
CPSC531-Probability.pptx
 
Chapter 13
Chapter 13Chapter 13
Chapter 13
 
Probability and probability distributions ppt @ bec doms
Probability and probability distributions ppt @ bec domsProbability and probability distributions ppt @ bec doms
Probability and probability distributions ppt @ bec doms
 
Probability Theory.pdf
Probability Theory.pdfProbability Theory.pdf
Probability Theory.pdf
 
PTSP PPT.pdf
PTSP PPT.pdfPTSP PPT.pdf
PTSP PPT.pdf
 
ML-04.pdf
ML-04.pdfML-04.pdf
ML-04.pdf
 
C2.0 propositional logic
C2.0 propositional logicC2.0 propositional logic
C2.0 propositional logic
 
Theory of Probability
Theory of Probability Theory of Probability
Theory of Probability
 
Probability distribution for Dummies
Probability distribution for DummiesProbability distribution for Dummies
Probability distribution for Dummies
 

Recently uploaded

EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 

Recently uploaded (20)

Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 

02 math essentials

  • 1. Machine Learning Math Essentials Jeff Howbert Introduction to Machine Learning Winter 2012 1
  • 2. Areas of math essential to machine learning Machine learning is part of both statistics and computer science – Probability – Statistical inference – Validation – Estimates of error, confidence intervals Linear algebra Li l b – Hugely useful for compact representation of linear transformations on data – Dimensionality reduction techniques Optimization theory p y Jeff Howbert Introduction to Machine Learning Winter 2012 2
  • 3. Why worry about the math? There are lots of easy-to-use machine learning packages out there. After this course, you will know how to apply several of the most general-purpose algorithms. g p p g HOWEVER To get really useful results, you need good mathematical intuitions about ce ta ge e a at e at ca tu t o s certain general machine learning principles, as well as the inner workings of the individual algorithms. Jeff Howbert Introduction to Machine Learning Winter 2012 3
  • 4. Why worry about the math? These intuitions will allow you to: – Choose the right algorithm(s) for the problem – Make good choices on parameter settings, validation strategies g – Recognize over- or underfitting – Troubleshoot poor / ambiguous results – Put appropriate bounds of confidence / uncertainty on results – Do a better job of coding algorithms or incorporating them into more complex p g p analysis pipelines Jeff Howbert Introduction to Machine Learning Winter 2012 4
  • 5. Notation a∈A |B| || v || ∑ ∫ ℜ ℜn set membership: a is member of set A cardinality: number of items in set B norm: length of vector v summation integral the t f th set of real numbers l b real number space of dimension n n = 2 : plane or 2-space n = 3 : 3- (dimensional) space p yp p n > 3 : n-space or hyperspace Jeff Howbert Introduction to Machine Learning Winter 2012 5
  • 6. Notation x, y, z, vector (bold, lower case) u, v A, B, X matrix (bold, upper case) y = f( x ) function (map): assigns unique value in range of y to each value in domain of x dy / dx derivative of y with respect to single y p g variable x y = f( x ) function on multiple variables, i.e. a ( p vector of variables; function in n-space ∂y / ∂xi partial derivative of y with respect to element i of vector x Jeff Howbert Introduction to Machine Learning Winter 2012 6
  • 7. The concept of probability Intuition: In some process, several outcomes are possible. When the process is repeated a large number of times, each outcome occurs with a characteristic relative frequency or probability If a particular frequency, probability. outcome happens more often than another outcome, outcome we say it is more probable probable. Jeff Howbert Introduction to Machine Learning Winter 2012 7
  • 8. The concept of probability Arises in two contexts: In actual repeated experiments. – Example: You record the color of 1000 cars driving by. 57 of them are green. You estimate the probability of a car being green as 57 / 1000 = 0 0057 0.0057. In idealized conceptions of a repeated process. – Example: You consider the behavior of an unbiased six-sided die. The expected probability of rolling a 5 is 1 / 6 = 0.1667. – Example: You need a model for how people’s heights are distributed. You choose a normal distribution ( (bell-shaped curve) to represent the expected relative p ) p p probabilities. Jeff Howbert Introduction to Machine Learning Winter 2012 8
  • 9. Probability spaces A probability space is a random process or experiment with three components: – Ω, the set of possible outcomes O number of possible outcomes = | Ω | = N – F th set of possible events E F, the t f ibl t an event comprises 0 to N outcomes number of possible events = | F | = 2N – P, the probability distribution function mapping each outcome and event to real number between 0 and 1 (th probability of O or E) b t d (the b bilit f probability of an event is sum of probabilities of possible outcomes in event Jeff Howbert Introduction to Machine Learning Winter 2012 9
  • 10. Axioms of probability 1. Non-negativity: for any event E ∈ F p( E ) ≥ 0 F, 2. 2 All possible outcomes: p( Ω ) = 1 3. Additivity of disjoint events: for all events E, E’ ∈ F where E ∩ E’ = ∅, p( E U E’ ) = p( E ) + p( E’ ) Jeff Howbert Introduction to Machine Learning Winter 2012 10
  • 11. Types of probability spaces Define | Ω | = number of possible outcomes Discrete space | Ω | is finite – Analysis involves summations ( ∑ ) Continuous space | Ω | is infinite C ti i i fi it – Analysis involves integrals ( ∫ ) Jeff Howbert Introduction to Machine Learning Winter 2012 11
  • 12. Example of discrete probability space Single roll of a six-sided die – 6 possible outcomes: O = 1, 2, 3, 4, 5, or 6 p , , , , , – 26 = 64 possible events example: E = ( O ∈ { 1, 3, 5 } ), i.e. outcome is odd – If die is fair, then probabilities of outcomes are equal p( 1 ) = p( 2 ) = p( 3 ) = p( 4 ) = p( 5 ) = p( 6 ) = 1 / 6 example: probability of event E = ( outcome is odd ) is p( 1 ) + p( 3 ) + p( 5 ) = 1 / 2 Jeff Howbert Introduction to Machine Learning Winter 2012 12
  • 13. Example of discrete probability space Three consecutive flips of a coin – 8 possible outcomes: O = HHH, HHT, HTH, HTT, p , , , , THH, THT, TTH, TTT – 28 = 256 possible events example: E = ( O ∈ { HHT, HTH, THH } ), i.e. exactly two flips are heads example: E = ( O ∈ { THT, TTT } ), i.e. the first and third flips are tails – If coin is fair, then probabilities of outcomes are equal p( HHH ) = p( HHT ) = p( HTH ) = p( HTT ) = p( THH ) = p( THT ) = p( TTH ) = p( TTT ) = 1 / 8 example: probability of event E = ( exactly two heads ) is p( HHT ) + p( HTH ) + p( THH ) = 3 / 8 ( ( ( Jeff Howbert Introduction to Machine Learning Winter 2012 13
  • 14. Example of continuous probability space Height of a randomly chosen American male – Infinite number of possible outcomes: O has some single value in range 2 feet to 8 feet – Infinite number of possible events example: E = ( O | O < 5.5 feet ), i.e. individual chosen is less than 5.5 feet tall – Probabilities of outcomes are not equal and are equal, described by a continuous function, p( O ) Jeff Howbert Introduction to Machine Learning Winter 2012 14
  • 15. Example of continuous probability space Height of a randomly chosen American male – Probabilities of outcomes O are not equal and are equal, described by a continuous function, p( O ) – p( O ) is a relative, not an absolute probability p( O ) for any particular O is zero ∫ p( O ) from O = -∞ to ∞ (i.e. area under curve) is 1 example: p( O = 5 8 ) > p( O = 6 2 ) 5’8” 6’2” example: p( O < 5’6” ) = ( ∫ p( O ) from O = -∞ to 5’6” ) ≈ 0.25 Jeff Howbert Introduction to Machine Learning Winter 2012 15
  • 16. Probability distributions Discrete: probability mass function (pmf) example: sum of two fair dice probability density function (pdf) example: waiting time between eruptions of Old Faithful (minutes) Jeff Howbert probability Continuous: Introduction to Machine Learning Winter 2012 16
  • 17. Random variables A random variable X is a function that associates a number x with each outcome O of a process – C Common notation: X( O ) = x, or j t X = x t ti just Basically a way to redefine (usually simplify) a probability space to a new probability space – X must obey axioms of probability (over the possible values of x) – X can be discrete or continuous Example: X = number of heads in three flips of a coin – Possible values of X are 0, 1, 2, 3 – p( X = 0 ) = p( X = 3 ) = 1 / 8 p( X = 1 ) = p( X = 2 ) = 3 / 8 – Size of space (number of “outcomes”) reduced from 8 to 4 Example: X = average height of five randomly chosen American men – Size of space unchanged (X can range from 2 feet to 8 feet) but feet), pdf of X different than for single man Jeff Howbert Introduction to Machine Learning Winter 2012 17
  • 18. Multivariate probability distributions Scenario – Several random processes occur ( p (doesn’t matter whether in parallel or in sequence) – Want to know probabilities for each possible combination of outcomes bi ti f t Can describe as joint probability of several random variables – Example: two processes whose outcomes are represented by random variables X and Y. Probability that process X has outcome x and process Y has outcome y is denoted as: p( X = x, Y = y ) ( Jeff Howbert Introduction to Machine Learning Winter 2012 18
  • 19. Example of multivariate distribution joint probability: p( X = minivan, Y = European ) = 0.1481 Jeff Howbert Introduction to Machine Learning Winter 2012 19
  • 20. Multivariate probability distributions Marginal probability – Probability distribution of a single variable in a joint distribution – Example: two random variables X and Y: p( X = x ) = ∑b=all values of Y p( X = x, Y = b ) Conditional probability – Probability distribution of one variable given that another variable takes a certain value – Example: two random variables X and Y: p( X = x | Y = y ) = p( X = x Y = y ) / p( Y = y ) x, Jeff Howbert Introduction to Machine Learning Winter 2012 20
  • 21. Example of marginal probability marginal probability: p( X = minivan ) = 0.0741 + 0.1111 + 0.1481 = 0.3333 Jeff Howbert Introduction to Machine Learning Winter 2012 21
  • 22. Example of conditional probability conditional probability: p( Y = European | X = minivan ) = 0.1481 / ( 0.0741 + 0.1111 + 0.1481 ) = 0.4433 p probability 0.2 0.15 0 15 0.1 0.05 0 American sport Asian Y = manufacturer Jeff Howbert SUV minivan European sedan Introduction to Machine Learning X = model type Winter 2012 22
  • 23. Continuous multivariate distribution Same concepts of joint, marginal, and conditional probabilities apply (except use integrals) Example: three-component Gaussian mixture in two dimensions Jeff Howbert Introduction to Machine Learning Winter 2012 23
  • 24. Expected value Given: A discrete random variable X with possible X, values x = x1, x2, … xn Probabilities p( X = xi ) that X takes on the various values of xi A function yi = f( xi ) defined on X The expected value of f is the probability-weighted “average” value of f( xi ): E( f ) = ∑i p( xi ) ⋅ f( xi ) Jeff Howbert Introduction to Machine Learning Winter 2012 24
  • 25. Example of expected value Process: game where one card is drawn from the deck – If face card, dealer pays y $10 , p y you $ – If not a face card, you pay dealer $4 Random variable X = { face card, not face card } – p( face card ) = 3/13 – p( not face card ) = 10/13 Function f( X ) is payout to you – f( face card ) = 10 – f( not face card ) = -4 tf d 4 Expected value of payout is: E( f ) = ∑i p( xi ) ⋅ f( xi ) = 3/13 ⋅ 10 + 10/13 ⋅ -4 = -0.77 4 0 77 Jeff Howbert Introduction to Machine Learning Winter 2012 25
  • 26. Expected value in continuous spaces E( f ) = ∫x = a → b p( x ) ⋅ f( x ) Jeff Howbert Introduction to Machine Learning Winter 2012 26
  • 27. Common forms of expected value (1) Mean (μ) f( xi ) = xi ⇒ μ = E( f ) = ∑i p( xi ) ⋅ xi – Average value of X = xi, taking into account probability of the various xi – M t common measure of “center” of a distribution Most f“ t ” f di t ib ti Compare to formula for mean of an actual sample 1 n μ = ∑ xi N i =1 Jeff Howbert Introduction to Machine Learning Winter 2012 27
  • 28. Common forms of expected value (2) Variance (σ2) f( xi ) = ( xi - μ ) ⇒ σ2 = ∑i p( xi ) ⋅ ( xi - μ )2 – Average value of squared deviation of X = xi from mean μ, taking into account probability of the various xi – M t common measure of “spread” of a distribution Most f“ d” f di t ib ti – σ is the standard deviation Compare to formula for variance of an actual sample 1 n 2 σ = ( xi − μ ) 2 ∑ N − 1 i =1 Jeff Howbert Introduction to Machine Learning Winter 2012 28
  • 29. Common forms of expected value (3) Covariance f( xi ) = ( xi - μx ), g( yi ) = ( yi - μy ) ⇒ cov( x y ) = ∑i p( xi , yi ) ⋅ ( xi - μx ) ⋅ ( yi - μy ) x, high (pos sitive) covaria ance no co ovariance – Measures tendency for x and y to deviate from their means in same (or opposite) directions at same time Compare to formula for covariance of actual samples 1 n cov( x, y ) = ∑ ( xi − μ x )( yi − μ y ) N − 1 i =1 Jeff Howbert Introduction to Machine Learning Winter 2012 29
  • 30. Correlation Pearson’s correlation coefficient is covariance normalized by the standard deviations of the two variables cov( x, y ) corr( x, y ) = σ xσ y – Always lies in range -1 to 1 – Only reflects linear dependence between variables Linear dependence with noise Linear dependence without noise Various nonlinear dependencies Jeff Howbert Introduction to Machine Learning Winter 2012 30
  • 31. Complement rule Given: event A, which can occur or not p( not A ) = 1 - p( A ) Ω A not A areas represent relative probabilities Jeff Howbert Introduction to Machine Learning Winter 2012 31
  • 32. Product rule Given: events A and B, which can co-occur (or not) p( A B ) = p( A | B ) ⋅ p( B ) A, (same expression given previously to define conditional probability) (not A, not B) A (A B) A, B Ω (A, not B) (not A, B) areas represent relative probabilities Jeff Howbert Introduction to Machine Learning Winter 2012 32
  • 33. Example of product rule Probability that a man has white hair (event A) and is over 65 (event B) – p( B ) = 0.18 – p( A | B ) = 0.78 – p( A, B ) = p( A | B ) ⋅ p( B ) = 0.78 ⋅ 0.18 = 0.14 Jeff Howbert Introduction to Machine Learning Winter 2012 33
  • 34. Rule of total probability Given: events A and B, which can co-occur (or not) p( A ) = p( A B ) + p( A not B ) A, A, (same expression given previously to define marginal probability) (not A, not B) A (A B) A, B Ω (A, not B) (not A, B) areas represent relative probabilities Jeff Howbert Introduction to Machine Learning Winter 2012 34
  • 35. Independence Given: events A and B, which can co-occur (or not) p( A | B ) = p( A ) or p( A B ) = p( A ) ⋅ p( B ) A, Ω (not A, not B) (not A, B) B (A, not B) A ( A, B ) areas represent relative probabilities Jeff Howbert Introduction to Machine Learning Winter 2012 35
  • 36. Examples of independence / dependence Independence: – Outcomes on multiple rolls of a die p – Outcomes on multiple flips of a coin – Height of two unrelated individuals – Probability of getting a king on successive draws from a deck, if card from each draw is replaced Dependence: D d – Height of two related individuals – Duration of successive eruptions of Old Faithful – Probability of getting a king on successive draws from a deck, if card from each draw is not replaced , p Jeff Howbert Introduction to Machine Learning Winter 2012 36
  • 37. Example of independence vs. dependence Independence: All manufacturers have identical product mix. p( X = x | Y = y ) = p( X = x ). Dependence: American manufacturers love SUVs, Europeans manufacturers don’t. Jeff Howbert Introduction to Machine Learning Winter 2012 37
  • 38. Bayes rule A way to find conditional probabilities for one variable when conditional probabilities for another variable are known. p( B | A ) = p( A | B ) ⋅ p( B ) / p( A ) where p( A ) = p( A, B ) + p( A, not B ) (not A, not B) A ( A, B ) B Ω (A, not B) Jeff Howbert (not A, B) Introduction to Machine Learning Winter 2012 38
  • 39. Bayes rule posterior probability ∝ likelihood × prior probability p( B | A ) = p( A | B ) ⋅ p( B ) / p( A ) (not A, not B) A ( A, B ) B Ω (A, not B) Jeff Howbert (not A, B) Introduction to Machine Learning Winter 2012 39
  • 40. Example of Bayes rule Marie is getting married tomorrow at an outdoor ceremony in the desert. In recent years, it has rained only 5 days each year. Unfortunately, Unfortunately the weatherman is forecasting rain for tomorrow When tomorrow. it actually rains, the weatherman has forecast rain 90% of the time. When it doesn't rain, he has forecast rain 10% of the time. What is the probability it will rain on the day of Marie's wedding? Marie s Event A: The weatherman has forecast rain. Event B: It rains. We know: – p( B ) = 5 / 365 = 0.0137 [ It rains 5 days out of the year. ] – p( not B ) = 360 / 365 = 0.9863 – p( A | B ) = 0.9 [ When it rains, the weatherman has forecast rain 90% of the time. ] – p( A | not B ) = 0.1 [When it does not rain the weatherman has 01 rain, forecast rain 10% of the time.] Jeff Howbert Introduction to Machine Learning Winter 2012 40
  • 41. Example of Bayes rule, cont’d. 1. 2. 3. We want to know p( B | A ), the probability it will rain on the day of Marie's wedding, given a forecast for rain by the th weatherman. Th answer can b d t th The be determined f i d from Bayes rule: p( B | A ) = p( A | B ) ⋅ p( B ) / p( A ) p( A ) = p( A | B ) ⋅ p( B ) + p( A | not B ) ⋅ p( not B ) = (0.9)(0.014) + (0.1)(0.986) = 0.111 p( B | A ) = (0.9)(0.0137) / 0.111 = 0.111 The result seems unintuitive but is correct. Even when the weatherman predicts rain, it only rains only about 11% of p gloomy p y prediction, it , the time. Despite the weatherman's g is unlikely Marie will get rained on at her wedding. Jeff Howbert Introduction to Machine Learning Winter 2012 41
  • 42. Probabilities: when to add, when to multiply ADD: When you want to allow for occurrence of any of several possible outcomes of a single process. Comparable to logical OR. MULTIPLY: When you want to allow for simultaneous occurrence of particular outcomes p from more than one process. Comparable to logical AND. – But only if the processes are independent. Jeff Howbert Introduction to Machine Learning Winter 2012 42
  • 43. Linear algebra applications 1) 2) 3) 4) 5) 6) Operations on or between vectors and matrices Coordinate transformations Dimensionality reduction Linear regression Solution of linear systems of equations Many others M th Applications 1) – 4) are directly relevant to this course. Today we’ll start with 1). Jeff Howbert Introduction to Machine Learning Winter 2012 43
  • 44. Why vectors and matrices? Most common form of data organization for machine learning is a 2D array, where – rows represent samples p p (records, items, datapoints) – columns represent attributes p (features, variables) Natural to think of each sample as a vector of attributes, and whole array as a matrix Jeff Howbert Introduction to Machine Learning vector Refund Marital Status Taxable Income Cheat Yes Single 125K No No Married 100K No No Single 70K No Yes Married 120K No No Divorced 95K Yes No Married No Yes Divorced 220K No No Single 85K Yes No Married 75K No No Single 90K Yes 60K 10 matrix Winter 2012 44
  • 45. Vectors Definition: an n-tuple of values (usually real numbers). – n referred to as the dimension of the vector – n can be any positive integer from 1 to infinity integer, Can be written in column form or row form – Column form is conventional – Vector elements referenced by subscript ⎛ x1 ⎞ ⎜ ⎟ x=⎜ M ⎟ ⎜x ⎟ ⎝ n⎠ Jeff Howbert x T = ( x1 L xn ) T means " t transpose" " Introduction to Machine Learning Winter 2012 45
  • 46. Vectors Can think of a vector as: – a point in space or – a directed line segment with a magnitude and direction Jeff Howbert Introduction to Machine Learning Winter 2012 46
  • 47. Vector arithmetic Addition of two vectors – add corresponding elements z = x + y = (x1 + y1 L xn + yn ) T – result is a vector Scalar multiplication of a vector – multiply each element by scalar y = ax = (a x1 L axn ) T – result is a vector Jeff Howbert Introduction to Machine Learning Winter 2012 47
  • 48. Vector arithmetic Dot product of two vectors – multiply corresponding elements, th add products lti l di l t then dd d t n a = x ⋅ y = ∑ xi yi i =1 – result is a scalar y Dot product alternative form a = x ⋅ y = x y cos (θ ) Jeff Howbert Introduction to Machine Learning θ x Winter 2012 48
  • 49. Matrices Definition: an m x n two-dimensional array of values (usually real numbers). – m rows – n columns Matrix referenced by two-element subscript – first element in ⎛ a11 L a1n ⎞ subscript is row ⎜ ⎟ A=⎜ M O M ⎟ – second element in ⎜a L amn ⎟ ⎝ m1 ⎠ subscript is column – example: A24 or a24 is element in second row, fourth column of A Jeff Howbert Introduction to Machine Learning Winter 2012 49
  • 50. Matrices A vector can be regarded as special case of a matrix, where one of matrix dimensions = 1. Matrix transpose (denoted T) – swap columns and rows row 1 becomes column 1, etc. – m x n matrix becomes n x m matrix – example: 4 ⎞ ⎛2 ⎛ 2 7 − 1 0 3⎞ A=⎜ ⎜ 4 6 − 3 1 8⎟ ⎟ ⎝ ⎠ Jeff Howbert ⎟ ⎜ 6 ⎟ ⎜7 AT = ⎜ − 1 − 3⎟ ⎜ ⎟ 1 ⎟ ⎜0 ⎜3 8 ⎟ ⎝ ⎠ Introduction to Machine Learning Winter 2012 50
  • 51. Matrix arithmetic Addition of two matrices – matrices must be same size – add corresponding elements: cij = aij + bij – result is a matrix of same size C= A+B = ⎛ a11 + b11 L a1n + b1n ⎞ ⎜ ⎟ M O M ⎜ ⎟ ⎜a + b L amn + bmn ⎟ m1 m1 ⎝ ⎠ Scalar multiplication of a matrix – multiply each element by scalar: bij = d ⋅ aij – result is a matrix of same size Jeff Howbert Introduction to Machine Learning B = d ⋅A = ⎛ d ⋅ a11 L d ⋅ a1n ⎞ ⎜ ⎟ O M ⎟ ⎜ M ⎜d ⋅a L d ⋅ amn ⎟ m1 ⎝ ⎠ Winter 2012 51
  • 52. Matrix arithmetic Matrix-matrix multiplication – vector-matrix multiplication j p just a special case p TO THE BOARD!! Multiplication is associative A⋅(B⋅C)=(A⋅B)⋅C Multiplication is not commutative A ⋅ B ≠ B ⋅ A (generally) Transposition rule: ( A ⋅ B )T = B T ⋅ A T Jeff Howbert Introduction to Machine Learning Winter 2012 52
  • 53. Matrix arithmetic RULE: In any chain of matrix multiplications, the column dimension of one matrix in the chain must match the row dimension of the following matrix in the chain. Examples A3x5 B5x5 C3x1 Right: A ⋅ B ⋅ AT CT ⋅ A ⋅ B AT ⋅ A ⋅ B C ⋅ CT ⋅ A Wrong: A⋅B⋅A C⋅A⋅B A ⋅ AT ⋅ B CT ⋅ C ⋅ A Jeff Howbert Introduction to Machine Learning Winter 2012 53
  • 54. Vector projection Orthogonal projection of y onto x – Can take place in any space of dimensionality > 2 – Unit vector in direction of x is y x / || x || – Length of projection of y in direction of x is θ x || y || ⋅ cos(θ ) projx( y ) – Orthogonal projection of y onto x is the vector projx( y ) = x ⋅ || y || ⋅ cos(θ ) / || x || = [ ( x ⋅ y ) / || x ||2 ] x (using dot product alternate form) Jeff Howbert Introduction to Machine Learning Winter 2012 54
  • 55. Optimization theory topics Maximum likelihood Expectation maximization Gradient descent Jeff Howbert Introduction to Machine Learning Winter 2012 55