2. About me
• Education
• NCU (MIS)、NCCU (CS)
• Work Experience
• Telecom big data Innovation
• AI projects
• Retail marketing technology
• User Group
• TW Spark User Group
• TW Hadoop User Group
• Taiwan Data Engineer Association Director
• Research
• Big Data/ ML/ AIOT/ AI Columnist
2
「How can you not get romantic about baseball ? 」
5. Simulation vs. Machine Learning
• Simulation and machine learning are related in that they both revolve
around models, but they are very different.
• In fact, simulation and machine learning are almost opposites.
5
6. Simulation vs. Machine Learning
• Simulation Characteristics
• With simulation the model is often known (we know how to take input values,
make a calculation, and determine the output).
• Inputs are unknown. Inputs (at least some of them) are random variables and
we don't know their values exactly.
• Historical data is used to fit a probability distribution to the input or a
probability distribution is constructed from expert estimates.
• The goal is to find a range of outcomes by randomly sampling input values
and calculating output repeatedly.
6
8. Simulation vs. Machine Learning
• Machine Learning Characteristics
• In a machine learning problem the model is unknown initially.
• We have no way of determining the output value based on input values.
• If we have a set of data where inputs and the corresponding output are
known, we can use supervised learning to train a machine learning model.
• Supervised learning means we keep track of how well the machine learning
model is predicting versus the known outputs. Each iteration of the learning
process refines the model to improve prediction.
8
10. Simulation vs. Machine Learning
• Summary
• In simulation, the main source of uncertainty is in the inputs.
• We have to repeatedly simulate to get a range of possible outputs and make
statements about outcome probabilities.
• In machine learning, the main source of uncertainty is in the model. When
making a prediction, the model is often not 100% certain of the prediction.
Model Performance Evaluation
10
11. Monte Carlo Simulation
• Unlike a normal forecasting model, Monte Carlo Simulation predicts a
set of outcomes based on an estimated range of values versus a set of
fixed input values.
• By leveraging a probability distribution, such as a uniform or normal
distribution, for any variable that has inherent uncertainty. Then,
recalculates the results over and over, each time using a different set
of random numbers between the minimum and maximum values.
• This exercise can be repeated thousands of times to produce a large
number of likely outcomes.
11
12. Monte Carlo Simulation
• Monte Carlo Simulations are also utilized for long-term predictions
due to their accuracy. As the number of inputs increase, the number
of forecasts also grows, allowing you to project outcomes farther out
in time with more accuracy. When a Monte Carlo Simulation is
complete, it yields a range of possible outcomes with the probability
of each result occurring.
• One simple example of a Monte Carlo Simulation is to consider
calculating the probability of rolling two standard dice. There are 36
combinations of dice rolls.Using a Monte Carlo Simulation, you can
simulate rolling the dice 10,000 times (or more) to achieve more
accurate predictions.
12
13. Monte Carlo Simulation
• Firstly define the target.
• Let the computer shoot N darts blindfolded,
and then remove the dart that hit the
target, and then use the N you just shot
out and the X that hit the target.
• Estimate how big this target is.
13
14. Monte Carlo Simulation
• Assuming that our N=100 and the dart hits 78 rounds (X=78), we can
estimate that the pi (π) is approximately: (78/100)*4 = 3.12.
14
15. Monte Carlo Simulation
• Monte Carlo simulation based on law of large numbers.
• It is inherently a risk analysis tool since we assign probability
distributions to all random variable model inputs.
• The input variables are randomly sampled and the output is
recorded. This is repeated thousands of times resulting in a histogram
of output values and their frequency.
• Its average value will be closer to the theoretical value.
15
https://en.wikipedia.org/wiki/Law_of_large_numbers
Rolling 2 dice 1000 times
16. Monte Carlo Simulation
• As long as you have probability
distribution, you can use Monte
Carlo Simulation to sampling value.
16
monte_carlo_simpling_distribution.ipynb
18. Monte Carlo Simulation Application
• Production lines
• Sales forecast
• Reliability Analysis
• Waiting Lines
• Budget forecast
• Project Management
18
• Cost estimation
• Industrial process
• Project selection
• Acceptance sampling
• Markov Chains (MCMC)
• Imbalanced dataset
• And more
19. Monte Carlo Simulation Application
• PMBOK (Project Management Body of Knowledge)
• It is an important technique in risk management that many PMP and PMI-
RMP exam study books.
• It is a quantitative risk analysis technique used in identifying the risk level of
achieving objectives.
19
https://www.pmi.org/pmbok-guide-standards
https://pmstudycircle.com/2015/02/monte-carlo-simulation/
20. Monte Carlo Simulation Application
• Project Management – risk
assessment
• Blue bar is the probability of
completion days, the 58th day is
the most possible completion
day (9%); probability of
completion within 54 days is
22% or can’t completion
probability is (100%-22%=78%)
• Orange line (S curve) is the
quantitative risk assessment of
PMBOK, we use Monte Carlo
simulation (normal distribution)
20
22%
9%
simulation.xlsx Homework 1
21. Monte Carlo Simulation Application
• Imbalanced dataset
• Most practical situations are lack of data, or
data is difficult to generate.
• We can generate false samples based on a
certain method that has been determined. The
method of generation is actually the basis of
statistics, which is to generate random
observations from a limited population.
• If the process itself is random, or can be
converted into a certain random number, we
can simulate it through Monte Carlo, and then
use these simulations.
21
22. Monte Carlo Simulation Application
• Sales commissions prediction next year
22
Commission Amount = Actual Sales * Commission Rate
23. Monte Carlo Simulation Application
• Assume everyone makes 100% of their target and earn 4%
commission rate
• Or
23
Budget
forecast!!
24. Monte Carlo Simulation Application
24
• A Monte Carlo simulation is a useful tool for predicting future results
by calculating a formula multiple times with different random inputs.
• Using numpy and pandas to build a model and generate multiple
potential results and analyze them is relatively straightforward.
• The other added benefit is that analysts can run many scenarios by
changing the inputs and can move on to much more sophisticated
models in the future if the needs arise.
monde_carlo_simulation_with_commission_prediction.ipynb
Homework 2
25. Homework
• Please fill-in the value of table for your investigation of simulation.
• Some simple input changes you can make to see how the results change:
• Increase top commission rate to 5% (original is 4%)
• Decrease the number of sales people (change num_reps=500)
• Change the expected standard deviation to a higher amount (change std_dev=.1)
• Modify the distribution of targets (change distribution from normal to others)
25