A apresentação será conduzida por Leonardo Mauro P. Moraes, Team Leader pela Amaris Consulting e Doutorando em Inteligência Artificial pela Universidade de São Paulo. 🎓
O Machine Learning é uma tecnologia muito utilizada na área de #datascience faz alguns anos, porém como implementar e manter esse conceito de uma forma confiável e eficaz? O Machine Learning Operations (#mlops) procura responder esta pergunta utilizando-se de engenharia de software e de dados, assim criando um ciclo de vida, em respeito à modelagem, implementação, monitoramento, distribuição, e escalabilidade de Machine Learning, fazendo a ponte entre o desenvolvimento e a operação do modelo.
Gravação: https://www.youtube.com/live/iwmaEABBeYw?si=R_YujavuSxec8MtF&t=265
5. 5
New tech world
Without realizing it, people generate data all the time
on the Internet
● Marketplaces - Site, payment method, search...
● Media player - Search, recommendations, advertisements...
in offline
● Supermarket - What do we buy, how to organize the rows…
● Payment - Debit card, credit card, PIX, TED, DOC...
6. 6
According to Gartner Research
● in 2020, an average of 44 trillion of
gigabytes (zettabytes) of data in the world
● 2.2 million terabytes generated per day
New tech world
How can we analyze it?
1. Business Specialist
2. Data Analyst
3. Data Scientist
7. 7
Example - Borrowing money
Traditional design
● If you are 25 years old or older
and have an income of R$3,000.00
● I can borrow R$ 15,000.00 to pay in 5 years
Scientific design
● Considering a risk (e.g., 5% risk),
for the bank not to lose money
● Input: Age, gender, gross salary,...
● Output: Value and financing period.
8. What is Data Science?
8
Data science is a multidisciplinary
field that uses scientific processes,
algorithms, and systems to extract
knowledge and insights from data.
by Google Bard
Logo generated by ideogram
10. 10
Knowledge Discovery in Databases
Data Science Procedure
Advances in Knowledge Discovery and Data Mining.
Fayyad, U. M., et. al (1996)
Procedure executed by
● Data Scientist
● Data Engineer
● ML Engineer
11. 11
Knowledge Discovery in Databases
Data Science Procedure
Advances in Knowledge Discovery and Data Mining.
Fayyad, U. M., et. al (1996)
Selection
● Identification of the subset
of data that should be
considered in the process
12. 12
Knowledge Discovery in Databases
Data Science Procedure
Advances in Knowledge Discovery and Data Mining.
Fayyad, U. M., et. al (1996)
Preprocessing
● Data cleaning; covers any
processing about data
quality and data integrity
13. 13
Knowledge Discovery in Databases
Data Science Procedure
Advances in Knowledge Discovery and Data Mining.
Fayyad, U. M., et. al (1996)
Transformation
● Data aggregation,
transformation; encode
data into inputs recognized
by algorithms
14. 14
Knowledge Discovery in Databases
Data Science Procedure
Advances in Knowledge Discovery and Data Mining.
Fayyad, U. M., et. al (1996)
Data Mining
● Practice of analyzing large
databases in order to
generate new information
● Tasks: classification,
clustering, image
recognition, etc..
15. 15
Knowledge Discovery in Databases
Data Science Procedure
Advances in Knowledge Discovery and Data Mining.
Fayyad, U. M., et. al (1996)
Interpretation
● Transform identified/inferred
patterns into knowledge
● Make new knowledge
available to customers
16. 16
Knowledge Discovery in Databases
Data Science Procedure
Advances in Knowledge Discovery and Data Mining.
Fayyad, U. M., et. al (1996)
Data Engineer
Data Scientist
M
L
E
n
g
i
n
e
e
r
● Data Engineer - Prepare
and organize the data
● Data Scientist - Generate
knowledge and insights
● ML Engineer - Automation
and delivery a product
18. 18
MLOps
In the world of data science and machine
learning, the process of developing, deploying,
and maintaining models can be complex and
challenging. MLOps, short for Machine Learning
Operations, has emerged as a crucial discipline
that aims to streamline this process and make it
more manageable, efficient, and effective.
Essential MLOps (2023),
by Data Science Horizons
19. 19
MLOps in Data Science
MLOps: Continuous delivery and
automation pipelines in machine learning
(Google, NIPS 2014 Workshop)
Challenge
● Technical debt
● High monetary risk factors
Goal
● apply DevOps principles to
ML systems (MLOps)
● automation and monitoring
20. 20
MLOps x DevOps
● CI is no longer only about testing and validating code, but
also testing and validating data, data schemas, and models.
● CT is a new property, unique to ML systems, that's concerned
with automatically retraining and serving the models.
● CD is no longer about a single software package or a service,
but an ML training pipeline that should automatically deploy
another service (model prediction service).
21. 21
MLOps Maturity Levels
MLOps: Continuous delivery and automation
pipelines in machine learning (Google)
MLOps Maturity Model with
Azure Machine Learning (Azure)
Maturity Level Training Process Release Process Technology
Level 0 - No MLOps Untracked file Manual, hand-off ● Manual builds and deployments
● Manual testing of model and application
● No tracking of model performance
Level 1 - DevOps no
MLOps
Untracked file Semi-automatized ● Automated builds
● Automated tests for application code
Level 2 - Automated
Training
Tracked, run results and
model artifacts
Automated release, code
is version controlled
● Automated model training
● Tracking of model training performance
● Model orchestration and management
Level 3 - Full MLOps
Automated Retraining
Tracked, run results and
model artifacts, retraining
set up based on metrics
Automated, CI/CD
pipeline set up, A/B testing
has been added
● Automated model training and testing
● Centralized metrics from deployed model
● Automated tests for all code
22. 22
Maturity L.0 - no MLOps
Blogpost - MLOps Maturity Model with Azure Machine Learning (Azure)
23. 23
Maturity L.1 - DevOps no MLOps
Blogpost - MLOps Maturity Model with Azure Machine Learning (Azure)
24. 24
Maturity L.2 - Automated Training
Blogpost - MLOps Maturity Model with Azure Machine Learning (Azure)
25. 25
Blogpost - MLOps Maturity Model with Azure Machine Learning (Azure)
Maturity L.3 - Full MLOps
27. 27
1. Learn the basics of Data Science
2. High programming Skills, in Python, CI/CD, Git, Linux
Machine Learning Engineer
General
1. Build a Portfolio, like personal projects, contributions
2. Networking - Participation in Communities and Events
3. Stay updated - Lifelong Learning!
How To Become MLOps Engineer in 2024
by Asad iqbal
30. End!
30
MLOps: ML more efficient, reliable, and scalable.
● Observability: The ability to monitor models helps
identify degraded performance or anomalies quickly.
● Optimization: enhance the performance, efficiency,
and scalability of machine learning systems.
● Automation: allows for smooth and lower-risk
rollouts of new ML model versions.