Reproducible AI using MLflow and PyTorch

R E P R O D U C I B L E A I
U S I N G P Y T O R C H
A N D M L F L O W
Geeta Chauhan
PyTorch Partner
Engineering, Facebook AI
@ C H A U H A N G

MLFLOW + PyTorch
A G E N D A 0 1
R E P R O D U C I B L E A I C H A L L E N G E
0 2
M L F L O W + P Y T O R C H
0 3
D E M O

PyTorch + MLflow
Reproducible AI Challenge

PyTorch + MLflow
• Continuous Iterative process, Optimize for a metric
• Quality depends on data and running parameters
• Experiment tracking is difficult
• Over time data changes, model drift
• Model artifacts getting lost
• Compare & combine many libraries and models
• Diverse deployment environments
TRADITIONAL SOFTWARE VS MACHINE LEARNING

PyTorch + MLflow
mlflow + PyTorch

PyTorch + MLflow
AN OPEN SOURCE PLATFORM FOR MACHINE LEARNING LIFECYCLE MANAGEMENT
I N T R O D U C I N G
Record and query
experiments: code,
data, config, and
results.
TRACKING
Package data science
code in a format that
enables reproducible
runs on many
platforms
PROJECTS
Deploy machine
learning models in
diverse serving
environments
MODELS
Store, annotate, and
manage models in a
central repository
MODEL REGISTRY

PyTorch + MLflow
MLFLow + Pytorch for reproducibility
Record and query
experiments: code,
data, config, and
results.
TRACKING
Package data science
code in a format that
enables reproducible
runs on many
platforms
PROJECTS
Deploy machine
learning models in
diverse serving
environments
MODELS
Store, annotate, and
manage models in a
central repository
MODEL REGISTRY
PYTORCH AUTO
LOGGING
PYTORCH EXAMPLES
W/ MLPROJECTS
TORCHSCRIPTED MODELS,
SAVE/LOAD ARTIFACTS
MLFLOW TORCHSERVE
DEPLOYMENT PLUGIN

PyTorch + MLflow
M L F L O W A U T O L O G G I N G
• PyTorch auto logging with Lightning training loop
• Model hyper-params like LR, model summary,
optimizer name, min delta, best score
• Early stopping and other callbacks
• Log every N iterations
• User defined metrics like F1 score, test accuracy
import mlflow.pytorch
parser =
LightningMNISTClassifier.add_model_specific_args(parent_parser=parser)
#just add this and your autologging should work!
mlflow.pytorch.autolog()
model = LightningMNISTClassifier(**dict_args)
dm = MNISTDataModule(**dict_args)
dm.prepare_data()
dm.setup(stage="fit")
early_stopping = EarlyStopping(monitor="val_loss", mode="min",
verbose=True)
checkpoint_callback = ModelCheckpoint(
filepath=os.getcwd(), save_top_k=1, verbose=True,
monitor="val_loss", mode="min", prefix="",
)
lr_logger = LearningRateLogger()
trainer = pl.Trainer.from_argparse_args(
args, callbacks=[lr_logger, early_stopping],
checkpoint_callback=checkpoint_callback
)
trainer.fit(model)
trainer.test()

PyTorch + MLflow
C O M P A R E E X P E R I M E N T R U N S

PyTorch + MLflow
mlflow.pytorch.save_model(
model,
path=args.model_save_path,
requirements_file="requirements.txt",
extra_files=["class_mapping.json", "bert_base_uncased_vocab.txt"],
)
:param requirements_file: An (optional) string containing the path to requirements file.
If ``None``, no requirements file is added to the model.
:param extra_files: An (optional) list containing the paths to corresponding extra files.
For example, consider the following ``extra_files`` list::
extra_files = ["s3://my-bucket/path/to/my_file1",
"s3://my-bucket/path/to/my_file2"]
In this case, the ``"my_file1 & my_file2"`` extra file is downloaded from S3.
If ``None``, no extra files are added to the model.
S A V E
A R T I F A C T S
• Additional artifacts for
model reproducibility
• For Example: vocabulary files
for NLP models,
requirements.txt and other
extra files for torchserve
deployment

PyTorch + MLflow
model = LightningMNISTClassifier(**dict_args)
# Convert to TorchScripted model
scripted_model = torch.jit.script(model)
mlflow.start_run()
# Log the scripted model using log_model
mlflow.pytorch.log_model(scripted_model, "scripted_model")
# If you need to reload the model just call load_model
uri_path = mlflow.get_artifact_uri()
scripted_loaded_model =
mlflow.pytorch.load_model(os.path.join(uri_path, "scripted_model"))
mlflow.end_run()
T O R C H S C R I P T E D M O D E L
• Log TorchScripted model
• Static subset of the python language
specialized for ML applications
• Serialize and Optimize models for python-
free process
• Recommended for production inference

PY TORCH DEVELOPER DAY 2020 #PTD2
TORCHSERVE
• Default handlers for common use
cases (e.g., image segmentation,
text classification) along with
custom handlers support for other
use cases and a Model Zoo
•
• Multi-model serving, Model
versioning and ability to roll back
to an earlier version
• Automatic batching of individual
inferences across HTTP requests
• Logging including common
metrics, and the ability to
incorporate custom metrics
• Robust HTTP APIS -
Management and Inference
model1.pth
model1.pth
model1.pth
torch-model-archiver
HTTP
HTTP
http://localhost:8080/ …
http://localhost:8081/ …
Logging Metrics
model1.mar model2.mar model3.mar
model4.mar model5.mar
<path>/model_store
Inference API
Management API
TorchServe
Metrics API
Inference
API
Serving Model 3
Serving Model 2
Serving Model 1
torchserve --start

PyTorch + MLflow
# deploy model
mlflow deployments create --name mnist_test --target torchserve ——
model-uri mnist.pt -C "MODEL_FILE=mnist_model.py" -C
"HANDLER=mnist_handler.py"
# do prediction
mlflow deployments predict --name mnist_test --target torchserve --
input_path sample.json --output_path output.json
D E P L O Y M E N T P L U G I N
New TorchServe Deployment Plugin
Test models during development cycle, pull
models from MLflow Model repository and run
• CLI
• Run with Local vs remote TorchServe
• Python API
import os
import matplotlib.pyplot as plt
from torchvision import transforms
from mlflow.deployments import get_deploy_client
img = plt.imread(os.path.join(os.getcwd(), "test_data/one.png"))
mnist_transforms = transforms.Compose([
transforms.ToTensor()
])
image = mnist_transforms(img)
plugin = get_deploy_client("torchserve")
config = {
'MODEL_FILE': "mnist_model.py",
'HANDLER_FILE': 'mnist_handler.py'
}
plugin.create_deployment(name="mnist_test", model_uri="mnist_cnn.pt",
config=config)
prediction = plugin.predict("mnist_test", image)

CAPTUM
Text Contributions: 7.54
Image Contributions: 11.19
Total Contributions: 18.73
0 200 400 600 800
400
300
200
100
0
S U P P O R T F O R AT T R I B U T I O N A LG O R I T H M S
T O I N T E R P R E T:
• Output predictions with respect to inputs
• Output predictions with respect to layers
• Neurons with respect to inputs
• Currently provides gradient & perturbation based
approaches (e.g. Integrated Gradients)
Model interpretability library for PyTorch
https://captum.ai/

GradientSHAP
DeepLiftSHAP
SHAP Methods Integrated Gradients
Saliency
GuidedGradCam
Attribute model output (or internal neurons) to input
features
LayerGradientSHAP
LayerDeepLiftSHAP
SHAP Methods
LayerConductance
InternalInfluence
GradCam
Attribute model output to the layers of the model
DeepLift
NoiseTunnel (Smoothgrad, Vargrad, Smoothgrad Square)
LayerActivation
LayerGradientXActivation
LayerDeepLift
FeatureAblation /
FeaturePermutation
GuidedBackprop /
Deconvolution
AT TRIBUTION ALGORITHMS
Input * Gradient LayerFeatureAblation
LayerIntegratedGradients
Occlusion
Shapely Value Sampling
Gradient
Perturbation
Other

NEW FEATURES
Integrations and new samples for:
- Model Interpretability using Captum
- Model Signature
- Hyper Parameter Optimization using Ax/Botorch
- Iterative Pruning Example using Ax/Botorch
#Captum
ig = IntegratedGradients(net)
test_input_tensor.requires_grad_()
attr, _ = ig.attribute(test_input_tensor, target=1,
return_convergence_delta=True)
attr = attr.detach().numpy()
# To understand attributions, average across all inputs, print
and visualize average attribution for each feature.
feature_imp, feature_imp_dict =
visualize_importances(feature_names, np.mean(attr, axis=0))
mlflow.log_metrics(feature_imp_dict)
mlflow.log_text(str(feature_imp), "feature_imp_summary.txt")
fig, (ax1, ax2) = plt.subplots(2, 1)
fig.tight_layout(pad=3)
ax1.hist(attr[:, 1], 100)
ax1.set(title="Distribution of Sibsp Attribution Values")
#Model Signature
from mlflow.models.signature import infer_signature
train = df.drop_column("target_label")
predictions = ... # compute model predictions
signature = infer_signature(train, predictions)

EXPERIMENT
CANDIDATES
OFFLINE
SIMULATION
ONLINE
A / B TEST
AX / BOTORCH
MULTITASK MODEL
ADAPTIVE EXPERIMENTATION FOR NEWS
FEED RANKING

NEW FEATURES - HPO WITH AX
with mlflow.start_run(run_name="Parent Run"):
train_evaluate(params=params, max_epochs=max_epochs)
ax_client = AxClient()
ax_client.create_experiment(
parameters=[
{"name": "weight_decay", "type": "range", "bounds": [1e-4, 1e-3]},
{"name": "momentum", "type": "range", "bounds": [0.7, 1.0]},
],
objective_name="test_accuracy",
)
for i in range(total_trials):
with mlflow.start_run(nested=True, run_name="Trial " + str(i)) as child_run:
parameters, trial_index = ax_client.get_next_trial()
test_accuracy = train_evaluate(params=parameters, max_epochs=max_epochs)
# completion of trial
ax_client.complete_trial(trial_index=trial_index,
raw_data=test_accuracy.item())
best_parameters, metrics = ax_client.get_best_parameters()
for param_name, value in best_parameters.items():
mlflow.log_param("optimum_" + param_name, value)

PY TORCH DEVELOPER DAY 2020 #PTD2
MLOPS WORKFLOW: MLFLOW + PY TORCH + TORCHSERVE
Deployment
TorchServe
Management
Inference
Build PyTorch Model
Data Scientist
Training/
Distributed Training
PyTorch Model
Optimized Model:
TorchScript
Autolog
Experiment Runs
Model Registry
MLflow TorchServe
Plugin
+

PyTorch + MLflow
FUTURE
• Captum Interpretability for Inference in mlflow-torchserve
• Captum Insights visualizations in MLflow UI
• PyTorch Profiler integration with Tensorboard in MLflow UI
• More examples

PyTorch + MLflow
RESOURCES
• Reproducibility Checklist: https://www.cs.mcgill.ca/~jpineau/ReproducibilityChecklist.pdf
• NeurIPS Reproducibility updates: https://ai.facebook.com/blog/new-code-completeness-checklist-and-
reproducibility-updates/
• arXiv + Papers with code: https://medium.com/paperswithcode/papers-with-code-partners-with-arxiv-
ecc362883167
• MLflow + PyTorch Autolog blog: https://medium.com/pytorch/mlflow-and-pytorch-where-cutting-edge-ai-
meets-mlops-1985cf8aa789
• MLflow TorchServe deployment plugin: https://github.com/mlflow/mlflow-torchserve
• MLflow + PyTorch Examples: https://github.com/mlflow/mlflow/tree/master/examples/pytorch

T H A N K Y O U
Contact:
Email: gchauhan@fb.com
Linkedin: https://www.linkedin.com/in/geetachauhan/

Reproducible AI using MLflow and PyTorch

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Reproducible AI using MLflow and PyTorch

Similar to Reproducible AI using MLflow and PyTorch (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

Reproducible AI using MLflow and PyTorch