Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0

Share

Download to read offline

Reproducible AI Using PyTorch and MLflow

Download to read offline

Model reproducibility is becoming the next frontier for successful AI models building and deployments for both Research and Production scenarios. In this talk we will show you how to build reproducible AI models and workflows using PyTorch and MLflow that can be shared across your teams, with traceability and speed up collaboration for AI projects.

  • Be the first to like this

Reproducible AI Using PyTorch and MLflow

  1. 1. REPRODUCIBLE AI USING PYTORCH AND MLFLOW GEETA CHAUHAN AI PARTNER ENGINEERING, FACEBOOK AI NOV, 2020
  2. 2. AGENDA 01 PYTORCH COMMUNITY GROWTH 02 REPRODUCIBLE AI CHALLENGE 03 MLFLOW + PYTORCH 04 DEMO 05 REFERENCES
  3. 3. P Y T O R C H C O M M U N I T Y G R O W T H
  4. 4. ~1,619C O N T R I B U T O R S 50%+Y O Y G R O W T H 34K+P Y T O R C H F O R U M U S E R S
  5. 5. G R O W I N G U S A G E I N O P E N S O U R C E Source: https://paperswithcode.com/trends
  6. 6. INDUSTRY USAGE https://medium.com/pytorch
  7. 7. G R O W T H O F M L T R A I N I N G @ F A C E B O O K WORKFLOWSUNIQUE USERS COMPUTE CONSUMED 5X INCREASE 2X INCREASE 8X INCREASE
  8. 8. R E P R O D U C I B L E A I C H A L L E N G E
  9. 9. • Continuous Iterative process, Optimize for a metric • Quality depends on data and running parameters • Experiment tracking is difficult • Over time data changes, model drift • Model artifacts getting lost • Compare & combine many libraries and models • Diverse deployment environments TRADITIONAL SOFTWARE VS MACHINE LEARNING
  10. 10. REPRODUCIBILITY CHALLENGE Research Difficult to reproduce results of a paper Missing data, Model weights, scripts Production Hyper parameters, Features, Data, Vocabulary and other artifacts People leaving company Data Same Different Reproducible Replicable Robust Generalisable SameDifferent Code&Analysis
  11. 11. R E P R O D U C I B L E R E S E A R C H
  12. 12. REPRODUCIBILITY CHALLENGE
  13. 13. • Dependencies: does a repository have information on dependencies or instructions on how to set up the environment? • Training scripts: does a repository contain a way to train/fit the model(s) described in the paper? • Evaluation scripts: does a repository contain a script to calculate the performance of the trained model(s) or run experiments on models? • Pretrained models: does a repository provide free access to pretrained model weights? • Results: does a repository contain a table/plot of main results and a script to reproduce those results? ML CODE COMPLETENESS CHECKLIST https://medium.com/paperswithcode/ml-code-completeness-checklist-e9127b168501
  14. 14. ML CODE COMPLETENESS CHECKLIST https://medium.com/paperswithcode/ml-code-completeness-checklist-e9127b168501
  15. 15. https://medium.com/paperswithcode/papers-with-code-partners-with-arxiv-ecc362883167 ARXIV + PWC —> REPRODUCIBLE RESEARCH
  16. 16. M L F L O W + P Y T O R C H
  17. 17. AN OPEN SOURCE PLATFORM FOR MACHINE LEARNING LIFECYCLE MANAGEMENT I N T R O D U C I N G Record and query experiments: code, data, config, and results. TRACKING Package data science code in a format that enables reproducible runs on many platforms PROJECTS Deploy machine learning models in diverse serving environments MODELS Store, annotate, and manage models in a central repository MODEL REGISTRY
  18. 18. MLFLow + Pytorch for reproducibility Record and query experiments: code, data, config, and results. TRACKING Package data science code in a format that enables reproducible runs on many platforms PROJECTS Deploy machine learning models in diverse serving environments MODELS Store, annotate, and manage models in a central repository MODEL REGISTRY PYTORCH AUTO LOGGING PYTORCH EXAMPLES W/ MLPROJECTS TORCHSCRIPTED MODELS, SAVE/LOAD ARTIFACTS MLFLOW TORCHSERVE DEPLOYMENT PLUGIN
  19. 19. M L F L O W A U T O L O G G I N G • PyTorch auto logging with Lightning training loop • Model hyper-params like LR, model summary, optimizer name, min delta, best score • Early stopping and other callbacks • Log every N iterations • User defined metrics like F1 score, test accuracy import mlflow.pytorch parser = LightningMNISTClassifier.add_model_specific_args(parent_parser=parser) #just add this and your autologging should work! mlflow.pytorch.autolog() model = LightningMNISTClassifier(**dict_args) early_stopping = EarlyStopping(monitor="val_loss", mode="min", verbose=True) checkpoint_callback = ModelCheckpoint( filepath=os.getcwd(), save_top_k=1, verbose=True, monitor="val_loss", mode="min", prefix="", ) lr_logger = LearningRateLogger() trainer = pl.Trainer.from_argparse_args( args, callbacks=[lr_logger], early_stop_callback=early_stopping, checkpoint_callback=checkpoint_callback, train_percent_check=0.1, ) trainer.fit(model) trainer.test()
  20. 20. C O M P A R E E X P E R I M E N T R U N S
  21. 21. mlflow.pytorch.save_model( model, path=args.model_save_path, requirements_file="requirements.txt", extra_files=["class_mapping.json", "bert_base_uncased_vocab.txt"], ) :param requirements_file: An (optional) string containing the path to requirements file. If ``None``, no requirements file is added to the model. :param extra_files: An (optional) list containing the paths to corresponding extra files. For example, consider the following ``extra_files`` list:: extra_files = ["s3://my-bucket/path/to/my_file1", "s3://my-bucket/path/to/my_file2"] In this case, the ``"my_file1 & my_file2"`` extra file is downloaded from S3. If ``None``, no extra files are added to the model. S A V E A R T I F A C T S • Additional artifacts for model reproducibility • For Example: vocabulary files for NLP models, requirements.txt and other extra files for torchserve deployment
  22. 22. model = LightningMNISTClassifier(**dict_args) # Convert to TorchScripted model scripted_model = torch.jit.script(model) mlflow.start_run() # Log the scripted model using log_model mlflow.pytorch.log_model(scripted_model, "scripted_model") # If you need to reload the model just call load_model uri_path = mlflow.get_artifact_uri() scripted_loaded_model = mlflow.pytorch.load_model(os.path.join(uri_path, "scripted_model")) mlflow.end_run() T O R C H S C R I P T E D M O D E L • Log TorchScripted model • Static subset of the python language specialized for ML applications • Serialize and Optimize models for python- free process • Recommended for production inference
  23. 23. PY TORCH DEVELOPER DAY 2020 #PTD2 TORCHSERVE • Default handlers for common use cases (e.g., image segmentation, text classification) along with custom handlers support for other use cases and a Model Zoo • • Multi-model serving, Model versioning and ability to roll back to an earlier version • Automatic batching of individual inferences across HTTP requests • Logging including common metrics, and the ability to incorporate custom metrics • Robust HTTP APIS - Management and Inference model1.pth model1.pth model1.pth torch-model-archiver HTTP HTTP http://localhost:8080/ … http://localhost:8081/ … Logging Metrics model1.mar model2.mar model3.mar model4.mar model5.mar <path>/model_store Inference API Management API TorchServe Metrics API InferenceAPI Serving Model 3 Serving Model 2 Serving Model 1 torchserve --start
  24. 24. # deploy model mlflow deployments create --name mnist_test --target torchserve —— model-uri mnist.pt -C "MODEL_FILE=mnist_model.py" -C "HANDLER=mnist_handler.py" # do prediction mlflow deployments predict --name mnist_test --target torchserve -- input_path sample.json --output_path output.json D E P L O Y M E N T P L U G I N New TorchServe Deployment Plugin Test models during development cycle, pull models from MLflow Model repository and run • CLI • Run with Local vs remote TorchServe • Python API import os import matplotlib.pyplot as plt from torchvision import transforms from mlflow.deployments import get_deploy_client img = plt.imread(os.path.join(os.getcwd(), "test_data/one.png")) mnist_transforms = transforms.Compose([ transforms.ToTensor() ]) image = mnist_transforms(img) plugin = get_deploy_client("torchserve") config = { 'MODEL_FILE': "mnist_model.py", 'HANDLER_FILE': 'mnist_handler.py' } plugin.create_deployment(name="mnist_test", model_uri="mnist_cnn.pt", config=config) prediction = plugin.predict("mnist_test", image)
  25. 25. DEMO
  26. 26. PY TORCH DEVELOPER DAY 2020 #PTD2 P Y T E X T Export to Torchscript Export Validation Performance Tuning P Y T E X T Model Authoring Training Evaluation Parameter Sweeping New Idea/Paper PYTORCH MODEL PYTHON SERVICE SMALL-SCALE METRICS PYTORCH TORCHSCRIPT C++ INFERENCE SERVICE RESEARCH TO PRODUCTION CYCLE @ FACEBOOK
  27. 27. PY TORCH DEVELOPER DAY 2020 #PTD2 MLOPS WORKFLOW: MLFLOW + PY TORCH + TORCHSERVE Deployment TorchServe Management Inference Build PyTorch ModelData Scientist Training/ Distributed Training PyTorch Model Optimized Model: TorchScript Autolog Experiment Runs Model Registry MLflow TorchServe Plugin +
  28. 28. More examples …. FUTURE Model Interpretability - Captum Hyper parameter optimization - Ax/BoTorch
  29. 29. REFERENCES • PyTorch 1.7: https://pytorch.org/blog/pytorch-1.7-released/ • Reproducibility Checklist: https://www.cs.mcgill.ca/~jpineau/ ReproducibilityChecklist.pdf • NeurIPS Reproducibility updates: https://ai.facebook.com/blog/new-code- completeness-checklist-and-reproducibility-updates/ • arXiv + Papers with code: https://medium.com/paperswithcode/papers-with-cod partners-with-arxiv-ecc362883167 • NeurIPS 2020 RC: https://paperswithcode.com/rc2020 • MLflow PyTorch autolog: https://github.com/mlflow/mlflow/tree/master/mlflow/p • MLflow TorchServe deployment plugin: https://github.com/mlflow/mlflow-torchs • MLflow + PyTorch Examples: https://github.com/mlflow/mlflow/tree/master/exam pytorch • PyTorch Medium: https://medium.com/pytorch
  30. 30. QUESTIONS? Contact: Email: gchauhan@fb.com Linkedin: https://www.linkedin.com/in/geetachauhan/
  31. 31. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.

Model reproducibility is becoming the next frontier for successful AI models building and deployments for both Research and Production scenarios. In this talk we will show you how to build reproducible AI models and workflows using PyTorch and MLflow that can be shared across your teams, with traceability and speed up collaboration for AI projects.

Views

Total views

172

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

11

Shares

0

Comments

0

Likes

0

×