This document outlines the steps to build your own natural language processing (NLP) system, beginning with creating a streaming consumer, launching a message queue service, creating a data pre-processing service, serving an ML model, and publishing predictions to a messaging app. It discusses separating components for modularity and ease of testing/extensibility. The presenter recommends tools like Anaconda, Docker, Redis, Fast.ai and SpaCy and walks through setting up the environment and each step in a Jupyter notebook. The goal is to experiment with building your own end-to-end NLP system in a modular, reusable way.
3. @jeremimucha | https://create.ml
About me
Data Science and Data Engineering - consulting and training
Academic research (mobile phone data, smart meter data)
Commercial projects (decision simulation, revenue modeling,
visualization, building apps, data strategy)
Husband and dad
❤ boxing, cycling, hiking in the mountains ⛰ and traveling
Call me #$ Michael or % Me how 🙃
5. @jeremimucha | https://create.ml
High level steps
Create a Streaming Consumer
Launch and Integrate a Message Queue Service
Create the First Subscriber - a Data Pre-processing Service
Serve a Machine Learning Model
Publish or broadcast predictions to a Messaging App
Organize and bundle all services into a system
6. @jeremimucha | https://create.ml
Requirements
https://github.com/MichaMucha/pydata2019-nlp-system/
Software:
Anaconda Python
Git
Docker
Docker-compose
Telegram mobile app or desktop app
API keys and environment preparation
Check out this talk’s git repo
Create the Conda environment
Reddit CLIENT_ID and CLIENT_SECRET
Telegram Bot and API key
Voluntary - appreciated but not required:
Your own NLP model + Idea what you want to monitor in Reddit
Examine the conda-env.yml file that you used to create the new environment
7. @jeremimucha | https://create.ml
Benefits of Conda environments
Easy, self contained recipes
Installs binaries without building, no need for
dependencies
Makes shipping and sharing easier
9. @jeremimucha | https://create.ml
Step 1.1 - spawn Redis
Nice and clean - one line and we’re done
Not wasting time on things we don’t want to do!
Getting all the benefit
12. @jeremimucha | https://create.ml
Step 3 - NLP models
BYOM today
Assumption:
your model is all trained and tested,
developed and signed off by important executives
Ready to use in the real world
Open “step3” in lab
13. @jeremimucha | https://create.ml
Important resources
https://fast.ai
Excellent course + framework
Releases the genius within you
https://spacy.io
Fantastic piece of engineering
Very widely used, open source
16. @jeremimucha | https://create.ml
Step 4 - beyond my lab
“Works on my machine” - o rly
ImportError - “just don’t move the files”
Another day another version
Dependency tracking
18. @jeremimucha | https://create.ml
Step 6 - Orchestration
Making friends with the Operations team
Fast and easy prototyping
Configure and run sophisticated setups quickly
Build your own NLP system!
20. @jeremimucha | https://create.ml
Share your work!
Use your new knowledge to jumpstart your own solution
Please share what you built :)
Write a blog post!
Let’s stay in touch