There are many great tools for training machine learning tools, ranging from sci-kit to Apache Spark, and tensorflow. However many of these systems largely leave open the question how to use our models outside of the batch world (like in a reactive application). Different options exist for persisting the results and using them for live training, and we will explore the trade-offs of the different formats and their corresponding serving/prediction layers.
Russian Call Girls Pune (Adult Only) 8005736733 Escort Service 24x7 Cash Pay...
Intro - End to end ML with Kubeflow @ SignalConf 2018
1. End to End ML
With Kubeflow
& friends
@holdenkarau
Signal
2018
Legit-enough
2. Some links (slides & recordings will be at):
http://bit.ly/2QgsqF9
^ Slides & code-lab links
(after)
CatLoversShow
3. Holden:
● Prefered pronouns are she/her
● Developer Advocate at Google
● Apache Spark PMC/Committer, contribute to many other projects
● previously IBM, Alpine, Databricks, Google, Foursquare & Amazon
● co-author of Learning Spark & High Performance Spark
● Twitter: @holdenkarau
● Slide share http://www.slideshare.net/hkarau
● Code review livestreams: https://www.twitch.tv/holdenkarau /
https://www.youtube.com/user/holdenkarau
● Spark Talk Videos http://bit.ly/holdenSparkVideos
4.
5. Who do I think you all are?
● Nice people*
● Interested in Machine Learning
● Possibly Familiar with one of Java, Scala, or Python
Amanda
6. What is in store for our adventure?
● We have 30 minutes :)
● Brief intros to what Kubernetes & Spark, and Kubeflow are
● How to train a model (ish)
● How to serve a model (ish)
● Scaling (ish)
● Updating models and other scary thoughts
Ada Doglace
8. ● General purpose distributed system
○ With a really nice API including Python :)
● Apache project
● Faster than Hadoop Map/Reduce
● Good when too big for a single
machine
● Built on top of two abstractions for
distributed data: RDDs & Datasets
● Has ML Libraries
● WIP Kubeflow integration PR 1467
What is Spark?
9. The different pieces of Spark
Apache Spark
SQL, DataFrames & Datasets
Structured
Streaming
Scala,
Java,
Python, &
R
Spark ML
bagel &
Graph X
MLLib
Scala,
Java,
PythonStreaming
Graph
Frames
Paul Hudson
18. What is Kubeflow?
“Kubeflow is a Cloud Native platform for machine learning based on
Google’s internal machine learning pipelines.”
or:
● The recognition that just a bunch of model weights isn’t enough
● Designed to support the ecosystem of tools needed (from data
prep to serving)
● Open source project :)
Ada Doglace
21. What’s Next?!
Step away from keyboard
Think about type(s) of model
Look at components directory and see what’s a fit tool wise
Don’t know? Choose jupyter deal with the details live
Can’t find it?
23. What about just the basics?*
./scripts/kfctl.sh init ${KFAPP} --platform gcp --project ${PROJECT}
cd ${KFAPP}
../scripts/kfctl.sh generate platform
../scripts/kfctl.sh apply platform
../scripts/kfctl.sh generate k8s
../scripts/kfctl.sh apply k8s
24. What about just tensorflow?*
ks registry add kubeflow
github.com/kubeflow/kubeflow/tree/${VERSION}/kubeflow
ks pkg install kubeflow/core@${VERSION}
ks pkg install kubeflow/tf-serving@${VERSION}
ks pkg install kubeflow/tf-job@${VERSION}
25. Ok well I need to be able to access Jupyter
too...
kubectl port-forward -n ${NAMESPACE} `kubectl get pods -n
${NAMESPACE} --selector=service=ambassador -o
jsonpath='{.items[0].metadata.name}'` 8080:80
26. Your Special ML Training Goes here
Don’t have any pressing projects but still want to have fun? Check
out Michelle’s notebook for Github Issue summarization.
Or want to see mnist again? here :)
27. Your Special ML Training Goes here
...
from keras.callbacks import CSVLogger, ModelCheckpoint
script_name_base = 'tutorial_seq2seq'
csv_logger =
CSVLogger('{:}.log'.format(script_name_base))
model_checkpoint =
ModelCheckpoint('{:}.epoch{{epoch:02d}}-val{{val_loss:
.5f}}.hdf5'.format(script_name_base),
save_best_only=True)
28. Your Special ML Training Goes here
history = seq2seq_Model.fit([encoder_input_data,
decoder_input_data],
np.expand_dims(decoder_target_data, -1),
batch_size=batch_size,
epochs=epochs,
validation_split=0.12,
callbacks=[csv_logger, model_checkpoint])
Really just check out Michelle’s notebook for Github Issue
summarization.
29. But what about [special foo-baz-inator] or
[special-yak-shaving-tool]?
Write a Dockerfile and build an image, use FROM so you’re not
starting from scratch.
FROM gcr.io/kubeflow-images-public/tensorflow-1.6.0-notebook-cpu
RUN pip install py-special-yak-shaving-tool
Then tell set it as a param for your training/serving job as needed:
ks param set tfjob-v1alpha2 image "my-special-image-goes-here”
30. What about that magical feature prep?
For now it’s a mostly write-by-hand situation
However TFX has some cool tools we can use today (like
TF.Transform) if we’re ok with DirectRunner or Dataflow (with Flink
support in the works indirectly)
31. Enter: TF.Transform
● For pre-processing of your data
● e.g. where you spend 90% of your dev time anyways
● Integrates into serving time :D
● OSS
● Runs on top of Apache Beam, but current release not yet
scalable outside of GCP
● On Apache Beam master this can run-ish on Flink, but rough
● Please don’t use this in production today unless your on
GCP/Dataflow
PROKathryn Yengel
32. Defining a Transform processing function
def preprocessing_fn(inputs):
x = inputs['x']
y = inputs['y']
s = inputs['s']
x_centered = x - tft.mean(x)
y_normalized = tft.scale_to_0_1(y)
s_int = tft.string_to_int(s)
return { 'x_centered': x_centered,
'y_normalized': y_normalized, 's_int': s_int}
35. Scale to ... Bag of Words / N-Grams
Bucketization Feature Crosses
tft.ngrams
tft.string_to_int
tf.string_split
tft.scale_to_z_score
tft.apply_buckets
tft.quantiles
tft.string_to_int
tf.string_join
...
Some common use-cases...
36. BEAM Beyond the JVM: Current release
● Non JVM BEAM doesn’t work outside of Google’s environment yet
● tl;dr : uses grpc / protobuf
○ Similar to the common design but with more efficient representations (often)
● But exciting new plans to unify the runners and ease the support of different
languages (called SDKS)
○ See https://beam.apache.org/contribute/portability/
● If this is exciting, you can come join me on making BEAM work in Python3
○ Yes we still don’t have that :(
○ But we're getting closer & you can come join us on BEAM-2874 :D
Emma
37. Serving: TF is probably easiest for now...
MODEL_COMPONENT=my-model-server
MODEL_NAME=cat-finder-3k
ks generate tf-serving ${MODEL_COMPONENT}
--name=${MODEL_NAME}
ks param set ${MODEL_COMPONENT} deployHttpProxy true
ks param set ${MODEL_COMPONENT} modelPath
${MODEL_PATH}
ks apply ${KF_ENV} -c ${MODEL_COMPONENT}
38. Or use Seldon Core & friends*
Seldon Core is an OSS platform for deploying ML models on
Kubernetes supported by Kubeflow.
Supports Many Model types/formats:
● Tensorflow
● Sklearn
● Spark ML**
● R
● H20
39. Set up seldon core for serving
# Gives cluster-admin role to the default service account
kubectl create clusterrolebinding seldon-admin
--clusterrole=cluster-admin
--serviceaccount=${NAMESPACE}:default
# Install the kubeflow/seldon package
ks pkg install kubeflow/seldon
# Generate the seldon component and deploy it
ks generate seldon seldon --name=seldon
40. Build an image with your model*
docker run -v $(pwd):/my_model
seldonio/core-python-wrapper:0.7 /my_model
IssueSummarization 0.1 gcr.io --base-image=python:3.6
--image-name=gcr-repository-name/my-image-name
41. And kick off the new model:
ks generate seldon-serve-simple new-serving-magic
--name=model-name
--image=gcr.io/gcr-repository-name/model:version
--namespace=${NAMESPACE}
--replicas=2
ks apply ${KF_ENV} -c new-serving-magic
42. Wait so how do I use this?
Your favourite rest library goes here*
Timeouts matter!
Doing recommendations? Have fall-backs
Have multiple models? fall-backs
*Need to use in batch? Maybe skip seldon, tf-serving &
friends and integrate the library into your code. Or
not.
Trish Hamme
43. Scaling - or ruh roh people are using this!
replicas: 1
Becomes
replicas: 10
Factor of 10 =~ “science”
44. Wait really?
● Early: switch from mini-kube to ${cloud provider} with GPUs
○ “Vertical” scaling
● Next: increase # of workers for training
○ “Horizontal” scaling
○ Auto-scaling also WIP per-backend for the most part
● Serving, # of replicas
○ Auto-scaling is a WIP -
https://github.com/kubeflow/kubeflow/issues/1219
PROJennifer C.
45. What about validation?
TensorFlow Data Validation (TFDV)
Or Roll your own?
● Counters & execution time most common
● Please also check % of data change
Spark-validator (proof of concept)
Please validate your pipelines, and not just for data code changes too.
48. Previously live demos recorded
● Kubeflow intro
https://codelabs.developers.google.com/codelabs/kubeflow-intr
oduction/index.html & streamed http://bit.ly/kfIntroStream
● Kubeflow E2E with Github issue
summurizationhttps://codelabs.developers.google.com/codelab
s/cloud-kubeflow-e2e-gis/ & streamed http://bit.ly/kfGHStream
● You can tell they were live streamed by how poorly went, I
promise no video editing has occurred.
● You can do these yourself too (including one of them at our
booth)!
49. Join me & Boo @ Google’s booth @ 5PM
And join my-coworker Casey West @ 6talking about:
Building Captain Obvious:
Understand Faster with Machine Learning APIs
50. Want to watch working on a Kubeflow PR?
● Join Holden Friday @ 2pm pacific for live coding continuing
working on her Apache Spark to Kubeflow (using the existing
Spark operator as a base)
https://www.youtube.com/watch?v=zHnTdqbjPik
● Or just https://youtube.com/user/holdenkarau & like +
subscribe + click the bell :p
51. k thnx bye :)
Give feedback on this presentation
http://bit.ly/holdenTalkFeedback