2. Deep Learning with GPUs in Production
AI By the Bay 2017
DEEPLEARNING4J &
KAFKA
April 2019
3. | OBJECTIVES
By the end of this presentation, you should…
1. Know the Deeplearning4j stack and how it works
2. Understand why aggregation is useful
3. Have an example of using Deeplearning4j and Kafka together
5. DL4J Ecosystem
Deeplearning4j, ScalNet
Build, train, and deploy neural
networks on JVM and in Spark.
ND4J /libND4J
High performance linear algebra
on GPU/CPU. Numpy for JVM.
DataVec
Data ingestion, normalization, and
vectorization. Pandas integration.
SameDiff
Symbolic differentiation and
computation graphs.
Arbiter
Hyperparameter search for optimizing
neural networks.
RL4J
Reinforcement learning on JVM.
Model Import
Import neural nets from ONNX,
TensorFlow, Keras (Theano, Caffe).
Jumpy
Python API for ND4J.
6. DL4J Training API
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.updater(new AMSGrad(0.05))
.l2(5e-4).activation(Activation.RELU)
.list(
new ConvolutionLayer.Builder(5, 5).stride(1, 1).nOut(20).build(),
new SubsamplingLayer.Builder(PoolingType.MAX).kernelSize(2, 2).build(),
new ConvolutionLayer.Builder(5, 5).stride(1, 1).nOut(50).build(),
new SubsamplingLayer.Builder(PoolingType.MAX).kernelSize(2, 2).padding(2,2).build(),
new DenseLayer.Builder().nOut(500).build(),
new DenseLayer.Builder().nOut(nClasses).activation(Activation.SOFTMAX).build(),
new LossLayer.Builder().lossFunction(LossFunction.MCXENT).build()
)
.setInputType(InputType.convolutionalFlat(28, 28, 1))
.build()
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
model.fit(...);
7. DL4J Training Features
A very extensive feature rich library
- Large set of layers, including VAE
- Elaborate architectures, eg. center loss
- Listeners: score and performance, checkpoint
- Extensive Eval classes
- Custom Activation, Custom Layers
- Learning Rate Schedules
- Dropout, WeightNoise, WeightConstraints
- Transfer Learning
- And so much more
8. Inference with imported models
//Import model
model = KerasModelImport.import...
//Featurize input data into an INDArray
INDArray features = …
//Get prediction
INDArray prediction = model.output(features)
9. Featurizing Data
DataVec: A tool for ETL
Runs natively on Spark with GPUs and CPUs
Designed to support all major types of input data (text, CSV, audio,
image and video) with these specific input formats
Define Schemas and Transform Process
Serialize the transform processes, which allows them to be more
portable when they’re needed for production environments.
10. DataVec Schema
Define Schemas
Schema inputDataSchema = new Schema.Builder()
.addColumnsString("CustomerID", "MerchantID")
.addColumnInteger("NumItemsInTransaction")
.addColumnCategorical("MerchantCountryCode",
Arrays.asList("USA","CAN","FR","MX"))
.addColumnDouble("TransactionAmountUSD",0.0,null,false,false) //$0.0 or
more, no maximum limit, no NaN and no Infinite values
.addColumnCategorical("FraudLabel", Arrays.asList("Fraud","Legit"))
.build()
11. DataVec Transform Process
Basic Transform Example
- Filter rows by column value
- Handle invalid values with replacement (-ve $ amt)
- Handle datetime, extract hour of day etc
- Operate on columns in place
- Derive new columns from existing columns
- Join multiple sources of data
- AND much more...
Serialize to JSON!!
https://gist.github.com/eraly/3b15d35eb4285acd444f2f18976dd226
12. DataVec Data Analysis
DataAnalysis dataAnalysis =
AnalyzeSpark.analyze(schema, parsedInputData, maxHistogramBuckets);
HtmlAnalysis.createHtmlAnalysisFile(dataAnalysis, new File("DataVecAnalysis.html"));
13. Parallel Inference
Model model =
ModelSerializer.restoreComputationGraph("PATH_TO_YOUR_MODEL_FILE", false);
ParallelInference pi = new ParallelInference.Builder(model)
.inferenceMode(InferenceMode.BATCHED)
.batchLimit(32)
.workers(2)
.build();
INDArray result = pi.output(..);
14. DL4J Transfer Learning API
- Ability to freeze layers
- Modify layers, add new layers; change graph structure etc
- FineTuneConfiguration for changing learning
- Helper functions to presave featurized frozen layer outputs
(.featurize method in TransferLearningHelper)
Example with vgg16 that keeps bottleneck and below frozen and edits
new layers:
https://github.com/deeplearning4j/dl4j-examples/blob/5381c5f86170dc5
44522eb7926d8fbf8119bec67/dl4j-examples/src/main/java/org/deeplear
ning4j/examples/transferlearning/vgg16/EditAtBottleneckOthersFrozen.ja
va#L74-L90
15. DL4J Training UI
Helps with training and tuning by tracking gradients and updates works with Spark
17. • Skymind integrates
Deeplearning4j into it’s
commercial model server,
SKIL
• Underlying code uses
ParallelInference class
• Promising scalability as
minibatch and number of
local devices increases
Commercial Performance
minibatch size
18. • ParallelInference class
automatically picks up
available GPUs and balances
requests to them
• Backpressure can be
handled by “batching” the
requests in a queue
• Single-node, up to
programmer to scale out or
can use commercial
solution like SKIL
Parallel GPUs
GPU 1 GPU 1
ParallelInference
21. Prerequisites
What is anomaly detection?
In layman’s terms, anomaly detection is the identification of rare
events or items that are significantly different from the “normal” of a
dataset.
Something is not like the others...
22. The Problem
How to monitor 1 terabyte of CDN logs per day and detect anomalies.
We want to monitor the health of a live sports score websocket API.
Let’s analyze packet logs from a server farm streaming the latest
NFL game. It produces 1 TB of logs per day with files that look like:
91739747923947 live.nfl.org GET /panthers_chargers 0
1554863750 250 6670 wss 0
Let’s do some math. This line is 73 bytes...
23. Analysis
What’s the most efficient way to monitor for system disruptions?
I’ve seen attempts to perform anomaly detection on every single
packet! Ummm okay so if we have 1 TB of logs per day and each line
is 73 bytes, that is how many lines….
1e+12 bytes / 73 bytes =
13,698,630,137 log lines
24. Available Hardware
I have a 2 x Titan X Pascal GPU Workstation at home.
Titan X has 342.9 GFLOPS of FP64 (double) computing power.
Sounds like a lot? We can process a terabyte of logs per day?
Let’s benchmark it!
25. Data Vectorization
Format of log file is:
{id} {domain} {http_method} {uri} {server_errors}
{timestamp} {round_trip} {payload_size} {protocol}
{client_errors}
How anomalous is our packet when comparing errors, timing, and
round trip?
Let’s build an input using the above...
26. MLP Architecture
We need to encode our data into a representation that has some sort
of computational meaning. Potentially a small MLP encoder can work.
Model size: 158 parameters (very small)
Benchmarks: 43,166 logs/sec on 2xGPU
Total Capacity: 3,729,542,400 logs/day
We need at least 8 GPUs!!! And backpressure!
27. Analysis
What if there was a better way?
We already know we can leverage Kafka for backpressure. That
eliminates high burst loads. What if there was a way we could turn 13
billion packet logs into a fraction of that?
Aggregate!
We can add a Spark streaming component, use microbatching and
aggregate into smaller sequences.
28. LSTM Architecture
Our MLP encoder turns into an LSTM sequence encoder. We
aggregate across a rolling window of 30 seconds, every second. Do
we become more efficient?
Model size: 14,178 parameters (small)
Benchmarks: 1,494 aggregations/sec on 2xGPU
Total Capacity: 129,081,600 aggregations/day
Aggregation gains significant efficiency.
29. Lessons
Still need additional hardware.
Spark streaming will still require additional hardware. However you’re
optimizing this and not requiring expensive GPU usage. Aggregation
across all packets also gives big picture which is indicator of health.
Number of parameters.
While the models used for this thought experiment are small, you
could very well increase the size by 10x for performance or
dimensionality. That requires additional hardware.
31. Github Example
Kafka, Keras, and Deeplearning4j.
A simplified real-world example involves a data science team training
in python via Keras, importing your model into Deeplearning4j and
Java, and deploying your model to perform inference from data fed
by Kafka.
Repository.
https://github.com/crockpotveggies/kafka-streams-machine-learning
-examples