Revealing the Power of Legacy Machine Data

Revealing the Power of
Legacy Machine-Data
Oliver Lemp
ENGEL Austria GmbH

ENGEL: Injection Moulding Machines
▪ Austria’s largest machine manufacturer
▪ Market leader for injection moulding machines
▪ Machines to manufacture plastic products

Customer Proximity
Business Units
Automotive Medical Packaging
Technical
Moulding
Teletronics

Setting the Scene
The ENGEL Customer Service
Customer
Report a problem at the
machine
1st level support
Field Engineer
Problem cannot be solved
immediately, send field engineer
Repair & Maintenance
• Collect error reports
• Analyse & fix errors
Collect feedback

Analysis in the Past
Excel as the tool of choice

The Use Case
Goal: Self Service Tools
Customer
1st level support
Field Engineer
Repair & Maintenance
DIY error analysis

The Use Case
▪ Use data science to assist the customer support
▪ Classified error documentation (symptoms, errors, solutions)
▪ Detect error patterns automatically / rule-based
▪ Reduce maintenance/repair times
▪ Predict future errors
▪ Detect / discover serial defects
▪ Generate sustainable knowledge
▪ Fast onboarding of new employees
▪ Focus on fixing the problems efficiently
▪ Creating data-driven solutions
Fault Discovery Assistance

Challenges of the Use Case
Starting situation

▪ Zipped collection of (serialised) logfiles
▪ Snapshot of the machine’s parameters
▪ Last X errors on the machine
▪ Fault discovery & documentation
▪ Customer support can derive wrong settings from the reports
▪ Different data formats for different control generations
▪ Legacy data (no standard in chosen data formats)
▪ Collected since approx. 1990
▪ Ranging from simple text files to recursive archives
ENGEL Error Reports

Recursive Archive Structure
▪ Logfiles (partially binary serialised)
▪ Memory Dumps
▪ Parameter Snapshots
▪ …
Issues
▪ 13 different timestamp formats
▪ Different structure for each control generation
▪ Broken archives
▪ Missing files
▪ …
Report Structure

Prototyping a Solution
▪ Hortonworks stack was promising
▪ Apache Nifi, Spark, Kafka and HDFS as our core components
▪ Starting with a small cluster
▪ 5x raspberry pies
▪ Establishing a production environment on dedicated hardware
▪ On–premise hosting
Hadoop seemed to be in fashion

Prototyping a Solution
Lambda Architecture on Hortonworks HDP
Upload report
Apache Nifi
Data ingestion & routing
Write meta attributes +
filepath to Kafka
Apache Kafka
(Event) Stream
Process metadata
Process parameters
(Parquet)
Store raw data blob
HDFS Batch & Stream Processing
BI, web & mobile apps

New Difficulties Arise
▪ Maintaining Streaming + batch jobs
▪ Kafka and large files
▪ Reading from multiple systems (Kafka + HDFS)
▪ Hadoop and small files
▪ Legacy binary deserialisation
▪ Pascal(!) JNA Wrapper
▪ Unpredictable (parameter) data
▪ Binaries with 200.000 and up to 3 Mio. variables per error report
Non-standard use case
JavaDStream<String> json = kafkaStream.map(ConsumerRecord::value);
json.foreachRDD(rdd -> {
Dataset<Row> df = sparkSession.read().json(rdd);
if (df.count() >= 1) {
List<String> hdfsPaths = df
.select("`hdfs.filepath`")
.javaRDD()
.map(row -> row.getString(0)).collect();
String hdfsPaths = String.join(",", hdfsPaths);
SampleProcessor sampleProcessor = new SampleProcessor();
JavaRDD<String> binaries = javaSparkContext.binaryFiles(hdfsPaths)
.map(report -> new TarArchiveInputStream(report._2.open()))
.map(sampleProcessor::call);
} else {
Log.info("No records in this batch");
}
});
High complexity and workaround for streaming large binaries

Partitioning Systemvariables
▪ Spark and parquet files to store systemvariables
▪ No efficient and economic database was found
Flattening the tree structure
Timestamp,FabNr,VarName,Unit,IntValue,DoubleValue,StringValue,BoolValue
2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.ai_Pressure,,0.174087137,,
2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.ai_Pressure_sim,,,,,
2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.ai_Pressure_stat,,,,false
2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.do_AccuChargeMainPump,,,,false
2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.do_AccuInject,,,,
2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.do_AccuOff,,,,
2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.do_AccuSafety,,,,false
2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.er_AccuPressMin,,,,
2020-01-23T14:14:57.000+01:00,200153,AccuGeneral1.evAnaDisEn,,,,

Partitioning Systemvariables
▪ Ideally by functionunit (1st part of the variable)
▪ Grouped by machine components
▪ Custom hash-based partitioning
▪ Not time series data
▪ Very good for point lookups – not so much for regex
Optimising for point lookups
SELECT *
FROM variables
WHERE varName = "AccuGeneral1.ai_Pressure"
AND fabNr = "XXX"
SELECT *
FROM variables
WHERE varName = "AccuGeneral1.ai_Pressure"
Query Examples

Issues in This Architecture
▪ Upgrades and Migrations
▪ Job Monitoring / Ganglia Metrics
▪ Repartitioning Job
▪ Merging Streaming and Batch Files
▪ Unpredictable errors in batch jobs
▪ Memory Bombs / Issues
▪ Spark 2.x no binary file support => working with RDDs
▪ This architecture feels like a big workaround
The Real Show Stoppers
WARN TaskSetManager: Lost task 53.0 in stage 49.0 (TID 32715,
XXXXXXXXXX):
ExecutorLostFailure (executor 23 exited caused by one of the running
tasks)
Reason: Container killed by YARN for exceeding memory limits. 12.4 GB of
12 GB physical memory used.
Consider boosting spark.yarn.executor.memoryOverhead.

Working Towards an Optimal Solution?
Discovering Azure & Databricks
Upload report
Azure Data Lake Storage
Autoloader
Process metadata
Process parameters
(Parquet)
Store raw data blob
BI, web & mobile apps
Azure Cosmos DB

▪ Equi-distant range partitioning
▪ Based on Lexical order of parameter names
▪ Roughly same amount of parameters per partition
▪ Allows searching for variables in
the same component / root node
▪ Better data skipping
▪ OPTIMIZE instead of repartitioning jobs
Parameter Partitioning
A.xxx
B.xxx
C.xxx
D.xxx
E.xxx
F.xxx
Partition 1
Partition 2
….

▪ Reduced Complexity
▪ One single configurable Spark job
▪ Kafka replaced by Autoloader
▪ Unified Batch & Streaming
▪ ….
▪ Reduced memory pressure
▪ Micro Batches and not full batches
▪ Stable Jobs
▪ Monitoring
▪ JVM / Ganglia Metrics
Azure & Databricks Benefits

The Current State
Self-service tools instead of manual doing
Currently
▪ Established self-service tools
▪ Send a PDF summary to the field
engineers
▪ Steps that could solve the problem
▪ Things to also consider at the machine
In future
▪ Automatically detect and classify
errors

Key Takeaways
▪ Don’t underestimate the effort to process legacy- data
▪ The unknown in this data jungle is quite daunting
▪ Unforeseeable things will happen
▪ Change management in a traditional company is really demanding
▪ Running an unmanaged cluster without dedicated resources leads to pure frustration
▪ Moving to the cloud reduced the complexity in our pipelines by a lot

Feedback
Your feedback is important to us.
Don’t forget to rate
and review the sessions.

Revealing the Power of Legacy Machine Data

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Revealing the Power of Legacy Machine Data

Similar to Revealing the Power of Legacy Machine Data (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

Revealing the Power of Legacy Machine Data