Automotive Information Research driven by Apache Solr

Automotive Information Research driven by Apache Solr
Mario-Leander Reimer
Chief Technologist, QAware GmbH
mario-leander.reimer@qaware.de
@LeanderReimer

2
01
Agenda
Reverse Data Engineering and Exploration with MIR
Aftersales Information Research with AIR
Architecture, Requirements, Challenges
Solutions for the Problem of Combinatorial Explosion
Data Consistency and Timeliness
BOM Explosions and Demand Forecasts with ZEBRA

Reverse Data Engineering and Exploration with MIR

5
02
How do we ﬁnd the originating data silo for the desired data?
System A System B System C System D
Vehicle data
Other data
Where to find the vehicle data?
60 potential systems with 5000 entities.

6
03
How do we ﬁnd the hidden relations between the systems?
How is the data linked to each other?
400.000 potential relations.
Vehicle data
Other data
System A System B System C System D
Parts
Documents

7
01
Reverse Data Engineering and Analysis with MIR and Solr
MIR manages the meta information, data models and record descriptions about the all
our source systems (RDBMS, XML, SOAP, …)
MIR allows to navigate and search the metadata, easy drill into the metadata using facets
MIR also manages the target data model and Solr schema description

Search
Results
Tree view of
systems, tables
and attributes
Drill down
via facets
Wildcard
Search
Found potential
synonyms for the
chassis number

Aftersales Information Research with AIR

10
01
Find the right information in less than 3 clicks.
The initial situation:
Users had to use up to 7 different applications for their daily work.
Systems were not really integrated nicely.
Finding the correct information was laborious and error prone.
The project vision:
Combine the data into a consistent information network.
Make the information network and its data searchable and navigable.
Replace existing application with one easy to use application.

„But Apache Solr is only a full-text search engine. You have
to use an Oracle database for your application data.“
– Anonymous IT person

14
01
Solr outperformed Oracle in query time as well as index size.
SELECT * FROM VEHICLE WHERE VIN='V%'
INFO_TYPE:VEHICLE AND VIN:V*
SELECT * FROM MEASURE WHERE TEXT='engine'
INFO_TYPE:MEASURE AND TEXT:engine
SELECT * FROM VEHICLE WHERE VIN='%X%'
INFO_TYPE:VEHICLE AND VIN:*X*
| 038 ms | 000 ms | 000 ms
| 383 ms | 384 ms | 383 ms
| 092 ms | 000 ms | 000 ms
| 389 ms | 387 ms | 386 ms
| 039 ms | 000 ms | 000 ms
| 859 ms | 379 ms | 383 ms
Disk space: 132 MB Solr vs. 385 MB OracleTest data set: 150.000 records

The dirt race use case:
•No internet connection
•Low-End Devices

16
01
Solr and AIR on Raspberry Pi Model B as PoC worked like a charm!
Running Debian Linux + JDK8
Jetty Servlet Container with the
Solr und AIR web apps deployed
A reduced ofﬂine data set with
~1.5 Mio Solr Documents
Model B Hardware Specs:
ARMv6 CPU 700Mhz
512MB RAM
32GB SD Card
And now try this
with Oracle!

17
01
A careful schema design is crucial for your Solr performance.

18
01
Naive denormalization quickly leads to combinatorial explosion!
33.071.137
Vehicles14.830.197
Flat Rate Units
1.678.667
Packages
5.078.411
FRU Groups
18.573
Repair
Instructions
648.129
Technical
Documents
55.000
Parts
648.129
Measures
41.385
Types
6.180
Fault Indications
Relationship
Navigation

19
01
Multi-value typed ﬁelds can efﬁciently store 1..n relations, but
may result in false positives.
{
"INFO_TYPE":"AWPOS_GROUP",
"NUMMER" :[ "1134190" , "1235590" ]
"BAUSTAND" :["1969-12-31T23:00:00Z","1975-12-31T23:00:00Z"]
"E_SERIES" :[ "F10" , "E30" ]
}
In case this doesn‘t matter, perform a post filtering of the results in your application.
Alternative: current Solr versions support nested child documents. Use instead.
Index 0 Index 1
fq=INFO_TYPE:AWPOS_GROUP AND NUMMER:1134190 AND E_SERIES:F10
fq=INFO_TYPE:AWPOS_GROUP AND NUMMER:1134190 AND E_SERIES:E30

20
01
Technical documents and their validity were expressed and stored
in a binary representation.
Validity expressions may have up to 46 characteristics
Validity expressions use 5 different boolean operators (AND, NOT, …)
Validity expessions can be nested and complex
Some characteristics are dynamic and not even known at index time
The solution: transform the validity expressions into the equivalent
ternary JavaScript terms and evaluate these terms at query time using
a custom function query filter.

21
01
Binary validity expression example.
Type(53078923) = ‚Brand‘, Value(53086475) = ‚BMW PKW‘
Type(53088651) = ‚E-Series‘, Value(53161483) = ‚F10‘
Type(64555275) = ‚Transmission‘, Value(53161483) = ‚MECH‘

22
01
Transformation of the binary validity terms into their JavaScript
equivalent at index time.
((BRAND=='BMW PKW')&&(E_SERIES=='F10')&&(TRANSMISSION=='MECH'))
AND(Brand='BMW PKW', E-Series='F10'‚ Transmission='MECH')
{
"INFO_TYPE": "TECHNISCHES_DOKUMENT",
"DOKUMENT_TITEL": "Getriebe aus- und einbauen",
"DOKUMENT_ART": " reparaturanleitung",
"VALIDITY": "((BRAND=='BMW PKW')&&((E_SERIES=='F10')&&(...))",
„BRAND": [„BMW PKW"]
}

23
01
The JavaScript validity term is evaluated at query time using a
custom function query.
&fq=INFO_TYPE:TECHNISCHES_DOKUMENT
&fq=DOKUMENT_ART:reparaturanleitung
&fq={!frange l=1 u=1 incl=true incu=true cache=false cost=500}
jsTerm(VALIDITY,eyJNT1RPUl9LUkFGVFNUT0ZGQVJUX01PVE9SQVJCRUlUU
1ZFUkZBSFJFTiI6IkIiLCJFX01BU0NISU5FX0tSQUZUU1RPRkZBUlQiOm51bG
wsIlNJQ0hFUkhFSVRTRkFIUlpFVUciOiIwIiwiQU5UUklFQiI6IkFXRCIsIkV
kJBVVJFSUhFIjoiWCcifQ==)
Base64decode
{
"BRAND":"BMW PKW",
"E_SERIES":"F10",
"TRANSMISSION":"MECH"
}
http://qaware.blogspot.de/2014/11/how-to-write-postfilter-for-solr-49.html

24
01
Custom ETL combined with Continuous Delivery and DevOps
ensure data consistency and timeliness.

BOM Explosions and Demand Forecasts with ZEBRA

26
01
Bills of Materials (BOMs) explained

27
01
BOMs are required for …
Production planning Forecasting Demand Scenario-based PlanningSimulations

28
01
The Big Picture of ZEBRA
Parts /
abstract
demands
Orders /
actual
demands
Analytics
BOMs /
dependent
demands
Demand
Resolver
Production
Planning
7 Mio.2 Mio. 21 Mrd.

29
01
The most essential Solr optimizations in ZEBRA
Bulk RequestHandler
Binary DocValue support
Boolean interpreter as postfilter
Mass data binary response format
Search components with custom
JOIN algorithm
Solving thousands of
orders with one request
Be able to store data
effective using our own
JOIN implementation.
Speed up the access to
persisted data dramatically
using binary doc values.
0111 0111
Use the standard Solr cinary
codec with an optimized data-
model that reduce the amount  
of data by a factor of 8.
Computing
BOM
explosions
Enable Solr with custom post filters
to filter documents using stored
boolean expessions.

30
01
Low Level Optimizations can yield great boosts in performance
October 14 January 15 May 15 October 15
4,9 ms 0,28 ms
24 ms
TimetocalculatetheBoMforoneorder
0,08 ms
Scoring (-8%)
Default Query Parser (-25%)
Stat-Cache (-8%)
String DocValues (-28%)
Development of the processing time Demand Calulation Service PoC Profiling result and the some improvements to reduce the query time.
X
X
X
X

Solr has become a powerful tool for building enterprise
and data analytics applications. Be creative!

&
Mario-Leander Reimer
Chief Technologist, QAware GmbH
mario-leander.reimer@qaware.de
https://www.qaware.de
https://slideshare.net/MarioLeanderReimer/
https://speakerdeck.com/lreimer/
https://twitter.com/leanderreimer/

Automotive Information Research driven by Apache Solr

Recommended

Recommended

More Related Content

Similar to Automotive Information Research driven by Apache Solr

Similar to Automotive Information Research driven by Apache Solr (20)

More from Mario-Leander Reimer

More from Mario-Leander Reimer (20)

Recently uploaded

Recently uploaded (20)

Automotive Information Research driven by Apache Solr