SlideShare a Scribd company logo
1 of 32
Download to read offline
Automotive Information Research driven by Apache Solr
Mario-Leander Reimer
Chief Technologist, QAware GmbH
mario-leander.reimer@qaware.de
@LeanderReimer
2
01
Agenda
Reverse Data Engineering and Exploration with MIR
Aftersales Information Research with AIR
Architecture, Requirements, Challenges
Solutions for the Problem of Combinatorial Explosion
Data Consistency and Timeliness
BOM Explosions and Demand Forecasts with ZEBRA
Reverse Data Engineering and Exploration with MIR
5
02
How do we find the originating data silo for the desired data?
System A System B System C System D
Vehicle data
Other data
Where to find the vehicle data?
60 potential systems with 5000 entities.
6
03
How do we find the hidden relations between the systems?
How is the data linked to each other?
400.000 potential relations.
Vehicle data
Other data
System A System B System C System D
Parts
Documents
7
01
Reverse Data Engineering and Analysis with MIR and Solr
MIR manages the meta information, data models and record descriptions about the all
our source systems (RDBMS, XML, SOAP, …)
MIR allows to navigate and search the metadata, easy drill into the metadata using facets
MIR also manages the target data model and Solr schema description
Search
Results
Tree view of
systems, tables
and attributes
Drill down
via facets
Wildcard
Search
Found potential
synonyms for the
chassis number
Aftersales Information Research with AIR
10
01
Find the right information in less than 3 clicks.
The initial situation:
Users had to use up to 7 different applications for their daily work.
Systems were not really integrated nicely.
Finding the correct information was laborious and error prone.
The project vision:
Combine the data into a consistent information network.
Make the information network and its data searchable and navigable.
Replace existing application with one easy to use application.
11
01
12
01
„But Apache Solr is only a full-text search engine. You have
to use an Oracle database for your application data.“
– Anonymous IT person
14
01
Solr outperformed Oracle in query time as well as index size.
SELECT * FROM VEHICLE WHERE VIN='V%'
INFO_TYPE:VEHICLE AND VIN:V*
SELECT * FROM MEASURE WHERE TEXT='engine'
INFO_TYPE:MEASURE AND TEXT:engine
SELECT * FROM VEHICLE WHERE VIN='%X%'
INFO_TYPE:VEHICLE AND VIN:*X*
| 038 ms | 000 ms | 000 ms
| 383 ms | 384 ms | 383 ms
| 092 ms | 000 ms | 000 ms
| 389 ms | 387 ms | 386 ms
| 039 ms | 000 ms | 000 ms
| 859 ms | 379 ms | 383 ms
Disk space: 132 MB Solr vs. 385 MB OracleTest data set: 150.000 records
The dirt race use case:
•No internet connection
•Low-End Devices
16
01
Solr and AIR on Raspberry Pi Model B as PoC worked like a charm!
Running Debian Linux + JDK8
Jetty Servlet Container with the
Solr und AIR web apps deployed
A reduced offline data set with
~1.5 Mio Solr Documents
Model B Hardware Specs:
ARMv6 CPU 700Mhz
512MB RAM
32GB SD Card
And now try this
with Oracle!
17
01
A careful schema design is crucial for your Solr performance.
18
01
Naive denormalization quickly leads to combinatorial explosion!
33.071.137
Vehicles14.830.197
Flat Rate Units
1.678.667
Packages
5.078.411
FRU Groups
18.573
Repair
Instructions
648.129
Technical
Documents
55.000
Parts
648.129
Measures
41.385
Types
6.180
Fault Indications
Relationship
Navigation
19
01
Multi-value typed fields can efficiently store 1..n relations, but
may result in false positives.
{
"INFO_TYPE":"AWPOS_GROUP",
"NUMMER" :[ "1134190" , "1235590" ]
"BAUSTAND" :["1969-12-31T23:00:00Z","1975-12-31T23:00:00Z"]
"E_SERIES" :[ "F10" , "E30" ]
}
In case this doesn‘t matter, perform a post filtering of the results in your application.
Alternative: current Solr versions support nested child documents. Use instead.
Index 0 Index 1
fq=INFO_TYPE:AWPOS_GROUP AND NUMMER:1134190 AND E_SERIES:F10
fq=INFO_TYPE:AWPOS_GROUP AND NUMMER:1134190 AND E_SERIES:E30
20
01
Technical documents and their validity were expressed and stored
in a binary representation.
Validity expressions may have up to 46 characteristics
Validity expressions use 5 different boolean operators (AND, NOT, …)
Validity expessions can be nested and complex
Some characteristics are dynamic and not even known at index time
The solution: transform the validity expressions into the equivalent
ternary JavaScript terms and evaluate these terms at query time using
a custom function query filter.
21
01
Binary validity expression example.
Type(53078923) = ‚Brand‘, Value(53086475) = ‚BMW PKW‘
Type(53088651) = ‚E-Series‘, Value(53161483) = ‚F10‘
Type(64555275) = ‚Transmission‘, Value(53161483) = ‚MECH‘
22
01
Transformation of the binary validity terms into their JavaScript
equivalent at index time.
((BRAND=='BMW PKW')&&(E_SERIES=='F10')&&(TRANSMISSION=='MECH'))
AND(Brand='BMW PKW', E-Series='F10'‚ Transmission='MECH')
{
"INFO_TYPE": "TECHNISCHES_DOKUMENT",
"DOKUMENT_TITEL": "Getriebe aus- und einbauen",
"DOKUMENT_ART": " reparaturanleitung",
"VALIDITY": "((BRAND=='BMW PKW')&&((E_SERIES=='F10')&&(...))",
„BRAND": [„BMW PKW"]
}
23
01
The JavaScript validity term is evaluated at query time using a
custom function query.
&fq=INFO_TYPE:TECHNISCHES_DOKUMENT
&fq=DOKUMENT_ART:reparaturanleitung
&fq={!frange l=1 u=1 incl=true incu=true cache=false cost=500}
jsTerm(VALIDITY,eyJNT1RPUl9LUkFGVFNUT0ZGQVJUX01PVE9SQVJCRUlUU
1ZFUkZBSFJFTiI6IkIiLCJFX01BU0NISU5FX0tSQUZUU1RPRkZBUlQiOm51bG
wsIlNJQ0hFUkhFSVRTRkFIUlpFVUciOiIwIiwiQU5UUklFQiI6IkFXRCIsIkV
kJBVVJFSUhFIjoiWCcifQ==)
Base64decode
{
"BRAND":"BMW PKW",
"E_SERIES":"F10",
"TRANSMISSION":"MECH"
}
http://qaware.blogspot.de/2014/11/how-to-write-postfilter-for-solr-49.html
24
01
Custom ETL combined with Continuous Delivery and DevOps
ensure data consistency and timeliness.
BOM Explosions and Demand Forecasts with ZEBRA
26
01
Bills of Materials (BOMs) explained
27
01
BOMs are required for …
Production planning Forecasting Demand Scenario-based PlanningSimulations
28
01
The Big Picture of ZEBRA
Parts /
abstract
demands
Orders /
actual
demands
Analytics
BOMs /
dependent
demands
Demand
Resolver
Production
Planning
7 Mio.2 Mio. 21 Mrd.
29
01
The most essential Solr optimizations in ZEBRA
Bulk RequestHandler
Binary DocValue support
Boolean interpreter as postfilter
Mass data binary response format
Search components with custom
JOIN algorithm
Solving thousands of
orders with one request
Be able to store data
effective using our own
JOIN implementation.
Speed up the access to
persisted data dramatically
using binary doc values.
0111 0111
Use the standard Solr cinary
codec with an optimized data-
model that reduce the amount 

of data by a factor of 8.
Computing
BOM
explosions
Enable Solr with custom post filters
to filter documents using stored
boolean expessions.
30
01
Low Level Optimizations can yield great boosts in performance
October 14 January 15 May 15 October 15
4,9 ms 0,28 ms
24 ms
TimetocalculatetheBoMforoneorder
0,08 ms
Scoring (-8%)
Default Query Parser (-25%)
Stat-Cache (-8%)
String DocValues (-28%)
Development of the processing time Demand Calulation Service PoC Profiling result and the some improvements to reduce the query time.
X
X
X
X
Solr has become a powerful tool for building enterprise
and data analytics applications. Be creative!
&
Mario-Leander Reimer
Chief Technologist, QAware GmbH
mario-leander.reimer@qaware.de
https://www.qaware.de
https://slideshare.net/MarioLeanderReimer/
https://speakerdeck.com/lreimer/
https://twitter.com/leanderreimer/

More Related Content

Similar to Automotive Information Research driven by Apache Solr

Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...Lucidworks
 
Search-based business intelligence and reverse data engineering with Apache Solr
Search-based business intelligence and reverse data engineering with Apache SolrSearch-based business intelligence and reverse data engineering with Apache Solr
Search-based business intelligence and reverse data engineering with Apache SolrMario-Leander Reimer
 
20200402 oracle cloud infrastructure data science
20200402 oracle cloud infrastructure data science20200402 oracle cloud infrastructure data science
20200402 oracle cloud infrastructure data scienceKenichi Sonoda
 
MongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014Mark Tabladillo
 
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...Amazon Web Services
 
Azure Machine Learning and Data Journeys
Azure Machine Learning and Data JourneysAzure Machine Learning and Data Journeys
Azure Machine Learning and Data JourneysLuca Mauri
 
Bogdan Kecman INIT Presentation
Bogdan Kecman INIT PresentationBogdan Kecman INIT Presentation
Bogdan Kecman INIT Presentationarhismece
 
Databaseconcepts
DatabaseconceptsDatabaseconcepts
Databaseconceptsdilipkkr
 
Splunk App for Stream - Einblicke in Ihren Netzwerkverkehr
Splunk App for Stream - Einblicke in Ihren NetzwerkverkehrSplunk App for Stream - Einblicke in Ihren Netzwerkverkehr
Splunk App for Stream - Einblicke in Ihren NetzwerkverkehrGeorg Knon
 
Integrate Office365 with On-premise ERP
Integrate Office365 with On-premise ERPIntegrate Office365 with On-premise ERP
Integrate Office365 with On-premise ERPEdwin Kanis
 
Big Data Expo 2015 - MapR Impacting Business As It Happens
Big Data Expo 2015 - MapR Impacting Business As It HappensBig Data Expo 2015 - MapR Impacting Business As It Happens
Big Data Expo 2015 - MapR Impacting Business As It HappensBigDataExpo
 
Deep_dive_on_Amazon_Neptune_DAT361.pdf
Deep_dive_on_Amazon_Neptune_DAT361.pdfDeep_dive_on_Amazon_Neptune_DAT361.pdf
Deep_dive_on_Amazon_Neptune_DAT361.pdfShaikAsif83
 
As You Seek – How Search Enables Big Data Analytics
As You Seek – How Search Enables Big Data AnalyticsAs You Seek – How Search Enables Big Data Analytics
As You Seek – How Search Enables Big Data AnalyticsInside Analysis
 
Ebs dba con4696_pdf_4696_0001
Ebs dba con4696_pdf_4696_0001Ebs dba con4696_pdf_4696_0001
Ebs dba con4696_pdf_4696_0001jucaab
 
Troubleshooting Tips and Tricks for Database 19c - EMEA Tour Oct 2019
Troubleshooting Tips and Tricks for Database 19c - EMEA Tour  Oct 2019Troubleshooting Tips and Tricks for Database 19c - EMEA Tour  Oct 2019
Troubleshooting Tips and Tricks for Database 19c - EMEA Tour Oct 2019Sandesh Rao
 
Spark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael NitschingerSpark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael NitschingerSpark Summit
 
Apache Spark: The Analytics Operating System
Apache Spark: The Analytics Operating SystemApache Spark: The Analytics Operating System
Apache Spark: The Analytics Operating SystemAdarsh Pannu
 
CCI2017 - Azure Virtual Machine & Networking - Marco Gumini
CCI2017 - Azure Virtual Machine & Networking - Marco GuminiCCI2017 - Azure Virtual Machine & Networking - Marco Gumini
CCI2017 - Azure Virtual Machine & Networking - Marco Guminiwalk2talk srl
 

Similar to Automotive Information Research driven by Apache Solr (20)

Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
 
Search-based business intelligence and reverse data engineering with Apache Solr
Search-based business intelligence and reverse data engineering with Apache SolrSearch-based business intelligence and reverse data engineering with Apache Solr
Search-based business intelligence and reverse data engineering with Apache Solr
 
20200402 oracle cloud infrastructure data science
20200402 oracle cloud infrastructure data science20200402 oracle cloud infrastructure data science
20200402 oracle cloud infrastructure data science
 
MongoDB and the Internet of Things
MongoDB and the Internet of ThingsMongoDB and the Internet of Things
MongoDB and the Internet of Things
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
 
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
 
Azure Machine Learning and Data Journeys
Azure Machine Learning and Data JourneysAzure Machine Learning and Data Journeys
Azure Machine Learning and Data Journeys
 
Bogdan Kecman INIT Presentation
Bogdan Kecman INIT PresentationBogdan Kecman INIT Presentation
Bogdan Kecman INIT Presentation
 
Databaseconcepts
DatabaseconceptsDatabaseconcepts
Databaseconcepts
 
Splunk App for Stream - Einblicke in Ihren Netzwerkverkehr
Splunk App for Stream - Einblicke in Ihren NetzwerkverkehrSplunk App for Stream - Einblicke in Ihren Netzwerkverkehr
Splunk App for Stream - Einblicke in Ihren Netzwerkverkehr
 
Integrate Office365 with On-premise ERP
Integrate Office365 with On-premise ERPIntegrate Office365 with On-premise ERP
Integrate Office365 with On-premise ERP
 
Big Data Expo 2015 - MapR Impacting Business As It Happens
Big Data Expo 2015 - MapR Impacting Business As It HappensBig Data Expo 2015 - MapR Impacting Business As It Happens
Big Data Expo 2015 - MapR Impacting Business As It Happens
 
Deep_dive_on_Amazon_Neptune_DAT361.pdf
Deep_dive_on_Amazon_Neptune_DAT361.pdfDeep_dive_on_Amazon_Neptune_DAT361.pdf
Deep_dive_on_Amazon_Neptune_DAT361.pdf
 
As You Seek – How Search Enables Big Data Analytics
As You Seek – How Search Enables Big Data AnalyticsAs You Seek – How Search Enables Big Data Analytics
As You Seek – How Search Enables Big Data Analytics
 
Ebs dba con4696_pdf_4696_0001
Ebs dba con4696_pdf_4696_0001Ebs dba con4696_pdf_4696_0001
Ebs dba con4696_pdf_4696_0001
 
Troubleshooting Tips and Tricks for Database 19c - EMEA Tour Oct 2019
Troubleshooting Tips and Tricks for Database 19c - EMEA Tour  Oct 2019Troubleshooting Tips and Tricks for Database 19c - EMEA Tour  Oct 2019
Troubleshooting Tips and Tricks for Database 19c - EMEA Tour Oct 2019
 
Spark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael NitschingerSpark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael Nitschinger
 
Log I am your father
Log I am your fatherLog I am your father
Log I am your father
 
Apache Spark: The Analytics Operating System
Apache Spark: The Analytics Operating SystemApache Spark: The Analytics Operating System
Apache Spark: The Analytics Operating System
 
CCI2017 - Azure Virtual Machine & Networking - Marco Gumini
CCI2017 - Azure Virtual Machine & Networking - Marco GuminiCCI2017 - Azure Virtual Machine & Networking - Marco Gumini
CCI2017 - Azure Virtual Machine & Networking - Marco Gumini
 

More from Mario-Leander Reimer

Steinzeit war gestern! Vielfältige Wege der Cloud-nativen Evolution.
Steinzeit war gestern! Vielfältige Wege der Cloud-nativen Evolution.Steinzeit war gestern! Vielfältige Wege der Cloud-nativen Evolution.
Steinzeit war gestern! Vielfältige Wege der Cloud-nativen Evolution.Mario-Leander Reimer
 
A Hitchhiker's Guide to Cloud Native Java EE
A Hitchhiker's Guide to Cloud Native Java EEA Hitchhiker's Guide to Cloud Native Java EE
A Hitchhiker's Guide to Cloud Native Java EEMario-Leander Reimer
 
Steinzeit war gestern! Die vielfältigen Wege der Cloud-nativen Evolution
Steinzeit war gestern! Die vielfältigen Wege der Cloud-nativen EvolutionSteinzeit war gestern! Die vielfältigen Wege der Cloud-nativen Evolution
Steinzeit war gestern! Die vielfältigen Wege der Cloud-nativen EvolutionMario-Leander Reimer
 
Everything-as-code: DevOps und Continuous Delivery aus Sicht des Entwicklers....
Everything-as-code: DevOps und Continuous Delivery aus Sicht des Entwicklers....Everything-as-code: DevOps und Continuous Delivery aus Sicht des Entwicklers....
Everything-as-code: DevOps und Continuous Delivery aus Sicht des Entwicklers....Mario-Leander Reimer
 
Das kleine Einmaleins der sicheren Architektur @heise_devSec
Das kleine Einmaleins der sicheren Architektur @heise_devSecDas kleine Einmaleins der sicheren Architektur @heise_devSec
Das kleine Einmaleins der sicheren Architektur @heise_devSecMario-Leander Reimer
 
Polyglot Adventures for the Modern Java Developer #javaone2017
Polyglot Adventures for the Modern Java Developer #javaone2017Polyglot Adventures for the Modern Java Developer #javaone2017
Polyglot Adventures for the Modern Java Developer #javaone2017Mario-Leander Reimer
 
Elegantes In-Memory Computing mit Apache Ignite und Kubernetes. @data2day
Elegantes In-Memory Computing mit Apache Ignite und Kubernetes. @data2dayElegantes In-Memory Computing mit Apache Ignite und Kubernetes. @data2day
Elegantes In-Memory Computing mit Apache Ignite und Kubernetes. @data2dayMario-Leander Reimer
 
Cloud-native .NET-Microservices mit Kubernetes @BASTAcon
Cloud-native .NET-Microservices mit Kubernetes @BASTAconCloud-native .NET-Microservices mit Kubernetes @BASTAcon
Cloud-native .NET-Microservices mit Kubernetes @BASTAconMario-Leander Reimer
 
A Hitchhiker’s Guide to the Cloud Native Stack. #DevoxxPL
A Hitchhiker’s Guide to the Cloud Native Stack. #DevoxxPLA Hitchhiker’s Guide to the Cloud Native Stack. #DevoxxPL
A Hitchhiker’s Guide to the Cloud Native Stack. #DevoxxPLMario-Leander Reimer
 
Everything-as-code. A polyglot adventure. #DevoxxPL
Everything-as-code. A polyglot adventure. #DevoxxPLEverything-as-code. A polyglot adventure. #DevoxxPL
Everything-as-code. A polyglot adventure. #DevoxxPLMario-Leander Reimer
 
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17Mario-Leander Reimer
 
Per Anhalter durch den Cloud Native Stack. #SEACONHH
Per Anhalter durch den Cloud Native Stack. #SEACONHHPer Anhalter durch den Cloud Native Stack. #SEACONHH
Per Anhalter durch den Cloud Native Stack. #SEACONHHMario-Leander Reimer
 
Everything-as-code. Ein polyglottes Abenteuer. #jax2017
Everything-as-code. Ein polyglottes Abenteuer. #jax2017Everything-as-code. Ein polyglottes Abenteuer. #jax2017
Everything-as-code. Ein polyglottes Abenteuer. #jax2017Mario-Leander Reimer
 
Everything-as-code. Eine vielsprachige Reise. #javaland
Everything-as-code. Eine vielsprachige Reise. #javalandEverything-as-code. Eine vielsprachige Reise. #javaland
Everything-as-code. Eine vielsprachige Reise. #javalandMario-Leander Reimer
 
Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017
Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017
Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017Mario-Leander Reimer
 
Der Cloud Native Stack in a Nutshell. #CloudExpoEurope
Der Cloud Native Stack in a Nutshell. #CloudExpoEuropeDer Cloud Native Stack in a Nutshell. #CloudExpoEurope
Der Cloud Native Stack in a Nutshell. #CloudExpoEuropeMario-Leander Reimer
 
Automotive Information Research driven by Apache Solr
Automotive Information Research driven by Apache SolrAutomotive Information Research driven by Apache Solr
Automotive Information Research driven by Apache SolrMario-Leander Reimer
 
Everything-as-code. A polyglot journey.
Everything-as-code. A polyglot journey.Everything-as-code. A polyglot journey.
Everything-as-code. A polyglot journey.Mario-Leander Reimer
 
Lightweight Developer Provisioning with Gradle
Lightweight Developer Provisioning with GradleLightweight Developer Provisioning with Gradle
Lightweight Developer Provisioning with GradleMario-Leander Reimer
 

More from Mario-Leander Reimer (20)

Steinzeit war gestern! Vielfältige Wege der Cloud-nativen Evolution.
Steinzeit war gestern! Vielfältige Wege der Cloud-nativen Evolution.Steinzeit war gestern! Vielfältige Wege der Cloud-nativen Evolution.
Steinzeit war gestern! Vielfältige Wege der Cloud-nativen Evolution.
 
A Hitchhiker's Guide to Cloud Native Java EE
A Hitchhiker's Guide to Cloud Native Java EEA Hitchhiker's Guide to Cloud Native Java EE
A Hitchhiker's Guide to Cloud Native Java EE
 
Steinzeit war gestern! Die vielfältigen Wege der Cloud-nativen Evolution
Steinzeit war gestern! Die vielfältigen Wege der Cloud-nativen EvolutionSteinzeit war gestern! Die vielfältigen Wege der Cloud-nativen Evolution
Steinzeit war gestern! Die vielfältigen Wege der Cloud-nativen Evolution
 
Everything-as-code: DevOps und Continuous Delivery aus Sicht des Entwicklers....
Everything-as-code: DevOps und Continuous Delivery aus Sicht des Entwicklers....Everything-as-code: DevOps und Continuous Delivery aus Sicht des Entwicklers....
Everything-as-code: DevOps und Continuous Delivery aus Sicht des Entwicklers....
 
Das kleine Einmaleins der sicheren Architektur @heise_devSec
Das kleine Einmaleins der sicheren Architektur @heise_devSecDas kleine Einmaleins der sicheren Architektur @heise_devSec
Das kleine Einmaleins der sicheren Architektur @heise_devSec
 
Polyglot Adventures for the Modern Java Developer #javaone2017
Polyglot Adventures for the Modern Java Developer #javaone2017Polyglot Adventures for the Modern Java Developer #javaone2017
Polyglot Adventures for the Modern Java Developer #javaone2017
 
Elegantes In-Memory Computing mit Apache Ignite und Kubernetes. @data2day
Elegantes In-Memory Computing mit Apache Ignite und Kubernetes. @data2dayElegantes In-Memory Computing mit Apache Ignite und Kubernetes. @data2day
Elegantes In-Memory Computing mit Apache Ignite und Kubernetes. @data2day
 
Cloud-native .NET-Microservices mit Kubernetes @BASTAcon
Cloud-native .NET-Microservices mit Kubernetes @BASTAconCloud-native .NET-Microservices mit Kubernetes @BASTAcon
Cloud-native .NET-Microservices mit Kubernetes @BASTAcon
 
A Hitchhiker’s Guide to the Cloud Native Stack. #DevoxxPL
A Hitchhiker’s Guide to the Cloud Native Stack. #DevoxxPLA Hitchhiker’s Guide to the Cloud Native Stack. #DevoxxPL
A Hitchhiker’s Guide to the Cloud Native Stack. #DevoxxPL
 
Everything-as-code. A polyglot adventure. #DevoxxPL
Everything-as-code. A polyglot adventure. #DevoxxPLEverything-as-code. A polyglot adventure. #DevoxxPL
Everything-as-code. A polyglot adventure. #DevoxxPL
 
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
A Hitchhiker’s Guide to the Cloud Native Stack. #CDS17
 
Per Anhalter durch den Cloud Native Stack. #SEACONHH
Per Anhalter durch den Cloud Native Stack. #SEACONHHPer Anhalter durch den Cloud Native Stack. #SEACONHH
Per Anhalter durch den Cloud Native Stack. #SEACONHH
 
Everything-as-code. Ein polyglottes Abenteuer. #jax2017
Everything-as-code. Ein polyglottes Abenteuer. #jax2017Everything-as-code. Ein polyglottes Abenteuer. #jax2017
Everything-as-code. Ein polyglottes Abenteuer. #jax2017
 
Everything-as-code. Eine vielsprachige Reise. #javaland
Everything-as-code. Eine vielsprachige Reise. #javalandEverything-as-code. Eine vielsprachige Reise. #javaland
Everything-as-code. Eine vielsprachige Reise. #javaland
 
Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017
Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017
Everything as-code. Polyglotte Entwicklung in der Praxis. #oop2017
 
Der Cloud Native Stack in a Nutshell. #CloudExpoEurope
Der Cloud Native Stack in a Nutshell. #CloudExpoEuropeDer Cloud Native Stack in a Nutshell. #CloudExpoEurope
Der Cloud Native Stack in a Nutshell. #CloudExpoEurope
 
Automotive Information Research driven by Apache Solr
Automotive Information Research driven by Apache SolrAutomotive Information Research driven by Apache Solr
Automotive Information Research driven by Apache Solr
 
Kubernetes 101 and Fun
Kubernetes 101 and FunKubernetes 101 and Fun
Kubernetes 101 and Fun
 
Everything-as-code. A polyglot journey.
Everything-as-code. A polyglot journey.Everything-as-code. A polyglot journey.
Everything-as-code. A polyglot journey.
 
Lightweight Developer Provisioning with Gradle
Lightweight Developer Provisioning with GradleLightweight Developer Provisioning with Gradle
Lightweight Developer Provisioning with Gradle
 

Recently uploaded

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 

Recently uploaded (20)

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 

Automotive Information Research driven by Apache Solr

  • 1. Automotive Information Research driven by Apache Solr Mario-Leander Reimer Chief Technologist, QAware GmbH mario-leander.reimer@qaware.de @LeanderReimer
  • 2. 2 01 Agenda Reverse Data Engineering and Exploration with MIR Aftersales Information Research with AIR Architecture, Requirements, Challenges Solutions for the Problem of Combinatorial Explosion Data Consistency and Timeliness BOM Explosions and Demand Forecasts with ZEBRA
  • 3.
  • 4. Reverse Data Engineering and Exploration with MIR
  • 5. 5 02 How do we find the originating data silo for the desired data? System A System B System C System D Vehicle data Other data Where to find the vehicle data? 60 potential systems with 5000 entities.
  • 6. 6 03 How do we find the hidden relations between the systems? How is the data linked to each other? 400.000 potential relations. Vehicle data Other data System A System B System C System D Parts Documents
  • 7. 7 01 Reverse Data Engineering and Analysis with MIR and Solr MIR manages the meta information, data models and record descriptions about the all our source systems (RDBMS, XML, SOAP, …) MIR allows to navigate and search the metadata, easy drill into the metadata using facets MIR also manages the target data model and Solr schema description
  • 8. Search Results Tree view of systems, tables and attributes Drill down via facets Wildcard Search Found potential synonyms for the chassis number
  • 10. 10 01 Find the right information in less than 3 clicks. The initial situation: Users had to use up to 7 different applications for their daily work. Systems were not really integrated nicely. Finding the correct information was laborious and error prone. The project vision: Combine the data into a consistent information network. Make the information network and its data searchable and navigable. Replace existing application with one easy to use application.
  • 11. 11 01
  • 12. 12 01
  • 13. „But Apache Solr is only a full-text search engine. You have to use an Oracle database for your application data.“ – Anonymous IT person
  • 14. 14 01 Solr outperformed Oracle in query time as well as index size. SELECT * FROM VEHICLE WHERE VIN='V%' INFO_TYPE:VEHICLE AND VIN:V* SELECT * FROM MEASURE WHERE TEXT='engine' INFO_TYPE:MEASURE AND TEXT:engine SELECT * FROM VEHICLE WHERE VIN='%X%' INFO_TYPE:VEHICLE AND VIN:*X* | 038 ms | 000 ms | 000 ms | 383 ms | 384 ms | 383 ms | 092 ms | 000 ms | 000 ms | 389 ms | 387 ms | 386 ms | 039 ms | 000 ms | 000 ms | 859 ms | 379 ms | 383 ms Disk space: 132 MB Solr vs. 385 MB OracleTest data set: 150.000 records
  • 15. The dirt race use case: •No internet connection •Low-End Devices
  • 16. 16 01 Solr and AIR on Raspberry Pi Model B as PoC worked like a charm! Running Debian Linux + JDK8 Jetty Servlet Container with the Solr und AIR web apps deployed A reduced offline data set with ~1.5 Mio Solr Documents Model B Hardware Specs: ARMv6 CPU 700Mhz 512MB RAM 32GB SD Card And now try this with Oracle!
  • 17. 17 01 A careful schema design is crucial for your Solr performance.
  • 18. 18 01 Naive denormalization quickly leads to combinatorial explosion! 33.071.137 Vehicles14.830.197 Flat Rate Units 1.678.667 Packages 5.078.411 FRU Groups 18.573 Repair Instructions 648.129 Technical Documents 55.000 Parts 648.129 Measures 41.385 Types 6.180 Fault Indications Relationship Navigation
  • 19. 19 01 Multi-value typed fields can efficiently store 1..n relations, but may result in false positives. { "INFO_TYPE":"AWPOS_GROUP", "NUMMER" :[ "1134190" , "1235590" ] "BAUSTAND" :["1969-12-31T23:00:00Z","1975-12-31T23:00:00Z"] "E_SERIES" :[ "F10" , "E30" ] } In case this doesn‘t matter, perform a post filtering of the results in your application. Alternative: current Solr versions support nested child documents. Use instead. Index 0 Index 1 fq=INFO_TYPE:AWPOS_GROUP AND NUMMER:1134190 AND E_SERIES:F10 fq=INFO_TYPE:AWPOS_GROUP AND NUMMER:1134190 AND E_SERIES:E30
  • 20. 20 01 Technical documents and their validity were expressed and stored in a binary representation. Validity expressions may have up to 46 characteristics Validity expressions use 5 different boolean operators (AND, NOT, …) Validity expessions can be nested and complex Some characteristics are dynamic and not even known at index time The solution: transform the validity expressions into the equivalent ternary JavaScript terms and evaluate these terms at query time using a custom function query filter.
  • 21. 21 01 Binary validity expression example. Type(53078923) = ‚Brand‘, Value(53086475) = ‚BMW PKW‘ Type(53088651) = ‚E-Series‘, Value(53161483) = ‚F10‘ Type(64555275) = ‚Transmission‘, Value(53161483) = ‚MECH‘
  • 22. 22 01 Transformation of the binary validity terms into their JavaScript equivalent at index time. ((BRAND=='BMW PKW')&&(E_SERIES=='F10')&&(TRANSMISSION=='MECH')) AND(Brand='BMW PKW', E-Series='F10'‚ Transmission='MECH') { "INFO_TYPE": "TECHNISCHES_DOKUMENT", "DOKUMENT_TITEL": "Getriebe aus- und einbauen", "DOKUMENT_ART": " reparaturanleitung", "VALIDITY": "((BRAND=='BMW PKW')&&((E_SERIES=='F10')&&(...))", „BRAND": [„BMW PKW"] }
  • 23. 23 01 The JavaScript validity term is evaluated at query time using a custom function query. &fq=INFO_TYPE:TECHNISCHES_DOKUMENT &fq=DOKUMENT_ART:reparaturanleitung &fq={!frange l=1 u=1 incl=true incu=true cache=false cost=500} jsTerm(VALIDITY,eyJNT1RPUl9LUkFGVFNUT0ZGQVJUX01PVE9SQVJCRUlUU 1ZFUkZBSFJFTiI6IkIiLCJFX01BU0NISU5FX0tSQUZUU1RPRkZBUlQiOm51bG wsIlNJQ0hFUkhFSVRTRkFIUlpFVUciOiIwIiwiQU5UUklFQiI6IkFXRCIsIkV kJBVVJFSUhFIjoiWCcifQ==) Base64decode { "BRAND":"BMW PKW", "E_SERIES":"F10", "TRANSMISSION":"MECH" } http://qaware.blogspot.de/2014/11/how-to-write-postfilter-for-solr-49.html
  • 24. 24 01 Custom ETL combined with Continuous Delivery and DevOps ensure data consistency and timeliness.
  • 25. BOM Explosions and Demand Forecasts with ZEBRA
  • 26. 26 01 Bills of Materials (BOMs) explained
  • 27. 27 01 BOMs are required for … Production planning Forecasting Demand Scenario-based PlanningSimulations
  • 28. 28 01 The Big Picture of ZEBRA Parts / abstract demands Orders / actual demands Analytics BOMs / dependent demands Demand Resolver Production Planning 7 Mio.2 Mio. 21 Mrd.
  • 29. 29 01 The most essential Solr optimizations in ZEBRA Bulk RequestHandler Binary DocValue support Boolean interpreter as postfilter Mass data binary response format Search components with custom JOIN algorithm Solving thousands of orders with one request Be able to store data effective using our own JOIN implementation. Speed up the access to persisted data dramatically using binary doc values. 0111 0111 Use the standard Solr cinary codec with an optimized data- model that reduce the amount 
 of data by a factor of 8. Computing BOM explosions Enable Solr with custom post filters to filter documents using stored boolean expessions.
  • 30. 30 01 Low Level Optimizations can yield great boosts in performance October 14 January 15 May 15 October 15 4,9 ms 0,28 ms 24 ms TimetocalculatetheBoMforoneorder 0,08 ms Scoring (-8%) Default Query Parser (-25%) Stat-Cache (-8%) String DocValues (-28%) Development of the processing time Demand Calulation Service PoC Profiling result and the some improvements to reduce the query time. X X X X
  • 31. Solr has become a powerful tool for building enterprise and data analytics applications. Be creative!
  • 32. & Mario-Leander Reimer Chief Technologist, QAware GmbH mario-leander.reimer@qaware.de https://www.qaware.de https://slideshare.net/MarioLeanderReimer/ https://speakerdeck.com/lreimer/ https://twitter.com/leanderreimer/