Big Data and Machine Learning with FIWARE

Session 9 - Big Data and Machine Learning with FIWARE
Fernando López, Cloud & Platform Senior Expert
fernando.lopez@fiware.org
@flopezaguilar
FIWARE Foundation, e.V.

Learning Goals
1
● Introduction to Big Data
● Different between Apache Flink and Spark
● FIWARE connectors
● (Work in Progress) Machine Learning in FIWARE

Big Data Analytics
4
Indexed
Storage
(RDBMS,
Apache
Solr)
Interactive
Processing
(e.g. Drill,
BigQuery,
OLAP)
MapReduce
(e,g, Spark, Hadoop)
Realtime
Analytics
(CEP,
Stream
Processing)
In-Memory
Computing
(e.g. Spark,
SAP Hana,
VoltDB)SizeodtheDataHandled
(persecond)
millis seconds minutes hours days
Time to Act
100k
events
(100MBs)
1k events
(1MBs)
100 events
(10KBs)

5
NGSI-LD
Based on
Source: https://docbox.etsi.org/ISG/CIM/Open/NGSI-LD_introduction.pdf
https://www.webfirst.com/services/open-data-solutions

6
ETL architecture
Source: https://www.red-gate.com/simple-talk/sql/database-
delivery/database-lifecycle-management-for-etl-systems/

8
Kappa architecture
Source: Siddharth Mittal

Simple Smart solutions: Reference Architecture
9
Draco
Kurento
Wirecloud
QuantumLeap
Knowage
Flink
CrateDB

10
FIWARE Cosmos: Orion Flink Connector

Features
▪ The Cosmos Generic Enabler enables an easier BigData analysis over context
integrated with some of the most popular BigData platforms.
▪ Batch Processing
▪ Stream Processing (Real-time)
▪ Direct data ingestion
▪ Direct connection with Context Broker
▪ Multiple Sinks
11

Apache Flink
▪ Framework and distributed processing engine for stateful computations over unbounded
and bounded data streams.
▪ Designed to run in all common cluster environments, perform computations at in-memory
speed and at any scale.
12

Connection
14
ORION
Context Broker
Flink Cluster
Flink Job (JAR)
orion-flink-connector
HTTP POST (Notification)
HTTP POST/PUT/PATCH
OrionSource
OrionSink

OrionSource
15
▪ Receives data from the Orion Context Broker from a given port.
▪ The received data is a Stream of NgsiEvent object.
val eventStream = env.addSource(new OrionSource(9001))

OrionSink
16
▪ Sends data back to the Orion Context Broker:
▪ Takes a stream of OrionSinkObjects as a source:
• content: Message content in String format. If it is a JSON, it needs to be stringified.
• url: URL to which the message should be sent
• contentType: Type of HTTP content of the message (JSON, Plain)
• method: HTTP method of the message (POST, PUT, PATCH)
OrionSink.addSink( processedDataStream )

Basic example
17
final val URL_CB = "http://flinkexample_orion_1:1026/v2/entities/"
final val CONTENT_TYPE = ContentType.JSON
final val METHOD = HTTPMethod.POST
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
// Create Orion Source. Receive notifications on port 9001
val eventStream = env.addSource(new OrionSource(9001))
// Process event stream

Basic example
18
// Process event stream
val processedDataStream = eventStream
.flatMap(event => event.entities)
.map(entity => {
val temp = entity.attrs("temperature").value.asInstanceOf[Number].floatValue()
new Temp_Node(entity.id, temp)
})
.keyBy("id")
.timeWindow(Time.seconds(5), Time.seconds(2))
.min("temperature")
.map(tempNode => {
val url = URL_CB + tempNode.id + "/attrs"
OrionSinkObject(tempNode.toString, url, CONTENT_TYPE, METHOD)
})
// Add Orion Sink

Basic example
19
// Add Orion Sink
OrionSink.addSink( processedDataStream )
// …
}

20
FIWARE Cosmos: Orion Spark Connector

Spark Scheduler
22
join
union
groupBy
map
Stage 3
Stage 1
Stage 2
A: B:
C: D:
E:
F:
G:
= cached data partition
▪ Dryad-like DAGs
▪ Pipelines functions within a stage
▪ Cache-aware work reuse & locality
▪ Partitioning-aware to avoid shuffles

Motivation of Spark
23
▪ Iterative algorithms (machine learning, graphs)
▪ Interactive data mining tools (R, Excel, Python)

Connection
24
ORION
Context Broker
Spark Cluster
Spark Job (JAR)
orion-spark-connector
HTTP POST (Notification)
HTTP POST/PUT/PATCH
OrionReceiver
OrionSink

Machine Learning Development Lifecycle
27

Machine Learning Algorithms
28
▪ Some solutions have
high algorithm complexity
▪ Some can be parallelized
in a cluster (FlinkML)
▪ Other can use GPU (e.g.
Tensorflow)
▪ Even each case could be
different we try to set up
some generic life cycle.

29
ML Standard Solution
▪ Each problem requires an analysis of which ML algorithm suits our data.
▪ Later, the training dataset needs to be set up.
▪ Each problem may be slightly different (“same same but different”).
▪ We can provide some solutions for some cases and use a proper dataset.
▪ The tool to use (Spark, Flink, Tensorflow) depends on the chosen ML algorithm (not all the
ML algorithms are in all the architectures).

30
Current Status
Orion Connector
Orion Source/Receiver + Orion Sink ✔ ✔
RTD Documentation ✔ ✔
Unit Tests ✔ ✔
Examples ✔ ✔
Step-by-step tutorial ✔
Support NGSI LD

Summary: Terms
31
● OLAP, Online Analytical Processing.
● OLTP, Online Transaction Processing.
● RDBMS, Relational Database Management System.
● ETL, Extract, Transform, Load.
● ERP, Enterprise Resource Planning.
● CRM, Customer relationship management.

Summary: Terms
32
● OSV, Output Slot Vector.
● BI, Business Intelligence.
● HDFS, Hadoop Distributed File System
● DAG, Directed Acyclic Graph. The DAG defines the dataflow of the application, and the vertices of the
graph defines the operations that are to be performed on the data.

References
▪ FIWARE Catalogue
• https://www.fiware.org/developers/catalogue
▪ FIWARE Academy:
• https://fiware-academy.readthedocs.io/en/latest/processing/wirecloud
▪ Installation, administration & reference documentation is available on Read The Docs:
• https://fiware-cosmos-flink.readthedocs.io
33

References
▪ GitHub
• https://github.com/ging/fiware-cosmos-orion-flink-connector
• https://github.com/ging/fiware-cosmos-orion-spark-connector
• https://github.com/ging/fiware-cosmos-orion-flink-connector-examples
• https://github.com/ging/fiware-cosmos-orion-spark-connector-examples
34

Question & Answer
35
fiware-tech-help@lists.fiware.org

Big Data and Machine Learning with FIWARE

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big Data and Machine Learning with FIWARE

Similar to Big Data and Machine Learning with FIWARE (20)

More from Fernando Lopez Aguilar

More from Fernando Lopez Aguilar (20)

Recently uploaded

Recently uploaded (20)

Big Data and Machine Learning with FIWARE