Introduction to Big Data and how FIWARE manage it through the different approaches. What are the differences between Apache Flink and Spark approaches. Introduction to FIWARE Connectors to manage NGSI context information. Brief introduction to Machine Learning with FIWARE technology
1. Session 9 - Big Data and Machine Learning with FIWARE
Fernando López, Cloud & Platform Senior Expert
fernando.lopez@fiware.org
@flopezaguilar
FIWARE Foundation, e.V.
2. Learning Goals
1
● Introduction to Big Data
● Different between Apache Flink and Spark
● FIWARE connectors
● (Work in Progress) Machine Learning in FIWARE
12. Features
▪ The Cosmos Generic Enabler enables an easier BigData analysis over context
integrated with some of the most popular BigData platforms.
▪ Batch Processing
▪ Stream Processing (Real-time)
▪ Direct data ingestion
▪ Direct connection with Context Broker
▪ Multiple Sinks
11
13. Apache Flink
▪ Framework and distributed processing engine for stateful computations over unbounded
and bounded data streams.
▪ Designed to run in all common cluster environments, perform computations at in-memory
speed and at any scale.
12
16. OrionSource
15
▪ Receives data from the Orion Context Broker from a given port.
▪ The received data is a Stream of NgsiEvent object.
val eventStream = env.addSource(new OrionSource(9001))
17. OrionSink
16
▪ Sends data back to the Orion Context Broker:
▪ Takes a stream of OrionSinkObjects as a source:
• content: Message content in String format. If it is a JSON, it needs to be stringified.
• url: URL to which the message should be sent
• contentType: Type of HTTP content of the message (JSON, Plain)
• method: HTTP method of the message (POST, PUT, PATCH)
OrionSink.addSink( processedDataStream )
18. Basic example
17
final val URL_CB = "http://flinkexample_orion_1:1026/v2/entities/"
final val CONTENT_TYPE = ContentType.JSON
final val METHOD = HTTPMethod.POST
def main(args: Array[String]): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
// Create Orion Source. Receive notifications on port 9001
val eventStream = env.addSource(new OrionSource(9001))
// Process event stream
19. Basic example
18
// Process event stream
val processedDataStream = eventStream
.flatMap(event => event.entities)
.map(entity => {
val temp = entity.attrs("temperature").value.asInstanceOf[Number].floatValue()
new Temp_Node(entity.id, temp)
})
.keyBy("id")
.timeWindow(Time.seconds(5), Time.seconds(2))
.min("temperature")
.map(tempNode => {
val url = URL_CB + tempNode.id + "/attrs"
OrionSinkObject(tempNode.toString, url, CONTENT_TYPE, METHOD)
})
// Add Orion Sink
29. Machine Learning Algorithms
28
▪ Some solutions have
high algorithm complexity
▪ Some can be parallelized
in a cluster (FlinkML)
▪ Other can use GPU (e.g.
Tensorflow)
▪ Even each case could be
different we try to set up
some generic life cycle.
30. 29
ML Standard Solution
▪ Each problem requires an analysis of which ML algorithm suits our data.
▪ Later, the training dataset needs to be set up.
▪ Each problem may be slightly different (“same same but different”).
▪ We can provide some solutions for some cases and use a proper dataset.
▪ The tool to use (Spark, Flink, Tensorflow) depends on the chosen ML algorithm (not all the
ML algorithms are in all the architectures).
31. 30
Current Status
Orion Connector
Orion Source/Receiver + Orion Sink ✔ ✔
RTD Documentation ✔ ✔
Unit Tests ✔ ✔
Examples ✔ ✔
Step-by-step tutorial ✔
Support NGSI LD
33. Summary: Terms
32
● OSV, Output Slot Vector.
● BI, Business Intelligence.
● HDFS, Hadoop Distributed File System
● DAG, Directed Acyclic Graph. The DAG defines the dataflow of the application, and the vertices of the
graph defines the operations that are to be performed on the data.
34. References
▪ FIWARE Catalogue
• https://www.fiware.org/developers/catalogue
▪ FIWARE Academy:
• https://fiware-academy.readthedocs.io/en/latest/processing/wirecloud
▪ Installation, administration & reference documentation is available on Read The Docs:
• https://fiware-cosmos-flink.readthedocs.io
33