Presentation on how to chat with PDF using ChatGPT code interpreter
Flink Case Study: OKKAM
1. A Semantic Big Data
Companion
Stefano Bortoli
bortoli@okkam.it
Flavio Pompermaier
pompermaier@okkam.it
2. The company (briefly)
• Okkam is
– a SME based in Trento, Italy.
– Started as spin-off of the
University of Trento and FBK (2010)
• Okkam core business is
– large-scale data integration using
semantic technologies and
an Entity Name System
• Okkam operative sectors
– Services for public administration
– Services for restaurants (and more)
– Research projects
• FP7, H2020, and Local agencies
3. Who we are
• Stefano Bortoli, PhD
– works as technical director and researcher at Okkam S.R.L.
(Trento, Italy). His research and development interests are in the
area of Information Integration, with special focus in entity-
centric applications exploiting semantic technologies.
• Flavio Pompermaier, MSc.
– works as senior software engineer at Okkam S.R.L. (Trento, Italy).
Flavio is a passionate developer working with state of the art
technologies, combining semantic with big data technologies.
5. Why we need Flink
Entiton data model
Database record
RDF statement
Triplestore
NOSQL
& Index
+
Quad
provenance IRI
predicate
object
object Type
Subject
local IRI
Subject
ENS IRI
RDF Type
Expensive
datawearhouse
6. Why we are here
• We want to build and manage (very) large
entity-centric knowledge bases
• We endorsed Flink since Stratosphere as data
processing framework (during DOPA FP7)
• Our use cases for Apache Flink:
– Domain reasoning (Flink + Parquet + Thrift)
– RDF data lifecycle (Flink + Parquet + Jena/Sesame )
– RDF data intelligence (Flink + ELKiBi)
– Duplicate record detection (Flink + HBase + Solr)
– Entiton Record linkage (Flink + MongoDB + Kryo)
– Telemetry analysis (Flink + MongoDB + Weka)
7. Come to our session!
• We are the last presenting, don’t let us ALONE!
• We are hiring! (maybe ;-)