Session 1 and 2 "Challenges and Opportunities with Big Linked Data Visualization" tutorial @ISWC 2018

CHALLENGES AND
OPPORTUNITIES WITH BIG
LINKED DATA VISUALIZATION
Laura Po
‘‘Enzo Ferrari’’ Engineering Department
University of Modena and Reggio Emilia
ITALY
laura.po@unimore.it
Download the slides available
at
https://sites.google.com/view/tu
torial-iswc-2018/materials

INTRO
• Staggering growth in the production/consumption of Linked Open Data (LOD)
• Increasingly large dimension of the datasets
• Datasets get continuously updated with newer versions
• Exploring, visualizing and analysing BLD is a core task for a variety of users in
numerous scenarios.

VISUALIZATION AS APOWERFUL
TOOL
Visualization for…
• visually presenting the internal structure in the data
• showing the relationship between the data
• allowing the users to identify any unreasonable, incorrect or duplicate data and links
in the Linked Data

THE LOD CLOUD
The LOD CLOUD:
• Linked Open Data (LOD) are public available
RDF Data in the Web, identifiable via URI and
accessable via HTTP, contain more than 1000
triples
1,224 datasets [lod-cloud.net 2018]
> 28 billion unique triples [ISWC 2017]
http://lod-cloud.net/

PRE-REQUISITES
• Some basic knowledge of Linked Data
• Uniform Resource Identifiers (URIs)
• the Hypertext Transfer Protocol (HTTP)
• the Resource Description Framework (RDF)
• RDF Schema.
• Knowledge of the SPARQL Protocol, SPARQL Query Language not mandatory

AT THE END …
You will be able
• to get started with your own experiments on the LOD Cloud
• to select the most appropriate tool for a defined type of analysis
… be aware
• of the open issues and challenging problems that remain unsolved in the scenario
of the exploration of Big Linked Data

WHAT WILL NOT BE COVERED
• Data Visualization is a broader topic
• dataviz.tools and datavizcatalogue list a large number of visualization tools, libraries and
resources
Data Visualization
BOLD Visualization

SCHEDULE OF THE TUTORIAL
• Session 1: The exploration of Big Linked Data (15 min)
• Session 2: Big Linked Data tools for visualization, exploration and navigation (25 min)
• Session 3: Hands-on-session on exploration of Linked Data by using online tools (30 min)
** COFFEE BREAK 15.20-16.00 **
• Session 3: Hands-on-session on exploration of Linked Data by using online tools (40 min)
• Session 4: Closing and Free Discussion (20 Min)
All slides and references are available at the tutorial website

SESSION 1: THE EXPLORATION OF BIG LINKED
DATA

Exploring LOD is not exploring your own dataset
You do not know the dataset
You do not know if the dataset is relevant for you

ISSUES
1. Large size and the dynamic nature of data
2. Exploratory search
3. Variety of tasks and users

LARGE SIZE & DYNAMIC DATASETS
Examples
• Dbpedia - 6 million triples in English - 7 billion RDF triples in total
• BBC Music - 27 billion triple (http://lod.openlinksw.com)
• Linked Geo Data - 400 million geographic elements - 20 billion triples
(http://linkedgeodata.org)
• PubMed - 186 million concepts - 1.3 billion triples (http://pubmed.bio2rdf.org)
• and many others…

LARGE SIZE DATASETS
• Problems with
• Load /Memory
• Navigation
• Visualization

users do not know
what exactly they are
searching for

EXPLORATION-DRIVEN SETTING
≠
Lookup search - focused searches
where the user has a specific goal
in mind and an idea of the
expected result
Exploratory search (ES) is performed
whenever a user wants to discover a
domain, increase his knowledge,
learn about new topics, etc.” [Marie
2014 bis]
ES is open-ended, with an unclear
information need, a search with
multiple targets

VARIATY OF USERS
• An increasingly large number of diverse users
• politicians, citizens, researchers, decision makers, practitioners
• Different preferences and skills
• A plethora of different scenarios
A tool, that does not require technical skills, can also be useful for domain or
technology experts

IMPACT
High potential value of OPEN DATA
• the economic impact of open data has a value of € 140
billion a year between direct and indirect effects [EU
Commission 2011]
• the social impact of open data: increasing
transparency, and enhancing public services, creating new
opportunities for citizens and organizations
[http://odimpact.org ]
• Big Data can introduce innovative solutions through the
development of data driven infrastructures and
applications.
OPEN +
LINKED
+
BIG

WHAT WE NEED TO EXPLORE BOLD?
• Provide a glimpse of the dataset
• Implement the exploratory search
• Encourage user comprehension
• offer customization capabilities to different user-deﬁned scenarios
• Deal with large datasets
• Highlight the evolution over time of the dataset
• Provide multiple visual perspectives (foster discovery of patterns using different views)
• Allow a panoramic and specific view on demand over the data
• Provide real-time response and progressive results - partial and preferably representative results, as
soon as possible
• …

SESSION 2: BIG LINKED DATA TOOLS FOR
VISUALIZATION, EXPLORATION AND
NAVIGATION

Disco Linked Data browsers
VizBoard
Rhizomer
SemLens Linked Data Exploration Systems
LOD Viewer
Payola
Linked Data Graph Tools
Definition of Linked Data Aesthetics in Interface Design for Linked Data [Mazumdar]
SynopsisViz
H-BOLD
Lodlive
LODWheel
Balloon synopsis
LDVizWiz
Aemoo
Fenfire
Gephi
graphVizdb
LODeX
Vis Wizard
RelFinder
ViziQuer
Ontology Visualization Systems
CropCircles FlexViz GLOW
OntoGraf
OntoTrix
OWLViz
VOWL 2
Explorator
Marbles
Tabulator
gFacet
EVOLUTION OVER TIME
Dbpedia first version (September)
Big linked data visualization tool survey [Bikakis]
Surveys on visualising Linked Data [Dadzie]
Exploratory search surveys [Marie 2014, Palagi 2017]

IN THE BEGINNING WAS…
LINKED DATA BROWSERS
• Linked Data provide the functionality for link
navigation and representation of WoD resources
and their properties; browsers such as Disco,
Tabulator or Explorator allow users to navigate
the graph structures and display property-value
pairs in tables.
• They provide a view of a subject, or a set of
subjects and their properties, but not any
additional support getting a broader view of
the dataset being explored.

GENERIC EXPLORATION SYSTEMS
• support different types of data
• provide different types of visualization
• Tree Maps, Graphs, Diagrams …
• visual scalability, most systems do not adopt
approximation techniques such as sampling,
filtering or aggregation.
• exceptions are SynopsViz and VizBoard which
exploit external memory at runtime
Payola

GRAPH BASED TOOLS
• A large number of systems visualize
LOD adopting a graph-based (a.k.a.,
node-link) approach.
• Some systems provide keyword search
functionality or mechanisms for data
ﬁltering.
H-BOLD

ONTOLOGY VISUALIZATION
SYSTEMS
• The problems of ontology
visualization and exploration have
been extensively studied in several
research areas (e.g., biology,
chemistry
• Some graph-based ontology
visualization systems have been
developed in the LOD context
VOWL2

DOMAIN / DEVICE SPECIFIC
VISUALIZATION SYSTEMS
• Several systems focus on visualizing and
exploring geo-spatial data.
• For example the LinkedGeoData
Browser [Auer 2009, Stadler 2012] is a
faceted browser and editor derived from
Open Street Map.
• DBpedia Atlas [Valsecchi 2015] offers
exploration over the DBpedia dataset by
exploiting the dataset’s spatial data.
Dbpedia Atlas

DOMAIN SPECIFIC LOD VISUALIZER
• A visualization system for the
linked biomedical data to exhibit
the relationships among targets,
compounds, and diseases.
• Repository of biomedical data:
Open PHACTS

SCALABILITY ISSUE
In order to handle large graphs
• hierarchical aggregation approaches - the graph is recursively decomposed into
smaller subgroups [Archambault 2007, Auber 2004, Tong 2013, Li 2015];
• Clustering/Partitioning techniques/Hierarchy of levels of abstraction
• edge grouping techniques – aggregate the edges of the graph into bundles [Cui
2008, Gansner 2011]
In order to show on-the-fly results as soon as possible
• progressive techniques - The results/visual elements are computed/constructed
incrementally based on user interaction or as time progresses [Bikakis 2017], also using
incremental and approximate techniques

BIG DATA VISUALIZATION TOOLS
Modern visualization and exploration systems should effectively and efficiently handle the
following aspects
• Real-time Interaction. Efficient and scalable techniques should support the interaction with
billion objects datasets, while maintaining the system response in the range of a few
milliseconds.
• On-the-fly Processing. Support of on-the-fly visualizations over large and dynamic sets of
volatile raw (i.e., not preprocessed) data is required.
• Visual Scalability. Provision of effective data abstraction mechanisms is necessary for
addressing problemsrelated to visual information overloading (a.k.a. overplotting).
• User Assistance and Personalization. Encouraging user comprehension and offering
customization capabilities to different user-defined exploration scenarios and preferences
according to the analysis needs are important
[Bikakis 2018]

Session 1 and 2 "Challenges and Opportunities with Big Linked Data Visualization" tutorial @ISWC 2018

Recommended

Recommended

More Related Content

What's hot

What's hot (10)

Similar to Session 1 and 2 "Challenges and Opportunities with Big Linked Data Visualization" tutorial @ISWC 2018

Similar to Session 1 and 2 "Challenges and Opportunities with Big Linked Data Visualization" tutorial @ISWC 2018 (20)

More from Laura Po

More from Laura Po (14)

Recently uploaded

Recently uploaded (20)

Session 1 and 2 "Challenges and Opportunities with Big Linked Data Visualization" tutorial @ISWC 2018

Editor's Notes