"Challenges and Opportunities with Big Linked Data Visualization" tutorial @ISWC 2018
A book on the topic published by the author is
"Linked Data Visualization: Techniques, Tools and Big Data"
Laura Po, Nikos Bikakis, Federico Desimoni & George Papastefanatos
Synthesis Lectures on Data, Semantics and Knowledge
Morgan & Claypool, 2020
ISBN: 9781681737256 | 9781681737263 (ebook)
DOI: 10.2200/S00967ED1V01Y201911WBE019
Morgan & Claypool: https://www.morganclaypool.com/doi/abs/10.2200/S00967ED1V01Y201911WBE019
Homepage: http://www.linkeddatavisualization.com
Session 1 and 2 "Challenges and Opportunities with Big Linked Data Visualization" tutorial @ISWC 2018
1. CHALLENGES AND
OPPORTUNITIES WITH BIG
LINKED DATA VISUALIZATION
Laura Po
‘‘Enzo Ferrari’’ Engineering Department
University of Modena and Reggio Emilia
ITALY
laura.po@unimore.it
Download the slides available
at
https://sites.google.com/view/tu
torial-iswc-2018/materials
2. INTRO
• Staggering growth in the production/consumption of Linked Open Data (LOD)
• Increasingly large dimension of the datasets
• Datasets get continuously updated with newer versions
• Exploring, visualizing and analysing BLD is a core task for a variety of users in
numerous scenarios.
3. VISUALIZATION AS APOWERFUL
TOOL
Visualization for…
• visually presenting the internal structure in the data
• showing the relationship between the data
• allowing the users to identify any unreasonable, incorrect or duplicate data and links
in the Linked Data
4. THE LOD CLOUD
The LOD CLOUD:
• Linked Open Data (LOD) are public available
RDF Data in the Web, identifiable via URI and
accessable via HTTP, contain more than 1000
triples
1,224 datasets [lod-cloud.net 2018]
> 28 billion unique triples [ISWC 2017]
http://lod-cloud.net/
5. THE LOD CLOUD
The LOD CLOUD:
• Linked Open Data (LOD) are public available
RDF Data in the Web, identifiable via URI and
accessable via HTTP, contain more than 1000
triples
1,224 datasets [lod-cloud.net 2018]
> 28 billion unique triples [ISWC 2017]
http://lod-cloud.net/
6. PRE-REQUISITES
• Some basic knowledge of Linked Data
• Uniform Resource Identifiers (URIs)
• the Hypertext Transfer Protocol (HTTP)
• the Resource Description Framework (RDF)
• RDF Schema.
• Knowledge of the SPARQL Protocol, SPARQL Query Language not mandatory
7. AT THE END …
You will be able
• to get started with your own experiments on the LOD Cloud
• to select the most appropriate tool for a defined type of analysis
… be aware
• of the open issues and challenging problems that remain unsolved in the scenario
of the exploration of Big Linked Data
8. WHAT WILL NOT BE COVERED
• Data Visualization is a broader topic
• dataviz.tools and datavizcatalogue list a large number of visualization tools, libraries and
resources
Data Visualization
BOLD Visualization
9. SCHEDULE OF THE TUTORIAL
• Session 1: The exploration of Big Linked Data (15 min)
• Session 2: Big Linked Data tools for visualization, exploration and navigation (25 min)
• Session 3: Hands-on-session on exploration of Linked Data by using online tools (30 min)
** COFFEE BREAK 15.20-16.00 **
• Session 3: Hands-on-session on exploration of Linked Data by using online tools (40 min)
• Session 4: Closing and Free Discussion (20 Min)
All slides and references are available at the tutorial website
11. Exploring LOD is not exploring your own dataset
You do not know the dataset
You do not know if the dataset is relevant for you
12. ISSUES
1. Large size and the dynamic nature of data
2. Exploratory search
3. Variety of tasks and users
13. LARGE SIZE & DYNAMIC DATASETS
Examples
• Dbpedia - 6 million triples in English - 7 billion RDF triples in total
• BBC Music - 27 billion triple (http://lod.openlinksw.com)
• Linked Geo Data - 400 million geographic elements - 20 billion triples
(http://linkedgeodata.org)
• PubMed - 186 million concepts - 1.3 billion triples (http://pubmed.bio2rdf.org)
• and many others…
16. users do not know
what exactly they are
searching for
17. EXPLORATION-DRIVEN SETTING
≠
Lookup search - focused searches
where the user has a specific goal
in mind and an idea of the
expected result
Exploratory search (ES) is performed
whenever a user wants to discover a
domain, increase his knowledge,
learn about new topics, etc.” [Marie
2014 bis]
ES is open-ended, with an unclear
information need, a search with
multiple targets
18. VARIATY OF USERS
• An increasingly large number of diverse users
• politicians, citizens, researchers, decision makers, practitioners
• Different preferences and skills
• A plethora of different scenarios
A tool, that does not require technical skills, can also be useful for domain or
technology experts
19. IMPACT
High potential value of OPEN DATA
• the economic impact of open data has a value of € 140
billion a year between direct and indirect effects [EU
Commission 2011]
• the social impact of open data: increasing
transparency, and enhancing public services, creating new
opportunities for citizens and organizations
[http://odimpact.org ]
• Big Data can introduce innovative solutions through the
development of data driven infrastructures and
applications.
OPEN +
LINKED
+
BIG
20. WHAT WE NEED TO EXPLORE BOLD?
• Provide a glimpse of the dataset
• Implement the exploratory search
• Encourage user comprehension
• offer customization capabilities to different user-defined scenarios
• Deal with large datasets
• Highlight the evolution over time of the dataset
• Provide multiple visual perspectives (foster discovery of patterns using different views)
• Allow a panoramic and specific view on demand over the data
• Provide real-time response and progressive results - partial and preferably representative results, as
soon as possible
• …
21. SESSION 2: BIG LINKED DATA TOOLS FOR
VISUALIZATION, EXPLORATION AND
NAVIGATION
22. Disco Linked Data browsers
VizBoard
Rhizomer
SemLens Linked Data Exploration Systems
LOD Viewer
Payola
Linked Data Graph Tools
Definition of Linked Data Aesthetics in Interface Design for Linked Data [Mazumdar]
SynopsisViz
H-BOLD
Lodlive
LODWheel
Balloon synopsis
LDVizWiz
Aemoo
Fenfire
Gephi
graphVizdb
LODeX
Vis Wizard
RelFinder
ViziQuer
Ontology Visualization Systems
CropCircles FlexViz GLOW
OntoGraf
OntoTrix
OWLViz
VOWL 2
Explorator
Marbles
Tabulator
gFacet
EVOLUTION OVER TIME
Dbpedia first version (September)
Big linked data visualization tool survey [Bikakis]
Surveys on visualising Linked Data [Dadzie]
Exploratory search surveys [Marie 2014, Palagi 2017]
23. IN THE BEGINNING WAS…
LINKED DATA BROWSERS
• Linked Data provide the functionality for link
navigation and representation of WoD resources
and their properties; browsers such as Disco,
Tabulator or Explorator allow users to navigate
the graph structures and display property-value
pairs in tables.
• They provide a view of a subject, or a set of
subjects and their properties, but not any
additional support getting a broader view of
the dataset being explored.
24. GENERIC EXPLORATION SYSTEMS
• support different types of data
• provide different types of visualization
• Tree Maps, Graphs, Diagrams …
• visual scalability, most systems do not adopt
approximation techniques such as sampling,
filtering or aggregation.
• exceptions are SynopsViz and VizBoard which
exploit external memory at runtime
Payola
25. GRAPH BASED TOOLS
• A large number of systems visualize
LOD adopting a graph-based (a.k.a.,
node-link) approach.
• Some systems provide keyword search
functionality or mechanisms for data
filtering.
H-BOLD
26. ONTOLOGY VISUALIZATION
SYSTEMS
• The problems of ontology
visualization and exploration have
been extensively studied in several
research areas (e.g., biology,
chemistry
• Some graph-based ontology
visualization systems have been
developed in the LOD context
VOWL2
27. DOMAIN / DEVICE SPECIFIC
VISUALIZATION SYSTEMS
• Several systems focus on visualizing and
exploring geo-spatial data.
• For example the LinkedGeoData
Browser [Auer 2009, Stadler 2012] is a
faceted browser and editor derived from
Open Street Map.
• DBpedia Atlas [Valsecchi 2015] offers
exploration over the DBpedia dataset by
exploiting the dataset’s spatial data.
Dbpedia Atlas
28. DOMAIN SPECIFIC LOD VISUALIZER
• A visualization system for the
linked biomedical data to exhibit
the relationships among targets,
compounds, and diseases.
• Repository of biomedical data:
Open PHACTS
29. SCALABILITY ISSUE
In order to handle large graphs
• hierarchical aggregation approaches - the graph is recursively decomposed into
smaller subgroups [Archambault 2007, Auber 2004, Tong 2013, Li 2015];
• Clustering/Partitioning techniques/Hierarchy of levels of abstraction
• edge grouping techniques – aggregate the edges of the graph into bundles [Cui
2008, Gansner 2011]
In order to show on-the-fly results as soon as possible
• progressive techniques - The results/visual elements are computed/constructed
incrementally based on user interaction or as time progresses [Bikakis 2017], also using
incremental and approximate techniques
30. BIG DATA VISUALIZATION TOOLS
Modern visualization and exploration systems should effectively and efficiently handle the
following aspects
• Real-time Interaction. Efficient and scalable techniques should support the interaction with
billion objects datasets, while maintaining the system response in the range of a few
milliseconds.
• On-the-fly Processing. Support of on-the-fly visualizations over large and dynamic sets of
volatile raw (i.e., not preprocessed) data is required.
• Visual Scalability. Provision of effective data abstraction mechanisms is necessary for
addressing problemsrelated to visual information overloading (a.k.a. overplotting).
• User Assistance and Personalization. Encouraging user comprehension and offering
customization capabilities to different user-defined exploration scenarios and preferences
according to the analysis needs are important
[Bikakis 2018]
Editor's Notes
Today, we are assisting at a staggering growth in the production and consumption of Linked Open Data (LOD) and the generation of increasingly large datasets.
In this scenario, it is crucial to provide intuitive tools for researchers, domain experts, but also businessmen and citizens to view and interact with LOD resources.
Linked Data already spans a wide range of application areas, a strong indication that its potential value is already largely acknowledged
Representing, querying, and visualizing linked data is crucial.
High potential of LOD WHO
The lack of development environments for interdisciplinary research conducted on large-scale datasets hampers research at every stage. Projects incur large startup costs as disparate infrastructure is assembled; experimentation slows when software components and environment are mismatched for specific research tasks; and findings are disseminated in forms that are hard to examine, learn from, and reuse. Behind these problems is a common cause — the lack of good tools. When large, heterogeneous and distributed data is added to the equation, further frustration, at the least, ensues. As a result using existing platforms, the programmers of 21st century interactive visualizations are reduced to working in the same fashion with the same tools as 20th century database programmers. Our contribution is to bring the tools of digital artists to bear on the aforementioned data analysis and visualization challenges. Here we report on the current state of progress in adapting Field for large-scale, web-based scientific data analysis and visualization with an emphasis on
visualization can be a reasonable way to visually present the internal structure in the data and the relationship between the data; friendly visualization interfaces allow the users to identify any unreasonable, incorrect or duplicate data and links in the Linked Data
Only dataset with >1000 triples and
Only dataset with >1000 triples and
We will not take into consideration visual exploration tool that are not specialized from LOD – Tableau, Qlink …
What remains unsolved
---- 7 min senza approfondimenti
10 min
temporal dimension of linked data is crucial
DBpedia is a leading project for publishing LD started by individuals at the Free University of Berlin and Leipzig
University in cooperation with OpenLink Software
The dynamic nature of nowadays data (e.g., stream data),
hinders the application of a preprocessing phase, such as traditional database
loading and indexing. Hence, systems should provide on-the-fly processing
over large sets of raw data.
with limited computational and memory resources
(e.g., laptops).
The mapping activity of the ontology enrichment process along with the editing of the ten most active mapping language communities is depicted in Figure 6. It is interesting to notice that the high mapping activity peaks coincide wi
users attempt to find something interesting without knowing what exactly they are searching for
Searching within a LOD dataset is not just about finding an answer to a specific question
Progressiveness can significantly improve efficiency in exploration scenarios,
users perform a sequence of operations (e.g., queries), where the result
of each operation determines the formulation of the next operation
in each operation, after inspecting the already produced results, the user is able to interrupt the execution and define the next operation, without waiting the exact result to be computed.
ES is a particular information seeking activity.
It is a loosely defined concept as its definition is not stable and continues to evolve every time new systems are being developed.
Many papers use this dichotomy to define ES
is described as open-ended, with an unclear information need, an ill-structured. This search activity is evolving and can occur over time. For example, a user wants to
know more about Senegal, she doesn’t really know what kind of information she wants or what she will discover in this search session; she only knows she wants to learn more about that topic.
“learning in exploratory search is not only about memorization of salient
facts, but rather the development of higher-level intellectual capabilities” [White 2016]
The main goal in ES is learning.
with the exponential increase in data sets we can only assume that the variety of users and profiles will increase
Data plays a fundamental role in all aspects of human activity and social interest.
DATA on a socio-economic level
the impact of open data and technology-enabled transparency does not lie solely in the economic sphere. Government openness produces tremendous other benefits for our societies through increasing state or institutional responsiveness, reducing levels of corruption, building new democratic spaces for citizens, empowering local and disadvantaged voices or enhancing service delivery and effective service utilization.
Big Linked Data can introduce innovative solutions in the public and private sectors, through the development of data driven infrastructures and applications.
-
differences from version to version
Take into account the human cognition model
Provide a direct interaction (interaction to be provided without interfering with the user’s train of thought);
the best known systems in each category are shown
In this second session, we describe the state-of-the-art of Linked Data visualization systems with particular attention to those tools able to navigate vast amount of data [Dadzie 2011, Marie 2014]. We start describing generic systems and then focus on graph-oriented systems; in the end, we pay attention on the scalability issues.
WoD browsers have been the first systems developed for WoD utilization and analysis [35, 4]. Similarly to the traditional ones,
WoD browsers provide the functionality for link navigation and
representation of WoD resources and their properties; thus enabling
browsing and exploration of WoD in a most intuitive way. WoD
browsers mainly use tabular views and links to provide navigation
over the WoD resources.
Disco 2007 renders all information related to a particular RDF resource as HTML table with property-value pairs.
Explorator 2009 is a WoD exploratory tool that allows users to browse a dataset by combining search and facets.
Tabulator {Berners-Lee2006} another WoD browser, additionally provides maps and timeline visualizations.
there is a large number
of generic visualization frameworks, that offer a wide range of vi-
sualization types and operations
types of data (for example, numbers, temporal, graphical, spatial)
provide a graphical
representation of the data, using bubbles, circles, charts or graphs. This tool are the most interesting when
talking about big linked open data visualization, and that is why tables with some the principal characteristics
of this tool are inserted in this section. In particular the aspect taken into consideration are
Some offer recommendation mechanisms suggesting the most suitable form of visualization depending on the input data
With regard to
Existing approaches assume that all objects can be presented on the screen and managed through traditional visualization techniques, thus limiting their applicability to data sets of limited size.
Thematic map shows the depth of the classes in the DBpedia ontology hierarchy (the darker, the deeper).
An interface for biological scientists to consume large amounts of biomedical data
A tree-like structure to show the query results
Iterative query approach
The width of the lines reflects the degrees of relationships between compounds and diseases
The first layer is the query input, the second layer is the compounds that are related to the input, and the third layer is the diseases that are related to these compounds
The width of the lines in the uppermost layer is the sum of all the degrees of reactions on the paths to the diseases that are contained in the subtrees of the rooted tree. This design is to better reflect the degrees of relationships between compounds and diseases, thus reducing the impact of the intermediate variables, to make the relations between two entities more clear.
, modern systems should adopt more sophisticated techniques such as
and deepen disk-based implementations
Scalability and performance should be considered as key requirements [Tong 2006, Sundara 2010].