Article: "Improved Knowledge from Data: Building an Immersive Data Analysis Platform"
1. Improved Knowledge from Data: Building an Immersive Data Analysis
Platform
Felipe Augusto Pedroso* Paula Dornhofer Paro Costa†
Dept. of Computer Engineering and Industrial Automation (DCA)
School of Electrical and Computer Engineering - University of Campinas (UNICAMP)
ABSTRACT
We are facing an unprecedented production growth of data, with
increasing degrees of complexity, turning data analysis a difficult
task. At the same time, Virtual Reality (VR) technology is becoming
more popular, with affordable devices and easier content production
tools. In this scenario, the current research project proposes the
development of an open data analysis platform using VR, allowing
researchers to evaluate how the immersion of this environment could
add value to the Visual Analytics (VA) process and pose a better
way to extract knowledge from the data.
Index Terms: Visual Analytics—Virtual Reality—Visualization—
Immersive Analytics;
1 INTRODUCTION
The growth of data production and complexity is creating new chal-
lenges to analyze big volumes of data and extract knowledge from
it. The Visual Analytics (VA) field comes to this territory to provide
tools and techniques to aid the process of data analysis and represent
the knowledge acquired. On the other hand, Virtual Reality (VR)
applications are becoming more accessible since head mounted dis-
plays (HMD) are getting cheaper and VR content development tools
are becoming easier to be used. Our research goal aims to explore
and evaluate the application of VA techniques in VR environments.
In particular, this research project focuses on developing an open
data analysis platform using VR, where researchers could evaluate
how the immersion and the experience provided by this kind of
environment could improve the VA process and provide a better way
to extract knowledge from data.
2 MOTIVATION
Even with the growth of the Immersive Analytics field, researchers
are producing immersive data visualizations with implementations
from scratch, without using any proper tool or framework to aid
the process [2,7,9]. This behavior can potentially make the result
of their evaluation biased to their solution’s context or limitations.
Our proposal is to create an open and extensible platform using
VR that allows researchers to work in an immersive environment
to manipulate data and create visualizations in an easier fashion.
With this platform in hand, we hope that researchers could focus
their work on extracting knowledge from large amounts of data or
evaluating the feasibility of VR as a VA tool.
3 RELATED WORK
The current production of data is achieving unprecedented levels [8]
and together with the volume growth, the data complexity is facing a
dramatic increase [3]. One of the biggest challenges that is presented
*e-mail: felipe.pedroso@live.com
†e-mail: paula@dca.fee.unicamp.br
Figure 1: The VA process proposed by Keim et al. [6].
by this data deluge phenomenon is how to discover and understand
meaningful patterns hidden in the data [3].
Gorodov and Gubarev [4] stated that “Graphical Thinking” is a very
simple and natural type of data processing for human beings, support-
ing decision-making in an effective and understandable way. They
also noted that when it comes to Big Data domain, visualization may
not be an effective or applicable option, as it presents the following
problems: visual noise; large image perception; information loss;
high performance requirements; high rate of image change.
According to Keim et al. the Visual Analytics field tries to address
these challenges and problems by combining automated analysis
techniques with interactive visualizations to provide an effective
understanding, reasoning, and decision making on the basis of very
large and complex data sets [6]. The techniques and practices de-
veloped by VA are well-established for: emergency management;
astronomy; monitoring climate and weather; security; scientific ap-
plications; biology and medicine; business intelligence and fraud
detection [6].
At the same time that data volume and complexity are getting
bigger, the VR technology is becoming more affordable to be used
by VA researchers, allowing them to explore the user experience and
immersion to provide new ways to see the data [2,3].
There exist different perceptions about the use of VR for data
visualization: from studies presenting positive results of the usage
[7] to others pointing that conventional 2D Desktop presents better
performance over an immersive solution [9]. The authors did not
base their conclusions on the use of any standard solution to produce
the visualizations for their experiment, potentially presenting biases
related to their solution’s contexts and limitations.
4 PROPOSED SOLUTION
Our work proposes an immersive VA platform capable of imple-
menting the processes depicted in Figure 1. The technologies to be
adopted and supported by the platform are still under evaluation, but
we plan to use Unity to handle the visualization step and Python
to do the work related to data management and analysis. This ap-
2. Figure 2: The proposed user interaction with the platform.
proach lets the platform take advantage of the best of both worlds:
the well-established data analysis environment from Python and the
ready-to-use VR integration available on Unity.
One big challenge that we want to tackle is to build this platform
with an open, free and extensible foundation. This will allow other
researchers to use or adapt it according to their needs and contexts
without worrying with the underlying technologies.
4.1 Proposed User Experience
The user interaction will happen entirely in the VR environment.
This will demand more efforts on the VR Human Computer Interac-
tion (HCI) factors, as this type of interface has its own peculiarities
and affordances [2,7].
We propose the user interaction illustrated by Figure 2, where
the user will pass through 4 steps: data source selection; data set
loading; data visualization and automatic analysis.
The first step is where the user will select the origin of the data
set. At this moment we are planning to implement some of the most
common data sources: JSON format, CSV files and Microsoft Excel
documents.
After this selection, we are going to present a summary of the
available data with the option to choose what is going to be used to
build the visualization. In this step, the user is going to select which
dimensions will be used, how many records will be shown, if the
data needs some cleaning, etc.
With the data ready, the user will have the option to choose the
next step between doing some automatic analysis or visualizing the
data as is.
If the user chooses the automatic analysis, the platform will
present the option to execute operations to improve the understand-
ing of the data or create a perspective to be visualized. Some op-
erations that we are planning to offer are basic statistics, pattern
recognition, data modeling, and dimensionality reduction.
The visualization step is where the users will be immersed in the
data, allowing them to explore, zoom, manipulate, obtain informa-
tion about specific data points, etc. We are considering to offer here
the option of using some kind of local analysis so the user could
filter, identify outliers or even do a simple clusterization.
At any time, the user will be able to go back and forth between
analysis and visualization. We also plan to add the ability of “taking
a snapshot” of the current state of the visualization, allowing users
to share their results or revisit them later.
4.2 Experiments and Evaluation
After the development of the platform, we plan to run an experiment
with potential users to make observations about the usage and to
collect some feedback. This information will be used to do small
adjustments and some fine tuning.
At this point, with an almost “ready-to-use” platform, we will run
a second experiment aiming to compare the effectiveness of a VR
visualization with a conventional 2D desktop visualization.
The experiment will have a design very similar to the ones found
in the literature, with users executing data analysis tasks using the
VR environment and a traditional 2D desktop [7,9].
We plan to position our platform capabilities among other VA
tools with a similar approach found in the literature [5, 10]. This
will help us to identify mandatory features and gaps that we could
fill the void.
5 PRELIMINARY RESULTS
The project is at an early stage of development but during the litera-
ture review we made some investigations regarding the technologies
involved to understand the technical aspects of the scenario. Here
are some of the activities developed until now: A) Creation of a
sample of Data Visualization using Unity; B) Test Unity’s perfor-
mance to render a large amount of data points; C) Investigation
of WebVR libraries as an alternative to implement data visualiza-
tions; D) Evaluation of other Visual Analytics tools as Metabase and
Tableau.
We are currently evaluating how to do the communication be-
tween Python and Unity3D in an effective way. To create this
integration, we are evaluating the use of Remote Procedure Calls
(RPC) through the library gRPC [1], that allows a seamless com-
munication between clients and servers using different technologies
and supports data streaming with serialization out-of-the-box.
6 CONCLUSION
Our main goal is to contribute to VA field by creating an open
data analysis platform to allow researchers to evaluate or adopt VR
visualizations. We hope that the Immersive Analytics field could
also benefit from it, as we plan to create something that could be
extended to support other immersive interfaces.
REFERENCES
[1] gRPC open-source universal RPC framework. https://grpc.io/.
[Online; accessed 30-August-2018].
[2] T. Chandler, M. Cordeil, T. Czauderna, T. Dwyer, J. Glowacki,
C. Goncu, M. Klapperstueck, K. Klein, K. Marriott, F. Schreiber, and
E. Wilson. Immersive analytics. In 2015 Big Data Visual Analytics
(BDVA), pp. 1–8, Sept 2015. doi: 10.1109/BDVA.2015.7314296
[3] C. Donalek, S. G. Djorgovski, A. Cioc, A. Wang, J. Zhang, E. Lawler,
S. Yeh, A. Mahabal, M. Graham, A. Drake, et al. Immersive and
collaborative data visualization using virtual reality platforms. In Big
Data (Big Data), 2014 IEEE International Conference on, pp. 609–614.
IEEE, 2014.
[4] E. Y. Gorodov and V. V. Gubarev. Analytical review of data visualiza-
tion methods in application to big data. JECE, 2013:22:2–22:2, Jan.
2013. doi: 10.1155/2013/969458
[5] P. J. C. John R Harger. Comparison of open-source visual analytics
toolkits. vol. 8294, pp. 8294 – 8294 – 10, 2012. doi: 10.1117/12.
911901
[6] D. Keim, J. Kohlhammer, and G. Ellis. Mastering the Information Age:
Solving Problems with Visual Analytics. Eurographics Association, 1st
ed., 2010.
[7] O. Kwon, C. Muelder, K. Lee, and K. Ma. A study of layout, rendering,
and interaction methods for immersive graph visualization. IEEE
Transactions on Visualization and Computer Graphics, 22(7):1802–
1815, July 2016. doi: 10.1109/TVCG.2016.2520921
[8] B. Marr. Big data: 20 mind-boggling facts everyone must read.
https://www.forbes.com/sites/bernardmarr/2015/09/30/
big-data-20-mind-boggling-facts-everyone-must-read,
Nov 2015. [Online; accessed 30-August-2018].
[9] J. A. Wagner Filho, M. F. Rey, C. M. Freitas, and L. Nedel. Immersive
analytics of dimensionally-reduced data scatterplots. In 2nd Workshop
on Immersive Analytics. IEEE, 2017.
[10] L. Zhang, A. Stoffel, M. Behrisch, S. Mittelstadt, T. Schreck, R. Pompl,
S. Weber, H. Last, and D. Keim. Visual analytics for the big data eraa
comparative review of state-of-the-art commercial systems. In Visual
Analytics Science and Technology (VAST), 2012 IEEE Conference on,
pp. 173–182. IEEE, 2012.