Carrying out OLAP analyses in hands-free scenarios requires lean forms of communication between the users and the system, based for instance on natural language. In this paper we introduce VOOL, a framework specifically devised for vocalizing the insights resulting from OLAP sessions. VOOL is self-configurable, extensible, and is aware of the user's intentions expressed by OLAP operators. To avoid overwhelming the user with very long descriptions, we pursue the vocalization of selected insights automatically extracted from query results. These insights are detected by a set of modules, each returning a set of independent insights that characterize data. After describing and formalizing our approach, we evaluate it in terms of efficiency and effectiveness.
[ADBIS2022] Insight-based Vocalization of OLAP Sessions
1. ADBIS 2022
Insight-based vocalization
of OLAP sessions
Matteo Francia1, Enrico Gallinucci1, Matteo Golfarelli1, Stefano Rizzi1
1University of Bologna, Italy
26th European Conference on Advances in Databases and Information Systems
ADBIS 2022
2. ADBIS 2022
Motivation
Augmented analytics and smart assistants are cutting-edge applications
- Shift of human-machine interaction towards voice interfaces
- E.g., to support the needs of specific user groups, such as the visually-impaired
- E.g., to enable analytics where hand-free interaction is mandatory (e.g., augmented reality [1])
We introduce VOOL (VOcalization of OLap sessions)
- Describe (sequences of) multimensional query results through natural language…
- … by returning interesting insights to the user
Matteo Francia – University of Bologna 2
Introduction
[1] Matteo Francia, Matteo Golfarelli, Stefano Rizzi: A-BI+: A framework for Augmented Business Intelligence. Information Systems. (2020)
3. ADBIS 2022
Related work
Exploration/querying of multidimensional cubes
- OLAP comes with low-level operators to query multidimensional cubes
- Additional operators have been introduced to automatically extract interesting patterns [1]
- E.g., Cinecubes [2], compares the result of a query to results obtained over sibling values or drill-downs
- However: these operators can be plugged in VOOL
Conversational systems
- Single OLAP queries are sampled, and a speech is produced out of fixed templates [3]
- End-to-end dialog system is introduced for vocalization of single query [4]
- However: limited support for the vocalization of analytic sessions
3
[1] Golab, L., Srivastava, D.: Exploring data using patterns: A survey and open problems. In: Proc. DOLAP@EDBT/ICDT. pp. 116–120 (2021)
[2] Gkesoulis, D., Vassiliadis, P., Manousis, P.: CineCubes: Aiding data workers gain insights from OLAP queries. Inf. Syst. 53, 60–86 (2015)
[3] Trummer, I., Wang, Y., Mahankali, S.: A holistic approach for query evaluation and result vocalization in voice-based OLAP. In: Proc. SIGMOD. pp. 936–953 (2019)
[4] Lyons, G., Tran, V., Binnig, C., C ̧ etintemel, U., Kraska, T.: Making the case for query-by-voice with echoquery. In: Proc. SIGMOD. pp. 2129–2132 (2016)
Introduction
4. ADBIS 2022
VOOL: Overview
Desiderata for the vocalization of analytic sessions
- #D1 Intention-awareness, vocalization should:
- Describe single query results as well as compare subsequent query results
- Consider the user’s intention between subsequent queries
- #D2 Extensibility: rely on interfaces that make operators easily pluggable
- #D3 Timeliness: produce vocalizations responsively
- #D4 Conciseness: produce vocalizations that take a limited time
4
Overview
5. ADBIS 2022
VOOL: Overview
Matteo Francia – University of Bologna 5
Overview
Querying
[1]
"Sales by
Customer and
Year"
Query
DW
[1] Matteo Francia, Enrico Gallinucci, Matteo Golfarelli: COOL: A Framework for Conversational OLAP. Information Systems. (2021)
6. ADBIS 2022
VOOL: Overview
Modules are implementations
of (insight) operators
- Given subsequent cubes…
- … retrieve self-contained
insights…
- … with different levels of
interestingness (relevance to
the user query)
The module abstraction
enables easy extensibility
Matteo Francia – University of Bologna 6
Overview
Querying
[1]
Query
result
Session
History
"Sales by
Customer and
Year"
Query
DW
Insight
generation
Module 1
Module 2
Module 3
[1] Matteo Francia, Enrico Gallinucci, Matteo Golfarelli: COOL: A Framework for Conversational OLAP. Information Systems. (2021)
7. ADBIS 2022
VOOL: Overview
Modules are implementations
of (insight) operators
- Given subsequent cubes…
- … retrieve self-contained
insights…
- … with different levels of
interestingness (relevance to
the user query)
The module abstraction
enables easy extensibility
Matteo Francia – University of Bologna 7
Overview
Insights
Querying
[1]
Query
result
Session
History
"Sales by
Customer and
Year"
Query
DW
Insight
generation
Module 1
Module 2
Module 3
Insight
selection
R
R
☐☐☐R
[1] Matteo Francia, Enrico Gallinucci, Matteo Golfarelli: COOL: A Framework for Conversational OLAP. Information Systems. (2021)
8. ADBIS 2022
VOOL: Overview
Modules are implementations
of (insight) operators
- Given subsequent cubes…
- … retrieve self-contained
insights…
- … with different levels of
interestingness (relevance to
the user query)
The module abstraction
enables easy extensibility
Matteo Francia – University of Bologna 8
Overview
Insights Selected
insights
Querying
[1]
Query
result
Session
History
"Sales by
Customer and
Year"
Query
DW
Insight
generation
Module 1
Module 2
Module 3
Insight
selection
R
R
R
Vocalization
The average
sale is ...
Outstanding
products are
The worst
product is ...
R
R
R
[1] Matteo Francia, Enrico Gallinucci, Matteo Golfarelli: COOL: A Framework for Conversational OLAP. Information Systems. (2021)
9. ADBIS 2022
VOOL: Overview
Modules are implementations
of (insight) operators
- Given subsequent cubes…
- … retrieve self-contained
insights…
- … with different levels of
interestingness (relevance to
the user query)
The module abstraction
enables easy extensibility
Matteo Francia – University of Bologna 9
Overview
Insights Selected
insights
Querying
[1]
Query
result
Session
History
Previous
query result
"Drill down
to Month"
Query
DW
Insight
generation
Module 1
Module 2
Module 3
Insight
selection
R
R
R
Vocalization
The average
sale is ...
Outstanding
products are
The worst
product is ...
R
R
R
[1] Matteo Francia, Enrico Gallinucci, Matteo Golfarelli: COOL: A Framework for Conversational OLAP. Information Systems. (2021)
10. ADBIS 2022
Insight Generation
Modules (e.g., top-k) are executed to extract
insights (e.g., top-3 facts) out of the query results
- The execution of a module depends on the
query/OLAP operator (#D1)
An insight s consists of
- A set of components c (a single/group of facts)
- NL(s): natural language description of s
- cov(s): fraction of data covered by the insight
- cost(s): vocalization cost (i.e., words in NL(s))
Assumptions
- Modules are black boxes, but…
- … NL(s) is self-contained
- … insights from the same module are incremental
s3 = {c1=Beer, c2=Wine, c3=Cola}
NL(s3) = “The 3 facts with highest Quantity are Beer
with 80, Wine with 70, and Cola with 30”
cov(s3) = 0.5
cost(s3) = 17
s2 = {c1=Beer, c2=Wine} ⊂ s3
Matteo Francia – University of Bologna 10
Product Quantity
Beer 80
Wine 70
Cola 30
Bagel 8
Pizza 6
Bread 5
Product Quantity
Beer 80
Wine 70
Cola 30
Bagel 8
Pizza 6
Bread 5
top-k
Generation
"Sales by
Product"
11. ADBIS 2022
Insight Generation
Insight interestingness
- Sum of the interestingness of the
components of each insight
int(s) = σc∈s int c
- The component interestingness int(c) is
module-specific (black box)
- Constraint int(c) ∈ (0, 1]
For instance, for top-k
- Beer retains 44% of sales
- Beer, Wine, and Cola retain 98% of sales
int(ck) =
γk − y
k. m
σi=1
k
(γi − y
k. m)
int s1 = int 𝑐1 = Beer = 0.44
int(s3) = 0.98
Matteo Francia – University of Bologna 11
Product Quantity
Beer 80
Wine 70
Cola 30
Bagel 8
Pizza 6
Bread 5
Product Quantity
Beer 80
Wine 70
Cola 30
Bagel 8
Pizza 6
Bread 5
top-k
Generation
12. ADBIS 2022
Insight Generation
The component interestingness changes
between subsequent query results C and C’
For instance, for top-k
- Peculiarity [1] measures to what extent values from
facts in C’ deviate from the originating facts in C
- The higher the deviation, the higher the peculiarity
- Beer (the best-selling product from the best-selling
category) is less peculiar than Cola (the worst-selling
product from the best-selling category)
int(ck) =
(γk−y
k. m) · pec(γk)
σi=1
k
(γi − y
k. m) · pec(γi)
int 𝑠3 =
75 · 0.47 + 65 · 0.17 + 25 · 1.00
72.07
= 0.99
Matteo Francia – University of Bologna 12
Product Quantity
Beer 80
Wine 70
Cola 30
Bagel 8
Pizza 6
Bread 5
Category Quantity
Beverages 180
Food 19
Generation
[1] Francia, Matteo, et al. "Enhancing cubes with models to describe multidimensional data." Information Systems Frontiers 24.1 (2022): 31-48.
"Sales by
Category"
“Specialize
Sales by
Product"
C
C’
13. ADBIS 2022
Insight Selection
Insights (𝒮) are too many to be vocalized
- Insights (SF) from the same module (F) are
incremental by construction
- Modules (F’s) have different semantics
GOAL: return insights that max. interestingness
while not exceeding a budget tvoc (#D4)
- This is a multiple-choice knapsack problem
- The set of insights 𝒮 is partitioned into classes (SF’s)
- Select at most one insight sF
∈ SF
out of each class
Matteo Francia – University of Bologna 13
Selection
s1
Clustering
Statistics
𝒮
Outlier
detection [1]
Correlation
Slicing
variance
Aggregation
variance [2]
Assess [4]
F = top-k [3]
Stop-k = {s1, …, sn}
stop-k
1.int = 1.3
stop-k
1.cov = 0.5
(#D1 and #D2)
[1] Liu, Fei Tony, Kai Ming Ting, and Zhi-Hua Zhou. "Isolation forest." 2008 eighth ieee
international conference on data mining. IEEE, 2008.
[2] Das, M., Amer-Yahia, S., Das, G., Yu, C.: MRI: meaningful interpretations of
collaborative ratings. Proc. VLDB Endow. 4(11), 1063–1074 (2011)
[3] Francia, M., Marcel, P., Peralta, V., Rizzi, S.: Enhancing cubes with models to
describe multidimensional data. Inf. Syst. Frontiers 24(1), 31–48 (2022)
[4] Francia, Matteo, et al. "Suggesting assess queries for interactive analysis of
multidimensional data." IEEE Transactions on Knowledge and Data Engineering (2022).
14. ADBIS 2022
Insight Vocalization
Start with a preamble that describes the query
- The preamble acts as a context for subsequent insights
- The vocalization of the preamble takes tgen
- Start insight selection slightly before tgen to not perceive
any pause in the vocalization (#D3)
Vocalization
- Selected insights are sorted by descending cov
- Natural language descriptions NL's are concatenated
1. The query result shows
the sum of quantity
grouped by product
2. The average Quantity is
33.2
3. The three facts with
highest Quantity are Beer
with 80, Wine with
70, and Cola with 30
Matteo Francia – University of Bologna 14
Vocalization
Statistics
s.cov = 1.0
Top-K
s.cov =0.5
Preamble
15. ADBIS 2022
Experimental Evaluation
Scalability with respect to query
result cardinality up 104 tuples
- 104 is unrealistic for OLAP, results are
constrained by the visualization /
interaction [1]
- 10 OLAP sessions, each involving 3
OLAP steps (Foodmart cube)
- The computation of all modules requires
less than 1 second
- Single exception: Clustering requires 7
seconds for results with cardinality 104
Matteo Francia – University of Bologna 15
Results
[1] Francia, Matteo, Matteo Golfarelli, and Stefano Rizzi. "A-BI+: a framework for Augmented Business Intelligence." Information Systems 92 (2020): 101520.
16. ADBIS 2022
Experimental Evaluation
Preliminary tests with 10 users
- Master students in data science with basic/advanced knowledge of BI
- Users were assigned three OLAP sessions with different analysis goals
- E.g., “As a shop owner, you are analyzing the performance of each product department”
- Rating on a scale from 1 (very poor) to 5 (very high)
- User experience: 4.2 ± 0.6
- Quality of the description of query results: 3.8 ± 0.9
- Lowest appreciation: Statistics is sometimes too simple to describe the whole result
- Highest appreciation: Aggregation variance to describe how aggregation changes value distributions
Matteo Francia – University of Bologna 16
Results
17. ADBIS 2022
Conclusion and Research Directions
So far
- We introduced the desiderata for a vocalization system
- We implemented VOOL to vocalize insights out of the results of analytic sessions
- User-based and efficiency evaluation show promising results
Besides refining and extending the modules, future directions are:
- Handling redundancy over single queries (e.g., insights vocalizing the same tuples)
- … and sessions (e.g., vocalizing the same insight twice or more reduces its interestingness)
- “Tell me more” : users can ask for details and insights retrieved after the time budget
- Assess the correlation between insights and users' intentions
Matteo Francia – University of Bologna 17
Conclusion