SlideShare a Scribd company logo
1 of 13
Download to read offline
Data Driven Innovation - Rome
Self-Service Data
Preparation
Dr. Michele Stecca
24 Feb., 2017
• IoT systems generate massive amounts of data
SELF-SERVICE DATA
PREPARATION
Who knows
how to do
this?
So far,
so good
BUT
• We store this huge amount of information in big
data platforms
• Then we extract value from it
• A "citizen data scientist" is a person who creates or generates
models that leverage predictive or prescriptive analytics but whose
primary job function is outside of the field of statistics and
analytics1
• The person is not typically a member of an analytics team. Citizen
data scientists are typically in a line of business, outside of IT and
outside of a BI team
SELF-SERVICE DATA
PREPARATION
¹ Gartner 2015 Research – “Smart Data Discovery Will Enable a New Class of Citizen Data Scientist”
• Big data discovery will help expand the use of big data analytics
because exploration of big data sources will occur more often,
much faster and at a lower cost per analysis, delivered by a
broader range of users with more rudimentary technical skills
• The global trend is to enable lesser skilled (i.e., citizen data
scientists) users with the ability to solve more complex problems
or access more insights using easier and quicker methods
• Through 2017, the number of citizen data scientists will grow five
times faster than the number of highly skilled data scientists
• The blending in a single tool or tightly coupled portfolio, the ease
of use, interactivity and agility of data discovery, with the richness
of analysis and scale, diversity or immediacy of big data, will be
the inception of big data discovery
SELF-SERVICE DATA
PREPARATION
• Gartner has developed the concept of smart big data discovery
• Preparing data, finding patterns in large, complex data and sharing
findings with other users from data remains largely manual
• Smart self-service data preparation is a smart data discovery
capability, where algorithms are used to find relationships in data
and to profile and recommend to users the best approaches to
minimize modeling time and improve quality
SELF-SERVICE DATA
PREPARATION
• doolytic simplifies access to big data with a modern BI user
experience and functionality
• doolytic enables smart data discovery on both structured and
unstructured data
• doolytic offers sophisticated advanced query capabilities required by
power users/citizen data scientists
• doolytic leverages supervised and unsupervised machine learning
features for further investigation
SELF-SERVICE DATA
PREPARATION
SELF-SERVICE DATA
PREPARATION
• Native Datalake Dictionary
• Join Recommender
• Not based on field name conventions
like traditional BI tools
• Search links between fields and draw
graphs with confidence from Datalake
Dictionary
SELF-SERVICE DATA
PREPARATION
How can
doolytic help
to discover
unknown
correlations?
• The algorithm suggests the user the potential correlations by
associating a degree of confidence
• The user can accept/reject recommendations
• Graphical visualization for usability
• The algorithm is scalable
SELF-SERVICE DATA
PREPARATION
SELF-SERVICE DATA
PREPARATION
• The network planning department needs to optimize the bandwidth allocation by user and traffic type
• Citizen data scientists are limited by the existing technology stack to high aggregation levels and small
fractions of data while performing statistical analyses
• Citizen data scientists must manually correlate data coming from different data sources (including
network probes)
solution
challenge
benefits
• Business users keep track of frequently used queries with responsive interactive dashboards and
visualizations
• Citizen data scientists drill data on the-the-fly at maximum granularity – at user, device and traffic
package level - and discover new paths and rules for network optimization through the Relation-Action
model
• Relationships among datasets are automatically recommended by a specific component
• More accurate and effective network optimizations algorithms are enabled with a wider and
deeper set of inputs
• Citizen data scientist are free to do big data discovery on their own
• Lower TCO than legacy tools
• IT department redirected from support to custom data inquiries
• ROI realized through smaller required investment in optimized network equipment
• Moving from manual data preparation to smart data preparation is
an important trend for IoT/big data applications
• This is particularly true when dealing with heterogeneous data such
as sensor data, structured/unstructured data, etc.
• doolytic supports the citizen data scientist by providing advanced
tools for data preparation on large datasets with the Join
Recommender
SELF-SERVICE DATA
PREPARATION
@steccami
SELF-SERVICE DATA
PREPARATION
• Senior Big Data Analyst, doolytic
• Ph.D. Computer Engineering, Univ. of Genoa, Italy
• Visiting Researcher, ICSI - UC Berkeley, USA
• Principal Investigator, FP6 & FP7 projects co-
funded by EU
• Author 30+ scientific papers in Computer Science
• Main interests: Big data (Hadoop, Spark, etc.), IoT
Self-service Big Data Preparation - Michele Stecca

More Related Content

Viewers also liked

A visual approach to fraud detection and investigation - Giuseppe Francavilla
A visual approach to fraud detection and investigation - Giuseppe FrancavillaA visual approach to fraud detection and investigation - Giuseppe Francavilla
A visual approach to fraud detection and investigation - Giuseppe FrancavillaData Driven Innovation
 
How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)
How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)
How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)Data Driven Innovation
 
How Data Drive Beyond Bank - Christian Miccoli (Conio)
How Data Drive Beyond Bank - Christian Miccoli (Conio)How Data Drive Beyond Bank - Christian Miccoli (Conio)
How Data Drive Beyond Bank - Christian Miccoli (Conio)Data Driven Innovation
 
The mine of the public open data, a fundamental asset - Flavia Marzano
The mine of the public open data, a fundamental asset - Flavia MarzanoThe mine of the public open data, a fundamental asset - Flavia Marzano
The mine of the public open data, a fundamental asset - Flavia MarzanoData Driven Innovation
 
Il valore delle Indicazioni Geografiche nell'economia italiana - Mauro Rosati
Il valore delle Indicazioni Geografiche nell'economia italiana - Mauro RosatiIl valore delle Indicazioni Geografiche nell'economia italiana - Mauro Rosati
Il valore delle Indicazioni Geografiche nell'economia italiana - Mauro RosatiData Driven Innovation
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
 
Disrupting the weather market, one thousand drops at a time - Paola Allamano ...
Disrupting the weather market, one thousand drops at a time - Paola Allamano ...Disrupting the weather market, one thousand drops at a time - Paola Allamano ...
Disrupting the weather market, one thousand drops at a time - Paola Allamano ...Data Driven Innovation
 
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...L'economia europea dei dati. Politiche europee e opportunità di finanziamento...
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...Data Driven Innovation
 
Healthware for medicine - Roberto Ascione
Healthware for medicine - Roberto AscioneHealthware for medicine - Roberto Ascione
Healthware for medicine - Roberto AscioneData Driven Innovation
 
Cognitive computing in the digital health era - Federico Neri
Cognitive computing in the digital health era - Federico NeriCognitive computing in the digital health era - Federico Neri
Cognitive computing in the digital health era - Federico NeriData Driven Innovation
 
Portabilità dei dati e benessere del consumatore di servizi cloud - Davide Mula
Portabilità dei dati e benessere del consumatore di servizi cloud - Davide MulaPortabilità dei dati e benessere del consumatore di servizi cloud - Davide Mula
Portabilità dei dati e benessere del consumatore di servizi cloud - Davide MulaData Driven Innovation
 
Sistema di logging applicativo per ambienti distribuiti Hadoop-based - Monica...
Sistema di logging applicativo per ambienti distribuiti Hadoop-based - Monica...Sistema di logging applicativo per ambienti distribuiti Hadoop-based - Monica...
Sistema di logging applicativo per ambienti distribuiti Hadoop-based - Monica...Data Driven Innovation
 
LCA as an innovation tool - Barilla - Luca Ruini
LCA as an innovation tool - Barilla - Luca RuiniLCA as an innovation tool - Barilla - Luca Ruini
LCA as an innovation tool - Barilla - Luca RuiniData Driven Innovation
 
INDUSTRIA 4.0 - Il trasferimento tecnologico attraverso i Digital Innovation ...
INDUSTRIA 4.0 - Il trasferimento tecnologico attraverso i Digital Innovation ...INDUSTRIA 4.0 - Il trasferimento tecnologico attraverso i Digital Innovation ...
INDUSTRIA 4.0 - Il trasferimento tecnologico attraverso i Digital Innovation ...Data Driven Innovation
 
Innovazione per la PA - Andrea D'Acunto
Innovazione per la PA - Andrea D'AcuntoInnovazione per la PA - Andrea D'Acunto
Innovazione per la PA - Andrea D'AcuntoData Driven Innovation
 
L’etica nella società dell’intelligenza artificiale - Edmondo Grassi
L’etica nella società dell’intelligenza artificiale - Edmondo GrassiL’etica nella società dell’intelligenza artificiale - Edmondo Grassi
L’etica nella società dell’intelligenza artificiale - Edmondo GrassiData Driven Innovation
 

Viewers also liked (17)

A visual approach to fraud detection and investigation - Giuseppe Francavilla
A visual approach to fraud detection and investigation - Giuseppe FrancavillaA visual approach to fraud detection and investigation - Giuseppe Francavilla
A visual approach to fraud detection and investigation - Giuseppe Francavilla
 
How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)
How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)
How AI will impact Web and Social Media Intelligence - Uljan Sharka (Crystal.io)
 
How Data Drive Beyond Bank - Christian Miccoli (Conio)
How Data Drive Beyond Bank - Christian Miccoli (Conio)How Data Drive Beyond Bank - Christian Miccoli (Conio)
How Data Drive Beyond Bank - Christian Miccoli (Conio)
 
The mine of the public open data, a fundamental asset - Flavia Marzano
The mine of the public open data, a fundamental asset - Flavia MarzanoThe mine of the public open data, a fundamental asset - Flavia Marzano
The mine of the public open data, a fundamental asset - Flavia Marzano
 
Il valore delle Indicazioni Geografiche nell'economia italiana - Mauro Rosati
Il valore delle Indicazioni Geografiche nell'economia italiana - Mauro RosatiIl valore delle Indicazioni Geografiche nell'economia italiana - Mauro Rosati
Il valore delle Indicazioni Geografiche nell'economia italiana - Mauro Rosati
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
Disrupting the weather market, one thousand drops at a time - Paola Allamano ...
Disrupting the weather market, one thousand drops at a time - Paola Allamano ...Disrupting the weather market, one thousand drops at a time - Paola Allamano ...
Disrupting the weather market, one thousand drops at a time - Paola Allamano ...
 
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...L'economia europea dei dati. Politiche europee e opportunità di finanziamento...
L'economia europea dei dati. Politiche europee e opportunità di finanziamento...
 
Healthware for medicine - Roberto Ascione
Healthware for medicine - Roberto AscioneHealthware for medicine - Roberto Ascione
Healthware for medicine - Roberto Ascione
 
Cognitive computing in the digital health era - Federico Neri
Cognitive computing in the digital health era - Federico NeriCognitive computing in the digital health era - Federico Neri
Cognitive computing in the digital health era - Federico Neri
 
Portabilità dei dati e benessere del consumatore di servizi cloud - Davide Mula
Portabilità dei dati e benessere del consumatore di servizi cloud - Davide MulaPortabilità dei dati e benessere del consumatore di servizi cloud - Davide Mula
Portabilità dei dati e benessere del consumatore di servizi cloud - Davide Mula
 
Sistema di logging applicativo per ambienti distribuiti Hadoop-based - Monica...
Sistema di logging applicativo per ambienti distribuiti Hadoop-based - Monica...Sistema di logging applicativo per ambienti distribuiti Hadoop-based - Monica...
Sistema di logging applicativo per ambienti distribuiti Hadoop-based - Monica...
 
No Data, No Party - Roberto Magnifico
No Data, No Party - Roberto MagnificoNo Data, No Party - Roberto Magnifico
No Data, No Party - Roberto Magnifico
 
LCA as an innovation tool - Barilla - Luca Ruini
LCA as an innovation tool - Barilla - Luca RuiniLCA as an innovation tool - Barilla - Luca Ruini
LCA as an innovation tool - Barilla - Luca Ruini
 
INDUSTRIA 4.0 - Il trasferimento tecnologico attraverso i Digital Innovation ...
INDUSTRIA 4.0 - Il trasferimento tecnologico attraverso i Digital Innovation ...INDUSTRIA 4.0 - Il trasferimento tecnologico attraverso i Digital Innovation ...
INDUSTRIA 4.0 - Il trasferimento tecnologico attraverso i Digital Innovation ...
 
Innovazione per la PA - Andrea D'Acunto
Innovazione per la PA - Andrea D'AcuntoInnovazione per la PA - Andrea D'Acunto
Innovazione per la PA - Andrea D'Acunto
 
L’etica nella società dell’intelligenza artificiale - Edmondo Grassi
L’etica nella società dell’intelligenza artificiale - Edmondo GrassiL’etica nella società dell’intelligenza artificiale - Edmondo Grassi
L’etica nella società dell’intelligenza artificiale - Edmondo Grassi
 

More from Data Driven Innovation

Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...Data Driven Innovation
 
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...Data Driven Innovation
 
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...Data Driven Innovation
 
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...Data Driven Innovation
 
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...Data Driven Innovation
 
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Data Driven Innovation
 
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...Data Driven Innovation
 
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...Data Driven Innovation
 
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...Data Driven Innovation
 
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...Data Driven Innovation
 
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)Data Driven Innovation
 
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...Data Driven Innovation
 
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)Data Driven Innovation
 
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...Data Driven Innovation
 
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...Data Driven Innovation
 
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...Data Driven Innovation
 
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...Data Driven Innovation
 
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)Data Driven Innovation
 
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)Data Driven Innovation
 
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...Data Driven Innovation
 

More from Data Driven Innovation (20)

Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
Integrazione della mobilità elettrica nei sistemi urbani (Stefano Carrese, Un...
 
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
La statistica ufficiale e i trasporti marittimi nell'era dei big data (Vincen...
 
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
How can we realize the Mobility as a Service (Maas) (Andrea Paletti, London S...
 
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
Il DTC-Lazio e i dati del patrimonio culturale (Maria Prezioso, Università To...
 
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
CHNet-DHLab: Servizi Cloud a supporto dei beni culturali (Fabio Proietti, INF...
 
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
Progetto EOSC-Pillar (Fulvio Galeazzi, GARR)
 
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
Una infrastruttura per l’accesso al patrimonio culturale: il Progetto del Por...
 
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
Utilizzo dei Big data per l’analisi dei flussi veicolari e della mobilità (Ma...
 
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
I dati personali nell'analisi comportamentale della mobilità di dipendenti e ...
 
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
Estrarre valore dai dati: tecnologie per ottimizzare la mobilità del futuro (...
 
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
Le piattaforme dati per la mobilità nelle città italiane (Marco Mena, EY)
 
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
WiseTown, un ecosistema di applicazioni e strumenti per migliorare la qualità...
 
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
CityOpenSource as a civic tech tool (Ilaria Vitellio, CityOpenSource)
 
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...Big Data Confederation: toward the local urban data market place (Renzo Taffa...
Big Data Confederation: toward the local urban data market place (Renzo Taffa...
 
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
Making citizens the eyes of policy makers: a sweet spot for hybrid AI? (Danie...
 
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
Dall'Agenda Digitale alla Smart City: il percorso di Roma Capitale verso il D...
 
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
Reusing open data: how to make a difference (Vittorio Scarano, Università di ...
 
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
Gestire i beni culturali con i big data (Sandro Stancampiano, Istat)
 
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
Data Governance: cos’è e perché è importante? (Elena Arista, Erwin)
 
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
Data driven economy: bastano i dati per avviare una start up? (Gabriele Anton...
 

Recently uploaded

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 

Self-service Big Data Preparation - Michele Stecca

  • 1. Data Driven Innovation - Rome Self-Service Data Preparation Dr. Michele Stecca 24 Feb., 2017
  • 2. • IoT systems generate massive amounts of data SELF-SERVICE DATA PREPARATION Who knows how to do this? So far, so good BUT • We store this huge amount of information in big data platforms • Then we extract value from it
  • 3. • A "citizen data scientist" is a person who creates or generates models that leverage predictive or prescriptive analytics but whose primary job function is outside of the field of statistics and analytics1 • The person is not typically a member of an analytics team. Citizen data scientists are typically in a line of business, outside of IT and outside of a BI team SELF-SERVICE DATA PREPARATION ¹ Gartner 2015 Research – “Smart Data Discovery Will Enable a New Class of Citizen Data Scientist”
  • 4. • Big data discovery will help expand the use of big data analytics because exploration of big data sources will occur more often, much faster and at a lower cost per analysis, delivered by a broader range of users with more rudimentary technical skills • The global trend is to enable lesser skilled (i.e., citizen data scientists) users with the ability to solve more complex problems or access more insights using easier and quicker methods • Through 2017, the number of citizen data scientists will grow five times faster than the number of highly skilled data scientists • The blending in a single tool or tightly coupled portfolio, the ease of use, interactivity and agility of data discovery, with the richness of analysis and scale, diversity or immediacy of big data, will be the inception of big data discovery SELF-SERVICE DATA PREPARATION
  • 5. • Gartner has developed the concept of smart big data discovery • Preparing data, finding patterns in large, complex data and sharing findings with other users from data remains largely manual • Smart self-service data preparation is a smart data discovery capability, where algorithms are used to find relationships in data and to profile and recommend to users the best approaches to minimize modeling time and improve quality SELF-SERVICE DATA PREPARATION
  • 6. • doolytic simplifies access to big data with a modern BI user experience and functionality • doolytic enables smart data discovery on both structured and unstructured data • doolytic offers sophisticated advanced query capabilities required by power users/citizen data scientists • doolytic leverages supervised and unsupervised machine learning features for further investigation SELF-SERVICE DATA PREPARATION
  • 8. • Native Datalake Dictionary • Join Recommender • Not based on field name conventions like traditional BI tools • Search links between fields and draw graphs with confidence from Datalake Dictionary SELF-SERVICE DATA PREPARATION How can doolytic help to discover unknown correlations?
  • 9. • The algorithm suggests the user the potential correlations by associating a degree of confidence • The user can accept/reject recommendations • Graphical visualization for usability • The algorithm is scalable SELF-SERVICE DATA PREPARATION
  • 10. SELF-SERVICE DATA PREPARATION • The network planning department needs to optimize the bandwidth allocation by user and traffic type • Citizen data scientists are limited by the existing technology stack to high aggregation levels and small fractions of data while performing statistical analyses • Citizen data scientists must manually correlate data coming from different data sources (including network probes) solution challenge benefits • Business users keep track of frequently used queries with responsive interactive dashboards and visualizations • Citizen data scientists drill data on the-the-fly at maximum granularity – at user, device and traffic package level - and discover new paths and rules for network optimization through the Relation-Action model • Relationships among datasets are automatically recommended by a specific component • More accurate and effective network optimizations algorithms are enabled with a wider and deeper set of inputs • Citizen data scientist are free to do big data discovery on their own • Lower TCO than legacy tools • IT department redirected from support to custom data inquiries • ROI realized through smaller required investment in optimized network equipment
  • 11. • Moving from manual data preparation to smart data preparation is an important trend for IoT/big data applications • This is particularly true when dealing with heterogeneous data such as sensor data, structured/unstructured data, etc. • doolytic supports the citizen data scientist by providing advanced tools for data preparation on large datasets with the Join Recommender SELF-SERVICE DATA PREPARATION
  • 12. @steccami SELF-SERVICE DATA PREPARATION • Senior Big Data Analyst, doolytic • Ph.D. Computer Engineering, Univ. of Genoa, Italy • Visiting Researcher, ICSI - UC Berkeley, USA • Principal Investigator, FP6 & FP7 projects co- funded by EU • Author 30+ scientific papers in Computer Science • Main interests: Big data (Hadoop, Spark, etc.), IoT