SlideShare a Scribd company logo
1 of 53
Download to read offline
Tetherless World Constellation
Broad Data
Jim Hendler
Tetherless World Professor of Computer and Cognitive Science
Director, The Rensselaer Institute of
Data Exploration and Applications (IDEA)
Rensselaer Polytechnic Institute
http://www.cs.rpi.edu/~hendler
@jahendler (twitter)
Tetherless World Constellation
This talk
• What I’m not going to talk about much
– The Semantic Web (per se)
• http://www.slideshare.net/jahendler/semantic-web-the-inside-story
– Social Machines
• http://www.slideshare.net/jahendler/social-machines-oxford-hendler
– My work with Watson and Cognitive Computing
• http://www.slideshare.net/jahendler/watson-an-academics-perspective
• http://www.slideshare.net/jahendler/watson-summer-review82013final
• What I am going to present
– The rest of the big data story…
Tetherless World Constellation
Data is important!
• Roughly every 50 years a new power
source for the human race is found.
Once upon a time it was chemical,
then it was electrical, then nuclear,
etc.
• Information – so not just data, but
data being used – is the new power
source for our generation.
http://www.slideshare.net/jahendler/the-science-of-data-science
4
The Rensselaer Institute for Data Exploration and Applications
Business
Systems:
Built and Natural
Environments:
Cyber-
Resiliency:
Policy, Ethics and
Stewardship:
Materials Informatics:Data-driven Physical/Life
Sciences:
Healthcare Analytics
and Mobile Health:
Social Network
Analytics:
Agents and
Augmented Reality:
Office of Research 5
Developing a “Data Science” Research Agenda
Multiscale
Sparcity
Abductive Agent-oriented
Tetherless World Constellation
BIG Data
• The term “Big Data” is widely used
nowadays to refer to a whole bunch of
machine-readable data in one accessible
(to the researcher) place
– 3 main contexts
• The large data collections of “big science” projects
– in traditional data warehouse or database formats
• The enterprise data of large, non-Web-based
companies (IBM, TATA, etc.)
– Generally in multiple data formats, stores, warehouses, etc.
• The data holdings of a Google, Facebook or other
large Web company
– Include large “unstructured” holdings
– Include “graph” data
Tetherless World Constellation
But wait, there’s more!
• 4th
context: Broad Data
– The huge amount of freely available, but widely varied,
Open Data on the World Wide Web (Structured and
Semi-structured)
• Example: The extended Facebook OGP graph (the
part outside Facebook’s datasets)
• Example: dbpedia, yago, wikidata, and other sources
of indexed information sources
• Example: The growing linked open data cloud of
freely available linked data from many domains
• Example: millions of datasets that are available on
the Web freely available from governments around
the world
Tetherless World Constellation
The V’s
Volume
Velocity
Tetherless World Constellation
BROAD data challenges
• For broad data the new challenges
that emerge include
– (Web-scale) data search
– “Crowd-sourced” modeling and user testing
– rapid (and potentially ad hoc) integration of
datasets
– visualization and analysis of only-partially
modeled datasets
– policies for data use, reuse and combination.
• Which are an overlooked but critical
part of the KDD world
Tetherless World Constellation
10
KDD Pipeline – as usually presented
Data
Storage
(Big Data
Warehouse)
Data
Storage
(Big Data
Warehouse)
Tetherless World Constellation
KDD Pipeline – in the real world
• Data is increasingly being
brought in from external
sources, with mixed
provenance, and
increasingly outside the
analyzers’ control.
• At increasing rates and
scalesData
Storag
e
Data
Storag
e
Sensors … apps
Social
Media
Customer
Behaviors
Web
Partners
Formatting, standards use, data
cleansing, data bias analysis, …
Open data
Data
Storag
e
Data
Storag
e
Data
Storag
e
Data
Storag
e
Data
Storag
e
Data
Storag
e
Data
Storag
e
Data
Storag
e
Data
Storag
e
Data
Storag
e
Data
Storag
e
Data
Storag
e
Data
Sources
Data
Sources
…
…
Tetherless World Constellation
Tough data integration challenges
Enterprise
analytics
Open Data
Integration
Hard
problems!
Tetherless World Constellation
DIVE into Data
Discover
Integrate
Validate
Explore
Thinking outside the Database box
IDEA
Discovery needs semantics
How do you find the Data you need?How do you find the Data you need?
The answer isn’t:
Middle Eastern Terrorists for $800 …
IDEA
Discovery – there’s a lot out there
IDEA
Discovery challenge: keyword search won’t work
World Bank: Africa
US Data.gov: Crop
Africover: Agriculture
Kenya: Agricultural
IDEA
Integration challenge: need to understand the data
Person
RIN 660125137
Address # 1118
Address St Pinehurst
Address
zip
12203
Course
topic
CSCI
Course # 4961
Campus Personnel
RPI ID 660125137
Name Hendler
Campus Classes
CRN 1118
Name Intro to Physics
YES
NO!!!!
IDEA
Semantic Web and Linked Data (UK)
County Council
Ordnance Survey
Royal Mail
IOGDC Open Data Tutorial 18
IDEADistribution Statement
http://logd.tw.rpi.edu
Semantic Web and Linked Data (US examples)
IDEA
Validation challenge: easy for humans
Easy for us
IDEA
But very hard for machines without people (or knowledge)
Head to head comparison shows that burglaries in Avon
and Somerset (UK) far exceed those in Los Angeles,
California
* one of the most dangerous places in the US
vs. one of the safest in the UK
* fails the “smell test”
IDEA
Data + everything else you know
Same or
different?
Do the terms mean the same? Are they collected in the same way? Are
they processed differently? …
Office of Research
Exploration challenge: develop/test earlier in pipeline
23
Data
Storage
Data
Storage
Sensors and apps Social
Media
Customer
Behaviors
Web
Partners
Formatting, standards use, data
cleansing, data bias analysis, …
Open data
Data
Storage
Data
Storage
Data
Storage
Data
Storage
Data
Storage
Data
Storage
Data
Storage
Data
Storage
Data
Storage
Data
Storage
Data
Storage
Data
Storage
Data
Storage
Data
Storage
ExploreExplore
Can we develop mechanisms
to rapidly develop/test
hypotheses prior to entering
the full analytics pipeline?
Can human perceptual
apparatus help?
Tetherless World Constellation
Exploration challenge is to
improve human/data interaction
Were there really no fires in 1985?
Tetherless World Constellation
How do we attack these challenges?
DOH? DO!
OR
Tetherless World Constellation
Traditional Metadata
• Traditionally metadata tries to be
comprehensive
– Example:ISO 19115
(GIS standard)
• >400 elements
• 14 “packages”
• Dozens of UML models
(not all consistent w/
each other)
• After 50 years this still doesn’t work!
Tetherless World Constellation
The alternative: Not your “father’s metadata”
• Big Data on the web
– is moving away from
traditional relational
models (cf. NoSQL)
– Moving towards third
party application and
extension (cf. Json)
– Focus on interoperability
and exchange with
“lightweight” semantics
• Using ideas from the Semantic Web
– Search: Schema.org
– Social Networking: OGP
Tetherless World Constellation
Semantic Web to Knowledge graph
Tetherless World Constellation
Knowledge graph and schema.org
Tetherless World Constellation
Google 2014
Google finds embedded metadata on >20% of its crawl – Guha, 2014
Tetherless World Constellation
• The schema.org hierarchy and
details are all available on line
–https://schema.org/docs/full.html
Tetherless World Constellation
Schema.org/Dataset
Human-readable database
description (HTML)
Tetherless World Constellation
Schema.org/Dataset
Embedded meta-
data (RDFa)
Tetherless World Constellation
Dataset extension to schema.org - April, 2013
Schema.org/Dataset – add this to your pages!
Tetherless World Constellation
Schema.org/Dataset
(Schema-labs, data search engne)
Tetherless World Constellation
Distribution Statement
Big Deal!
Tetherless World Constellation
USA “Project Data” – metadata
JSON
Aimed at developers
Based on DCAT
Tetherless World Constellation
USA “Project Data” – metadata
RDFa
Embedded metadata for
Search, Web Apps
Based on Schema.org/Dataset
Tetherless World Constellation
EU moving in similar direction
ADMS
Tetherless World Constellation
Not just Govt sector
• IPTC rNews
– Embedded format for online news publications
Tetherless World Constellation
Not just Govt sector
• Goodrelations
– Embedded format for online products/catalogs
Tetherless World Constellation
Not just Govt sector
• Open Graph Protocol
– Embedded format for Facebook
relationships
Tetherless World Constellation
OGP Use
Tetherless World Constellation
Next steps
Smith James June 4
Jones Fred May 17
O’Connell Frank April 3
Chang Wu February 21
Hoffman Bernd December 9
Person
Date
It’s not enough just to describe the data elements…
Tetherless World Constellation
Describing a dataset … requires a context
Smith James June 4
Jones Fred May 17
O’Connell Frank April 3
Chang Wu February 21
Hoffman Bernd December 9
Person
Date
1976 Dates
of Birth
Tetherless World Constellation
Describing a dataset … requires a context
How do we capture more of this information?
Smith James June 4
Jones Fred May 17
O’Connell Frank April 3
Chang Wu February 21
Hoffman Bernd December 9
Person
Date
1976 Cancer
Mortality
dates
IDEA
Scalable Data Integration (via metadata)
IDEA
Semantic Linking
IDEA
ARL Network-Science CTA
1
10
100
1000
1 10 100 1000
Count
Time interval (# of days)
Mentorship first
Housing first
1
10
100
1000
1 10 100 1000
Count
Time interval (# of days)
Mentorship first
Housing trust first
1
10
100
1000
1 10 100 1000
Count
Time interval (# of days)
Housing trust first
Mentorship first
1
10
100
1000
1 10 100 1000
Count
Time interval (# of days)
Housing trust first
Mentorship first
A
C
B
D
0
50
100
150
200
250
300
350
400
-300 -200 -100 0 100 200 300
Count
Time interval (# of days)
0
100
200
300
400
500
600
700
-300 -200 -100 0 100 200 300
Count
Time interval (# of days)
0
50
100
150
200
250
300
350
400
450
-300 -200 -100 0 100 200 300
Count
Time interval (# of days)
0
50
100
150
200
250
300
350
-300 -200 -100 0 100 200 300
Count
Time interval (# of days)
A
C
B
D
Algorithms designed Y3 were tested against 220GB of
data from Everquest II game looking for proxy
measures of trust - Performance results on real data
showed good correspondence with theoretical results.
(but 220GB = 1 month of our 2 yrs of data)
IDEA
Scaling inference for discovery, integration & validation
AI “rules on graphs” bring (limited) KR
languages to supercomputing models
Weaver (PhD 2013) showed power of BlueGene/Q for AI
computations
51
From visualization to exploration
… Unfortunately, visualization too often becomes an end product of scientific analysis,
rather than an exploration tool that scientists can use throughout the research life cycle.
However, new database technologies, coupled with emerging Web-based technologies,
may hold the key to lowering the cost of visualization generation and allow it to become
a more integral part of the scientific process.
Tetherless World Constellation
Conclusions
• Our data challenge is becoming “Broad Data”
– World Wide Web trend towards more and more varied
data
• In many domains
– E-commerce, Open Govt, many more (cf. Health/Medical care)
• Broad data requires
– Modern, Web-oriented metadata
– LINKING the metadata, not the data
• Broad data requires thinking outside the
“Database” box
– DIVE: discover, integrate, validate and
– especially: EXPLORE (early, often, rapidly)
Tetherless World Constellation
Questions?

More Related Content

What's hot

The Future(s) of the World Wide Web
The Future(s) of the World Wide WebThe Future(s) of the World Wide Web
The Future(s) of the World Wide WebJames Hendler
 
Watson: An Academic's Perspective
Watson: An Academic's PerspectiveWatson: An Academic's Perspective
Watson: An Academic's PerspectiveJames Hendler
 
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic Web
The Future of AI: Going BeyondDeep Learning, Watson, and the Semantic WebThe Future of AI: Going BeyondDeep Learning, Watson, and the Semantic Web
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic WebJames Hendler
 
KR in the age of Deep Learning
KR in the age of Deep LearningKR in the age of Deep Learning
KR in the age of Deep LearningJames Hendler
 
"Why the Semantic Web will Never Work" (note the quotes)
"Why the Semantic Web will Never Work"  (note the quotes)"Why the Semantic Web will Never Work"  (note the quotes)
"Why the Semantic Web will Never Work" (note the quotes)James Hendler
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningNik Spirin
 
Data Science For Social Scientists Workshop
Data Science For Social Scientists WorkshopData Science For Social Scientists Workshop
Data Science For Social Scientists WorkshopIan Hopkinson
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroData ScienceTech Institute
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)University of Washington
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DUniversity of Washington
 
Demystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine LearningDemystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine LearningJulian Bright
 
Applications of Machine Learning at USC
Applications of Machine Learning at USCApplications of Machine Learning at USC
Applications of Machine Learning at USCSri Ambati
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1NOVA DATASCIENCE
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingUniversity of Washington
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Gabriel Moreira
 

What's hot (20)

The Future(s) of the World Wide Web
The Future(s) of the World Wide WebThe Future(s) of the World Wide Web
The Future(s) of the World Wide Web
 
Broad Data
Broad DataBroad Data
Broad Data
 
Watson: An Academic's Perspective
Watson: An Academic's PerspectiveWatson: An Academic's Perspective
Watson: An Academic's Perspective
 
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic Web
The Future of AI: Going BeyondDeep Learning, Watson, and the Semantic WebThe Future of AI: Going BeyondDeep Learning, Watson, and the Semantic Web
The Future of AI: Going Beyond Deep Learning, Watson, and the Semantic Web
 
KR in the age of Deep Learning
KR in the age of Deep LearningKR in the age of Deep Learning
KR in the age of Deep Learning
 
Intro to Data Science Concepts
Intro to Data Science ConceptsIntro to Data Science Concepts
Intro to Data Science Concepts
 
"Why the Semantic Web will Never Work" (note the quotes)
"Why the Semantic Web will Never Work"  (note the quotes)"Why the Semantic Web will Never Work"  (note the quotes)
"Why the Semantic Web will Never Work" (note the quotes)
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine Learning
 
Data Science For Social Scientists Workshop
Data Science For Social Scientists WorkshopData Science For Social Scientists Workshop
Data Science For Social Scientists Workshop
 
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-ShapiroKeynote -  An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
Keynote - An overview on Big Data & Data Science - Dr Gregory Piatetsky-Shapiro
 
BigDataCSEKeyNote_2012
BigDataCSEKeyNote_2012BigDataCSEKeyNote_2012
BigDataCSEKeyNote_2012
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&D
 
Demystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine LearningDemystifying Data Science with an introduction to Machine Learning
Demystifying Data Science with an introduction to Machine Learning
 
Applications of Machine Learning at USC
Applications of Machine Learning at USCApplications of Machine Learning at USC
Applications of Machine Learning at USC
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1NOVA Data Science Meetup 1/19/2017 - Presentation 1
NOVA Data Science Meetup 1/19/2017 - Presentation 1
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity Computing
 
Python for Data Science - TDC 2015
Python for Data Science - TDC 2015Python for Data Science - TDC 2015
Python for Data Science - TDC 2015
 

Viewers also liked

Social Machines - 2017 Update (University of Iowa)
Social Machines - 2017 Update (University of Iowa)Social Machines - 2017 Update (University of Iowa)
Social Machines - 2017 Update (University of Iowa)James Hendler
 
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...James Hendler
 
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...James Hendler
 
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?James Hendler
 
Visual Data Representation Techniques Combining Art and Design
Visual Data Representation Techniques Combining Art and DesignVisual Data Representation Techniques Combining Art and Design
Visual Data Representation Techniques Combining Art and DesignLogo Design Guru
 

Viewers also liked (6)

Social Machines - 2017 Update (University of Iowa)
Social Machines - 2017 Update (University of Iowa)Social Machines - 2017 Update (University of Iowa)
Social Machines - 2017 Update (University of Iowa)
 
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...Social Machines: The coming collision of Artificial Intelligence, Social Netw...
Social Machines: The coming collision of Artificial Intelligence, Social Netw...
 
Wither OWL
Wither OWLWither OWL
Wither OWL
 
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
 
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
Artificial Intelligence: Existential Threat or Our Best Hope for the Future?
 
Visual Data Representation Techniques Combining Art and Design
Visual Data Representation Techniques Combining Art and DesignVisual Data Representation Techniques Combining Art and Design
Visual Data Representation Techniques Combining Art and Design
 

Similar to Broad Data (India 2015)

The Semantic Web: 2010 Update
The Semantic Web: 2010 UpdateThe Semantic Web: 2010 Update
The Semantic Web: 2010 UpdateJames Hendler
 
The Semantic Web: 2010 Update
The Semantic Web: 2010 Update The Semantic Web: 2010 Update
The Semantic Web: 2010 Update James Hendler
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving UpPaco Nathan
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
In search of lost knowledge: joining the dots with Linked Data
In search of lost knowledge: joining the dots with Linked DataIn search of lost knowledge: joining the dots with Linked Data
In search of lost knowledge: joining the dots with Linked Datajonblower
 
Semantic Web: "ten year" update
Semantic Web: "ten year" updateSemantic Web: "ten year" update
Semantic Web: "ten year" updateJames Hendler
 
The Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataThe Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataJames Hendler
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media suresh sood
 
INSPIRE Hackathon Webinar Intro to Linked Data and Semantics
INSPIRE Hackathon Webinar   Intro to Linked Data and SemanticsINSPIRE Hackathon Webinar   Intro to Linked Data and Semantics
INSPIRE Hackathon Webinar Intro to Linked Data and Semanticsplan4all
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
Tragedy of the (Data) Commons
Tragedy of the (Data) CommonsTragedy of the (Data) Commons
Tragedy of the (Data) CommonsJames Hendler
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Sciencedatasciencekorea
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Anita de Waard
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data ScienceAndrew Gardner
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositoriesChris Rusbridge
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadKelly Technologies
 

Similar to Broad Data (India 2015) (20)

The Semantic Web: 2010 Update
The Semantic Web: 2010 UpdateThe Semantic Web: 2010 Update
The Semantic Web: 2010 Update
 
The Semantic Web: 2010 Update
The Semantic Web: 2010 Update The Semantic Web: 2010 Update
The Semantic Web: 2010 Update
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
In search of lost knowledge: joining the dots with Linked Data
In search of lost knowledge: joining the dots with Linked DataIn search of lost knowledge: joining the dots with Linked Data
In search of lost knowledge: joining the dots with Linked Data
 
Semantic Web: "ten year" update
Semantic Web: "ten year" updateSemantic Web: "ten year" update
Semantic Web: "ten year" update
 
The Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of MetadataThe Unreasonable Effectiveness of Metadata
The Unreasonable Effectiveness of Metadata
 
Spark Social Media
Spark Social Media Spark Social Media
Spark Social Media
 
INSPIRE Hackathon Webinar Intro to Linked Data and Semantics
INSPIRE Hackathon Webinar   Intro to Linked Data and SemanticsINSPIRE Hackathon Webinar   Intro to Linked Data and Semantics
INSPIRE Hackathon Webinar Intro to Linked Data and Semantics
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Tragedy of the (Data) Commons
Tragedy of the (Data) CommonsTragedy of the (Data) Commons
Tragedy of the (Data) Commons
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Science
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Data 101: A Gentle Introduction
Data 101: A Gentle IntroductionData 101: A Gentle Introduction
Data 101: A Gentle Introduction
 
Cs501 dm intro
Cs501 dm introCs501 dm intro
Cs501 dm intro
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data Science
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
ITWS Capstone Lecture (Spring 2013)
ITWS Capstone Lecture (Spring 2013)ITWS Capstone Lecture (Spring 2013)
ITWS Capstone Lecture (Spring 2013)
 

More from James Hendler

Knowing what AI Systems Don't know and Why it matters
Knowing what AI  Systems Don't know and Why it mattersKnowing what AI  Systems Don't know and Why it matters
Knowing what AI Systems Don't know and Why it mattersJames Hendler
 
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")James Hendler
 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityJames Hendler
 
Enhancing Precision Wellness with Personal Health Knowledge Graphs
Enhancing Precision Wellness with Personal Health Knowledge Graphs Enhancing Precision Wellness with Personal Health Knowledge Graphs
Enhancing Precision Wellness with Personal Health Knowledge Graphs James Hendler
 
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...James Hendler
 
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...James Hendler
 
Digital Archiving, The Semantic Web, and Modern AI
Digital Archiving, The Semantic Web, and Modern AIDigital Archiving, The Semantic Web, and Modern AI
Digital Archiving, The Semantic Web, and Modern AIJames Hendler
 
Facilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupFacilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupJames Hendler
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science EducationJames Hendler
 
Watson at RPI - Summer 2013
Watson at RPI - Summer 2013Watson at RPI - Summer 2013
Watson at RPI - Summer 2013James Hendler
 
Future of the World WIde Web (India)
Future of the World WIde Web (India)Future of the World WIde Web (India)
Future of the World WIde Web (India)James Hendler
 

More from James Hendler (11)

Knowing what AI Systems Don't know and Why it matters
Knowing what AI  Systems Don't know and Why it mattersKnowing what AI  Systems Don't know and Why it matters
Knowing what AI Systems Don't know and Why it matters
 
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
Exploring the Boundaries of Artificial Intelligence (or "Modern AI")
 
Knowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/InteroperabilityKnowledge Graph Semantics/Interoperability
Knowledge Graph Semantics/Interoperability
 
Enhancing Precision Wellness with Personal Health Knowledge Graphs
Enhancing Precision Wellness with Personal Health Knowledge Graphs Enhancing Precision Wellness with Personal Health Knowledge Graphs
Enhancing Precision Wellness with Personal Health Knowledge Graphs
 
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...Capacity Building: Data Science in the University  At Rensselaer Polytechnic ...
Capacity Building: Data Science in the University At Rensselaer Polytechnic ...
 
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...Enhancing Precision Wellness with  Knowledge Graphs and Semantic Analytics: O...
Enhancing Precision Wellness with Knowledge Graphs and Semantic Analytics: O...
 
Digital Archiving, The Semantic Web, and Modern AI
Digital Archiving, The Semantic Web, and Modern AIDigital Archiving, The Semantic Web, and Modern AI
Digital Archiving, The Semantic Web, and Modern AI
 
Facilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic MarkupFacilitating Web Science Collaboration through Semantic Markup
Facilitating Web Science Collaboration through Semantic Markup
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science Education
 
Watson at RPI - Summer 2013
Watson at RPI - Summer 2013Watson at RPI - Summer 2013
Watson at RPI - Summer 2013
 
Future of the World WIde Web (India)
Future of the World WIde Web (India)Future of the World WIde Web (India)
Future of the World WIde Web (India)
 

Recently uploaded

Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?IES VE
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 

Recently uploaded (20)

Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?How Accurate are Carbon Emissions Projections?
How Accurate are Carbon Emissions Projections?
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 

Broad Data (India 2015)

  • 1. Tetherless World Constellation Broad Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Director, The Rensselaer Institute of Data Exploration and Applications (IDEA) Rensselaer Polytechnic Institute http://www.cs.rpi.edu/~hendler @jahendler (twitter)
  • 2. Tetherless World Constellation This talk • What I’m not going to talk about much – The Semantic Web (per se) • http://www.slideshare.net/jahendler/semantic-web-the-inside-story – Social Machines • http://www.slideshare.net/jahendler/social-machines-oxford-hendler – My work with Watson and Cognitive Computing • http://www.slideshare.net/jahendler/watson-an-academics-perspective • http://www.slideshare.net/jahendler/watson-summer-review82013final • What I am going to present – The rest of the big data story…
  • 3. Tetherless World Constellation Data is important! • Roughly every 50 years a new power source for the human race is found. Once upon a time it was chemical, then it was electrical, then nuclear, etc. • Information – so not just data, but data being used – is the new power source for our generation. http://www.slideshare.net/jahendler/the-science-of-data-science
  • 4. 4 The Rensselaer Institute for Data Exploration and Applications Business Systems: Built and Natural Environments: Cyber- Resiliency: Policy, Ethics and Stewardship: Materials Informatics:Data-driven Physical/Life Sciences: Healthcare Analytics and Mobile Health: Social Network Analytics: Agents and Augmented Reality:
  • 5. Office of Research 5 Developing a “Data Science” Research Agenda Multiscale Sparcity Abductive Agent-oriented
  • 6. Tetherless World Constellation BIG Data • The term “Big Data” is widely used nowadays to refer to a whole bunch of machine-readable data in one accessible (to the researcher) place – 3 main contexts • The large data collections of “big science” projects – in traditional data warehouse or database formats • The enterprise data of large, non-Web-based companies (IBM, TATA, etc.) – Generally in multiple data formats, stores, warehouses, etc. • The data holdings of a Google, Facebook or other large Web company – Include large “unstructured” holdings – Include “graph” data
  • 7. Tetherless World Constellation But wait, there’s more! • 4th context: Broad Data – The huge amount of freely available, but widely varied, Open Data on the World Wide Web (Structured and Semi-structured) • Example: The extended Facebook OGP graph (the part outside Facebook’s datasets) • Example: dbpedia, yago, wikidata, and other sources of indexed information sources • Example: The growing linked open data cloud of freely available linked data from many domains • Example: millions of datasets that are available on the Web freely available from governments around the world
  • 8. Tetherless World Constellation The V’s Volume Velocity
  • 9. Tetherless World Constellation BROAD data challenges • For broad data the new challenges that emerge include – (Web-scale) data search – “Crowd-sourced” modeling and user testing – rapid (and potentially ad hoc) integration of datasets – visualization and analysis of only-partially modeled datasets – policies for data use, reuse and combination. • Which are an overlooked but critical part of the KDD world
  • 10. Tetherless World Constellation 10 KDD Pipeline – as usually presented Data Storage (Big Data Warehouse) Data Storage (Big Data Warehouse)
  • 11. Tetherless World Constellation KDD Pipeline – in the real world • Data is increasingly being brought in from external sources, with mixed provenance, and increasingly outside the analyzers’ control. • At increasing rates and scalesData Storag e Data Storag e Sensors … apps Social Media Customer Behaviors Web Partners Formatting, standards use, data cleansing, data bias analysis, … Open data Data Storag e Data Storag e Data Storag e Data Storag e Data Storag e Data Storag e Data Storag e Data Storag e Data Storag e Data Storag e Data Storag e Data Storag e Data Sources Data Sources … …
  • 12. Tetherless World Constellation Tough data integration challenges Enterprise analytics Open Data Integration Hard problems!
  • 13. Tetherless World Constellation DIVE into Data Discover Integrate Validate Explore Thinking outside the Database box
  • 14. IDEA Discovery needs semantics How do you find the Data you need?How do you find the Data you need? The answer isn’t: Middle Eastern Terrorists for $800 …
  • 15. IDEA Discovery – there’s a lot out there
  • 16. IDEA Discovery challenge: keyword search won’t work World Bank: Africa US Data.gov: Crop Africover: Agriculture Kenya: Agricultural
  • 17. IDEA Integration challenge: need to understand the data Person RIN 660125137 Address # 1118 Address St Pinehurst Address zip 12203 Course topic CSCI Course # 4961 Campus Personnel RPI ID 660125137 Name Hendler Campus Classes CRN 1118 Name Intro to Physics YES NO!!!!
  • 18. IDEA Semantic Web and Linked Data (UK) County Council Ordnance Survey Royal Mail IOGDC Open Data Tutorial 18
  • 20. IDEA Validation challenge: easy for humans Easy for us
  • 21. IDEA But very hard for machines without people (or knowledge) Head to head comparison shows that burglaries in Avon and Somerset (UK) far exceed those in Los Angeles, California * one of the most dangerous places in the US vs. one of the safest in the UK * fails the “smell test”
  • 22. IDEA Data + everything else you know Same or different? Do the terms mean the same? Are they collected in the same way? Are they processed differently? …
  • 23. Office of Research Exploration challenge: develop/test earlier in pipeline 23 Data Storage Data Storage Sensors and apps Social Media Customer Behaviors Web Partners Formatting, standards use, data cleansing, data bias analysis, … Open data Data Storage Data Storage Data Storage Data Storage Data Storage Data Storage Data Storage Data Storage Data Storage Data Storage Data Storage Data Storage Data Storage Data Storage ExploreExplore Can we develop mechanisms to rapidly develop/test hypotheses prior to entering the full analytics pipeline? Can human perceptual apparatus help?
  • 24. Tetherless World Constellation Exploration challenge is to improve human/data interaction Were there really no fires in 1985?
  • 25. Tetherless World Constellation How do we attack these challenges? DOH? DO! OR
  • 26. Tetherless World Constellation Traditional Metadata • Traditionally metadata tries to be comprehensive – Example:ISO 19115 (GIS standard) • >400 elements • 14 “packages” • Dozens of UML models (not all consistent w/ each other) • After 50 years this still doesn’t work!
  • 27. Tetherless World Constellation The alternative: Not your “father’s metadata” • Big Data on the web – is moving away from traditional relational models (cf. NoSQL) – Moving towards third party application and extension (cf. Json) – Focus on interoperability and exchange with “lightweight” semantics • Using ideas from the Semantic Web – Search: Schema.org – Social Networking: OGP
  • 28. Tetherless World Constellation Semantic Web to Knowledge graph
  • 30. Tetherless World Constellation Google 2014 Google finds embedded metadata on >20% of its crawl – Guha, 2014
  • 31. Tetherless World Constellation • The schema.org hierarchy and details are all available on line –https://schema.org/docs/full.html
  • 34. Tetherless World Constellation Dataset extension to schema.org - April, 2013 Schema.org/Dataset – add this to your pages!
  • 37. Tetherless World Constellation USA “Project Data” – metadata JSON Aimed at developers Based on DCAT
  • 38. Tetherless World Constellation USA “Project Data” – metadata RDFa Embedded metadata for Search, Web Apps Based on Schema.org/Dataset
  • 39. Tetherless World Constellation EU moving in similar direction ADMS
  • 40. Tetherless World Constellation Not just Govt sector • IPTC rNews – Embedded format for online news publications
  • 41. Tetherless World Constellation Not just Govt sector • Goodrelations – Embedded format for online products/catalogs
  • 42. Tetherless World Constellation Not just Govt sector • Open Graph Protocol – Embedded format for Facebook relationships
  • 44. Tetherless World Constellation Next steps Smith James June 4 Jones Fred May 17 O’Connell Frank April 3 Chang Wu February 21 Hoffman Bernd December 9 Person Date It’s not enough just to describe the data elements…
  • 45. Tetherless World Constellation Describing a dataset … requires a context Smith James June 4 Jones Fred May 17 O’Connell Frank April 3 Chang Wu February 21 Hoffman Bernd December 9 Person Date 1976 Dates of Birth
  • 46. Tetherless World Constellation Describing a dataset … requires a context How do we capture more of this information? Smith James June 4 Jones Fred May 17 O’Connell Frank April 3 Chang Wu February 21 Hoffman Bernd December 9 Person Date 1976 Cancer Mortality dates
  • 49. IDEA ARL Network-Science CTA 1 10 100 1000 1 10 100 1000 Count Time interval (# of days) Mentorship first Housing first 1 10 100 1000 1 10 100 1000 Count Time interval (# of days) Mentorship first Housing trust first 1 10 100 1000 1 10 100 1000 Count Time interval (# of days) Housing trust first Mentorship first 1 10 100 1000 1 10 100 1000 Count Time interval (# of days) Housing trust first Mentorship first A C B D 0 50 100 150 200 250 300 350 400 -300 -200 -100 0 100 200 300 Count Time interval (# of days) 0 100 200 300 400 500 600 700 -300 -200 -100 0 100 200 300 Count Time interval (# of days) 0 50 100 150 200 250 300 350 400 450 -300 -200 -100 0 100 200 300 Count Time interval (# of days) 0 50 100 150 200 250 300 350 -300 -200 -100 0 100 200 300 Count Time interval (# of days) A C B D Algorithms designed Y3 were tested against 220GB of data from Everquest II game looking for proxy measures of trust - Performance results on real data showed good correspondence with theoretical results. (but 220GB = 1 month of our 2 yrs of data)
  • 50. IDEA Scaling inference for discovery, integration & validation AI “rules on graphs” bring (limited) KR languages to supercomputing models Weaver (PhD 2013) showed power of BlueGene/Q for AI computations
  • 51. 51 From visualization to exploration … Unfortunately, visualization too often becomes an end product of scientific analysis, rather than an exploration tool that scientists can use throughout the research life cycle. However, new database technologies, coupled with emerging Web-based technologies, may hold the key to lowering the cost of visualization generation and allow it to become a more integral part of the scientific process.
  • 52. Tetherless World Constellation Conclusions • Our data challenge is becoming “Broad Data” – World Wide Web trend towards more and more varied data • In many domains – E-commerce, Open Govt, many more (cf. Health/Medical care) • Broad data requires – Modern, Web-oriented metadata – LINKING the metadata, not the data • Broad data requires thinking outside the “Database” box – DIVE: discover, integrate, validate and – especially: EXPLORE (early, often, rapidly)

Editor's Notes

  1. this is the data science agenda- basically, these are the hard problems in the closing the loop – how to go from the correlation on one side to the causal on the other – I don’t love the term agent-oriented, but we mean a combination of unstructured, AI, etc – abductive is usually where I talk about these being hard inverse problems where we don’t know a specific function, but rathr are looking for an explanation.