Not all data is born equal - B.C Open Data Summit 2013

Not all open data is born equal

Some context

• Canadian nonprofit that builds websites and tools to help
governments and citizens engage with each other

• Follows two main strategies:

Improve access to government information via open data

Make participation easy and meaningful

Ongoing projects
• Citizen Budget: a online consultative budget simulator for
municipalities and civil society organizations

• Represent: the largest database and open API of elected
Canadian officials with two drupal modules for easy website
integration

• MaMairie/MyCityHall: an online portal for tracking and
interacting with your city hall

• Open511: an open data standard for traffic data and basic
related tools

Data = Natural Resources?

Source: USGS Source: James St-Jones (cc-by)

Value! Meh?
Hint: this is bauxite

Value extraction
Diamond Aluminum

Extract Discover it’s valuable

 Elaborate process
Cut 
Industrialize process
 
Tada! …

Cans, Car parts, etc.

Traffic and Transit data
• Sort of case study

– Region of San Francisco: 2 leader organizations

• Bay Area Rapid Transit (BART): 80+ apps

• Metropolitan Transportation Commission (MTC): handful
of apps

– Same (full of geeks and startups) region

– Same “type” of data (transportation)

– Both organizations are innovative

Let’s look at “intrinsic” data value

1. Standardization
• Transit data

– GTFS & SIRI: open data-oriented standards

– Used by 250 transit/transportation agencies

• Traffic

– Several standards (TMDD, TPEG, etc.), but difficult to use
in an open data context

⇒ Standard = low barrier to entry,

⇒ Tools/apps built for these standards can reach lots of
customers

2. Self sufficient
• Transit data

– Data can be interpreted on its own. No need for external
data

• Traffic

– Several subsets of related data (accident, constructions,
road data, etc.)

– Data managed by several jurisdictions (local, regional,
provincial, federal)

⇒ Managing several sources and several datasets is always…
complex

3. Complexity
• Transit

– (Quite) simple: some schedules, some fares, some spatial
data

• Traffic

– Complex: networks are wide, intertwined, with lots of rules,
lots of “free” actors

⇒ Modeling complex data is… complex and more prone to
discrepancy

4. Reliability
• Transit

– Usually buses and trains follow their schedule

– Adding a GPS on each single bus is simple and give
almost 100% reliability of the data

• Traffic

– Impossible to monitor every single road segment

⇒ Lack of reliability has a strong, negative impact on data value

Techno-utopian dream
Your iphone 8S 

Dear smartphone,
I need to pick
the kids at school
as fast as possible,
what’s the best
choice?

A wealth of data
Road events Gaz price Road data

Parking data Crowdsourced data

Realtime traffic sensors (gov) Planned trip

Car efficiency Realtime traffic (business)

Personal data: car, location, habits

Multiplicative effect
• “Diamond” data self-sufficient: a strength for adoption

• For all data: real value is in cross-use with other datasets

• Some datasets will find their value because of the existence of
other datasets

• Adding new datasets has a multiplier effects on existing
related datasets

Not only gov data
• Usually open data = open government data

• But open data can be much more

Road events Car, transit pass, bike share
Road data Open Open Transportation habits
Traffic data Gov personal Planned trip
Parking data Crowdsourced data
Data data

Open (?) Bike share
Gaz price
data from
Traffic data
companies Parking data
Vehicle efficiency

Some innovation theory
Gartner’s hype cycle of innovation (but it is not only about hype)

Stairway to heaven
(internet-style)
You might …or here
be here…
Peak of Plateau of
inflated expectations productivity
Slope of
Trough of enlightment
disillusionment
Innovation
trigger

Abyssal
crash

Conclusion
• Assess your datasets: diamond vs bauxite analogy or any
other analysis framework

• All datasets are not born equal, some might take more time to
show their value

• Help discovery and value extraction process

• Follow “open” standards when they exist or participate to their
elaboration

• Improve reliability of data where possible

• Be patient… but active!

Stéphane Guidoin
@hoedic

Twitter: @opennorth Facebook: OpenNorth.NordOuvert
Blog: www.opennorth.ca/blog

Not all data is born equal - B.C Open Data Summit 2013

Recommended

Recommended

More Related Content

What's hot

What's hot (12)

Similar to Not all data is born equal - B.C Open Data Summit 2013

Similar to Not all data is born equal - B.C Open Data Summit 2013 (20)

Recently uploaded

Recently uploaded (20)

Not all data is born equal - B.C Open Data Summit 2013

Editor's Notes