Gil Elbaz, CEO and founder of Factual, gave a talk at the 2011 Web 2.0 Conference in San Francisco. His talk was entitled: "Big Data Challenges: Getting Some."
3. Networks Underlying Information Flow
! Density:
number of
connecting paths
! Plasticity:
ease of forming
new paths
!
Speed & Flow:
!""#$%%&&&'()*++,-+.*/(01,-211(**3'4*5%()*++,-+6.*/6(01,-211%
rate of information
transfer
Conf dential
i 3
4. The Internet
!""#$%%&&&'7578*-'4*5%9,+,"7):;/*11/*7<1:=52/,47-:>2)24*550-,47",*-1:?-"2/-2"%<#%@ABACD@ECF
Conf dential
i 4
6. Social Networks: Facebook
!""#$%%A'(#'()*+1#*"'4*5%
600 million Facebook users
130 average friends
8 friend requests / month
15 messages / day / user
Conf dential
i 6
12. Web Scale Data = More Pain
Findability
Access
Rights
Economics
Standards
Integration & Aggregation
Trust
Conf dential
i 12
13. Web 2.0 Model: Scale-Free Networks
&&&'.0"0/22H#)*/7",*-'-2" Conf dential
i 13
14. Book Data: Progress Being Made
Google Book Search API
Open Library Books API
ISBNdb
Amazon API
LibraryThing
GoodReads
WorldCat
Conf dential
i 14
15. Google Book Search API Amazon API
Open Library Books API LibraryThing
ISBNdb WorldCat GoodReads
I,-<7(,),"JKKKKKKKKKKKK=44211KKKKKKK
L,+!"1KKKKKKKKKKKKM4*-*5,41KKKKKKK
N"7-<7/<1KKKKKKKKKKKKKKK>/01"KKKKKK
Conf dential
i
16. Another Case Study: Local Data
!""#$%%1"2O24!2-2J'#*1"2/*01'4*5%
Conf dential
i 16
17. Another Case Study: Local Data
!"##$%$&$'$(#)*+()(,-&(##)%.'/!"#$%"$&"'$()*$*!)$%+*+0
!"#$$%& !"#$$%&
'()%*++, Examine Twitter sentiment '()%*++,
(avoid dirty coffee shops)
-++.$ -++.$
'+/&01/(&% Identify areas of highest '+/&01/(&%
bike thefts
2%3. 2%3.
4#33+" 4#33+"
Correlate check-ins with
5++63% property values 5++63%
7+8%9:/;)$#+; 7+8%9:/;)$#+;
Conf dential
i 17
19. Factual is Example of New Information Network
"#$#%&'( )'$&*+*#(&( 345&*'6&'$ ,-./#'&01&-*'&2
,-."'-%$%+*+
Aggregate Mash Curate
Dedupe Canonicalize
Developers Publishers Search Engines
!"#$%"&'()"*+$,-.-/(0(1("*+$%231#-&"$4..*
Conf dential
i 19
20. Factual’s Open Data Model
Free, access via APIs, SDKs, and downloads BUT…
we ask you to contribute back into ecosystem.
Benef ts
i
! Drive down costs
! Rapid iteration
! Differentiate on user
experience
! Only need small %
participation from world
(e.g. Wikipedia)
Conf dential
i 20
21. Equivalence Measurements
=?
Subway Sandwiches Subway
52 E Court St 52 West Court St
Cincinnati, OH 45202
(513)-241-6699 (800)-653-2323
Conf dential
i 21
26. Large-Scale Deduping
• Specialized data compression & folding techniques
• Eliminate redundant entities - endpoints and authority pages
• Improves precision & recall
• Enables real-time dedupe and crosswalks
Conf dential
i 26
27. Shared Foundational Data
! Commoditization of data
! Head attributes for people, places, things decreasing in value
! hCard data value driven to zero (visual of local data being
identical on thousand of apps)
! Entertainment: IMDB exposed all their data for non-
commercial use (link to site map)
! Yet, there are still lots of errors in foundation data – thus
need “living” model
Conf dential
i
28. LA Neighborhoods: Another Crowdsourcing Example
! LA Times started with 87
neighborhoods based on census
tracts
! Incorporated 650+ user maps
! Ended with 114 neighborhoods for
LA City
! Added additional 158
neighborhoods for LA County
Conf dential
i
29. Ownership & Rights: LA Neighborhoods:
! Terms of Service:
Creative Commons
Attribution,
Noncommercial, Share-
Alike license
! Can share and remix as
long as it’s for
noncommercial uses,
attributed to the LA
Times, and shared
under the same terms
Conf dential
i
30. Evolving “Buy” Model
! Data Marketplaces (“itunes of data?”)
! Data Search Engines
! Microformats / Semantic Web Markups / Other
Standards
! Electronic Forms of T&Cs
Conf dential
i
31. Summary: Road to the Information Singularity
! Rise in community storage and access
! New common schemas and standards
! Def nitive, accountable sources of “open” data
i
! Trends towards sharing of foundational data
! 'Buy' models based on unique data, novel access
methods, SLAs, value-added services
Conf dential
i 31
32. Thank you!
Questions......
Gil Elbaz
@factual
@gilelbaz