SlideShare a Scribd company logo
1 of 32
Download to read offline
Big Data Challenges: Getting Some
March 31, 2011




Gil Elbaz
   @factual
   @gilelbaz
Road to Information Singularity




                          Conf dential
                             i           2
Networks Underlying Information Flow


                                                                                ! Density:
                                                                                  number of
                                                                                  connecting paths
                                                                                ! Plasticity:
                                                                                  ease of forming
                                                                                  new paths
                                                                                !
                                                                                  Speed & Flow:
 !""#$%%&&&'()*++,-+.*/(01,-211(**3'4*5%()*++,-+6.*/6(01,-211%
                                                                                  rate of information
                                                                                  transfer


                                                                 Conf dential
                                                                    i                                   3
The Internet




               !""#$%%&&&'7578*-'4*5%9,+,"7):;/*11/*7<1:=52/,47-:>2)24*550-,47",*-1:?-"2/-2"%<#%@ABACD@ECF




                                                          Conf dential
                                                             i                                               4
Search Engines




                 Conf dential
                    i           5
Social Networks: Facebook




                            !""#$%%A'(#'()*+1#*"'4*5%



             600 million Facebook users
                 130 average friends
              8 friend requests / month

              15 messages / day / user
                        Conf dential
                           i                            6
Trending of Unfriending




                          Conf dential
                             i           7
Conf dential
   i           8
Unfriending




              Conf dential
                 i           9
Another Network: The Brain




                  100 billion neurons

               1000 ‘hardwired’ synapses




                       !""#$%%&2)4*52"*G57/"'4*5%A@CC%@C




                            Conf dential
                               i                           10
Web 3.0: Data Web




                    Conf dential
                       i           11
Web Scale Data = More Pain


                     Findability
                       Access
                       Rights
                     Economics
                     Standards
             Integration & Aggregation
                        Trust
                         Conf dential
                            i            12
Web 2.0 Model: Scale-Free Networks




&&&'.0"0/22H#)*/7",*-'-2"   Conf dential
                               i           13
Book Data: Progress Being Made




    Google Book Search API
      Open Library Books API
         ISBNdb
           Amazon API
             LibraryThing
                GoodReads
                   WorldCat


                        Conf dential
                           i           14
Google Book Search API                 Amazon API
    Open Library Books API                LibraryThing
         ISBNdb     WorldCat           GoodReads



            I,-<7(,),"JKKKKKKKKKKKK=44211KKKKKKK
            L,+!"1KKKKKKKKKKKKM4*-*5,41KKKKKKK
            N"7-<7/<1KKKKKKKKKKKKKKK>/01"KKKKKK




                        Conf dential
                           i
Another Case Study: Local Data




                                        !""#$%%1"2O24!2-2J'#*1"2/*01'4*5%




                         Conf dential
                            i                                               16
Another Case Study: Local Data


    !"##$%$&$'$(#)*+()(,-&(##)%.'/!"#$%"$&"'$()*$*!)$%+*+0

         !"#$$%&                                  !"#$$%&
      '()%*++,       Examine Twitter sentiment    '()%*++,
                     (avoid dirty coffee shops)
           -++.$                                  -++.$
     '+/&01/(&%       Identify areas of highest   '+/&01/(&%
                               bike thefts
             2%3.                                 2%3.
          4#33+"                                  4#33+"
                      Correlate check-ins with
         5++63%            property values        5++63%
 7+8%9:/;)$#+;                                    7+8%9:/;)$#+;

                               Conf dential
                                  i                               17
HomeJunction




               Conf dential
                  i           18
Factual is Example of New Information Network

      "#$#%&'(   )'$&*+*#(&(    345&*'6&'$       ,-./#'&01&-*'&2

                           ,-."'-%$%+*+




                 Aggregate      Mash Curate
                   Dedupe       Canonicalize



          Developers      Publishers          Search Engines


        !"#$%"&'()"*+$,-.-/(0(1("*+$%231#-&"$4..*

                               Conf dential
                                  i                                19
Factual’s Open Data Model

  Free, access via APIs, SDKs, and downloads BUT…
     we ask you to contribute back into ecosystem.

                                           Benef ts
                                               i

                                           ! Drive down costs
                                           ! Rapid iteration
                                           ! Differentiate on user
                                              experience

                                           ! Only need small %
                                              participation from world
                                              (e.g. Wikipedia)



                            Conf dential
                               i                                         20
Equivalence Measurements




                     =?
    Subway Sandwiches                 Subway
    52 E Court St                     52 West Court St
    Cincinnati, OH                    45202
    (513)-241-6699                    (800)-653-2323


                       Conf dential
                          i                              21
Large-Scale Aggregation Technologies




                         Conf dential
                            i           22
Large-Scale Aggregation Technologies

                      =#7/"52-"1KPK=#"1
                         ;2-"2/KPK;"/
                    ;*/#KPK;*/#*/7",*-
                         N2/O,42KPKNO4
                        =""*/-2JKPK=""J
                      =11*4KPK=11*4,7"21
                      ?-4KPK?-4*/#*/7"2<
                      =11-KPK=11*4,7",*-
                        ;*KPK;*5#7-J
                          Q*0-"KPKQ"
                        R/*1KPKR/*"!2/1
                KKKKKKKRRSKPKR7/(2T02KKK'''''
                           U*/,KPK>2<
                          Conf dential
                             i                  23
Large-Scale Aggregation Technologies

                 L21"70/7-"KPKL1"/-"
               L21"70/7-"KPKL21"07/7-"
                    V*1#KPKV*1#,"7)
                   R,))7/<1KPKR,)),7/<1
                       N7)*-KPKN)-
                     R0..2"KPKR0..2""
                      ;2-"2/KPK;"/
                  =#7/"52-"1KPK=#"1
                     R*0",T02KPKR"T
                   W2&2)2/1KPKW2&)2/1
                    ;)27-2/1KPK;)-/1
                KKKKKQ7/32"KPKQ3"8K'''''
                  X/7+2-KPKYZL2,))JK[
                         Conf dential
                            i              24
Kragen O'Reilly?




                   Conf dential
                      i           25
Large-Scale Deduping




   • Specialized data compression & folding techniques
   • Eliminate redundant entities - endpoints and authority pages
   • Improves precision & recall
   • Enables real-time dedupe and crosswalks

                               Conf dential
                                  i                                 26
Shared Foundational Data

  ! Commoditization of data
  ! Head attributes for people, places, things decreasing in value
    ! hCard data value driven to zero (visual of local data being
       identical on thousand of apps)
    ! Entertainment: IMDB exposed all their data for non-
       commercial use (link to site map)
    ! Yet, there are still lots of errors in foundation data – thus
       need “living” model




                                Conf dential
                                   i
LA Neighborhoods: Another Crowdsourcing Example




 ! LA Times started with 87
   neighborhoods based on census
   tracts
 ! Incorporated 650+ user maps
 ! Ended with 114 neighborhoods for
   LA City
 ! Added additional 158
   neighborhoods for LA County




                                   Conf dential
                                      i
Ownership & Rights: LA Neighborhoods:


  ! Terms of Service:
    Creative Commons
    Attribution,
    Noncommercial, Share-
    Alike license
  ! Can share and remix as
    long as it’s for
    noncommercial uses,
    attributed to the LA
    Times, and shared
    under the same terms



                             Conf dential
                                i
Evolving “Buy” Model


 ! Data Marketplaces (“itunes of data?”)
 ! Data Search Engines
 ! Microformats / Semantic Web Markups / Other
   Standards
 ! Electronic Forms of T&Cs




                            Conf dential
                               i
Summary: Road to the Information Singularity


 ! Rise in community storage and access
 ! New common schemas and standards
 ! Def nitive, accountable sources of “open” data
     i
 ! Trends towards sharing of foundational data
 ! 'Buy' models based on unique data, novel access
   methods, SLAs, value-added services




                            Conf dential
                               i                     31
Thank you!
              Questions......

Gil Elbaz
  @factual
  @gilelbaz

More Related Content

Similar to Factual 2011 Web 2.0 Presentation

Presentatie KHN Drechtsteden 01032010
Presentatie KHN Drechtsteden 01032010Presentatie KHN Drechtsteden 01032010
Presentatie KHN Drechtsteden 01032010Jeroen van der Schenk
 
EarthCube DDMA AGU
EarthCube DDMA AGUEarthCube DDMA AGU
EarthCube DDMA AGUTanu Malik
 
Transition web project_survey_presentation_final
Transition web project_survey_presentation_finalTransition web project_survey_presentation_final
Transition web project_survey_presentation_finalEd Mitchell
 
Visualizing sociotechnicalsystemsfinalx
Visualizing sociotechnicalsystemsfinalxVisualizing sociotechnicalsystemsfinalx
Visualizing sociotechnicalsystemsfinalxrsd6
 
Evolution of Social Software in IBM
Evolution of Social Software in IBMEvolution of Social Software in IBM
Evolution of Social Software in IBMChris Sparshott
 
Internet Fundraising (International FR Festival, Prague)
Internet Fundraising (International FR Festival, Prague)Internet Fundraising (International FR Festival, Prague)
Internet Fundraising (International FR Festival, Prague)Igor Polakovic
 
Data on the web - an inconvenient truth
Data on the web - an inconvenient truthData on the web - an inconvenient truth
Data on the web - an inconvenient truthmarcobrattinga
 
Saiful Hidayat Trend Teknologi Dijital Dan E Commerce
Saiful Hidayat   Trend Teknologi Dijital Dan E CommerceSaiful Hidayat   Trend Teknologi Dijital Dan E Commerce
Saiful Hidayat Trend Teknologi Dijital Dan E CommerceSaiful Hidayat
 
Economics of innovation in mobile
Economics of innovation in mobileEconomics of innovation in mobile
Economics of innovation in mobileAndrew Savory
 
Cognitive IoT @ re.work technology summit london
Cognitive IoT @ re.work technology summit londonCognitive IoT @ re.work technology summit london
Cognitive IoT @ re.work technology summit londonRaffaele Giaffreda
 
Plays Well With Others
Plays Well With OthersPlays Well With Others
Plays Well With Othersbrianoberkirch
 
API Vulnerabilties and What to Do About Them
API Vulnerabilties and What to Do About ThemAPI Vulnerabilties and What to Do About Them
API Vulnerabilties and What to Do About ThemEoin Woods
 
MIT Trust Data Alliance building tomorrow’s smart city data systems
MIT Trust Data Alliance building tomorrow’s smart city data systemsMIT Trust Data Alliance building tomorrow’s smart city data systems
MIT Trust Data Alliance building tomorrow’s smart city data systemsBoston Global Forum
 
Brian Dowling Web 20 30 Social Networking
Brian Dowling Web 20 30 Social NetworkingBrian Dowling Web 20 30 Social Networking
Brian Dowling Web 20 30 Social Networkingebestes
 
Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...Alexandre Passant
 
10 in 2010 - How the Internet has changed how kids & tweens consume cultural ...
10 in 2010 - How the Internet has changed how kids & tweens consume cultural ...10 in 2010 - How the Internet has changed how kids & tweens consume cultural ...
10 in 2010 - How the Internet has changed how kids & tweens consume cultural ...AdNerds
 
The Social Semantic Web and Linked Data
The Social Semantic Web and Linked DataThe Social Semantic Web and Linked Data
The Social Semantic Web and Linked DataAlexandre Passant
 

Similar to Factual 2011 Web 2.0 Presentation (20)

Digital Xperience Trendsession
Digital Xperience TrendsessionDigital Xperience Trendsession
Digital Xperience Trendsession
 
Presentatie KHN Drechtsteden 01032010
Presentatie KHN Drechtsteden 01032010Presentatie KHN Drechtsteden 01032010
Presentatie KHN Drechtsteden 01032010
 
EarthCube DDMA AGU
EarthCube DDMA AGUEarthCube DDMA AGU
EarthCube DDMA AGU
 
Transition web project_survey_presentation_final
Transition web project_survey_presentation_finalTransition web project_survey_presentation_final
Transition web project_survey_presentation_final
 
Visualizing sociotechnicalsystemsfinalx
Visualizing sociotechnicalsystemsfinalxVisualizing sociotechnicalsystemsfinalx
Visualizing sociotechnicalsystemsfinalx
 
Baker and Dekkers, "Dublin Core: The Road from Metadata Formats to Linked Data"
Baker and Dekkers, "Dublin Core: The Road from Metadata Formats to Linked Data"Baker and Dekkers, "Dublin Core: The Road from Metadata Formats to Linked Data"
Baker and Dekkers, "Dublin Core: The Road from Metadata Formats to Linked Data"
 
U learn11 tmp
U learn11 tmpU learn11 tmp
U learn11 tmp
 
Evolution of Social Software in IBM
Evolution of Social Software in IBMEvolution of Social Software in IBM
Evolution of Social Software in IBM
 
Internet Fundraising (International FR Festival, Prague)
Internet Fundraising (International FR Festival, Prague)Internet Fundraising (International FR Festival, Prague)
Internet Fundraising (International FR Festival, Prague)
 
Data on the web - an inconvenient truth
Data on the web - an inconvenient truthData on the web - an inconvenient truth
Data on the web - an inconvenient truth
 
Saiful Hidayat Trend Teknologi Dijital Dan E Commerce
Saiful Hidayat   Trend Teknologi Dijital Dan E CommerceSaiful Hidayat   Trend Teknologi Dijital Dan E Commerce
Saiful Hidayat Trend Teknologi Dijital Dan E Commerce
 
Economics of innovation in mobile
Economics of innovation in mobileEconomics of innovation in mobile
Economics of innovation in mobile
 
Cognitive IoT @ re.work technology summit london
Cognitive IoT @ re.work technology summit londonCognitive IoT @ re.work technology summit london
Cognitive IoT @ re.work technology summit london
 
Plays Well With Others
Plays Well With OthersPlays Well With Others
Plays Well With Others
 
API Vulnerabilties and What to Do About Them
API Vulnerabilties and What to Do About ThemAPI Vulnerabilties and What to Do About Them
API Vulnerabilties and What to Do About Them
 
MIT Trust Data Alliance building tomorrow’s smart city data systems
MIT Trust Data Alliance building tomorrow’s smart city data systemsMIT Trust Data Alliance building tomorrow’s smart city data systems
MIT Trust Data Alliance building tomorrow’s smart city data systems
 
Brian Dowling Web 20 30 Social Networking
Brian Dowling Web 20 30 Social NetworkingBrian Dowling Web 20 30 Social Networking
Brian Dowling Web 20 30 Social Networking
 
Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...Federating Distributed Social Data to Build an Interlinked Online Information...
Federating Distributed Social Data to Build an Interlinked Online Information...
 
10 in 2010 - How the Internet has changed how kids & tweens consume cultural ...
10 in 2010 - How the Internet has changed how kids & tweens consume cultural ...10 in 2010 - How the Internet has changed how kids & tweens consume cultural ...
10 in 2010 - How the Internet has changed how kids & tweens consume cultural ...
 
The Social Semantic Web and Linked Data
The Social Semantic Web and Linked DataThe Social Semantic Web and Linked Data
The Social Semantic Web and Linked Data
 

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 

Factual 2011 Web 2.0 Presentation

  • 1. Big Data Challenges: Getting Some March 31, 2011 Gil Elbaz @factual @gilelbaz
  • 2. Road to Information Singularity Conf dential i 2
  • 3. Networks Underlying Information Flow ! Density: number of connecting paths ! Plasticity: ease of forming new paths ! Speed & Flow: !""#$%%&&&'()*++,-+.*/(01,-211(**3'4*5%()*++,-+6.*/6(01,-211% rate of information transfer Conf dential i 3
  • 4. The Internet !""#$%%&&&'7578*-'4*5%9,+,"7):;/*11/*7<1:=52/,47-:>2)24*550-,47",*-1:?-"2/-2"%<#%@ABACD@ECF Conf dential i 4
  • 5. Search Engines Conf dential i 5
  • 6. Social Networks: Facebook !""#$%%A'(#'()*+1#*"'4*5% 600 million Facebook users 130 average friends 8 friend requests / month 15 messages / day / user Conf dential i 6
  • 7. Trending of Unfriending Conf dential i 7
  • 9. Unfriending Conf dential i 9
  • 10. Another Network: The Brain 100 billion neurons 1000 ‘hardwired’ synapses !""#$%%&2)4*52"*G57/"'4*5%A@CC%@C Conf dential i 10
  • 11. Web 3.0: Data Web Conf dential i 11
  • 12. Web Scale Data = More Pain Findability Access Rights Economics Standards Integration & Aggregation Trust Conf dential i 12
  • 13. Web 2.0 Model: Scale-Free Networks &&&'.0"0/22H#)*/7",*-'-2" Conf dential i 13
  • 14. Book Data: Progress Being Made Google Book Search API Open Library Books API ISBNdb Amazon API LibraryThing GoodReads WorldCat Conf dential i 14
  • 15. Google Book Search API Amazon API Open Library Books API LibraryThing ISBNdb WorldCat GoodReads I,-<7(,),"JKKKKKKKKKKKK=44211KKKKKKK L,+!"1KKKKKKKKKKKKM4*-*5,41KKKKKKK N"7-<7/<1KKKKKKKKKKKKKKK>/01"KKKKKK Conf dential i
  • 16. Another Case Study: Local Data !""#$%%1"2O24!2-2J'#*1"2/*01'4*5% Conf dential i 16
  • 17. Another Case Study: Local Data !"##$%$&$'$(#)*+()(,-&(##)%.'/!"#$%"$&"'$()*$*!)$%+*+0 !"#$$%& !"#$$%& '()%*++, Examine Twitter sentiment '()%*++, (avoid dirty coffee shops) -++.$ -++.$ '+/&01/(&% Identify areas of highest '+/&01/(&% bike thefts 2%3. 2%3. 4#33+" 4#33+" Correlate check-ins with 5++63% property values 5++63% 7+8%9:/;)$#+; 7+8%9:/;)$#+; Conf dential i 17
  • 18. HomeJunction Conf dential i 18
  • 19. Factual is Example of New Information Network "#$#%&'( )'$&*+*#(&( 345&*'6&'$ ,-./#'&01&-*'&2 ,-."'-%$%+*+ Aggregate Mash Curate Dedupe Canonicalize Developers Publishers Search Engines !"#$%"&'()"*+$,-.-/(0(1("*+$%231#-&"$4..* Conf dential i 19
  • 20. Factual’s Open Data Model Free, access via APIs, SDKs, and downloads BUT… we ask you to contribute back into ecosystem. Benef ts i ! Drive down costs ! Rapid iteration ! Differentiate on user experience ! Only need small % participation from world (e.g. Wikipedia) Conf dential i 20
  • 21. Equivalence Measurements =? Subway Sandwiches Subway 52 E Court St 52 West Court St Cincinnati, OH 45202 (513)-241-6699 (800)-653-2323 Conf dential i 21
  • 23. Large-Scale Aggregation Technologies =#7/"52-"1KPK=#"1 ;2-"2/KPK;"/ ;*/#KPK;*/#*/7",*- N2/O,42KPKNO4 =""*/-2JKPK=""J =11*4KPK=11*4,7"21 ?-4KPK?-4*/#*/7"2< =11-KPK=11*4,7",*- ;*KPK;*5#7-J Q*0-"KPKQ" R/*1KPKR/*"!2/1 KKKKKKKRRSKPKR7/(2T02KKK''''' U*/,KPK>2< Conf dential i 23
  • 24. Large-Scale Aggregation Technologies L21"70/7-"KPKL1"/-" L21"70/7-"KPKL21"07/7-" V*1#KPKV*1#,"7) R,))7/<1KPKR,)),7/<1 N7)*-KPKN)- R0..2"KPKR0..2"" ;2-"2/KPK;"/ =#7/"52-"1KPK=#"1 R*0",T02KPKR"T W2&2)2/1KPKW2&)2/1 ;)27-2/1KPK;)-/1 KKKKKQ7/32"KPKQ3"8K''''' X/7+2-KPKYZL2,))JK[ Conf dential i 24
  • 25. Kragen O'Reilly? Conf dential i 25
  • 26. Large-Scale Deduping • Specialized data compression & folding techniques • Eliminate redundant entities - endpoints and authority pages • Improves precision & recall • Enables real-time dedupe and crosswalks Conf dential i 26
  • 27. Shared Foundational Data ! Commoditization of data ! Head attributes for people, places, things decreasing in value ! hCard data value driven to zero (visual of local data being identical on thousand of apps) ! Entertainment: IMDB exposed all their data for non- commercial use (link to site map) ! Yet, there are still lots of errors in foundation data – thus need “living” model Conf dential i
  • 28. LA Neighborhoods: Another Crowdsourcing Example ! LA Times started with 87 neighborhoods based on census tracts ! Incorporated 650+ user maps ! Ended with 114 neighborhoods for LA City ! Added additional 158 neighborhoods for LA County Conf dential i
  • 29. Ownership & Rights: LA Neighborhoods: ! Terms of Service: Creative Commons Attribution, Noncommercial, Share- Alike license ! Can share and remix as long as it’s for noncommercial uses, attributed to the LA Times, and shared under the same terms Conf dential i
  • 30. Evolving “Buy” Model ! Data Marketplaces (“itunes of data?”) ! Data Search Engines ! Microformats / Semantic Web Markups / Other Standards ! Electronic Forms of T&Cs Conf dential i
  • 31. Summary: Road to the Information Singularity ! Rise in community storage and access ! New common schemas and standards ! Def nitive, accountable sources of “open” data i ! Trends towards sharing of foundational data ! 'Buy' models based on unique data, novel access methods, SLAs, value-added services Conf dential i 31
  • 32. Thank you! Questions...... Gil Elbaz @factual @gilelbaz