A presentation I gave at I-Semantics 2010 on Sigma EE, an RDF-based data integration front-end.
Sigma EE is now available for download here: http://sig.ma/?page=help
Sigma EE: Reaping low-hanging fruits in RDF-based data integration
1. Sigma EE: Reaping low-hanging fruits in RDF-based data integration Richard Cyganiak I-Semantics 2010, Graz
2. Intro Semantic Technologies conferences In-use Tracks Applications session D2RQ Expose contents of relational databases as RDF/SPARQL Just a format converter; what do people use it for?
3. The common theme … Integration of data across the organization/project 3 of XYZ
7. Sigma EE Originally not built for enterprise data but for web data Sindice, search engine for the Web of Data Microformats, RDFa, Linked Data on the Web For building apps on top of data search API http://sindice.com/ How to show the richness of all that data? http://sig.ma/
10. Background The problem: How to provide uniform access to heterogeneous data sources? Value-added services: Search Browsing Recommendations of related items Reporting Dashboarding Notifications …
11. Solutions? Data Warehousing Enterprise Information Integration Enterprise Search A middle ground in-between?
12. Data Warehousing, EII Integrate enterprise data sources into a new data source Data Warehouse: materialized (new DB) Enterprise Information Integration: virtual (distrib. queries) Focus on data Tight integration High up-front cost
13. Enterprise Search Provides the most sought-after service (search) Focus on documents full-text search Lower up-front cost (no schema alignment) Providing value-added services on top is difficult
14. A middle ground Start by providing access to data on a per-business-object basis without prior schema alignment Services: Browsing of the catalog of objects; search Align, link and reconciliate as required to enable more services, e.g., expressive queries
15. A middle ground No accepted term yet Data Spaces? Pay-as-you-go Data Integration? Linked Enterprise Data?
16. The RDF technology stack A standards-based “data-first” approach RDF, SPARQL, OWL – W3C standards Off-the-shelf components Integrates well with web data sources
17. The “RDF Bus” Various implementation strategies ETL + One Big Triple Store with SPARQL endpoint Several SPARQL endpoint (SPARQL 1.1 SERVICE feature?) Linked Data style (resolvable URIs) Bus details determine what services can be provided Can you do high-performance SPARQL? Can you do full-text search? Real-time up-to-date information or significant delay? Where is alignment handled? Who can hook in new data sources?
18. Sigma EE Services: search, browsing Strengths Minimal requirements for the RDF bus Strong support for provenance Dynamic UI Bus has to provide Search and Entity descriptions E.g., SPARQL endpoint with full-text search E.g., Solr E.g., Sindice + (part of) the Web E.g., custom Java classes Or multiple of the above
20. Sigma UI Full-text search On-the-fly fuzzy merge of data sources Empower user to evaluate provenance, reject and accept data sources Show/hide/rearrange properties and values Browse to related entities Permalinks, embeddable widgets
21. Summary Sigma EE: front-end for your RDF Bus E.g., for your triple store Off-the-shelf UI with minimum configuration Available under GPL or other licenses at http://sig.ma/?page=help Running at http://sig.ma/