Search Engines, Analytics and Semantics
Developments In Analytics And Big Data
Who am I?
•CEO of iO1
•I have worked across Government and large enterprises for 20+ years
•The focus will be on giving you an under standing of how to combine key areas using open source toolsto enhance your organisations knowledge management
•Presentation is based on Invotra our intranet product we are working with the Home Office to roll out
•Everything is available freely and built on open standards
•You should be able to go away and build this by the end
•Targeted at simple solutions that can be broadly adopted to encourage participation by experts (like you).
Why Open source?
•Open is “usually”
•Easier to integrate, Massively scalable
•Easier to fix -You can see the code
•You can see what people are planning on doing in future
•Communities are fantastic, its easy to contribute
•An internal system
•Unstructured and ad-hoc data
•An everyday anytime tool for everyone
•Different usages in nearly every scenario
•Apart from the lunch menu ;-)
•Rarely recognised for the knowledge it contains
•A place for everything else i.e. Not covered by a line of business app
•Made up of multiple systems (search, cms, semantics, analytics)
•Its never the same
Areas the stack is focussed on
•Handling staff turnover
•Sustainable knowledge management
•Extracting knowledge as opposed to defining a million reports
•Understanding context and capture knowledge not just data
•Helping users to help organisations
•Discover information / knowledge
•Input information / knowledge not just data
•Create relationships between information
•Maintain information / Knowledge
•By being an interface to everything else
•Essentially the users tool
•Help people find
•Better information and knowledge
•Tailored to your own requirements
•Biased the way you want
•Leveraging your knowledge
•Find knowledge inside peoples heads by looking at what they write to infer what they know
•Real world knowledge
•Discover real usages
•See patterns in data and usage
•See usage impact factors
•Do analysis on creation, this gives you insights into what's happening now
•Analysis of versions
•According to the W3C
•"The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.
Models for adding semantic data
•Forced on after input
•Added during input
•Users know what they meant
•Forced on at display
•Useful when dealing with intelligent UI’s
•Don’t start from scratch
•Spend effort on modelling using Linked Data to save you hassle
•Use search to extraction from existing data
•Store your vocabularies in a source that is open ;-)
•Allow users to tag content and analyse this against the content they are tagging to gain insights
What problems does all this solve?
•Better quality answers to user questions/searchs
•Giving users the ability to provide background to their content
•Storing the information within the content so its shared easier between systems / departments
•Retaining knowledge after authors have left Organisation
•Giving organisations the ability to intelligently discover data
Semantics = Stanbol
•Apache Stanbol'smain features are:
•Content EnhancementServices that add semantic information to “non-semantic” pieces of content.
•ReasoningServices that are able to retrieve additional semantic information about the content based on the semantic information retrieved via content enhancement.
•Knowledge ModelsServices that are used to define and manipulate the data models (e.g. ontologies) that are used to store the semantic information.
•PersistenceServices that store (or cache) semantic information, i.e. enhanced content, entities, facts, and make it searchable.