5. Introducing XeniT
Managing content in a
smart way
2009 - Proprietary and Confidential Information of Xenit Solutions
6. 2009 - Proprietary and Confidential Information of Xenit Solutions
From our home base
With an enthusiastic and experienced team
In collaboration with our customers
7. 2009 - Proprietary and Confidential Information of Xenit Solutions
The corporate story of XeniT
IWT project
Concurrent
collaboration
3.5 M
docs
Alfresco-
8M As-A-
docs Service
2007 2008 2009 2010 2011 2012 2013
8. Maidenhead, UK Global Headquarters Atlanta, US Headquarters
Alfresco is the largest private, pure-
play open source software company
in the world.
4 million+ downloads of Alfresco community
75,000+ sites running community
2000+ Enterprise customers from 43+ countries
200+ channel partners
20 consecutive quarters of revenue growth
founded in 2005
10. What is Alfresco ?
● Enterprise Content Management (ECM)
is a formalized means of organizing and storing
an organization's documents, and other content,
that relate to the organization's processes. The
term encompasses strategies, methods, and
tools used throughout the lifecycle of the
content.
12. FAQ
● How does an open-source company like Alfresco
generate revenue ?
● Alfresco vs Microsoft SharePoint
13. 2009 - Proprietary and Confidential Information of Xenit Solutions
Alfresco demo
14. Search in Alfresco
● Many search engines out there, few engines
really good, fewer open source
● Requirements:
○ accurate
○ performant
○ flexible
○ cross-platform
○ scalable
○ mature
● Lucene
○ https://lucene.apache.org/
● Starting with Alfresco 4.0 => Solr
○ http://lucene.apache.org/solr/
15. Lucene
● Java-based indexing and search library, as
well as spellchecking, hit highlighting and
advanced analysis/tokenization capabilities
● History
Doug Cutting originally wrote Lucene in 1999.[2] It was initially available for download from its home at the SourceForge
web site. It joined the Apache Software Foundation's Jakarta family of open-source Java products in September 2001
and became its own top-level Apache project in February 2005.
● Many projects based on Lucene: Solr,
Nutch, Elasticsearch
16. Lucene
Indexing
● over 150GB/hour on modern hardware
● small RAM requirements -- only 1MB heap
● incremental indexing as fast as batch indexing
● index size roughly 20-30% the size of text indexed
Searching
● ranked searching -- best results returned first
● many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more
● fielded searching (e.g. title, author, contents)
● sorting by any field
● multiple-index searching with merged results
● allows simultaneous update and searching
● flexible faceting, highlighting, joins and result grouping
● fast, memory-efficient and typo-tolerant suggesters
● pluggable ranking models, including the Vector Space Model and Okapi BM25
● configurable storage engine (codecs)
17. 2009 - Proprietary and Confidential Information of Xenit Solutions
Lucene in Alfresco
http://www.slideshare.net/JM.Pascal/alfresco-search-tutorial-presentation
18. 2009 - Proprietary and Confidential Information of Xenit Solutions
Lucene in Alfresco
http://www.slideshare.net/JM.Pascal/alfresco-search-tutorial-presentation
19. 2009 - Proprietary and Confidential Information of Xenit Solutions
Lucene in Alfresco
http://www.slideshare.net/JM.Pascal/alfresco-search-tutorial-presentation
20. 2009 - Proprietary and Confidential Information of Xenit Solutions
Lucene in Alfresco
http://www.slideshare.net/JM.Pascal/alfresco-search-tutorial-presentation
21. 2009 - Proprietary and Confidential Information of Xenit Solutions
Lucene in Alfresco
http://www.slideshare.net/JM.Pascal/alfresco-search-tutorial-presentation
22. 2009 - Proprietary and Confidential Information of Xenit Solutions
Lucene in Alfresco
http://www.slideshare.net/JM.Pascal/alfresco-search-tutorial-presentation
The way to preserve information in the Lucene
index is specified in Alfresco's data models
Main concept: tokenization
23. 2009 - Proprietary and Confidential Information of Xenit Solutions
Lucene in Alfresco
http://www.slideshare.net/JM.Pascal/alfresco-search-tutorial-presentation
with
without tokenization with tokenization
24. 2009 - Proprietary and Confidential Information of Xenit Solutions
Lucene in Alfresco
● Out of the box search:
○ search in all items, in a certain property or in the
content (full text search)
○ additionally: PATH, ASPECT, CATEGORY searches
○ Lucene syntax allowed:
■ boolean queries
■ wildcard queries
■ range queries
25. Solr
● Standalone full-text search server within a
servlet container such as Tomcat. Uses
Lucene library and has REST-like
HTTP/XML and JSON API. Has an
extensive plugin architecture.
● In 2004, Solr was created by Yonik Seeley at CNET_Networks and in January 2006 the source code was donated to the
Apache Software Foundation under the Lucene top-level project. In March 2010, the Lucene and Solr projects merged and
consequently in 2011, the Solr version number scheme was changed in order to match that of Lucene.
● Many users:
○ http://wiki.apache.org/solr/PublicServers
26. Solr
● Uses the Lucene library for full-text search
● Faceted navigation
● Hit highlighting
● Query language supports structured as well as textual search
● JSON, XML, PHP, Ruby, Python, XSLT, Velocity and custom Java binary output formats over HTTP
● HTML administration interface
● Replication to other Solr servers - enables scaling QPS
● Distributed Search through Sharding - enables scaling content volume
● Search results clustering based on Carrot2
● Extensible through plugins
● Pluggable relevance - boost through formula
● Caching
● Embeddable in a Java Application
27. Faceted Search in Alfresco
● A way to navigate through the documents,
showing counts per property value and
offering the possibility to drill down in the
data
● Faceted search supported by Lucene/Solr,
not yet supported by Alfresco
● Implemented by Xenit in Fred
29. Faceted Search in Alfresco
● Questions
○ which fields should be facetable?
■ only the ones with a limited set of possible values
■ only the ones which are untokenized
■ plus ranges: dates and numbers
○ how to navigate inside facets?
● Current implementation
○ facetable fields configurable in a file
○ date ranges and number ranges not supported yet
○ drilling-down in a single value possible