SlideShare a Scribd company logo
1 of 26
Presented by
Veera Shekar G
Google Search VS Advanced Search (Enterprise
Search implemtation)
8/6/2015
11/05/2015
• A Normal Search engine processes.
• You will understand how search Engine Works.
• I am beginner at this subject.
• 5 Top requirements for Effective Enterprise search implementation.
• Problem with implementations.
Introduction
8/6/2015
11/05/2015
• Topic 1: How Search engine works.
▫ Will see architecture and component details.
• Topic 2: Google Search.
▫ Phases of implementation. Indexing architecture.
• Topic 3: Top 5 requirements for implementing Enterprise search.
▫ Options available for implementations.
Session Outline
8/6/2015
11/05/2015
• A Normal Search Engine Architecture.
• Architecture of a search engine factors determined .
• Indexing Process.
Topic 1: Objectives
8/6/2015
11/05/2015
• Architecture of a search engine can be viewed
as 2 Layered
Topic 1: Content – Normal Search engine Architecture
8/6/2015
11/05/2015
• Architecture of a search engine determined by 2
requirements –
effectiveness (quality of results)
efficiency (response time and throughput)
Topic 1: Content - Factors
8/6/2015
11/05/2015
• Text acquisition –identifies and stores documents for indexing.
• Text transformation –transforms documents into index terms or features
• Index creation –
takes index terms and creates data structures (indexes) to support fast searching
Topic 1: Content
8/6/2015
11/05/2015
• Search engine will have two main processes Indexing process and
Querying Process.
• Questions?
Topic 1: Wrap-up
8/6/2015
11/05/2015
• High Level Architecture of Google search.
• Web Crawlers.
• Technologies Used.
Topic 2: Google Search
8/6/201511/05/2015
Topic 2: Content - High Level Architecture
8/6/201511/05/2015
• A web crawler is a program that, given one or more seed URLs,
downloads the web pages associated with these URLs, extracts any
hyperlinks contained in them.
• Recursively continues to download the web pages identified by these
hyperlinks. Web crawlers are an important component of web search
engines, where they are used to collect the corpus of web pages
indexed by the search engine.
Topic 2: Content - Web Crawlers
8/6/201511/05/2015
• Google visualizes their infrastructure as a three layer stack:
• Products: search, advertising, email, maps, video, chat, blogger
• Distributed Systems Infrastructure: GFS, MapReduce, and BigTable.
• Computing Platforms: a bunch of machines in a bunch of different data
centers
• Make sure easy for folks in the company to deploy at a low cost.
• Look at price performance data on a per application basis. Spend more
money on hardware to not lose log data, but spend less on other types
of data. Having said that, they don't lose data.
Topic 2: Content – Technologies Stack
8/6/201511/05/2015
• Google Technology stack.
• Web-crawlers.
Topic 2: Wrap-up
8/6/201511/05/2015
• Top 5 requirements for implementing Enterprise search.
• Options available at each requirement.
Topic 3: Objectives
8/6/201511/05/2015
• Diverse Content: Ability to crawl, index and search diverse content repository.
The Web, Microsoft SQL database and SharePoint content management systems.
• Secured Search: Ability to crawl secured content and make it accessible to only authorized people
and/or groups.
Single sign-on, forms-based authentication.
• User Interface: Ability to provide various user interface (UI) components to serve end users with
precise results.
Guided navigation, related search terms, related articles and best bets.
AutoSuggest with terms combined from real-time search and custom (user configurable) terms
in data stores
• Desktop Search: Ability to integrate with content stored in the desktop.
• Social Search: Ability to find other people, ratings and expertise within the organization.
Topic 3: Content - Top 5 requirements for implementing
Enterprise search
8/6/201511/05/2015
• Google Web crawler for crawling and indexing Web content (GOOTB).
• Google DB connector for crawling and indexing Microsoft SQL database (GOOTB).
• Google SharePoint connector for crawling and indexing SharePoint content (GOOTB).
• Google forms authentication for index time authorization and serve time authentication
(GOOTB).
• Google front-end configuration for:
> Faceted search, aka guided navigation (limited OOTB).
> Related search terms (GOOTB).
> Related articles (GOOTB).
> Best bets (GOOTB).
> Autosuggest (GOOTB and custom application).
• Google desktop search component integration (external Google component).
• Google results integration with internal rating system
Topic 3: Content – Google implementing requirements
8/6/201511/05/2015
8/6/201511/05/2015
• Google Web Crawler.
• Disadvantage: As efficient and good as it sounds, one disadvantage of
Web crawler is Google’s inability to reveal the exact page that is
currently being processed.
• Alternative: The OS console monitor and/ or tracking log files are some
ways that could help track URL crawl status.
• At any point of time, a developer should be able to view the current URL
being crawled and issues faced (if any) with security. Almost all tools
provide this feature – such as Solr, FAST, Endeca and Autonomy.
Topic 3: Content – Web crawler
8/6/201511/05/2015
• Database Connector.
• Disadvantage:
Google’s inability to allow end implementers to schedule DB crawl
Poor diagnostics for connector/XML-fed content.
Google’s way of removing content from index is quite primitive and time-consuming.
• Alternative: Alternative: Compared to GSA, It found Apache Solr is a better
option for indexing the database via data import handler.
• Solr provides an effective way to remove content from the index, either via
the admin console or via XML import (/update with delete option).
Topic 3: Content – Database Connector
8/6/201511/05/2015
• Google provides connectors to very few CMS systems out of the box.
• Disadvantage:
Even if Google is executing a bulk late binding, performance issues
at query time are inevitable when the document volume is high.
• Alternative: One alternate is to consider the site/page/document level
security as an additional metadata, develop an application that would
post-filter the results based on end-user security attributes. This is again
a primitive method and has its own disadvantages in terms of query
time latency.
Topic 3: Content – SharePoint Connector (for Document
Management system)
8/6/201511/05/2015
• At query time, Google uses the query time configuration to make an HEAD
request that would allow the logged-in user (within a specific domain) to view
only the content that he is authorized to view
.
• Disadvantage:
This late binding security model has performance degradation is
inevitable with higher QPS and/or higher results count.
• Alternative: There are tools that support an early binding security model that
allows the search engine to cache the user security groups along with the
content.
Topic 3: Content – Forms Authentication
8/6/201511/05/2015
• One disadvantage with Apache Solr is that it does not handle secured
content. The only way to serve secured content is to store the security
tags/groups as one of the metadata and implement a field (or
metadata) constrained search.
• That is were ACL’s come into picture.
Note
8/6/201511/05/2015
• GSA provides an open source component called “search-as-you-type” which
allows end implementers to fetch real-time results from the appliance.
• Disadvantage:
Onebox modules are designed to respond within one second. This could
result in no results from TermFederator if there is any delay at the
database.
• Alternative: “TermComponent” in Apache Solr is an effective autosuggest tool.
Terms stored in any local text file can be made available to Solr at startup. A
separate component designed to merge alphabetically.
Topic 3: Content – Auto Suggest
8/6/201511/05/2015
• Best Bets — aka Keymatches, aka AdWords.
• Related search terms same as synonyms.
• Faceted search, aka Guided Navigation: GSA does not support faceted search.
But this feature can be achieved via metadata constrained search at query time,
similar to how it is implemented in Solr.
• Disadvantage: Facet count in GSA is not available OOTB.
• Alternative: Faceted search is one of Apache Solr’s strongest features and is
implemented within many e-commerce Website
And (Oracle) Endeca and (HP) Autonomy maintain content hierarchy for guided
navigation.
Topic 3: Content – User Interface
8/6/201511/05/2015
• InfoValuator component captures end-user rating and saves a
combination of user identity, content URI and value rating in the backend
data store.
Topic 3: Content – InfoValuator
8/6/201511/05/2015
• There is no one search engine that fulfills all enterprise search
requirements. HP Autonomy claims this lofty perch but it comes with a
huge cost overhead, with the base cost crossing half a million dollars.
• Google is not the right fit for many requirements that we have seen so
far. Custom search application development is inevitable and if well
planned, we can basically use any tool in the market to implement
enterprise search as a full-fledged application.
Summary of Session
8/6/201511/05/2015

More Related Content

What's hot

NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
South London Geek Nights
 
Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010
Eric Shupps
 
Taming Information Chaos in SharePoint 2010
Taming Information Chaos in SharePoint 2010Taming Information Chaos in SharePoint 2010
Taming Information Chaos in SharePoint 2010
Eric Shupps
 
PatSeer Patent Database Overview
PatSeer Patent Database OverviewPatSeer Patent Database Overview
PatSeer Patent Database Overview
Harshad Karmarkar
 

What's hot (20)

How goole search engine work
How goole search engine workHow goole search engine work
How goole search engine work
 
CRM UG Belux March 2017 - Power BI and Dynamics 365
CRM UG Belux March 2017 - Power BI and Dynamics 365CRM UG Belux March 2017 - Power BI and Dynamics 365
CRM UG Belux March 2017 - Power BI and Dynamics 365
 
Vital AI: Big Data Modeling
Vital AI: Big Data ModelingVital AI: Big Data Modeling
Vital AI: Big Data Modeling
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Relecura - Features Overview
Relecura - Features OverviewRelecura - Features Overview
Relecura - Features Overview
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
MongoDB Certification Study Group - May 2016
MongoDB Certification Study Group - May 2016MongoDB Certification Study Group - May 2016
MongoDB Certification Study Group - May 2016
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
 
What is Business Intelligence
What is Business IntelligenceWhat is Business Intelligence
What is Business Intelligence
 
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Democratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data DiscoveryDemocratizing Data within your organization - Data Discovery
Democratizing Data within your organization - Data Discovery
 
Improve Performance in Fast Search for SharePoint - Comperio
Improve Performance in Fast Search for SharePoint - ComperioImprove Performance in Fast Search for SharePoint - Comperio
Improve Performance in Fast Search for SharePoint - Comperio
 
Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010Building enterprise records management solutions for share point 2010
Building enterprise records management solutions for share point 2010
 
Taming Information Chaos in SharePoint 2010
Taming Information Chaos in SharePoint 2010Taming Information Chaos in SharePoint 2010
Taming Information Chaos in SharePoint 2010
 
How to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePointHow to build your own Delve: combining machine learning, big data and SharePoint
How to build your own Delve: combining machine learning, big data and SharePoint
 
Getting the most ouf of SharePoint Search - Tulsa SharePoint Interest Group
Getting the most ouf of SharePoint Search - Tulsa SharePoint Interest GroupGetting the most ouf of SharePoint Search - Tulsa SharePoint Interest Group
Getting the most ouf of SharePoint Search - Tulsa SharePoint Interest Group
 
Engineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platformsEngineering patterns for implementing data science models on big data platforms
Engineering patterns for implementing data science models on big data platforms
 
PatSeer Patent Database Overview
PatSeer Patent Database OverviewPatSeer Patent Database Overview
PatSeer Patent Database Overview
 

Viewers also liked

Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaSolr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Lucidworks
 
User generated advertising
User generated advertisingUser generated advertising
User generated advertising
amcgaugh
 
Question 6
Question 6 Question 6
Question 6
caddy20
 
Ugly duckling upload on diigo
Ugly duckling upload on diigoUgly duckling upload on diigo
Ugly duckling upload on diigo
emo5073
 
Recruiting Slide Linked In
Recruiting Slide   Linked InRecruiting Slide   Linked In
Recruiting Slide Linked In
ua131313
 
Bsp presentation
Bsp presentationBsp presentation
Bsp presentation
Allan Vega
 

Viewers also liked (20)

Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
Enterprise Search in Practice: A Presentation of Survey Results and Areas for...Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
Enterprise Search in Practice: A Presentation of Survey Results and Areas for...
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
Tips and tricks for getting the best out of solr on windows azure
Tips and tricks for getting the best out of solr on windows azureTips and tricks for getting the best out of solr on windows azure
Tips and tricks for getting the best out of solr on windows azure
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big features
 
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, RocanaSolr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
Solr At Scale For Time-Oriented Data: Presented by Brett Hoerner, Rocana
 
Sistem gerak pada manusi appt(3)
Sistem gerak pada manusi appt(3)Sistem gerak pada manusi appt(3)
Sistem gerak pada manusi appt(3)
 
IIRSI 2013 CONFERENCE
IIRSI 2013 CONFERENCEIIRSI 2013 CONFERENCE
IIRSI 2013 CONFERENCE
 
User generated advertising
User generated advertisingUser generated advertising
User generated advertising
 
Films for 41 Million People
Films for 41 Million PeopleFilms for 41 Million People
Films for 41 Million People
 
Virtual dj 7 user guide
Virtual dj 7   user guideVirtual dj 7   user guide
Virtual dj 7 user guide
 
Bio report
Bio reportBio report
Bio report
 
Question 6
Question 6 Question 6
Question 6
 
Ugly duckling upload on diigo
Ugly duckling upload on diigoUgly duckling upload on diigo
Ugly duckling upload on diigo
 
Recruiting Slide Linked In
Recruiting Slide   Linked InRecruiting Slide   Linked In
Recruiting Slide Linked In
 
Bsp presentation
Bsp presentationBsp presentation
Bsp presentation
 
Play station 4
Play station 4Play station 4
Play station 4
 
Brazil 2014: A look at how brands are celebrating the World Cup
Brazil 2014: A look at how brands are celebrating the World CupBrazil 2014: A look at how brands are celebrating the World Cup
Brazil 2014: A look at how brands are celebrating the World Cup
 
Local Artisanal Mining in Kenya
Local Artisanal Mining in KenyaLocal Artisanal Mining in Kenya
Local Artisanal Mining in Kenya
 
Lcs Annual Report 06
Lcs Annual Report 06Lcs Annual Report 06
Lcs Annual Report 06
 
E-Commerce & its pratices
E-Commerce & its praticesE-Commerce & its pratices
E-Commerce & its pratices
 

Similar to Google search vs Solr search for Enterprise search

SharePoint User Group Meeting- SharePoint 2013 Search
SharePoint User Group Meeting- SharePoint 2013 SearchSharePoint User Group Meeting- SharePoint 2013 Search
SharePoint User Group Meeting- SharePoint 2013 Search
C/D/H Technology Consultants
 
Decision CAMP 2014 - Erik Marutian - Using rules-based gui framework to power...
Decision CAMP 2014 - Erik Marutian - Using rules-based gui framework to power...Decision CAMP 2014 - Erik Marutian - Using rules-based gui framework to power...
Decision CAMP 2014 - Erik Marutian - Using rules-based gui framework to power...
Decision CAMP
 
Atlan_Product metering_Subrat.pdf
Atlan_Product metering_Subrat.pdfAtlan_Product metering_Subrat.pdf
Atlan_Product metering_Subrat.pdf
Subrat Kumar Dash
 
Effective Strategies for Searching Oracle UCM
Effective Strategies for Searching Oracle UCMEffective Strategies for Searching Oracle UCM
Effective Strategies for Searching Oracle UCM
Fishbowl Solutions
 

Similar to Google search vs Solr search for Enterprise search (20)

Implementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEMImplementing Site Search in CQ5 / AEM
Implementing Site Search in CQ5 / AEM
 
SharePoint User Group Meeting- SharePoint 2013 Search
SharePoint User Group Meeting- SharePoint 2013 SearchSharePoint User Group Meeting- SharePoint 2013 Search
SharePoint User Group Meeting- SharePoint 2013 Search
 
Seo for Engineers
Seo for EngineersSeo for Engineers
Seo for Engineers
 
13 Things Developers Forget When Launching Public Websites
13 Things Developers Forget When Launching Public Websites13 Things Developers Forget When Launching Public Websites
13 Things Developers Forget When Launching Public Websites
 
Planning Your Migration to SharePoint Online #SPBiz60
Planning Your Migration to SharePoint Online #SPBiz60Planning Your Migration to SharePoint Online #SPBiz60
Planning Your Migration to SharePoint Online #SPBiz60
 
Top ten new ECM features in SharePoint 2013
Top ten new ECM features in SharePoint 2013Top ten new ECM features in SharePoint 2013
Top ten new ECM features in SharePoint 2013
 
Product Catalog and IT Service Management
Product Catalog and IT Service ManagementProduct Catalog and IT Service Management
Product Catalog and IT Service Management
 
20150211 seo in drupal presentation
20150211 seo in drupal presentation20150211 seo in drupal presentation
20150211 seo in drupal presentation
 
Top 7 mistakes
Top 7 mistakesTop 7 mistakes
Top 7 mistakes
 
Search Engine Optimization (Seo) for Developers
Search Engine Optimization (Seo) for DevelopersSearch Engine Optimization (Seo) for Developers
Search Engine Optimization (Seo) for Developers
 
Decision CAMP 2014 - Erik Marutian - Using rules-based gui framework to power...
Decision CAMP 2014 - Erik Marutian - Using rules-based gui framework to power...Decision CAMP 2014 - Erik Marutian - Using rules-based gui framework to power...
Decision CAMP 2014 - Erik Marutian - Using rules-based gui framework to power...
 
Atlan_Product metering_Subrat.pdf
Atlan_Product metering_Subrat.pdfAtlan_Product metering_Subrat.pdf
Atlan_Product metering_Subrat.pdf
 
Most Important On Page SEO elements
Most Important On Page SEO elementsMost Important On Page SEO elements
Most Important On Page SEO elements
 
Effective Strategies for Searching Oracle UCM
Effective Strategies for Searching Oracle UCMEffective Strategies for Searching Oracle UCM
Effective Strategies for Searching Oracle UCM
 
Agile and Technical SEO
Agile and Technical SEOAgile and Technical SEO
Agile and Technical SEO
 
How Agile Technical SEO Can Add Value To Your SEO Campaign, by Adam Gent
How Agile Technical SEO Can Add Value To Your SEO Campaign, by Adam GentHow Agile Technical SEO Can Add Value To Your SEO Campaign, by Adam Gent
How Agile Technical SEO Can Add Value To Your SEO Campaign, by Adam Gent
 
The New Content SEO - Sydney SEO Conference 2023
The New Content SEO - Sydney SEO Conference 2023The New Content SEO - Sydney SEO Conference 2023
The New Content SEO - Sydney SEO Conference 2023
 
Real world rm in share point 2013
Real world rm in share point 2013Real world rm in share point 2013
Real world rm in share point 2013
 
DITA and SEO
DITA and SEODITA and SEO
DITA and SEO
 
How to prepare for Google's page experience update
How to prepare for Google's page experience updateHow to prepare for Google's page experience update
How to prepare for Google's page experience update
 

Recently uploaded

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 

Recently uploaded (20)

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

Google search vs Solr search for Enterprise search

  • 1. Presented by Veera Shekar G Google Search VS Advanced Search (Enterprise Search implemtation) 8/6/2015 11/05/2015
  • 2. • A Normal Search engine processes. • You will understand how search Engine Works. • I am beginner at this subject. • 5 Top requirements for Effective Enterprise search implementation. • Problem with implementations. Introduction 8/6/2015 11/05/2015
  • 3. • Topic 1: How Search engine works. ▫ Will see architecture and component details. • Topic 2: Google Search. ▫ Phases of implementation. Indexing architecture. • Topic 3: Top 5 requirements for implementing Enterprise search. ▫ Options available for implementations. Session Outline 8/6/2015 11/05/2015
  • 4. • A Normal Search Engine Architecture. • Architecture of a search engine factors determined . • Indexing Process. Topic 1: Objectives 8/6/2015 11/05/2015
  • 5. • Architecture of a search engine can be viewed as 2 Layered Topic 1: Content – Normal Search engine Architecture 8/6/2015 11/05/2015
  • 6. • Architecture of a search engine determined by 2 requirements – effectiveness (quality of results) efficiency (response time and throughput) Topic 1: Content - Factors 8/6/2015 11/05/2015
  • 7. • Text acquisition –identifies and stores documents for indexing. • Text transformation –transforms documents into index terms or features • Index creation – takes index terms and creates data structures (indexes) to support fast searching Topic 1: Content 8/6/2015 11/05/2015
  • 8. • Search engine will have two main processes Indexing process and Querying Process. • Questions? Topic 1: Wrap-up 8/6/2015 11/05/2015
  • 9. • High Level Architecture of Google search. • Web Crawlers. • Technologies Used. Topic 2: Google Search 8/6/201511/05/2015
  • 10. Topic 2: Content - High Level Architecture 8/6/201511/05/2015
  • 11. • A web crawler is a program that, given one or more seed URLs, downloads the web pages associated with these URLs, extracts any hyperlinks contained in them. • Recursively continues to download the web pages identified by these hyperlinks. Web crawlers are an important component of web search engines, where they are used to collect the corpus of web pages indexed by the search engine. Topic 2: Content - Web Crawlers 8/6/201511/05/2015
  • 12. • Google visualizes their infrastructure as a three layer stack: • Products: search, advertising, email, maps, video, chat, blogger • Distributed Systems Infrastructure: GFS, MapReduce, and BigTable. • Computing Platforms: a bunch of machines in a bunch of different data centers • Make sure easy for folks in the company to deploy at a low cost. • Look at price performance data on a per application basis. Spend more money on hardware to not lose log data, but spend less on other types of data. Having said that, they don't lose data. Topic 2: Content – Technologies Stack 8/6/201511/05/2015
  • 13. • Google Technology stack. • Web-crawlers. Topic 2: Wrap-up 8/6/201511/05/2015
  • 14. • Top 5 requirements for implementing Enterprise search. • Options available at each requirement. Topic 3: Objectives 8/6/201511/05/2015
  • 15. • Diverse Content: Ability to crawl, index and search diverse content repository. The Web, Microsoft SQL database and SharePoint content management systems. • Secured Search: Ability to crawl secured content and make it accessible to only authorized people and/or groups. Single sign-on, forms-based authentication. • User Interface: Ability to provide various user interface (UI) components to serve end users with precise results. Guided navigation, related search terms, related articles and best bets. AutoSuggest with terms combined from real-time search and custom (user configurable) terms in data stores • Desktop Search: Ability to integrate with content stored in the desktop. • Social Search: Ability to find other people, ratings and expertise within the organization. Topic 3: Content - Top 5 requirements for implementing Enterprise search 8/6/201511/05/2015
  • 16. • Google Web crawler for crawling and indexing Web content (GOOTB). • Google DB connector for crawling and indexing Microsoft SQL database (GOOTB). • Google SharePoint connector for crawling and indexing SharePoint content (GOOTB). • Google forms authentication for index time authorization and serve time authentication (GOOTB). • Google front-end configuration for: > Faceted search, aka guided navigation (limited OOTB). > Related search terms (GOOTB). > Related articles (GOOTB). > Best bets (GOOTB). > Autosuggest (GOOTB and custom application). • Google desktop search component integration (external Google component). • Google results integration with internal rating system Topic 3: Content – Google implementing requirements 8/6/201511/05/2015
  • 18. • Google Web Crawler. • Disadvantage: As efficient and good as it sounds, one disadvantage of Web crawler is Google’s inability to reveal the exact page that is currently being processed. • Alternative: The OS console monitor and/ or tracking log files are some ways that could help track URL crawl status. • At any point of time, a developer should be able to view the current URL being crawled and issues faced (if any) with security. Almost all tools provide this feature – such as Solr, FAST, Endeca and Autonomy. Topic 3: Content – Web crawler 8/6/201511/05/2015
  • 19. • Database Connector. • Disadvantage: Google’s inability to allow end implementers to schedule DB crawl Poor diagnostics for connector/XML-fed content. Google’s way of removing content from index is quite primitive and time-consuming. • Alternative: Alternative: Compared to GSA, It found Apache Solr is a better option for indexing the database via data import handler. • Solr provides an effective way to remove content from the index, either via the admin console or via XML import (/update with delete option). Topic 3: Content – Database Connector 8/6/201511/05/2015
  • 20. • Google provides connectors to very few CMS systems out of the box. • Disadvantage: Even if Google is executing a bulk late binding, performance issues at query time are inevitable when the document volume is high. • Alternative: One alternate is to consider the site/page/document level security as an additional metadata, develop an application that would post-filter the results based on end-user security attributes. This is again a primitive method and has its own disadvantages in terms of query time latency. Topic 3: Content – SharePoint Connector (for Document Management system) 8/6/201511/05/2015
  • 21. • At query time, Google uses the query time configuration to make an HEAD request that would allow the logged-in user (within a specific domain) to view only the content that he is authorized to view . • Disadvantage: This late binding security model has performance degradation is inevitable with higher QPS and/or higher results count. • Alternative: There are tools that support an early binding security model that allows the search engine to cache the user security groups along with the content. Topic 3: Content – Forms Authentication 8/6/201511/05/2015
  • 22. • One disadvantage with Apache Solr is that it does not handle secured content. The only way to serve secured content is to store the security tags/groups as one of the metadata and implement a field (or metadata) constrained search. • That is were ACL’s come into picture. Note 8/6/201511/05/2015
  • 23. • GSA provides an open source component called “search-as-you-type” which allows end implementers to fetch real-time results from the appliance. • Disadvantage: Onebox modules are designed to respond within one second. This could result in no results from TermFederator if there is any delay at the database. • Alternative: “TermComponent” in Apache Solr is an effective autosuggest tool. Terms stored in any local text file can be made available to Solr at startup. A separate component designed to merge alphabetically. Topic 3: Content – Auto Suggest 8/6/201511/05/2015
  • 24. • Best Bets — aka Keymatches, aka AdWords. • Related search terms same as synonyms. • Faceted search, aka Guided Navigation: GSA does not support faceted search. But this feature can be achieved via metadata constrained search at query time, similar to how it is implemented in Solr. • Disadvantage: Facet count in GSA is not available OOTB. • Alternative: Faceted search is one of Apache Solr’s strongest features and is implemented within many e-commerce Website And (Oracle) Endeca and (HP) Autonomy maintain content hierarchy for guided navigation. Topic 3: Content – User Interface 8/6/201511/05/2015
  • 25. • InfoValuator component captures end-user rating and saves a combination of user identity, content URI and value rating in the backend data store. Topic 3: Content – InfoValuator 8/6/201511/05/2015
  • 26. • There is no one search engine that fulfills all enterprise search requirements. HP Autonomy claims this lofty perch but it comes with a huge cost overhead, with the base cost crossing half a million dollars. • Google is not the right fit for many requirements that we have seen so far. Custom search application development is inevitable and if well planned, we can basically use any tool in the market to implement enterprise search as a full-fledged application. Summary of Session 8/6/201511/05/2015

Editor's Notes

  1. How presentation will benefit audience: Adult learners are more interested in a subject if they know how or why it is important to them. Presenter’s level of expertise in the subject: Briefly state your credentials in this area, or explain why participants should listen to you.
  2. Lesson descriptions should be brief.
  3. Example objectives At the end of this lesson, you will be able to: Save files to the team Web server. Move files to different locations on the team Web server. Share files on the team Web server.