SlideShare a Scribd company logo
1 of 41
www.edureka.co/apache-solr 
Introduction to APACHE SOLR 
View Apache Solr course details at www.edureka.co/apache-solr 
For Queries during the session and class recording: 
Post on Twitter @edurekaIN: #askEdureka 
Post on Facebook /edurekaIN 
For more details please contact us: 
US : 1800 275 9730 (toll free) 
INDIA : +91 88808 62004 
Email Us : sales@edureka.co
Slide 2 
LIVE Online Class 
Class Recording in LMS 
24/7 Post Class Support 
Module Wise Quiz 
Project Work 
Verifiable Certificate 
www.edureka.co/apache-solr 
How it Works?
Objectives 
At the end of this module, you will be able to: 
Understand the need for search engine for enterprise grade applications 
Understand the objectives & challenges of search engine 
What is Indexing & Searching & Why do you need them ? 
What is Lucene & its overview? 
How is Indexing & Searching Handled in Lucene 
What is Solr & its features? 
What is Solr schema & its structure? 
Understand how to achieve Bigdata/NoSQL needs using SolrCloud 
 Explore job opportunity for Solr Developers 
Slide 3 www.edureka.co/apache-solr
Introduction Apache Lucene 
Slide 4 www.edureka.co/apache-solr
What is Lucene ? 
 Lucene is a powerful Java search library that lets you easily add search or Information Retrieval (IR) to applications 
 Used by LinkedIn, Twitter, … and many more (see http://wiki.apache.org/lucene-java/PoweredBy ) 
 Scalable & High-performance Indexing 
 Powerful, Accurate and Efficient Search Algorithms 
 Cross-Platform Solution 
» Open Source & 100% pure Java 
» Implementations in other programming languages available that are index-compatible 
Doug Cutting “Creator” 
Slide 5 www.edureka.co/apache-solr
Why Indexing ? 
 Search engine indexing collects, parses, and stores data to facilitate fast and 
accurate information retrieval 
 The purpose of storing an index is to optimize speed and performance in 
finding relevant documents for a search query 
 Without an index, the search engine would scan every document in the 
corpus, which would require considerable time and computing power 
 For example, while an index of 10,000 documents can be queried within 
milliseconds, a sequential scan of every word in 10,000 large documents could 
take hours 
Slide 6 www.edureka.co/apache-solr
Indexing: Flow 
Tokens Inverted Index 
Document analysis indexing 
We can get a better idea of the flow of indexing from the following example: 
“edureka” 
Position:0 
Offset:0 
Length:7 
“hadoop” 
Position:1 
Offset:8 
Length:6 
“edureka hadoop” tokenization 
“Term Vector” “Term Vector” 
Slide 7 www.edureka.co/apache-solr
Lucene: Writing to Index 
Document 
Field 
Field 
Field 
Field 
Analyzer IndexWriter Directory 
Classes used when indexing documents with Lucene 
Slide 8 www.edureka.co/apache-solr
Lucene: Searching In Index 
 Query Parser translates a textual expression from the end into an arbitrarily complex query for searching 
Expression Query object 
QueryParser 
IndexSearcher Text fragments 
Analyzer 
Slide 9 www.edureka.co/apache-solr
Lucene: Inverted Indexing Technique 
1 1 1 
3 
1 1 1 
3 
1 1 1 
3 
1 1 1 
3 
1 1 
9 
 Indexing uses Inverted Index technique 
(Ex: Book Index). Because indexes are 
faster to read documents 
Write a new segment for each new 
document insertion 
 Merge the segments when too many of 
them into the index. (Merge-sort 
technique to merge the index in to the 
store.) 
 Single updates are costly, preferred bulk 
updates due to merging 
Slide 10 www.edureka.co/apache-solr
Lucene: Storage Schema 
 Like “databases” Lucene does not have common global schema 
 Lucene has indexes, which contains documents 
 Each document can have multiple fields 
 Each document can have different fields for every document 
 Fields can be only used to index & search or store it for retrieval 
 You can add new fields at any point of time 
Document-1 
<Field1> 
<Field2> 
<Field3> 
Document-2 
<Field2> 
<Field3> 
<Field4> 
Index-1 
Slide 11 www.edureka.co/apache-solr
Analyzers 
 Analyzers handle the job of analyzing text into tokens or keywords to be searched / indexed 
 An Analyzer builds TokenStreams, which analyze text and represents a policy for extracting index terms from 
text 
 There are few default Analyzers provided by Lucene, which can be used at the time of indexing or querying 
 Analyzers are provided to parse & analyze different languages like (Chinese, Japanese etc.,) 
Reader Tokenizer TokenFilter TokenFilter TokenFilter Tokens 
Slide 12 www.edureka.co/apache-solr
Analyzers (Contd.) 
Core Class Examples (org.apache.lucene.analysis.Analyzer) 
 SmartChineseAnalyzer 
 SnowballAnalyzer 
 SynonymAnalyzer 
 StandardAnalyzer 
 StopAnalyzer 
 WhitespaceAnalyzer 
LowerCaseFilter 
 PorterStemFilter 
 ChineseAnalyzer 
 CzechAnalyzer 
 ShingleAnalyzerWrapper 
 SimpleAnalyzer 
Slide 13 www.edureka.co/apache-solr
Querying: Key Types / Classes 
TermQuery 
 BooleanQuery 
 WildcardQuery 
 PhraseQuery 
 PrefixQuery 
 MultiPhraseQuery 
 FuzzyQuery 
RegexpQuery 
TermRangeQuery 
NumericRangeQuery 
 ConstantScoreQuery 
 DisjunctionMaxQuery 
MatchAllDocsQuery 
Query 
Slide 14 www.edureka.co/apache-solr
Scoring: Score Boosting 
 Document’s weight / score can be changed from default, which is called as boosting 
 Lucene allows influencing search results by "boosting" at different times: 
Scoring 
Index Time 
Query Time 
Index-time boost by calling Field.setBoost() before 
a document is added to the index 
Query-time boost by setting a boost on a query clause, 
calling Query.setBoost() 
Slide 15 www.edureka.co/apache-solr
Key Features 
Faceting 
Highlighting 
Grouping 
Joins 
Spatial Search 
Apache Tika Support 
Slide 16 www.edureka.co/apache-solr
Introduction Apache Solr 
Slide 17 www.edureka.co/apache-solr
Search Engine: Why do I need them? 
1. Text Based Search 
2. Filter 
3. Documents 
1 
2 
3 
Slide 18 www.edureka.co/apache-solr
Solr: Introduction 
 Solr is an open source enterprise search server / web application 
 Solr Uses the Lucene Search Library and extends it 
 Solr exposes lucene Java API’s as REST-Full services 
You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP 
You query it via HTTP GET and receive XML, JSON, CSV or binary results 
Slide 19 www.edureka.co/apache-solr
Solr: History 
 In 2004, Solr was created by “Yonik Seeley” at CNET Networks as an in-house project to add 
search capability for the company website 
 In January 2006, CNET Networks decided to openly publish the source code by donating it to 
the Apache Software Foundation under the Lucene top-level project 
 In September 2008, Solr 1.3 was released with many enhancements including distributed 
search capabilities and performance enhancements among many others 
 In October 2012 Solr version 4.0 was released, including the new SolrCloud feature 
Yonik Seeley 
Slide 20 www.edureka.co/apache-solr
Solr: Key Features 
Advanced Full-Text Search Capabilities 
Optimized for High Volume Web Traffic 
Standards Based Open Interfaces - XML, JSON and HTTP 
Comprehensive HTML Administration Interfaces 
Server statistics exposed over JMX for monitoring 
Near Real-time indexing and Adaptable with XML Configuration 
Linearly scalable, auto index replication, auto, Extensible Plugin Architecture 
Slide 21 www.edureka.co/apache-solr
Solr: Architecture 
Slide 22 www.edureka.co/apache-solr
Solr: Admin UI 
Slide 23 www.edureka.co/apache-solr
Solr 
Instance 
Solr: Schema Hierarchy 
Core/Index 
Documents 
Field Field 
Core/Index Core/Index 
Indexing & Querying 
Schema.xml 
Slide 24 www.edureka.co/apache-solr
Solr: Core 
 Solr Core: Also referred to as just a "Core" 
 This is a running instance of a Lucene index along with all the Solr configuration (SolrConfigXml, SchemaXml, etc...) 
required to use it 
 A single Solr application can contain 0 or more cores 
 Cores are run largely in isolation but can communicate with each other if necessary via the CoreContainer 
 Solr initially only supported one index, and the SolrCore class was a singleton for coordinating the low-level functionality 
at the "core" of Solr 
Slide 25 www.edureka.co/apache-solr
Solr: Documents & Fields 
 Solr's basic unit of information is a document, which is a set of data that describes something 
Documents are composed of fields, which are more specific pieces of information 
 Fields can contain different kinds of data. A name field, for example, is text (character data) 
The field type tells Solr how to interpret the field and how it can be queried 
Slide 26 www.edureka.co/apache-solr
Solr: Indexing Data 
 A Solr index can accept data from many different sources, including XML files, comma-separated value (CSV) files, data 
extracted from tables in a database, and files in common file formats such as Microsoft Word or PDFs 
Here are the three most common ways of loading data into a Solr index: 
 Uploading XML files by sending HTTP requests to the Solr 
 Using Index Handlers to Import from databases 
 Using the Solr Cell framework 
 Writing a custom Java application to ingest data through Solr's Java Client 
Slide 27 www.edureka.co/apache-solr
Analysis 
Analyzers 
Tokenizers 
Filters 
Solr: Analysis 
 There are three main concepts in analysis: analyzers, tokenizers, and filters 
 Analyzers are used both during, when a document is indexed, and at query 
time 
» The same analysis process need not be used for both operations 
» An analyzer examines the text of fields and generates a token stream 
» Analyzers may be a single class or they may be composed of a series 
of tokenizer and filter classes 
 Tokenizers break field data into lexical units, or tokens 
 Filters examine a stream of tokens and keep them, transform or discard 
them, or create new ones 
Slide 28 www.edureka.co/apache-solr
Solr: solrconfig.xml 
Lib directives 
indicates where 
Solr can find JAR 
files for extensions 
Register event handlers 
for searcher events; 
for example queries 
To execute to warm 
new searchers 
Activates version-dependent 
features in Lucene 
Index management 
settings 
Enable JMX 
instrumentation of 
Solr MBeans 
Update 
handler for 
indexing 
documents 
Cache-management 
settings 
Slide 29 www.edureka.co/apache-solr
Solr: Search Process 
qt: selects a RequestHandler for a query using/select(by default ,the DisMaxRequestHandler is used) 
Request 
Handler 
defType : selects a query parser for the query 
(by default, uses whatever has been 
configured for the RequestHandler) 
Query Parser 
Response 
Writer 
qf: selects which fields to query 
in the index(by default, all fields 
are required) 
Index 
wt: selects a response writer 
for formatting the query 
response 
fq: filters query by applying an additional query to 
the initial query’s results, caches the results 
Rows: 
specifies the 
number of rows 
to be displayed 
at one time 
Start: specifies an 
offset(by default 0) 
into the query results 
where the returned 
response should begin 
Slide 30 www.edureka.co/apache-solr
Solr Features 
 Faceting 
Highlighting 
 Spell Checking 
Query-Re-ranking 
Transforming 
 Suggestors 
More Like This 
 Pagination 
Grouping & Clustering 
 Spatial Search 
 Components 
Real time (Get & Update) 
 LABS 
Slide 31 www.edureka.co/apache-solr
Configuring Solr Instances / Cores 
Solr Configurations 
Solfrconfig.xml Solr.xml Core.properties Schema.xml 
Slide 32 www.edureka.co/apache-solr
SolrCloud Introduction 
 Apache Solr includes the ability to set up a cluster of Solr servers that combines fault tolerance and high availability 
called SolrCloud 
 SolrCloud is flexible distributed search and indexing, without a master node to allocate nodes, shards and replicas 
 Solr uses ZooKeeper to manage these locations, depending on configuration files and schemas 
 Documents can be sent to any server and ZooKeeper will figure it out 
Slide 33 www.edureka.co/apache-solr
Features 
 Horizontal Scaling (For Sharding & Replication) 
 Elastic Scaling 
 High Availability 
 Distributed Indexing 
 Distribution Searching 
 Central Configuration For Entire Cluster 
 Automatic Load Balancing 
 Automatic Failover For Queries 
 Zookeeper Integration For Coordination & Configurations 
Slide 34 www.edureka.co/apache-solr
Architecture 
Slide 35 www.edureka.co/apache-solr
Job trends for Apache Solr 
Slide 36 www.edureka.co/apache-solr
Demo 
Slide 37 www.edureka.co/apache-solr
Disclaimer 
Criteria and guidelines mentioned in this presentation may change. Please visit our website for 
latest and additional information on Apache Solr 
Slide 38 www.edureka.co/apache-solr
Course Topics 
 Module 5 
» Solr Searching 
 Module 6 
» Solr Extended Features 
 Module 7 
» Solr Cloud & Administration 
 Module 8 
» Final Project 
 Module 1 
» Introduction to Apache Lucene 
 Module 2 
» Exploring Lucene 
 Module 3 
» Introduction to Apache Solr 
 Module 4 
» Solr Indexing 
Slide 39 www.edureka.co/apache-solr
References 
 http://www.indeed.com/jobtrends 
 Office.com Clip Art/ 
Slide 40 www.edureka.co/apache-solr
Introduction to APACHE SOLR

More Related Content

What's hot

Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoophadooparchbook
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsAlluxio, Inc.
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
 
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...Spark Summit
 
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora - Benchmark ...
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora  - Benchmark ...The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora  - Benchmark ...
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora - Benchmark ...Symphony Software Foundation
 
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...HostedbyConfluent
 
Building Your Data Streams for all the IoT
Building Your Data Streams for all the IoTBuilding Your Data Streams for all the IoT
Building Your Data Streams for all the IoTDevOps.com
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudAnshum Gupta
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022HostedbyConfluent
 
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopBig Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopDataWorks Summit
 
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashKeeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashAmazon Web Services
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)NAVER D2
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detailMIJIN AN
 

What's hot (20)

Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Data warehousing with Hadoop
Data warehousing with HadoopData warehousing with Hadoop
Data warehousing with Hadoop
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 
ELK introduction
ELK introductionELK introduction
ELK introduction
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
Accelerating Shuffle: A Tailor-Made RDMA Solution for Apache Spark with Yuval...
 
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora - Benchmark ...
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora  - Benchmark ...The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora  - Benchmark ...
The Alfresco ECM 1 Billion Document Benchmark on AWS and Aurora - Benchmark ...
 
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
Streaming 101 Revisited: A Fresh Hot Take With Tyler Akidau and Dan Sotolongo...
 
Building Your Data Streams for all the IoT
Building Your Data Streams for all the IoTBuilding Your Data Streams for all the IoT
Building Your Data Streams for all the IoT
 
Best practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloudBest practices for highly available and large scale SolrCloud
Best practices for highly available and large scale SolrCloud
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
SQL Extensions to Support Streaming Data With Fabian Hueske | Current 2022
 
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopBig Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
 
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashKeeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
ELK Stack
ELK StackELK Stack
ELK Stack
 
[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)[211] HBase 기반 검색 데이터 저장소 (공개용)
[211] HBase 기반 검색 데이터 저장소 (공개용)
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 

Similar to Introduction to APACHE SOLR

Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1YI-CHING WU
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdfAbanti Aazmin
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache SolrEdureka!
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Manish kumar
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)dnaber
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrLucidworks (Archived)
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1 GokulD
 
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your SiteDrupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your Sitenyccamp
 
Solr中国6月21日企业搜索
Solr中国6月21日企业搜索Solr中国6月21日企业搜索
Solr中国6月21日企业搜索longkeyy
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )'Moinuddin Ahmed
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialSourcesense
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 

Similar to Introduction to APACHE SOLR (20)

Apache Solr
Apache SolrApache Solr
Apache Solr
 
Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1Introduction to Lucene and Solr - 1
Introduction to Lucene and Solr - 1
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Apace Solr Web Development.pdf
Apace Solr Web Development.pdfApace Solr Web Development.pdf
Apace Solr Web Development.pdf
 
New-Age Search through Apache Solr
New-Age Search through Apache SolrNew-Age Search through Apache Solr
New-Age Search through Apache Solr
 
Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)Search Engine Capabilities - Apache Solr(Lucene)
Search Engine Capabilities - Apache Solr(Lucene)
 
Apache Lucene Searching The Web
Apache Lucene Searching The WebApache Lucene Searching The Web
Apache Lucene Searching The Web
 
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)Apache Lucene: Searching the Web and Everything Else (Jazoon07)
Apache Lucene: Searching the Web and Everything Else (Jazoon07)
 
Solr 101
Solr 101Solr 101
Solr 101
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Indexing Text and HTML Files with Solr
Indexing Text and HTML Files with SolrIndexing Text and HTML Files with Solr
Indexing Text and HTML Files with Solr
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1
 
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your SiteDrupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
Drupal and Apache Solr Search Go Together Like Pizza and Beer for Your Site
 
Solr中国6月21日企业搜索
Solr中国6月21日企业搜索Solr中国6月21日企业搜索
Solr中国6月21日企业搜索
 
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
Assamese search engine using SOLR by Moinuddin Ahmed ( moin )
 
Dev8d Apache Solr Tutorial
Dev8d Apache Solr TutorialDev8d Apache Solr Tutorial
Dev8d Apache Solr Tutorial
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 

More from Edureka!

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaEdureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaEdureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaEdureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaEdureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaEdureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaEdureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaEdureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaEdureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaEdureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaEdureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | EdurekaEdureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEdureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEdureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaEdureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaEdureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaEdureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaEdureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaEdureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | EdurekaEdureka!
 

More from Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
 

Recently uploaded

Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxVanesaIglesias10
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxJanEmmanBrigoli
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxRosabel UA
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 

Recently uploaded (20)

FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
ROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptxROLES IN A STAGE PRODUCTION in arts.pptx
ROLES IN A STAGE PRODUCTION in arts.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptx
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 

Introduction to APACHE SOLR

  • 1. www.edureka.co/apache-solr Introduction to APACHE SOLR View Apache Solr course details at www.edureka.co/apache-solr For Queries during the session and class recording: Post on Twitter @edurekaIN: #askEdureka Post on Facebook /edurekaIN For more details please contact us: US : 1800 275 9730 (toll free) INDIA : +91 88808 62004 Email Us : sales@edureka.co
  • 2. Slide 2 LIVE Online Class Class Recording in LMS 24/7 Post Class Support Module Wise Quiz Project Work Verifiable Certificate www.edureka.co/apache-solr How it Works?
  • 3. Objectives At the end of this module, you will be able to: Understand the need for search engine for enterprise grade applications Understand the objectives & challenges of search engine What is Indexing & Searching & Why do you need them ? What is Lucene & its overview? How is Indexing & Searching Handled in Lucene What is Solr & its features? What is Solr schema & its structure? Understand how to achieve Bigdata/NoSQL needs using SolrCloud  Explore job opportunity for Solr Developers Slide 3 www.edureka.co/apache-solr
  • 4. Introduction Apache Lucene Slide 4 www.edureka.co/apache-solr
  • 5. What is Lucene ?  Lucene is a powerful Java search library that lets you easily add search or Information Retrieval (IR) to applications  Used by LinkedIn, Twitter, … and many more (see http://wiki.apache.org/lucene-java/PoweredBy )  Scalable & High-performance Indexing  Powerful, Accurate and Efficient Search Algorithms  Cross-Platform Solution » Open Source & 100% pure Java » Implementations in other programming languages available that are index-compatible Doug Cutting “Creator” Slide 5 www.edureka.co/apache-solr
  • 6. Why Indexing ?  Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval  The purpose of storing an index is to optimize speed and performance in finding relevant documents for a search query  Without an index, the search engine would scan every document in the corpus, which would require considerable time and computing power  For example, while an index of 10,000 documents can be queried within milliseconds, a sequential scan of every word in 10,000 large documents could take hours Slide 6 www.edureka.co/apache-solr
  • 7. Indexing: Flow Tokens Inverted Index Document analysis indexing We can get a better idea of the flow of indexing from the following example: “edureka” Position:0 Offset:0 Length:7 “hadoop” Position:1 Offset:8 Length:6 “edureka hadoop” tokenization “Term Vector” “Term Vector” Slide 7 www.edureka.co/apache-solr
  • 8. Lucene: Writing to Index Document Field Field Field Field Analyzer IndexWriter Directory Classes used when indexing documents with Lucene Slide 8 www.edureka.co/apache-solr
  • 9. Lucene: Searching In Index  Query Parser translates a textual expression from the end into an arbitrarily complex query for searching Expression Query object QueryParser IndexSearcher Text fragments Analyzer Slide 9 www.edureka.co/apache-solr
  • 10. Lucene: Inverted Indexing Technique 1 1 1 3 1 1 1 3 1 1 1 3 1 1 1 3 1 1 9  Indexing uses Inverted Index technique (Ex: Book Index). Because indexes are faster to read documents Write a new segment for each new document insertion  Merge the segments when too many of them into the index. (Merge-sort technique to merge the index in to the store.)  Single updates are costly, preferred bulk updates due to merging Slide 10 www.edureka.co/apache-solr
  • 11. Lucene: Storage Schema  Like “databases” Lucene does not have common global schema  Lucene has indexes, which contains documents  Each document can have multiple fields  Each document can have different fields for every document  Fields can be only used to index & search or store it for retrieval  You can add new fields at any point of time Document-1 <Field1> <Field2> <Field3> Document-2 <Field2> <Field3> <Field4> Index-1 Slide 11 www.edureka.co/apache-solr
  • 12. Analyzers  Analyzers handle the job of analyzing text into tokens or keywords to be searched / indexed  An Analyzer builds TokenStreams, which analyze text and represents a policy for extracting index terms from text  There are few default Analyzers provided by Lucene, which can be used at the time of indexing or querying  Analyzers are provided to parse & analyze different languages like (Chinese, Japanese etc.,) Reader Tokenizer TokenFilter TokenFilter TokenFilter Tokens Slide 12 www.edureka.co/apache-solr
  • 13. Analyzers (Contd.) Core Class Examples (org.apache.lucene.analysis.Analyzer)  SmartChineseAnalyzer  SnowballAnalyzer  SynonymAnalyzer  StandardAnalyzer  StopAnalyzer  WhitespaceAnalyzer LowerCaseFilter  PorterStemFilter  ChineseAnalyzer  CzechAnalyzer  ShingleAnalyzerWrapper  SimpleAnalyzer Slide 13 www.edureka.co/apache-solr
  • 14. Querying: Key Types / Classes TermQuery  BooleanQuery  WildcardQuery  PhraseQuery  PrefixQuery  MultiPhraseQuery  FuzzyQuery RegexpQuery TermRangeQuery NumericRangeQuery  ConstantScoreQuery  DisjunctionMaxQuery MatchAllDocsQuery Query Slide 14 www.edureka.co/apache-solr
  • 15. Scoring: Score Boosting  Document’s weight / score can be changed from default, which is called as boosting  Lucene allows influencing search results by "boosting" at different times: Scoring Index Time Query Time Index-time boost by calling Field.setBoost() before a document is added to the index Query-time boost by setting a boost on a query clause, calling Query.setBoost() Slide 15 www.edureka.co/apache-solr
  • 16. Key Features Faceting Highlighting Grouping Joins Spatial Search Apache Tika Support Slide 16 www.edureka.co/apache-solr
  • 17. Introduction Apache Solr Slide 17 www.edureka.co/apache-solr
  • 18. Search Engine: Why do I need them? 1. Text Based Search 2. Filter 3. Documents 1 2 3 Slide 18 www.edureka.co/apache-solr
  • 19. Solr: Introduction  Solr is an open source enterprise search server / web application  Solr Uses the Lucene Search Library and extends it  Solr exposes lucene Java API’s as REST-Full services You put documents in it (called "indexing") via XML, JSON, CSV or binary over HTTP You query it via HTTP GET and receive XML, JSON, CSV or binary results Slide 19 www.edureka.co/apache-solr
  • 20. Solr: History  In 2004, Solr was created by “Yonik Seeley” at CNET Networks as an in-house project to add search capability for the company website  In January 2006, CNET Networks decided to openly publish the source code by donating it to the Apache Software Foundation under the Lucene top-level project  In September 2008, Solr 1.3 was released with many enhancements including distributed search capabilities and performance enhancements among many others  In October 2012 Solr version 4.0 was released, including the new SolrCloud feature Yonik Seeley Slide 20 www.edureka.co/apache-solr
  • 21. Solr: Key Features Advanced Full-Text Search Capabilities Optimized for High Volume Web Traffic Standards Based Open Interfaces - XML, JSON and HTTP Comprehensive HTML Administration Interfaces Server statistics exposed over JMX for monitoring Near Real-time indexing and Adaptable with XML Configuration Linearly scalable, auto index replication, auto, Extensible Plugin Architecture Slide 21 www.edureka.co/apache-solr
  • 22. Solr: Architecture Slide 22 www.edureka.co/apache-solr
  • 23. Solr: Admin UI Slide 23 www.edureka.co/apache-solr
  • 24. Solr Instance Solr: Schema Hierarchy Core/Index Documents Field Field Core/Index Core/Index Indexing & Querying Schema.xml Slide 24 www.edureka.co/apache-solr
  • 25. Solr: Core  Solr Core: Also referred to as just a "Core"  This is a running instance of a Lucene index along with all the Solr configuration (SolrConfigXml, SchemaXml, etc...) required to use it  A single Solr application can contain 0 or more cores  Cores are run largely in isolation but can communicate with each other if necessary via the CoreContainer  Solr initially only supported one index, and the SolrCore class was a singleton for coordinating the low-level functionality at the "core" of Solr Slide 25 www.edureka.co/apache-solr
  • 26. Solr: Documents & Fields  Solr's basic unit of information is a document, which is a set of data that describes something Documents are composed of fields, which are more specific pieces of information  Fields can contain different kinds of data. A name field, for example, is text (character data) The field type tells Solr how to interpret the field and how it can be queried Slide 26 www.edureka.co/apache-solr
  • 27. Solr: Indexing Data  A Solr index can accept data from many different sources, including XML files, comma-separated value (CSV) files, data extracted from tables in a database, and files in common file formats such as Microsoft Word or PDFs Here are the three most common ways of loading data into a Solr index:  Uploading XML files by sending HTTP requests to the Solr  Using Index Handlers to Import from databases  Using the Solr Cell framework  Writing a custom Java application to ingest data through Solr's Java Client Slide 27 www.edureka.co/apache-solr
  • 28. Analysis Analyzers Tokenizers Filters Solr: Analysis  There are three main concepts in analysis: analyzers, tokenizers, and filters  Analyzers are used both during, when a document is indexed, and at query time » The same analysis process need not be used for both operations » An analyzer examines the text of fields and generates a token stream » Analyzers may be a single class or they may be composed of a series of tokenizer and filter classes  Tokenizers break field data into lexical units, or tokens  Filters examine a stream of tokens and keep them, transform or discard them, or create new ones Slide 28 www.edureka.co/apache-solr
  • 29. Solr: solrconfig.xml Lib directives indicates where Solr can find JAR files for extensions Register event handlers for searcher events; for example queries To execute to warm new searchers Activates version-dependent features in Lucene Index management settings Enable JMX instrumentation of Solr MBeans Update handler for indexing documents Cache-management settings Slide 29 www.edureka.co/apache-solr
  • 30. Solr: Search Process qt: selects a RequestHandler for a query using/select(by default ,the DisMaxRequestHandler is used) Request Handler defType : selects a query parser for the query (by default, uses whatever has been configured for the RequestHandler) Query Parser Response Writer qf: selects which fields to query in the index(by default, all fields are required) Index wt: selects a response writer for formatting the query response fq: filters query by applying an additional query to the initial query’s results, caches the results Rows: specifies the number of rows to be displayed at one time Start: specifies an offset(by default 0) into the query results where the returned response should begin Slide 30 www.edureka.co/apache-solr
  • 31. Solr Features  Faceting Highlighting  Spell Checking Query-Re-ranking Transforming  Suggestors More Like This  Pagination Grouping & Clustering  Spatial Search  Components Real time (Get & Update)  LABS Slide 31 www.edureka.co/apache-solr
  • 32. Configuring Solr Instances / Cores Solr Configurations Solfrconfig.xml Solr.xml Core.properties Schema.xml Slide 32 www.edureka.co/apache-solr
  • 33. SolrCloud Introduction  Apache Solr includes the ability to set up a cluster of Solr servers that combines fault tolerance and high availability called SolrCloud  SolrCloud is flexible distributed search and indexing, without a master node to allocate nodes, shards and replicas  Solr uses ZooKeeper to manage these locations, depending on configuration files and schemas  Documents can be sent to any server and ZooKeeper will figure it out Slide 33 www.edureka.co/apache-solr
  • 34. Features  Horizontal Scaling (For Sharding & Replication)  Elastic Scaling  High Availability  Distributed Indexing  Distribution Searching  Central Configuration For Entire Cluster  Automatic Load Balancing  Automatic Failover For Queries  Zookeeper Integration For Coordination & Configurations Slide 34 www.edureka.co/apache-solr
  • 35. Architecture Slide 35 www.edureka.co/apache-solr
  • 36. Job trends for Apache Solr Slide 36 www.edureka.co/apache-solr
  • 37. Demo Slide 37 www.edureka.co/apache-solr
  • 38. Disclaimer Criteria and guidelines mentioned in this presentation may change. Please visit our website for latest and additional information on Apache Solr Slide 38 www.edureka.co/apache-solr
  • 39. Course Topics  Module 5 » Solr Searching  Module 6 » Solr Extended Features  Module 7 » Solr Cloud & Administration  Module 8 » Final Project  Module 1 » Introduction to Apache Lucene  Module 2 » Exploring Lucene  Module 3 » Introduction to Apache Solr  Module 4 » Solr Indexing Slide 39 www.edureka.co/apache-solr
  • 40. References  http://www.indeed.com/jobtrends  Office.com Clip Art/ Slide 40 www.edureka.co/apache-solr