In this talk, An introduction to RediSearch, how and when to use the RediSearch for different scenarios is explained.
YouTube: https://www.youtube.com/watch?v=RlY-tprKzxg
4. Disclaimer: No theories, No stories. Each part of this presentation is made with ❤️ and is based on personal experience.
Agenda
1. Introduction to Redis Modules.
2. Deep diving in RediSearch.
3. Where RediSearch can be used? (use cases).
4. Installation & configuration.
5. Hands-on session on RediSearch.
6. Garbage collection.
7. Q & A session.
5. Problem Statement
Why we need Search ?
github : vikram-sahu twitter : vikramsahu_
History behind Search? UX got better!
6. Inspiration while building a
Search Engine ?
Why is it best ?
- There are number of reasons
- Algorithms
- Minimalist design
- Best Auto-complete Suggestion till
now.
github : vikram-sahu twitter : vikramsahu_
7. github : vikram-sahu twitter : vikramsahu_
Things that matter the most
- Type of data to be used in search.
- Common requirement for any application
- UX
- Key Pointers
- Add value to the user.
- Easy to access.
- Speed matter.
8. github : vikram-sahu twitter : vikramsahu_
How fast is fast?
0 - 5ms >> Too fast
5ms - 500ms >> fast
500ms - 1s >> not fast but not that slow
1s - 2s >> slow
2s - 3s >> possibility you user will
leave
9. What are Modules?
- Add-ons / Dynamic libraries.
- Every new libraries brings new commands & data types.
- This libraries are built on top of system programming language
(c,c++,rust,golang) due to which probability of performance is
always high.
- Modules are simple to load without making much changes in
running Redis server.
- Recommended : Always configure modules in redis.conf
github : vikram-sahu twitter : vikramsahu_
10. github : vikram-sahu twitter : vikramsahu_
How modules help us ?
- Redis has implemented their own algorithms &
structure to build this modules.
- Less memory time.
- Less CPU time.
- Introduction new data structures and
functionalities.
11. Redis Modules.
- neural-redis
- RediSearch
- RedisJSON
- rediSQL
- redis-cell
- RedisGraph
- RedisML
- RedisTimeSeries
- RedisBloom
- Cthulhu
- redis-cuckoofilter
- Many more.... you can find on (https://redis.io/modules)
github : vikram-sahu twitter : vikramsahu_
12. github : vikram-sahu twitter : vikramsahu_
What is RediSearch ?
- Search Engine is build on top of Redis.
- Wide range of client libraries and community support.
- Features
- Fastest since it is written in C.
- Full text search
- Secondary indexing
- Autocomplete
- Stores documents as hashes
- Scales up to billions of documents
- Non blocking updates and inserts.
- Optimized data structures.
13. Note : Above data is taken from official redis page
Few Stats and Facts
Benchmark : 58-60% faster than other NOSQL
databases.
Upload time : 50K indices in 201 sec.
Runs on : DRAM and persistent memory.
Medium of Query : RESP(Redis serialization protocol).
Autocomplete : It uses Radix tree(very optimized & 50%
compression rate).
14. github : vikram-sahu twitter : vikramsahu_
What is Trie and Radix tree?
Inserts : O(n) && Find : O(1)
15. github : vikram-sahu twitter : vikramsahu_
Installing Redis
- wget http://download.redis.io/redis-stable.tar.gz
- tar xvzf redis-stable.tar.gz
- cd redis-stable
- make
OR
- yum install -y redis (CentOS)
- apt-get install -y redis-server (Ubuntu)
- dnf install -y redis (Fedora)
Starting service :
- sudo systemctl enable redis
- sudo systemctl redis start && systemctl redis status
17. github : vikram-sahu twitter : vikramsahu_
Installing RediSearch
- git clone https://github.com/RedisLabsModules/RediSearch.git
- cd RediSearch
- make build
OR
- make all
- cd src
Check for redisearch file after compilation.
ll ./redisearch.so
18. github : vikram-sahu twitter : vikramsahu_
Configuration
- In redis.conf (/etc/redis.conf)
loadmodule /path/if/any/to/redisearch.so OPTX OPTY
- In redis-cli
127.0.0.6379> MODULE load redisearch.so OPTX OPTY
- In command line
redis-server --loadmodule ./redisearch.so OPTX OPTY
20. github : vikram-sahu twitter : vikramsahu_
Simple Query
Creating Index
FT.CREATE pepi1 SCHEMA to TEXT WEIGHT 5.0 fromadd TEXT subj TEXT
body TEXT
> OK
type idx:pepi1
ft_index0
Adding Document
FT.ADD pepi1 client2 1.0 FIELDS to "vikram2@pepipost.com" fromadd
"inf2o@google.com" subj "Welcome to RedisMeetup-2" body "lorem
ipsum"
Searching within document
FT.SEARCH pepi1 "lorem"
21. github : vikram-sahu twitter : vikramsahu_
Query Processing
Exact match
"hello world"
OR expressed as (|)
hell|hey|hello
NOT expressed as (-)
-foo or -@title:(foo|bar)
Prefix match (*)
hell*
Querying specific field
@fieldname:hello,Geeks
Numeric range
@field:[{min} {max}]
Geo Radius
@field:[{lon} {lat} {radius}
{m|km|mi|ft}]
Tag filter
@field:{tag | tag | ...}
Optional search
foo ~bar
22. github : vikram-sahu twitter : vikramsahu_
How we created an index?
FT.CREATE {index}
[MAXTEXTFIELDS] [TEMPORARY {seconds}] [NOOFFSETS] [NOHL]
[NOFIELDS] [NOFREQS]
[STOPWORDS {num} {stopword} ...]
SCHEMA {field} [TEXT [NOSTEM] [WEIGHT {weight}] [PHONETIC
{matcher}] | NUMERIC | GEO | TAG [SEPARATOR {sep}] ]
[SORTABLE][NOINDEX] ...
Complexity : O(1)
Example:
FT.CREATE idx SCHEMA name TEXT SORTABLE age NUMERIC SORTABLE myTag
TAG SORTABLE
32. github : vikram-sahu twitter : vikramsahu_
Adding AutoComplete
FT.SUGADD {key} {string} {score} [INCR] [PAYLOAD {payload}]
Example:
FT.SUGADD ac "hello world" 1
FT.SUGGET {key} {prefix} [FUZZY] [WITHSCORES] [WITHPAYLOADS] [MAX
num]
Example:
FT.SUGGET ac hell FUZZY MAX 3 WITHSCORES
FT.SUGDEL {key} {string} #deletes the key
FT.SUGLEN {key} #gets length of the key
33. github : vikram-sahu twitter : vikramsahu_
Garbage Collection
- Why we need GC?
- To avoid unnecessary memory usage.
- To keep queries faster every time.
- GC for single Term index
- Single Term index : array of blocks with encoded list of
records.
- Any specific Algorithm ?
- GC & concurrency
- Multi threaded concurrent query execution model
- How POV and POC got conflict?