RediSearch Mumbai Meetup 2020

{ "RediSearch" : " " }
github : vikram-sahu twitter : vikramsahu_
< Vikram Sahu
Developer Evangelist
Pepipost />

Any Newbies?
What is Redis?
- NoSQL database
- Key-value database
- In-memory database
Key Features:
- Caching.
- Message broker.
- Queues.
- Counters.

Disclaimer: No theories, No stories. Each part of this presentation is made with ❤️ and is based on personal experience.
Agenda
1. Introduction to Redis Modules.
2. Deep diving in RediSearch.
3. Where RediSearch can be used? (use cases).
4. Installation & configuration.
5. Hands-on session on RediSearch.
6. Garbage collection.
7. Q & A session.

Problem Statement
Why we need Search ?
History behind Search? UX got better!

Inspiration while building a
Search Engine ?
Why is it best ?
- There are number of reasons
- Algorithms
- Minimalist design
- Best Auto-complete Suggestion till
now.

Things that matter the most
- Type of data to be used in search.
- Common requirement for any application
- UX
- Key Pointers
- Add value to the user.
- Easy to access.
- Speed matter.

How fast is fast?
0 - 5ms >> Too fast
5ms - 500ms >> fast
500ms - 1s >> not fast but not that slow
1s - 2s >> slow
2s - 3s >> possibility you user will
leave

What are Modules?
- Add-ons / Dynamic libraries.
- Every new libraries brings new commands & data types.
- This libraries are built on top of system programming language
(c,c++,rust,golang) due to which probability of performance is
always high.
- Modules are simple to load without making much changes in
running Redis server.
- Recommended : Always configure modules in redis.conf

How modules help us ?
- Redis has implemented their own algorithms &
structure to build this modules.
- Less memory time.
- Less CPU time.
- Introduction new data structures and
functionalities.

Redis Modules.
- neural-redis
- RediSearch
- RedisJSON
- rediSQL
- redis-cell
- RedisGraph
- RedisML
- RedisTimeSeries
- RedisBloom
- Cthulhu
- redis-cuckoofilter
- Many more.... you can find on (https://redis.io/modules)

What is RediSearch ?
- Search Engine is build on top of Redis.
- Wide range of client libraries and community support.
- Features
- Fastest since it is written in C.
- Full text search
- Secondary indexing
- Autocomplete
- Stores documents as hashes
- Scales up to billions of documents
- Non blocking updates and inserts.
- Optimized data structures.

Note : Above data is taken from official redis page
Few Stats and Facts
Benchmark : 58-60% faster than other NOSQL
databases.
Upload time : 50K indices in 201 sec.
Runs on : DRAM and persistent memory.
Medium of Query : RESP(Redis serialization protocol).
Autocomplete : It uses Radix tree(very optimized & 50%
compression rate).

What is Trie and Radix tree?
Inserts : O(n) && Find : O(1)

Installing Redis
- wget http://download.redis.io/redis-stable.tar.gz
- tar xvzf redis-stable.tar.gz
- cd redis-stable
- make
OR
- yum install -y redis (CentOS)
- apt-get install -y redis-server (Ubuntu)
- dnf install -y redis (Fedora)
Starting service :
- sudo systemctl enable redis
- sudo systemctl redis start && systemctl redis status

Installing RediSearch dependencies
CentOS
- sudo yum group install "Development Tools"
- sudo yum --setopt=group_package_types=mandatory,default,optional
groupinstall "Development Tools"
- gcc --version
Ubuntu
- sudo apt install build-essential
- sudo apt-get install manpages-dev
- gcc --version
MacOS
- xcode-select --install

Installing RediSearch
- git clone https://github.com/RedisLabsModules/RediSearch.git
- cd RediSearch
- make build
OR
- make all
- cd src
Check for redisearch file after compilation.
ll ./redisearch.so

Configuration
- In redis.conf (/etc/redis.conf)
loadmodule /path/if/any/to/redisearch.so OPTX OPTY
- In redis-cli
127.0.0.6379> MODULE load redisearch.so OPTX OPTY
- In command line
redis-server --loadmodule ./redisearch.so OPTX OPTY

Configuration Options
- TIMEOUT (default :
500)
- NO_TIMEOUT{policy} (default : RET)
- EXTLOAD{filename}
- MINPREFIX (default : 2)
- SAFEMODE (default :
OFF) (deprecated)
- CONCURRENT_WRITE_MODE (default : disabled)
- MAXEXPANSIONS (default : 200)
- FRISOINI
- CURSOR_MAX_IDLE (default : 300000)
- GC_SCANSIZE (default : 200)
- GC_POLICY (default :
FORK)
- NOGC

Simple Query
Creating Index
FT.CREATE pepi1 SCHEMA to TEXT WEIGHT 5.0 fromadd TEXT subj TEXT
body TEXT
> OK
type idx:pepi1
ft_index0
Adding Document
FT.ADD pepi1 client2 1.0 FIELDS to "vikram2@pepipost.com" fromadd
"inf2o@google.com" subj "Welcome to RedisMeetup-2" body "lorem
ipsum"
Searching within document
FT.SEARCH pepi1 "lorem"

Query Processing
Exact match
"hello world"
OR expressed as (|)
hell|hey|hello
NOT expressed as (-)
-foo or -@title:(foo|bar)
Prefix match (*)
hell*
Querying specific field
@fieldname:hello,Geeks
Numeric range
@field:[{min} {max}]
Geo Radius
@field:[{lon} {lat} {radius}
{m|km|mi|ft}]
Tag filter
@field:{tag | tag | ...}
Optional search
foo ~bar

How we created an index?
FT.CREATE {index}
[MAXTEXTFIELDS] [TEMPORARY {seconds}] [NOOFFSETS] [NOHL]
[NOFIELDS] [NOFREQS]
[STOPWORDS {num} {stopword} ...]
SCHEMA {field} [TEXT [NOSTEM] [WEIGHT {weight}] [PHONETIC
{matcher}] | NUMERIC | GEO | TAG [SEPARATOR {sep}] ]
[SORTABLE][NOINDEX] ...
Complexity : O(1)
Example:
FT.CREATE idx SCHEMA name TEXT SORTABLE age NUMERIC SORTABLE myTag
TAG SORTABLE

Adding Document
FT.ADD {index} {docId} {score}
[NOSAVE]
[REPLACE [PARTIAL] [NOCREATE]]
[LANGUAGE {language}]
[PAYLOAD {payload}]
[IF {condition}]
FIELDS {field} {value} [{field} {value}...]
Complexity : O(n)
Example:
FT.CREATE idx SCHEMA name TEXT SORTABLE age NUMERIC SORTABLE myTag
TAG SORTABLE

Alter Document
FT.ALTER {index} SCHEMA ADD {field} {options}
Complexity :- O(1)
Example:
FT.ALTER idx SCHEMA ADD id2 NUMERIC SORTABLE

Adding Aliases
FT.ALIASADD {name} {index}
FT.ALIASUPDATE {name} {index}
FT.ALIASDEL {name}
FT.ALTER {index} ALIAS DEL {alias}
Complexity : O(1)

Getting Info about Index
FT.INFO {index}
Complexity : O(1)
Example :
FT.INFO pepi1

Advance Search
FT.SEARCH {index} {query} [NOCONTENT] [VERBATIM] [NOSTOPWORDS] [WITHSCORES]
[WITHPAYLOADS] [WITHSORTKEYS]
[FILTER {numeric_field} {min} {max}] ...
[GEOFILTER {geo_field} {lon} {lat} {radius} m|km|mi|ft]
[INKEYS {num} {key} ... ]
[INFIELDS {num} {field} ... ]
[RETURN {num} {field} ... ]
[SUMMARIZE [FIELDS {num} {field} ... ] [FRAGS {num}] [LEN {fragsize}]
[SEPARATOR {separator}]]
[HIGHLIGHT [FIELDS {num} {field} ... ] [TAGS {open} {close}]]
[SLOP {slop}] [INORDER]
[LANGUAGE {language}]
[EXPANDER {expander}]
[SCORER {scorer}] [EXPLAINSCORE]
[PAYLOAD {payload}]
[SORTBY {field} [ASC|DESC]]
[LIMIT offset num]

Aggregate Query
FT.AGGREGATE {index_name}
{query_string}
[VERBATIM]
[LOAD {nargs} {property} ...]
[GROUPBY {nargs} {property} ...
REDUCE {func} {nargs} {arg} ... [AS {name:string}]
...
] ...
[SORTBY {nargs} {property} [ASC|DESC] ... [MAX {num}]]
[APPLY {expr} AS {alias}] ...
[LIMIT {offset} {num}] ...
[FILTER {expr}] ...
Example:
FT.AGGREGATE idx "@email:"vikram@""
APPLY "@timestamp - (@timestamp % 86400)" AS day
GROUPBY 2 @day @country
REDUCE count 0 AS num_visits
SORTBY 4 @day ASC @country DESC

Explain Query
FT.EXPLAIN {index} {query}
Output:
127.0.0.1:6379> FT.EXPLAIN pepi1 "(foo bar)|(hello world) @date:[100
200]|@date:[500 +inf]"
INTERSECT {
UNION {
INTERSECT {
foo
bar
}
INTERSECT {
hello
world
}
}
UNION {
NUMERIC {100.000000 <= x <= 200.000000}
NUMERIC {500.000000 <= x <= inf}
}
}

Get Single Document
FT.GET {index} {DocId}
Example: FT.GET pepi1 client2
FT.MGET {index} {DocId} ..
Example: FT.GET pepi1 client1 client2

Danger Zone
FT.DEL {index} {docId}
Example:
FT.DEL pepi1 client1
FT.DROP {index} [KEEPDOCS]
Example:
FT.DROP pepi1 KEEPDOCS

Adding AutoComplete
FT.SUGADD {key} {string} {score} [INCR] [PAYLOAD {payload}]
Example:
FT.SUGADD ac "hello world" 1
FT.SUGGET {key} {prefix} [FUZZY] [WITHSCORES] [WITHPAYLOADS] [MAX
num]
Example:
FT.SUGGET ac hell FUZZY MAX 3 WITHSCORES
FT.SUGDEL {key} {string} #deletes the key
FT.SUGLEN {key} #gets length of the key

Garbage Collection
- Why we need GC?
- To avoid unnecessary memory usage.
- To keep queries faster every time.
- GC for single Term index
- Single Term index : array of blocks with encoded list of
records.
- Any specific Algorithm ?
- GC & concurrency
- Multi threaded concurrent query execution model
- How POV and POC got conflict?

Any Question?

Thank You

RediSearch Mumbai Meetup 2020

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to RediSearch Mumbai Meetup 2020

Similar to RediSearch Mumbai Meetup 2020 (20)

Recently uploaded

Recently uploaded (20)

RediSearch Mumbai Meetup 2020