SlideShare a Scribd company logo
1 of 76
Radix Tree, IDR APIs
and their test suite
Rehas Sachdeva & Sandhya Bankar
Overview
• What is a Radix tree?
• Applications of radix tree
• Kernel radix tree API
• Enhancing the test suite
What is a Radix tree?
Radix tree
Space optimized trie
• Stores a key to value
mapping.
• the edges are labelled by a
sequence of characters or
bits.
• Root to leaf path holds the key
and the leaf holds the value.
• Space optimized.
• Fast lookup.
https://en.wikipedia.org/wiki/Radix_tree
Radix tree applications
• General applications
–IP routing
•hierarchical organization of IP addresses.
–Search
•inverted indexes for text documents
• Kernel specific uses
– Page Cache
•Check presence in cache, dirty tag or under writeback etc.
– As resizeable arrays
•drivers, filesystems, interrupt controllers.
Kernel radix tree API
Node structure
Node Info: shift, offset,
count, parent pointer,
root pointer, tags etc.
Array of slots
• Each node contains (2^map_shift) pointers in slots array.
• Slots point to an item in the leaf node, and next, deeper node, in an internal node.
• Depth of node ~ which chunk of bits of key is used to index the slots.
#define RADIX_TREE_MAP_SIZE (1UL << RADIX_TREE_MAP_SHIFT)
...
struct radix_tree_node {
unsigned char shift; /* Bits remaining in each slot */
unsigned char offset; /* Slot offset in parent */
unsigned char count; /* Total entry count */
unsigned char exceptional; /* Exceptional entry count */
...
void __rcu *slots[RADIX_TREE_MAP_SIZE];
unsigned long tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS];
};
Initializing a radix tree
• #define RADIX_TREE(name, mask) 
struct radix_tree_root name = RADIX_TREE_INIT(mask)
Example: RADIX_TREE(tree, GFP_KERNEL);
– initializes a radix tree with the given name.
– gfp_mask to tell the code how memory allocations are to be performed.
– GFP_ATOMIC for atomic insertions, GFP_KERNEL for kernel-internal
allocations and so on.
Inserting an entry
• A tree of height N can contain any index between 0 and (2^(map_shift*N))-1.
• If the new index to be inserted is larger than the current max index, insert new nodes
above the current top node to create a deeper tree.
• Failure cases: should a memory allocation fail (-ENOMEM) or an entry already exists
at the index (-EEXIST).
Inserting an entry
Consider the following tree as example. Only 1 bit is used to index the slots at each node.
Inserting an entry
H is inserted, only first 2 bits need to be considered to uniquely lookup for it.
Inserting an entry
I is inserted. Nodes are created as all 5 bits need to be considered.
Inserting an entry
• root: radix tree root
• index: index key
• order: key covers the 2^order indices around index
• tem: item to insert
• static inline int radix_tree_insert(struct radix_tree_root *root,
unsigned long index, void *entry);
– For inserting an entry. Wrapper around __radix_tree_insert for 0 order entry.
• int __radix_tree_insert(struct radix_tree_root *, unsigned long index,
unsigned order, void *item);
– For inserting an entry of arbitrary order.
Deleting an entry
• If deleting an element results in a top node with only one child at offset 0, replace the top
node with its only child, creating a shallower tree. Consider the following tree as example.
Deleting an entry
• If deleting an element results in a top node with only one child at offset 0, replace the top
node with its only child, creating a shallower tree. Consider the following tree as example.
Deleting an entry
• If deleting an element results in a top node with only one child at offset 0, replace the top
node with its only child, creating a shallower tree. Consider the following tree as example.
Deleting an entry
• root: radix tree root
• index: index key
• void *radix_tree_delete(struct radix_tree_root *root, unsigned long index);
• item: expected item
• void *radix_tree_delete_item(struct radix_tree_root *root, unsigned long index, void *item);
– Delete if the entry at index is expected item.
• iter: iterator state
• slot: pointer to slot
• void radix_tree_iter_delete(struct radix_tree_root *root,
struct radix_tree_iter *iter, void __rcu **slot);
– Delete the entry at this iterator position
Lookup
• root: radix tree root
• index: index key
• void *radix_tree_lookup(const struct radix_tree_root *root, unsigned long index);
– looks for key in the tree and returns the associated item (or NULL on failure).
• results: where the results of the lookup are placed
• first_index: start the lookup from this key
• max_items: place up to this many items at *results
• unsigned int radix_tree_gang_lookup(const struct radix_tree_root *root, void **results,
unsigned long first_index, unsigned int max_items);
– perform multiple lookups.
• void __rcu **radix_tree_lookup_slot(const struct radix_tree_root *root,
unsigned long index);
– lookup a slot at index.
Iteration
• root: radix tree root
• index: index key
• void *radix_tree_lookup(const struct radix_tree_root *root, unsigned long index);
– looks for key in the tree and returns the associated item (or NULL on failure).
• results: where the results of the lookup are placed
• first_index: start the lookup from this key
• max_items: place up to this many items at *results
• unsigned int radix_tree_gang_lookup(const struct radix_tree_root *root, void **results,
unsigned long first_index, unsigned int max_items);
– perform multiple lookups.
• void __rcu **radix_tree_lookup_slot(const struct radix_tree_root *root,
unsigned long index);
– lookup a slot at index.
Tags
• root: radix tree root
• index: index key
• void *radix_tree_delete(struct radix_tree_root *root, unsigned long index);
• item: expected item
• void *radix_tree_delete_item(struct radix_tree_root *root, unsigned long index, void *item);
– Delete if the entry at index is expected item.
• iter: iterator state
• slot: pointer to slot
• void radix_tree_iter_delete(struct radix_tree_root *root,
struct radix_tree_iter *iter, void __rcu **slot);
– Delete the entry at this iterator position
Multiorder
• root: radix tree root
• index: index key
• void *radix_tree_delete(struct radix_tree_root *root, unsigned long index);
• item: expected item
• void *radix_tree_delete_item(struct radix_tree_root *root, unsigned long index, void *item);
– Delete if the entry at index is expected item.
• iter: iterator state
• slot: pointer to slot
• void radix_tree_iter_delete(struct radix_tree_root *root,
struct radix_tree_iter *iter, void __rcu **slot);
– Delete the entry at this iterator position
Radix Tree Test Suite
Test Suite
• Merged into Linux 4.6.
• Location: tools/testing/radix-tree.
• Regression tests, functional tests and performance tests.
• Short run or long run.
• Levels of verbose output.
Regression tests
• Merged into Linux 4.6.
• Location: tools/testing/radix-tree.
• Regression tests, functional tests and performance tests.
• Short run or long run.
• Levels of verbose output.
Functional tests
• Merged into Linux 4.6.
• Location: tools/testing/radix-tree.
• Regression tests, functional tests and performance tests.
• Short run or long run.
• Levels of verbose output.
Performance tests
• Merged into Linux 4.6.
• Location: tools/testing/radix-tree.
• Regression tests, functional tests and performance tests.
• Short run or long run.
• Levels of verbose output.
Enhancements as part of Outreachy Project
• Adding different levels of verbosity to output of test suite.
• #define printv(verbosity_level, fmt, ...) 
if(test_verbose >= verbosity_level) 
printf(fmt, ##__VA_ARGS__)
– Idea extendible to many areas parts in kernel, for debugging, testing etc.
• Config option in makefile to test for various values of map shift.
mapshift:
@if ! grep -qw $(SHIFT) generated/map-shift.h; then

echo "#define RADIX_TREE_MAP_SHIFT $(SHIFT)" > 
generated/map-shift.h;

fi
• Config option to build tests for 32 bit or 64 bit machine.
Enhancements as part of Outreachy Project
• Automate generation of .gcov files to check their test coverage.
• Adding new functional tests.
– idr_get_next()
– ida_simple_get()
– ida_simple_remove()
– radix_tree_clear_tags()
• Adding new performance tests.
–For radix tree insertion, deletion, tagging, join and split.
Enhancements as part of Outreachy Project
• Functional test example void radix_tree_clear_tags_test(void) {
...
item_insert(&tree, 0);
item_tag_set(&tree, 0, 0);
__radix_tree_lookup(&tree, 0, &node, &slot);
radix_tree_clear_tags(&tree, node, slot);
assert(item_tag_get(&tree, 0, 0) == 0);
for (index = 0; index < 1000; index++) {
item_insert(&tree, index);
item_tag_set(&tree, index, 0);
}
radix_tree_for_each_slot(slot, &tree, &iter, 0) {
radix_tree_clear_tags(&tree, iter.node,
slot);
assert(item_tag_get(&tree, iter.index, 0)
== 0);
}
Enhancements as part of Outreachy Project
• Performance test example
static long long __benchmark_split(unsigned long index,
int old_order, int new_order)
{
struct timespec start, finish;
long long nsec;
...
item_insert_order(&tree, index, old_order);
clock_gettime(CLOCK_MONOTONIC, &start);
radix_tree_split(&tree, index, new_order);
clock_gettime(CLOCK_MONOTONIC, &finish);
nsec = (finish.tv_sec - start.tv_sec) * NSEC_PER_SEC +
(finish.tv_nsec - start.tv_nsec);
...
}
References
• https://lwn.net/Articles/175432/
• http://events.linuxfoundation.org/sites/events/files/slides/Linux
ConNA2016%20-%20Radix%20Tree.pdf
• Paul McKenney on RCU: https://vimeo.com/113961292
• http://ppt-online.org/15597
Questions
Thank You
Implement IDR in
__alloc_fd()
Sandhya Bankar
About me
• Linux Kernel Intern
through Outreachy in The
Linux Foundation with the
support of mentors Rik
van Riel and Matthew
Wilcox.
• Master of Engineering in
Computer networks
• Bachelor of Engineering in
Electronics and
communication.
• Goal
• IDR
• Allocate and manage
file descriptor using IDR
• IDR API used in project
• Testing
• Result
• Conclusion
• Reference
Overview
• Linux kernel has lots of special allocators
• However, there now is an IDR library that can
do allocation of numbers for us
• Simplify the kernel by replacing custom
allocators with common allocation code
Goal of the Project
IDR
- IDR is a type of radix tree that maps integer IDs with specific
pointer values.
- Originally written for POSIX timer system call implementations. It
generates the ID that can handle a specific timer object It is now
widely used in various device drivers.
- IDR takes a given pointer and creates the corresponding integer
ID. With that ID, you can quickly find the original pointer.
Allocate and manage file
descriptor using IDR
About project
- Implement IDR in file descriptor allocation code path
- Replace custom allocator with IDR
- Remove struct fdtable
- Convert select() to implement idr_get_tag_batch()
- Replace close_on_exec bitmap with an IDR tag
- Use idr_tag_set() and idr_tag_get() for close_on_exec
operation.
- Rewrite close_files()
- Use idr_tag_get in fd_is_open()
- Remove full_fds_bits, open_fds bitmaps
Cont…
- Replace array of file pointer with IDR
- Remove next_fd
- Memory Saving
- Performance improvement
File Descriptor
- File descriptor is used to access a file or other I/O
resources (e. g pipe and socket)
- A file descriptor is a non-negative integer, generally
represented in the C programming language as the
type int (negative values being reserved to indicate "no
value" or an error condition).
Cont...
- Each Linux should expect to have three standard
POSIX file descriptors, corresponding to the
three standard streams
- stdin
- stdout
- stderr
Operations on file descriptors
- open() - open a file
- creat() - create a new file / rewrite an existing one
- pipe() - creates a pipe
- read() - read from a file descriptor
- write() - write to a file descriptor
- close() - close a file descriptor
- lseek() - reposition read/write file offset
- select() - synchronous I/O multiplexing
- socket() - create an endpoint for communication
- accept() - accept a connection on a socket
- dup(), dup2() - duplicate an open file descriptor
Before IDR implementation – open()
IDR API used in
Project
• static inline void idr_init(struct idr *idr)
- Initialize the IDR
- @idr – idr handle
• static inline void idr_preload(gfp_t gfp_mask)
• - Preload for idr_alloc()
- Preallocate memory to use for the next call to
idr_alloc(). This function returns with preemption
disabled. It will be enabled by idr_preload_end().
- @gfp_mask: allocation mask to use for preloading
• static inline void idr_preload_end(void)
- end preload section started with idr_preload()
- Enable preemption
• int idr_alloc(struct idr *idr, void *ptr, int start, int end,
gfp_t gfp)
- Allocates an unused ID in the range [start, end]. Returns
–ENOSPC if there are no unused IDs in that range.
- @idr: idr handle
- @ptr: pointer to be associated with the new id
- @start: the minimum id (inclusive)
- @end: the maximum id (exclusive)
- @gfp: memory allocation flags
• static inline bool idr_check_preload(const struct idr
*src)
- Check the preload is still sufficient
- @src: IDR to be copied from
- Between the successful allocation of memory and
acquiring the lock that protects @src, the IDR may have
expanded. If this function returns false, more memory
needs to be preallocated.
- Return: true if enough memory remains allocated, false to
retry the preallocation.
• #define idr_for_each_entry(idr, entry, id)
- iterate over an idr's elements of a given type
- @idr: idr handle
- @entry: the type * to use as cursor
- @id: id entry's key
- @entry and @id do not need to be initialized before the
loop, and after normal termination @entry is left with the
value NULL. This is convenient for a "not found" value.
• static inline void *idr_find(const struct idr *idr, int
id)
- return pointer for given id
- @idr: idr handle
- @id: lookup key
- Return the pointer given the id it has been registered
with. A %NULL return indicates that @id is not valid or
you passed %NULL in idr_get_new().
• void idr_destroy(struct idr *idr)
- release all internal memory from an IDR
- @idr: idr handle
- After this function is called, the IDR is empty, and may be
reused or the data structure containing it may be freed.
- A typical clean-up sequence for objects stored in an idr
tree will use idr_for_each() to free all objects, if
necessary, then idr_destroy() to free the memory used to
keep track of those objects.
• static inline void *idr_remove(struct idr *idr, int id)
- Remove specific ID
- @idr - IDR handle
- @id - ID to be remove
• void *idr_replace(struct idr *idr, void *ptr, int id)
- replace pointer for given id
- @idr: idr handle
- @ptr: New pointer to associate with the ID
- @id: Lookup key
- Replace the pointer registered with an ID and return the
old value.
- Returns: 0 on success. %-ENOENT indicates that @id
was not found. %-EINVAL indicates that @id or @ptr
were not valid.
• static inline void *idr_tag_set(struct idr *idr, int id,
unsigned int tag)
- Set a tag on an entry
- @idr: IDR pointer
- @id: ID of entry to tag
- @tag: Tag index to set
- If there is an entry at @id in this IDR, set a tag on it
and return the address of the entry. If @id is outside
the range of the IDR, return NULL.
• static inline bool idr_tag_get(const struct idr *idr,
int id, unsigned int tag)
- Return whether a particular entry has a tag set
- @idr: IDR pointer
- @id: ID of entry to check
- @tag: Tag index to check
- Returns true/false depending whether @tag is set on
this ID.
• static inline void *idr_tag_clear(struct idr *idr, int
id, unsigned int tag)
- Clear a tag on an entry
- @idr: IDR pointer
- @id: ID of entry to tag
- @tag: Tag index to clear
- If there is an entry at @id in this IDR, clear its tag and
return the address of the entry. If @id is outside the
range of the IDR, return NULL.
After implementing IDR – open()
Testing
Testing
- Performance benchmark
- Test cases to check below system call
- open()/close system call behaviour
- dup(), dup2() syscall behaviour
- select() syscall behaviour
- pipe() syscall behaviour
- Open file descriptor limit
- Test case which sets close_on_exec tag
Result
Result
struct / bitmap Size in bytes
struct file_struct 704
struct fdtable 64
struct file pointers 2048
bitmap 96
2912
Before implementing IDR
struct /radix_tree size in bytes
struct files_struct 32
radix_tree node (3 required) 576
1760
After implementing IDR
- Total memory saving is 1152 bytes (~1M)
- It also reduces the size of the tinyconfig build on i386
by 672 bytes of code and 192 bytes of data.
Conclusion
Conclusion
- Implementation of IDR in __alloc_fd() and related code
path saved the memory and slightly improved the
performance.
- With current changes ~1M kernel memory is saved
- fd allocation code (kernel code) size reduced and it is
much readable than earlier
- Wherever in kernel if we need to map number with any
type of pointer then IDR can be best option.
- Custom allocator can be replaced with IDR
Reference
Reference
- https://lwn.net/Articles/103209/
- https://lwn.net/Articles/536293/
- https://lwn.net/Articles/721395/
- https://en.wikipedia.org/wiki/File_descriptor
- Linux Kernel Development - Robert Love
- Understanding Linux Kernel - Daniel Bovet and Marco
Cesati
Thank You !
Questions?
We are Linux Kernel Newbies
Radix Tree, IDR APIs and Their Test Suite

More Related Content

What's hot

Rijpma's Catasto meets SPARQL dhb2017_workshop
Rijpma's Catasto meets SPARQL dhb2017_workshopRijpma's Catasto meets SPARQL dhb2017_workshop
Rijpma's Catasto meets SPARQL dhb2017_workshopRichard Zijdeman
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and TricksErik Hatcher
 
Optimizing queries MySQL
Optimizing queries MySQLOptimizing queries MySQL
Optimizing queries MySQLGeorgi Sotirov
 
Funddamentals of data structures
Funddamentals of data structuresFunddamentals of data structures
Funddamentals of data structuresGlobalidiots
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer OverviewOlav Sandstå
 
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)Hemant Kumar Singh
 
Think Like Spark
Think Like SparkThink Like Spark
Think Like SparkAlpine Data
 
Optimizing MySQL Queries
Optimizing MySQL QueriesOptimizing MySQL Queries
Optimizing MySQL QueriesAchievers Tech
 
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex DatatypesUKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex DatatypesMarco Gralike
 
MySQL: Indexing for Better Performance
MySQL: Indexing for Better PerformanceMySQL: Indexing for Better Performance
MySQL: Indexing for Better Performancejkeriaki
 
New SQL features in latest MySQL releases
New SQL features in latest MySQL releasesNew SQL features in latest MySQL releases
New SQL features in latest MySQL releasesGeorgi Sotirov
 
PuppetConf 2017: Hiera 5: The Full Data Enchilada- Hendrik Lindberg, Puppet
PuppetConf 2017: Hiera 5: The Full Data Enchilada- Hendrik Lindberg, PuppetPuppetConf 2017: Hiera 5: The Full Data Enchilada- Hendrik Lindberg, Puppet
PuppetConf 2017: Hiera 5: The Full Data Enchilada- Hendrik Lindberg, PuppetPuppet
 
Collections Framework Begineers guide 2
Collections Framework Begineers guide 2Collections Framework Begineers guide 2
Collections Framework Begineers guide 2Kenji HASUNUMA
 
1 Installing & getting started with R
1 Installing & getting started with R1 Installing & getting started with R
1 Installing & getting started with Rnaroranisha
 

What's hot (20)

Rijpma's Catasto meets SPARQL dhb2017_workshop
Rijpma's Catasto meets SPARQL dhb2017_workshopRijpma's Catasto meets SPARQL dhb2017_workshop
Rijpma's Catasto meets SPARQL dhb2017_workshop
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
 
Optimizing queries MySQL
Optimizing queries MySQLOptimizing queries MySQL
Optimizing queries MySQL
 
Explain that explain
Explain that explainExplain that explain
Explain that explain
 
Funddamentals of data structures
Funddamentals of data structuresFunddamentals of data structures
Funddamentals of data structures
 
MySQL Optimizer Overview
MySQL Optimizer OverviewMySQL Optimizer Overview
MySQL Optimizer Overview
 
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
 
Think Like Spark
Think Like SparkThink Like Spark
Think Like Spark
 
R Get Started I
R Get Started IR Get Started I
R Get Started I
 
R Get Started II
R Get Started IIR Get Started II
R Get Started II
 
Optimizing MySQL Queries
Optimizing MySQL QueriesOptimizing MySQL Queries
Optimizing MySQL Queries
 
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex DatatypesUKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
UKOUG Tech14 - Using Database In-Memory Column Store with Complex Datatypes
 
MySQL: Indexing for Better Performance
MySQL: Indexing for Better PerformanceMySQL: Indexing for Better Performance
MySQL: Indexing for Better Performance
 
Exalead managing terrabytes
Exalead   managing terrabytesExalead   managing terrabytes
Exalead managing terrabytes
 
R Introduction
R IntroductionR Introduction
R Introduction
 
New SQL features in latest MySQL releases
New SQL features in latest MySQL releasesNew SQL features in latest MySQL releases
New SQL features in latest MySQL releases
 
Java-7: Collections
Java-7: CollectionsJava-7: Collections
Java-7: Collections
 
PuppetConf 2017: Hiera 5: The Full Data Enchilada- Hendrik Lindberg, Puppet
PuppetConf 2017: Hiera 5: The Full Data Enchilada- Hendrik Lindberg, PuppetPuppetConf 2017: Hiera 5: The Full Data Enchilada- Hendrik Lindberg, Puppet
PuppetConf 2017: Hiera 5: The Full Data Enchilada- Hendrik Lindberg, Puppet
 
Collections Framework Begineers guide 2
Collections Framework Begineers guide 2Collections Framework Begineers guide 2
Collections Framework Begineers guide 2
 
1 Installing & getting started with R
1 Installing & getting started with R1 Installing & getting started with R
1 Installing & getting started with R
 

Similar to Radix Tree, IDR APIs and Their Test Suite

Think Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use CaseThink Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use CaseRachel Warren
 
Page Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdfPage Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdfycelgemici1
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2Itamar Haber
 
High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018Prasun Anand
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleMongoDB
 
Dive into spark2
Dive into spark2Dive into spark2
Dive into spark2Gal Marder
 
MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329Douglas Duncan
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)MongoDB
 
Fundamentalsofdatastructures 110501104205-phpapp02
Fundamentalsofdatastructures 110501104205-phpapp02Fundamentalsofdatastructures 110501104205-phpapp02
Fundamentalsofdatastructures 110501104205-phpapp02Getachew Ganfur
 
Scaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersScaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersJonathan Levin
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)Paul Chao
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQLSatoshi Nagayasu
 
Postgresql search demystified
Postgresql search demystifiedPostgresql search demystified
Postgresql search demystifiedjavier ramirez
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4DianaGray10
 
MathWorks Interview Lecture
MathWorks Interview LectureMathWorks Interview Lecture
MathWorks Interview LectureJohn Yates
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemMarco Parenzan
 
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2Antonios Giannopoulos
 
Binary Search Tree
Binary Search TreeBinary Search Tree
Binary Search Treesagar yadav
 

Similar to Radix Tree, IDR APIs and Their Test Suite (20)

Think Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use CaseThink Like Spark: Some Spark Concepts and a Use Case
Think Like Spark: Some Spark Concepts and a Use Case
 
Page Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdfPage Cache in Linux 2.6.pdf
Page Cache in Linux 2.6.pdf
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2
 
High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018
 
stack.pptx
stack.pptxstack.pptx
stack.pptx
 
Indexing Strategies to Help You Scale
Indexing Strategies to Help You ScaleIndexing Strategies to Help You Scale
Indexing Strategies to Help You Scale
 
Dive into spark2
Dive into spark2Dive into spark2
Dive into spark2
 
MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329
 
Fast querying indexing for performance (4)
Fast querying   indexing for performance (4)Fast querying   indexing for performance (4)
Fast querying indexing for performance (4)
 
Fundamentalsofdatastructures 110501104205-phpapp02
Fundamentalsofdatastructures 110501104205-phpapp02Fundamentalsofdatastructures 110501104205-phpapp02
Fundamentalsofdatastructures 110501104205-phpapp02
 
Scaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersScaling MySQL Strategies for Developers
Scaling MySQL Strategies for Developers
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
 
Postgresql search demystified
Postgresql search demystifiedPostgresql search demystified
Postgresql search demystified
 
UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4UiPath Studio Web workshop series - Day 4
UiPath Studio Web workshop series - Day 4
 
MathWorks Interview Lecture
MathWorks Interview LectureMathWorks Interview Lecture
MathWorks Interview Lecture
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft Ecosystem
 
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
New Indexing and Aggregation Pipeline Capabilities in MongoDB 4.2
 
Tthornton code4lib
Tthornton code4libTthornton code4lib
Tthornton code4lib
 
Binary Search Tree
Binary Search TreeBinary Search Tree
Binary Search Tree
 

Recently uploaded

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 

Radix Tree, IDR APIs and Their Test Suite

  • 1. Radix Tree, IDR APIs and their test suite Rehas Sachdeva & Sandhya Bankar
  • 2. Overview • What is a Radix tree? • Applications of radix tree • Kernel radix tree API • Enhancing the test suite
  • 3. What is a Radix tree?
  • 4. Radix tree Space optimized trie • Stores a key to value mapping. • the edges are labelled by a sequence of characters or bits. • Root to leaf path holds the key and the leaf holds the value. • Space optimized. • Fast lookup. https://en.wikipedia.org/wiki/Radix_tree
  • 5. Radix tree applications • General applications –IP routing •hierarchical organization of IP addresses. –Search •inverted indexes for text documents • Kernel specific uses – Page Cache •Check presence in cache, dirty tag or under writeback etc. – As resizeable arrays •drivers, filesystems, interrupt controllers.
  • 7. Node structure Node Info: shift, offset, count, parent pointer, root pointer, tags etc. Array of slots • Each node contains (2^map_shift) pointers in slots array. • Slots point to an item in the leaf node, and next, deeper node, in an internal node. • Depth of node ~ which chunk of bits of key is used to index the slots. #define RADIX_TREE_MAP_SIZE (1UL << RADIX_TREE_MAP_SHIFT) ... struct radix_tree_node { unsigned char shift; /* Bits remaining in each slot */ unsigned char offset; /* Slot offset in parent */ unsigned char count; /* Total entry count */ unsigned char exceptional; /* Exceptional entry count */ ... void __rcu *slots[RADIX_TREE_MAP_SIZE]; unsigned long tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS]; };
  • 8. Initializing a radix tree • #define RADIX_TREE(name, mask) struct radix_tree_root name = RADIX_TREE_INIT(mask) Example: RADIX_TREE(tree, GFP_KERNEL); – initializes a radix tree with the given name. – gfp_mask to tell the code how memory allocations are to be performed. – GFP_ATOMIC for atomic insertions, GFP_KERNEL for kernel-internal allocations and so on.
  • 9. Inserting an entry • A tree of height N can contain any index between 0 and (2^(map_shift*N))-1. • If the new index to be inserted is larger than the current max index, insert new nodes above the current top node to create a deeper tree. • Failure cases: should a memory allocation fail (-ENOMEM) or an entry already exists at the index (-EEXIST).
  • 10. Inserting an entry Consider the following tree as example. Only 1 bit is used to index the slots at each node.
  • 11. Inserting an entry H is inserted, only first 2 bits need to be considered to uniquely lookup for it.
  • 12. Inserting an entry I is inserted. Nodes are created as all 5 bits need to be considered.
  • 13. Inserting an entry • root: radix tree root • index: index key • order: key covers the 2^order indices around index • tem: item to insert • static inline int radix_tree_insert(struct radix_tree_root *root, unsigned long index, void *entry); – For inserting an entry. Wrapper around __radix_tree_insert for 0 order entry. • int __radix_tree_insert(struct radix_tree_root *, unsigned long index, unsigned order, void *item); – For inserting an entry of arbitrary order.
  • 14. Deleting an entry • If deleting an element results in a top node with only one child at offset 0, replace the top node with its only child, creating a shallower tree. Consider the following tree as example.
  • 15. Deleting an entry • If deleting an element results in a top node with only one child at offset 0, replace the top node with its only child, creating a shallower tree. Consider the following tree as example.
  • 16. Deleting an entry • If deleting an element results in a top node with only one child at offset 0, replace the top node with its only child, creating a shallower tree. Consider the following tree as example.
  • 17. Deleting an entry • root: radix tree root • index: index key • void *radix_tree_delete(struct radix_tree_root *root, unsigned long index); • item: expected item • void *radix_tree_delete_item(struct radix_tree_root *root, unsigned long index, void *item); – Delete if the entry at index is expected item. • iter: iterator state • slot: pointer to slot • void radix_tree_iter_delete(struct radix_tree_root *root, struct radix_tree_iter *iter, void __rcu **slot); – Delete the entry at this iterator position
  • 18. Lookup • root: radix tree root • index: index key • void *radix_tree_lookup(const struct radix_tree_root *root, unsigned long index); – looks for key in the tree and returns the associated item (or NULL on failure). • results: where the results of the lookup are placed • first_index: start the lookup from this key • max_items: place up to this many items at *results • unsigned int radix_tree_gang_lookup(const struct radix_tree_root *root, void **results, unsigned long first_index, unsigned int max_items); – perform multiple lookups. • void __rcu **radix_tree_lookup_slot(const struct radix_tree_root *root, unsigned long index); – lookup a slot at index.
  • 19. Iteration • root: radix tree root • index: index key • void *radix_tree_lookup(const struct radix_tree_root *root, unsigned long index); – looks for key in the tree and returns the associated item (or NULL on failure). • results: where the results of the lookup are placed • first_index: start the lookup from this key • max_items: place up to this many items at *results • unsigned int radix_tree_gang_lookup(const struct radix_tree_root *root, void **results, unsigned long first_index, unsigned int max_items); – perform multiple lookups. • void __rcu **radix_tree_lookup_slot(const struct radix_tree_root *root, unsigned long index); – lookup a slot at index.
  • 20. Tags • root: radix tree root • index: index key • void *radix_tree_delete(struct radix_tree_root *root, unsigned long index); • item: expected item • void *radix_tree_delete_item(struct radix_tree_root *root, unsigned long index, void *item); – Delete if the entry at index is expected item. • iter: iterator state • slot: pointer to slot • void radix_tree_iter_delete(struct radix_tree_root *root, struct radix_tree_iter *iter, void __rcu **slot); – Delete the entry at this iterator position
  • 21. Multiorder • root: radix tree root • index: index key • void *radix_tree_delete(struct radix_tree_root *root, unsigned long index); • item: expected item • void *radix_tree_delete_item(struct radix_tree_root *root, unsigned long index, void *item); – Delete if the entry at index is expected item. • iter: iterator state • slot: pointer to slot • void radix_tree_iter_delete(struct radix_tree_root *root, struct radix_tree_iter *iter, void __rcu **slot); – Delete the entry at this iterator position
  • 23. Test Suite • Merged into Linux 4.6. • Location: tools/testing/radix-tree. • Regression tests, functional tests and performance tests. • Short run or long run. • Levels of verbose output.
  • 24. Regression tests • Merged into Linux 4.6. • Location: tools/testing/radix-tree. • Regression tests, functional tests and performance tests. • Short run or long run. • Levels of verbose output.
  • 25. Functional tests • Merged into Linux 4.6. • Location: tools/testing/radix-tree. • Regression tests, functional tests and performance tests. • Short run or long run. • Levels of verbose output.
  • 26. Performance tests • Merged into Linux 4.6. • Location: tools/testing/radix-tree. • Regression tests, functional tests and performance tests. • Short run or long run. • Levels of verbose output.
  • 27. Enhancements as part of Outreachy Project • Adding different levels of verbosity to output of test suite. • #define printv(verbosity_level, fmt, ...) if(test_verbose >= verbosity_level) printf(fmt, ##__VA_ARGS__) – Idea extendible to many areas parts in kernel, for debugging, testing etc. • Config option in makefile to test for various values of map shift. mapshift: @if ! grep -qw $(SHIFT) generated/map-shift.h; then echo "#define RADIX_TREE_MAP_SHIFT $(SHIFT)" > generated/map-shift.h; fi • Config option to build tests for 32 bit or 64 bit machine.
  • 28. Enhancements as part of Outreachy Project • Automate generation of .gcov files to check their test coverage. • Adding new functional tests. – idr_get_next() – ida_simple_get() – ida_simple_remove() – radix_tree_clear_tags() • Adding new performance tests. –For radix tree insertion, deletion, tagging, join and split.
  • 29. Enhancements as part of Outreachy Project • Functional test example void radix_tree_clear_tags_test(void) { ... item_insert(&tree, 0); item_tag_set(&tree, 0, 0); __radix_tree_lookup(&tree, 0, &node, &slot); radix_tree_clear_tags(&tree, node, slot); assert(item_tag_get(&tree, 0, 0) == 0); for (index = 0; index < 1000; index++) { item_insert(&tree, index); item_tag_set(&tree, index, 0); } radix_tree_for_each_slot(slot, &tree, &iter, 0) { radix_tree_clear_tags(&tree, iter.node, slot); assert(item_tag_get(&tree, iter.index, 0) == 0); }
  • 30. Enhancements as part of Outreachy Project • Performance test example static long long __benchmark_split(unsigned long index, int old_order, int new_order) { struct timespec start, finish; long long nsec; ... item_insert_order(&tree, index, old_order); clock_gettime(CLOCK_MONOTONIC, &start); radix_tree_split(&tree, index, new_order); clock_gettime(CLOCK_MONOTONIC, &finish); nsec = (finish.tv_sec - start.tv_sec) * NSEC_PER_SEC + (finish.tv_nsec - start.tv_nsec); ... }
  • 34.
  • 35.
  • 38. • Linux Kernel Intern through Outreachy in The Linux Foundation with the support of mentors Rik van Riel and Matthew Wilcox. • Master of Engineering in Computer networks • Bachelor of Engineering in Electronics and communication.
  • 39. • Goal • IDR • Allocate and manage file descriptor using IDR • IDR API used in project • Testing • Result • Conclusion • Reference Overview
  • 40. • Linux kernel has lots of special allocators • However, there now is an IDR library that can do allocation of numbers for us • Simplify the kernel by replacing custom allocators with common allocation code Goal of the Project
  • 41. IDR - IDR is a type of radix tree that maps integer IDs with specific pointer values. - Originally written for POSIX timer system call implementations. It generates the ID that can handle a specific timer object It is now widely used in various device drivers. - IDR takes a given pointer and creates the corresponding integer ID. With that ID, you can quickly find the original pointer.
  • 42. Allocate and manage file descriptor using IDR
  • 43. About project - Implement IDR in file descriptor allocation code path - Replace custom allocator with IDR - Remove struct fdtable - Convert select() to implement idr_get_tag_batch() - Replace close_on_exec bitmap with an IDR tag - Use idr_tag_set() and idr_tag_get() for close_on_exec operation. - Rewrite close_files() - Use idr_tag_get in fd_is_open() - Remove full_fds_bits, open_fds bitmaps
  • 44. Cont… - Replace array of file pointer with IDR - Remove next_fd - Memory Saving - Performance improvement
  • 45. File Descriptor - File descriptor is used to access a file or other I/O resources (e. g pipe and socket) - A file descriptor is a non-negative integer, generally represented in the C programming language as the type int (negative values being reserved to indicate "no value" or an error condition).
  • 46. Cont... - Each Linux should expect to have three standard POSIX file descriptors, corresponding to the three standard streams - stdin - stdout - stderr
  • 47. Operations on file descriptors - open() - open a file - creat() - create a new file / rewrite an existing one - pipe() - creates a pipe - read() - read from a file descriptor - write() - write to a file descriptor - close() - close a file descriptor - lseek() - reposition read/write file offset - select() - synchronous I/O multiplexing - socket() - create an endpoint for communication - accept() - accept a connection on a socket - dup(), dup2() - duplicate an open file descriptor
  • 49. IDR API used in Project
  • 50. • static inline void idr_init(struct idr *idr) - Initialize the IDR - @idr – idr handle
  • 51. • static inline void idr_preload(gfp_t gfp_mask) • - Preload for idr_alloc() - Preallocate memory to use for the next call to idr_alloc(). This function returns with preemption disabled. It will be enabled by idr_preload_end(). - @gfp_mask: allocation mask to use for preloading
  • 52. • static inline void idr_preload_end(void) - end preload section started with idr_preload() - Enable preemption
  • 53. • int idr_alloc(struct idr *idr, void *ptr, int start, int end, gfp_t gfp) - Allocates an unused ID in the range [start, end]. Returns –ENOSPC if there are no unused IDs in that range. - @idr: idr handle - @ptr: pointer to be associated with the new id - @start: the minimum id (inclusive) - @end: the maximum id (exclusive) - @gfp: memory allocation flags
  • 54. • static inline bool idr_check_preload(const struct idr *src) - Check the preload is still sufficient - @src: IDR to be copied from - Between the successful allocation of memory and acquiring the lock that protects @src, the IDR may have expanded. If this function returns false, more memory needs to be preallocated. - Return: true if enough memory remains allocated, false to retry the preallocation.
  • 55. • #define idr_for_each_entry(idr, entry, id) - iterate over an idr's elements of a given type - @idr: idr handle - @entry: the type * to use as cursor - @id: id entry's key - @entry and @id do not need to be initialized before the loop, and after normal termination @entry is left with the value NULL. This is convenient for a "not found" value.
  • 56. • static inline void *idr_find(const struct idr *idr, int id) - return pointer for given id - @idr: idr handle - @id: lookup key - Return the pointer given the id it has been registered with. A %NULL return indicates that @id is not valid or you passed %NULL in idr_get_new().
  • 57. • void idr_destroy(struct idr *idr) - release all internal memory from an IDR - @idr: idr handle - After this function is called, the IDR is empty, and may be reused or the data structure containing it may be freed. - A typical clean-up sequence for objects stored in an idr tree will use idr_for_each() to free all objects, if necessary, then idr_destroy() to free the memory used to keep track of those objects.
  • 58. • static inline void *idr_remove(struct idr *idr, int id) - Remove specific ID - @idr - IDR handle - @id - ID to be remove
  • 59. • void *idr_replace(struct idr *idr, void *ptr, int id) - replace pointer for given id - @idr: idr handle - @ptr: New pointer to associate with the ID - @id: Lookup key - Replace the pointer registered with an ID and return the old value. - Returns: 0 on success. %-ENOENT indicates that @id was not found. %-EINVAL indicates that @id or @ptr were not valid.
  • 60. • static inline void *idr_tag_set(struct idr *idr, int id, unsigned int tag) - Set a tag on an entry - @idr: IDR pointer - @id: ID of entry to tag - @tag: Tag index to set - If there is an entry at @id in this IDR, set a tag on it and return the address of the entry. If @id is outside the range of the IDR, return NULL.
  • 61. • static inline bool idr_tag_get(const struct idr *idr, int id, unsigned int tag) - Return whether a particular entry has a tag set - @idr: IDR pointer - @id: ID of entry to check - @tag: Tag index to check - Returns true/false depending whether @tag is set on this ID.
  • 62. • static inline void *idr_tag_clear(struct idr *idr, int id, unsigned int tag) - Clear a tag on an entry - @idr: IDR pointer - @id: ID of entry to tag - @tag: Tag index to clear - If there is an entry at @id in this IDR, clear its tag and return the address of the entry. If @id is outside the range of the IDR, return NULL.
  • 65. Testing - Performance benchmark - Test cases to check below system call - open()/close system call behaviour - dup(), dup2() syscall behaviour - select() syscall behaviour - pipe() syscall behaviour - Open file descriptor limit - Test case which sets close_on_exec tag
  • 67. Result struct / bitmap Size in bytes struct file_struct 704 struct fdtable 64 struct file pointers 2048 bitmap 96 2912 Before implementing IDR
  • 68. struct /radix_tree size in bytes struct files_struct 32 radix_tree node (3 required) 576 1760 After implementing IDR
  • 69. - Total memory saving is 1152 bytes (~1M) - It also reduces the size of the tinyconfig build on i386 by 672 bytes of code and 192 bytes of data.
  • 71. Conclusion - Implementation of IDR in __alloc_fd() and related code path saved the memory and slightly improved the performance. - With current changes ~1M kernel memory is saved - fd allocation code (kernel code) size reduced and it is much readable than earlier - Wherever in kernel if we need to map number with any type of pointer then IDR can be best option. - Custom allocator can be replaced with IDR
  • 73. Reference - https://lwn.net/Articles/103209/ - https://lwn.net/Articles/536293/ - https://lwn.net/Articles/721395/ - https://en.wikipedia.org/wiki/File_descriptor - Linux Kernel Development - Robert Love - Understanding Linux Kernel - Daniel Bovet and Marco Cesati
  • 75. Questions? We are Linux Kernel Newbies