Radix Tree, IDR APIs and Their Test Suite

Radix Tree, IDR APIs
and their test suite
Rehas Sachdeva & Sandhya Bankar

Overview
• What is a Radix tree?
• Applications of radix tree
• Kernel radix tree API
• Enhancing the test suite

Radix tree
Space optimized trie
• Stores a key to value
mapping.
• the edges are labelled by a
sequence of characters or
bits.
• Root to leaf path holds the key
and the leaf holds the value.
• Space optimized.
• Fast lookup.
https://en.wikipedia.org/wiki/Radix_tree

Radix tree applications
• General applications
–IP routing
•hierarchical organization of IP addresses.
–Search
•inverted indexes for text documents
• Kernel specific uses
– Page Cache
•Check presence in cache, dirty tag or under writeback etc.
– As resizeable arrays
•drivers, filesystems, interrupt controllers.

Node structure
Node Info: shift, offset,
count, parent pointer,
root pointer, tags etc.
Array of slots
• Each node contains (2^map_shift) pointers in slots array.
• Slots point to an item in the leaf node, and next, deeper node, in an internal node.
• Depth of node ~ which chunk of bits of key is used to index the slots.
#define RADIX_TREE_MAP_SIZE (1UL << RADIX_TREE_MAP_SHIFT)
...
struct radix_tree_node {
unsigned char shift; /* Bits remaining in each slot */
unsigned char offset; /* Slot offset in parent */
unsigned char count; /* Total entry count */
unsigned char exceptional; /* Exceptional entry count */
...
void __rcu *slots[RADIX_TREE_MAP_SIZE];
unsigned long tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS];
};

Initializing a radix tree
• #define RADIX_TREE(name, mask)
struct radix_tree_root name = RADIX_TREE_INIT(mask)
Example: RADIX_TREE(tree, GFP_KERNEL);
– initializes a radix tree with the given name.
– gfp_mask to tell the code how memory allocations are to be performed.
– GFP_ATOMIC for atomic insertions, GFP_KERNEL for kernel-internal
allocations and so on.

Inserting an entry
• A tree of height N can contain any index between 0 and (2^(map_shift*N))-1.
• If the new index to be inserted is larger than the current max index, insert new nodes
above the current top node to create a deeper tree.
• Failure cases: should a memory allocation fail (-ENOMEM) or an entry already exists
at the index (-EEXIST).

Inserting an entry
Consider the following tree as example. Only 1 bit is used to index the slots at each node.

Inserting an entry
H is inserted, only first 2 bits need to be considered to uniquely lookup for it.

Inserting an entry
I is inserted. Nodes are created as all 5 bits need to be considered.

Inserting an entry
• root: radix tree root
• index: index key
• order: key covers the 2^order indices around index
• tem: item to insert
• static inline int radix_tree_insert(struct radix_tree_root *root,
unsigned long index, void *entry);
– For inserting an entry. Wrapper around __radix_tree_insert for 0 order entry.
• int __radix_tree_insert(struct radix_tree_root *, unsigned long index,
unsigned order, void *item);
– For inserting an entry of arbitrary order.

Deleting an entry
• If deleting an element results in a top node with only one child at offset 0, replace the top
node with its only child, creating a shallower tree. Consider the following tree as example.

Deleting an entry
• void *radix_tree_delete(struct radix_tree_root *root, unsigned long index);
• item: expected item
• void *radix_tree_delete_item(struct radix_tree_root *root, unsigned long index, void *item);
– Delete if the entry at index is expected item.
• iter: iterator state
• slot: pointer to slot
• void radix_tree_iter_delete(struct radix_tree_root *root,
struct radix_tree_iter *iter, void __rcu **slot);
– Delete the entry at this iterator position

Lookup
• void *radix_tree_lookup(const struct radix_tree_root *root, unsigned long index);
– looks for key in the tree and returns the associated item (or NULL on failure).
• results: where the results of the lookup are placed
• first_index: start the lookup from this key
• max_items: place up to this many items at *results
• unsigned int radix_tree_gang_lookup(const struct radix_tree_root *root, void **results,
unsigned long first_index, unsigned int max_items);
– perform multiple lookups.
• void __rcu **radix_tree_lookup_slot(const struct radix_tree_root *root,
unsigned long index);
– lookup a slot at index.

Iteration
• void *radix_tree_lookup(const struct radix_tree_root *root, unsigned long index);
– looks for key in the tree and returns the associated item (or NULL on failure).
• results: where the results of the lookup are placed
• first_index: start the lookup from this key
• max_items: place up to this many items at *results
• unsigned int radix_tree_gang_lookup(const struct radix_tree_root *root, void **results,
unsigned long first_index, unsigned int max_items);
– perform multiple lookups.
• void __rcu **radix_tree_lookup_slot(const struct radix_tree_root *root,
unsigned long index);
– lookup a slot at index.

Tags

Multiorder

Test Suite
• Merged into Linux 4.6.
• Location: tools/testing/radix-tree.
• Regression tests, functional tests and performance tests.
• Short run or long run.
• Levels of verbose output.

Regression tests

Functional tests

Performance tests

Enhancements as part of Outreachy Project
• Adding different levels of verbosity to output of test suite.
• #define printv(verbosity_level, fmt, ...)
if(test_verbose >= verbosity_level)
printf(fmt, ##__VA_ARGS__)
– Idea extendible to many areas parts in kernel, for debugging, testing etc.
• Config option in makefile to test for various values of map shift.
mapshift:
@if ! grep -qw $(SHIFT) generated/map-shift.h; then

echo "#define RADIX_TREE_MAP_SHIFT $(SHIFT)" >
generated/map-shift.h;

fi
• Config option to build tests for 32 bit or 64 bit machine.

• Automate generation of .gcov files to check their test coverage.
• Adding new functional tests.
– idr_get_next()
– ida_simple_get()
– ida_simple_remove()
– radix_tree_clear_tags()
• Adding new performance tests.
–For radix tree insertion, deletion, tagging, join and split.

• Functional test example void radix_tree_clear_tags_test(void) {
...
item_insert(&tree, 0);
item_tag_set(&tree, 0, 0);
__radix_tree_lookup(&tree, 0, &node, &slot);
radix_tree_clear_tags(&tree, node, slot);
assert(item_tag_get(&tree, 0, 0) == 0);
for (index = 0; index < 1000; index++) {
item_insert(&tree, index);
item_tag_set(&tree, index, 0);
}
radix_tree_for_each_slot(slot, &tree, &iter, 0) {
radix_tree_clear_tags(&tree, iter.node,
slot);
assert(item_tag_get(&tree, iter.index, 0)
== 0);
}

• Performance test example
static long long __benchmark_split(unsigned long index,
int old_order, int new_order)
{
struct timespec start, finish;
long long nsec;
...
item_insert_order(&tree, index, old_order);
clock_gettime(CLOCK_MONOTONIC, &start);
radix_tree_split(&tree, index, new_order);
clock_gettime(CLOCK_MONOTONIC, &finish);
nsec = (finish.tv_sec - start.tv_sec) * NSEC_PER_SEC +
(finish.tv_nsec - start.tv_nsec);
...
}

References
• https://lwn.net/Articles/175432/
• http://events.linuxfoundation.org/sites/events/files/slides/Linux
ConNA2016%20-%20Radix%20Tree.pdf
• Paul McKenney on RCU: https://vimeo.com/113961292
• http://ppt-online.org/15597

Implement IDR in
__alloc_fd()
Sandhya Bankar

• Linux Kernel Intern
through Outreachy in The
Linux Foundation with the
support of mentors Rik
van Riel and Matthew
Wilcox.
• Master of Engineering in
Computer networks
• Bachelor of Engineering in
Electronics and
communication.

• Goal
• IDR
• Allocate and manage
file descriptor using IDR
• IDR API used in project
• Testing
• Result
• Conclusion
• Reference
Overview

• Linux kernel has lots of special allocators
• However, there now is an IDR library that can
do allocation of numbers for us
• Simplify the kernel by replacing custom
allocators with common allocation code
Goal of the Project

IDR
- IDR is a type of radix tree that maps integer IDs with specific
pointer values.
- Originally written for POSIX timer system call implementations. It
generates the ID that can handle a specific timer object It is now
widely used in various device drivers.
- IDR takes a given pointer and creates the corresponding integer
ID. With that ID, you can quickly find the original pointer.

Allocate and manage file
descriptor using IDR

About project
- Implement IDR in file descriptor allocation code path
- Replace custom allocator with IDR
- Remove struct fdtable
- Convert select() to implement idr_get_tag_batch()
- Replace close_on_exec bitmap with an IDR tag
- Use idr_tag_set() and idr_tag_get() for close_on_exec
operation.
- Rewrite close_files()
- Use idr_tag_get in fd_is_open()
- Remove full_fds_bits, open_fds bitmaps

Cont…
- Replace array of file pointer with IDR
- Remove next_fd
- Memory Saving
- Performance improvement

File Descriptor
- File descriptor is used to access a file or other I/O
resources (e. g pipe and socket)
- A file descriptor is a non-negative integer, generally
represented in the C programming language as the
type int (negative values being reserved to indicate "no
value" or an error condition).

Cont...
- Each Linux should expect to have three standard
POSIX file descriptors, corresponding to the
three standard streams
- stdin
- stdout
- stderr

Operations on file descriptors
- open() - open a file
- creat() - create a new file / rewrite an existing one
- pipe() - creates a pipe
- read() - read from a file descriptor
- write() - write to a file descriptor
- close() - close a file descriptor
- lseek() - reposition read/write file offset
- select() - synchronous I/O multiplexing
- socket() - create an endpoint for communication
- accept() - accept a connection on a socket
- dup(), dup2() - duplicate an open file descriptor

Before IDR implementation – open()

• static inline void idr_init(struct idr *idr)
- Initialize the IDR
- @idr – idr handle

• static inline void idr_preload(gfp_t gfp_mask)
• - Preload for idr_alloc()
- Preallocate memory to use for the next call to
idr_alloc(). This function returns with preemption
disabled. It will be enabled by idr_preload_end().
- @gfp_mask: allocation mask to use for preloading

• static inline void idr_preload_end(void)
- end preload section started with idr_preload()
- Enable preemption

• int idr_alloc(struct idr *idr, void *ptr, int start, int end,
gfp_t gfp)
- Allocates an unused ID in the range [start, end]. Returns
–ENOSPC if there are no unused IDs in that range.
- @idr: idr handle
- @ptr: pointer to be associated with the new id
- @start: the minimum id (inclusive)
- @end: the maximum id (exclusive)
- @gfp: memory allocation flags

• static inline bool idr_check_preload(const struct idr
*src)
- Check the preload is still sufficient
- @src: IDR to be copied from
- Between the successful allocation of memory and
acquiring the lock that protects @src, the IDR may have
expanded. If this function returns false, more memory
needs to be preallocated.
- Return: true if enough memory remains allocated, false to
retry the preallocation.

• #define idr_for_each_entry(idr, entry, id)
- iterate over an idr's elements of a given type
- @idr: idr handle
- @entry: the type * to use as cursor
- @id: id entry's key
- @entry and @id do not need to be initialized before the
loop, and after normal termination @entry is left with the
value NULL. This is convenient for a "not found" value.

• static inline void *idr_find(const struct idr *idr, int
id)
- return pointer for given id
- @idr: idr handle
- @id: lookup key
- Return the pointer given the id it has been registered
with. A %NULL return indicates that @id is not valid or
you passed %NULL in idr_get_new().

• void idr_destroy(struct idr *idr)
- release all internal memory from an IDR
- @idr: idr handle
- After this function is called, the IDR is empty, and may be
reused or the data structure containing it may be freed.
- A typical clean-up sequence for objects stored in an idr
tree will use idr_for_each() to free all objects, if
necessary, then idr_destroy() to free the memory used to
keep track of those objects.

• static inline void *idr_remove(struct idr *idr, int id)
- Remove specific ID
- @idr - IDR handle
- @id - ID to be remove

• void *idr_replace(struct idr *idr, void *ptr, int id)
- replace pointer for given id
- @idr: idr handle
- @ptr: New pointer to associate with the ID
- @id: Lookup key
- Replace the pointer registered with an ID and return the
old value.
- Returns: 0 on success. %-ENOENT indicates that @id
was not found. %-EINVAL indicates that @id or @ptr
were not valid.

• static inline void *idr_tag_set(struct idr *idr, int id,
unsigned int tag)
- Set a tag on an entry
- @idr: IDR pointer
- @id: ID of entry to tag
- @tag: Tag index to set
- If there is an entry at @id in this IDR, set a tag on it
and return the address of the entry. If @id is outside
the range of the IDR, return NULL.

• static inline bool idr_tag_get(const struct idr *idr,
int id, unsigned int tag)
- Return whether a particular entry has a tag set
- @idr: IDR pointer
- @id: ID of entry to check
- @tag: Tag index to check
- Returns true/false depending whether @tag is set on
this ID.

• static inline void *idr_tag_clear(struct idr *idr, int
id, unsigned int tag)
- Clear a tag on an entry
- @idr: IDR pointer
- @id: ID of entry to tag
- @tag: Tag index to clear
- If there is an entry at @id in this IDR, clear its tag and
return the address of the entry. If @id is outside the
range of the IDR, return NULL.

After implementing IDR – open()

Testing
- Performance benchmark
- Test cases to check below system call
- open()/close system call behaviour
- dup(), dup2() syscall behaviour
- select() syscall behaviour
- pipe() syscall behaviour
- Open file descriptor limit
- Test case which sets close_on_exec tag

Result
struct / bitmap Size in bytes
struct file_struct 704
struct fdtable 64
struct file pointers 2048
bitmap 96
2912
Before implementing IDR

struct /radix_tree size in bytes
struct files_struct 32
radix_tree node (3 required) 576
1760
After implementing IDR

- Total memory saving is 1152 bytes (~1M)
- It also reduces the size of the tinyconfig build on i386
by 672 bytes of code and 192 bytes of data.

Conclusion
- Implementation of IDR in __alloc_fd() and related code
path saved the memory and slightly improved the
performance.
- With current changes ~1M kernel memory is saved
- fd allocation code (kernel code) size reduced and it is
much readable than earlier
- Wherever in kernel if we need to map number with any
type of pointer then IDR can be best option.
- Custom allocator can be replaced with IDR

Reference
- https://lwn.net/Articles/103209/
- https://en.wikipedia.org/wiki/File_descriptor
- Linux Kernel Development - Robert Love
- Understanding Linux Kernel - Daniel Bovet and Marco
Cesati

Questions?
We are Linux Kernel Newbies

Radix Tree, IDR APIs and Their Test Suite

Radix Tree, IDR APIs and Their Test Suite

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Radix Tree, IDR APIs and Their Test Suite

Similar to Radix Tree, IDR APIs and Their Test Suite (20)

Recently uploaded

Recently uploaded (20)

Radix Tree, IDR APIs and Their Test Suite