4. Radix tree
Space optimized trie
• Stores a key to value
mapping.
• the edges are labelled by a
sequence of characters or
bits.
• Root to leaf path holds the key
and the leaf holds the value.
• Space optimized.
• Fast lookup.
https://en.wikipedia.org/wiki/Radix_tree
5. Radix tree applications
• General applications
–IP routing
•hierarchical organization of IP addresses.
–Search
•inverted indexes for text documents
• Kernel specific uses
– Page Cache
•Check presence in cache, dirty tag or under writeback etc.
– As resizeable arrays
•drivers, filesystems, interrupt controllers.
7. Node structure
Node Info: shift, offset,
count, parent pointer,
root pointer, tags etc.
Array of slots
• Each node contains (2^map_shift) pointers in slots array.
• Slots point to an item in the leaf node, and next, deeper node, in an internal node.
• Depth of node ~ which chunk of bits of key is used to index the slots.
#define RADIX_TREE_MAP_SIZE (1UL << RADIX_TREE_MAP_SHIFT)
...
struct radix_tree_node {
unsigned char shift; /* Bits remaining in each slot */
unsigned char offset; /* Slot offset in parent */
unsigned char count; /* Total entry count */
unsigned char exceptional; /* Exceptional entry count */
...
void __rcu *slots[RADIX_TREE_MAP_SIZE];
unsigned long tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS];
};
8. Initializing a radix tree
• #define RADIX_TREE(name, mask)
struct radix_tree_root name = RADIX_TREE_INIT(mask)
Example: RADIX_TREE(tree, GFP_KERNEL);
– initializes a radix tree with the given name.
– gfp_mask to tell the code how memory allocations are to be performed.
– GFP_ATOMIC for atomic insertions, GFP_KERNEL for kernel-internal
allocations and so on.
9. Inserting an entry
• A tree of height N can contain any index between 0 and (2^(map_shift*N))-1.
• If the new index to be inserted is larger than the current max index, insert new nodes
above the current top node to create a deeper tree.
• Failure cases: should a memory allocation fail (-ENOMEM) or an entry already exists
at the index (-EEXIST).
10. Inserting an entry
Consider the following tree as example. Only 1 bit is used to index the slots at each node.
11. Inserting an entry
H is inserted, only first 2 bits need to be considered to uniquely lookup for it.
12. Inserting an entry
I is inserted. Nodes are created as all 5 bits need to be considered.
13. Inserting an entry
• root: radix tree root
• index: index key
• order: key covers the 2^order indices around index
• tem: item to insert
• static inline int radix_tree_insert(struct radix_tree_root *root,
unsigned long index, void *entry);
– For inserting an entry. Wrapper around __radix_tree_insert for 0 order entry.
• int __radix_tree_insert(struct radix_tree_root *, unsigned long index,
unsigned order, void *item);
– For inserting an entry of arbitrary order.
14. Deleting an entry
• If deleting an element results in a top node with only one child at offset 0, replace the top
node with its only child, creating a shallower tree. Consider the following tree as example.
15. Deleting an entry
• If deleting an element results in a top node with only one child at offset 0, replace the top
node with its only child, creating a shallower tree. Consider the following tree as example.
16. Deleting an entry
• If deleting an element results in a top node with only one child at offset 0, replace the top
node with its only child, creating a shallower tree. Consider the following tree as example.
17. Deleting an entry
• root: radix tree root
• index: index key
• void *radix_tree_delete(struct radix_tree_root *root, unsigned long index);
• item: expected item
• void *radix_tree_delete_item(struct radix_tree_root *root, unsigned long index, void *item);
– Delete if the entry at index is expected item.
• iter: iterator state
• slot: pointer to slot
• void radix_tree_iter_delete(struct radix_tree_root *root,
struct radix_tree_iter *iter, void __rcu **slot);
– Delete the entry at this iterator position
18. Lookup
• root: radix tree root
• index: index key
• void *radix_tree_lookup(const struct radix_tree_root *root, unsigned long index);
– looks for key in the tree and returns the associated item (or NULL on failure).
• results: where the results of the lookup are placed
• first_index: start the lookup from this key
• max_items: place up to this many items at *results
• unsigned int radix_tree_gang_lookup(const struct radix_tree_root *root, void **results,
unsigned long first_index, unsigned int max_items);
– perform multiple lookups.
• void __rcu **radix_tree_lookup_slot(const struct radix_tree_root *root,
unsigned long index);
– lookup a slot at index.
19. Iteration
• root: radix tree root
• index: index key
• void *radix_tree_lookup(const struct radix_tree_root *root, unsigned long index);
– looks for key in the tree and returns the associated item (or NULL on failure).
• results: where the results of the lookup are placed
• first_index: start the lookup from this key
• max_items: place up to this many items at *results
• unsigned int radix_tree_gang_lookup(const struct radix_tree_root *root, void **results,
unsigned long first_index, unsigned int max_items);
– perform multiple lookups.
• void __rcu **radix_tree_lookup_slot(const struct radix_tree_root *root,
unsigned long index);
– lookup a slot at index.
20. Tags
• root: radix tree root
• index: index key
• void *radix_tree_delete(struct radix_tree_root *root, unsigned long index);
• item: expected item
• void *radix_tree_delete_item(struct radix_tree_root *root, unsigned long index, void *item);
– Delete if the entry at index is expected item.
• iter: iterator state
• slot: pointer to slot
• void radix_tree_iter_delete(struct radix_tree_root *root,
struct radix_tree_iter *iter, void __rcu **slot);
– Delete the entry at this iterator position
21. Multiorder
• root: radix tree root
• index: index key
• void *radix_tree_delete(struct radix_tree_root *root, unsigned long index);
• item: expected item
• void *radix_tree_delete_item(struct radix_tree_root *root, unsigned long index, void *item);
– Delete if the entry at index is expected item.
• iter: iterator state
• slot: pointer to slot
• void radix_tree_iter_delete(struct radix_tree_root *root,
struct radix_tree_iter *iter, void __rcu **slot);
– Delete the entry at this iterator position
23. Test Suite
• Merged into Linux 4.6.
• Location: tools/testing/radix-tree.
• Regression tests, functional tests and performance tests.
• Short run or long run.
• Levels of verbose output.
24. Regression tests
• Merged into Linux 4.6.
• Location: tools/testing/radix-tree.
• Regression tests, functional tests and performance tests.
• Short run or long run.
• Levels of verbose output.
25. Functional tests
• Merged into Linux 4.6.
• Location: tools/testing/radix-tree.
• Regression tests, functional tests and performance tests.
• Short run or long run.
• Levels of verbose output.
26. Performance tests
• Merged into Linux 4.6.
• Location: tools/testing/radix-tree.
• Regression tests, functional tests and performance tests.
• Short run or long run.
• Levels of verbose output.
27. Enhancements as part of Outreachy Project
• Adding different levels of verbosity to output of test suite.
• #define printv(verbosity_level, fmt, ...)
if(test_verbose >= verbosity_level)
printf(fmt, ##__VA_ARGS__)
– Idea extendible to many areas parts in kernel, for debugging, testing etc.
• Config option in makefile to test for various values of map shift.
mapshift:
@if ! grep -qw $(SHIFT) generated/map-shift.h; then
echo "#define RADIX_TREE_MAP_SHIFT $(SHIFT)" >
generated/map-shift.h;
fi
• Config option to build tests for 32 bit or 64 bit machine.
28. Enhancements as part of Outreachy Project
• Automate generation of .gcov files to check their test coverage.
• Adding new functional tests.
– idr_get_next()
– ida_simple_get()
– ida_simple_remove()
– radix_tree_clear_tags()
• Adding new performance tests.
–For radix tree insertion, deletion, tagging, join and split.
29. Enhancements as part of Outreachy Project
• Functional test example void radix_tree_clear_tags_test(void) {
...
item_insert(&tree, 0);
item_tag_set(&tree, 0, 0);
__radix_tree_lookup(&tree, 0, &node, &slot);
radix_tree_clear_tags(&tree, node, slot);
assert(item_tag_get(&tree, 0, 0) == 0);
for (index = 0; index < 1000; index++) {
item_insert(&tree, index);
item_tag_set(&tree, index, 0);
}
radix_tree_for_each_slot(slot, &tree, &iter, 0) {
radix_tree_clear_tags(&tree, iter.node,
slot);
assert(item_tag_get(&tree, iter.index, 0)
== 0);
}
30. Enhancements as part of Outreachy Project
• Performance test example
static long long __benchmark_split(unsigned long index,
int old_order, int new_order)
{
struct timespec start, finish;
long long nsec;
...
item_insert_order(&tree, index, old_order);
clock_gettime(CLOCK_MONOTONIC, &start);
radix_tree_split(&tree, index, new_order);
clock_gettime(CLOCK_MONOTONIC, &finish);
nsec = (finish.tv_sec - start.tv_sec) * NSEC_PER_SEC +
(finish.tv_nsec - start.tv_nsec);
...
}
38. • Linux Kernel Intern
through Outreachy in The
Linux Foundation with the
support of mentors Rik
van Riel and Matthew
Wilcox.
• Master of Engineering in
Computer networks
• Bachelor of Engineering in
Electronics and
communication.
39. • Goal
• IDR
• Allocate and manage
file descriptor using IDR
• IDR API used in project
• Testing
• Result
• Conclusion
• Reference
Overview
40. • Linux kernel has lots of special allocators
• However, there now is an IDR library that can
do allocation of numbers for us
• Simplify the kernel by replacing custom
allocators with common allocation code
Goal of the Project
41. IDR
- IDR is a type of radix tree that maps integer IDs with specific
pointer values.
- Originally written for POSIX timer system call implementations. It
generates the ID that can handle a specific timer object It is now
widely used in various device drivers.
- IDR takes a given pointer and creates the corresponding integer
ID. With that ID, you can quickly find the original pointer.
43. About project
- Implement IDR in file descriptor allocation code path
- Replace custom allocator with IDR
- Remove struct fdtable
- Convert select() to implement idr_get_tag_batch()
- Replace close_on_exec bitmap with an IDR tag
- Use idr_tag_set() and idr_tag_get() for close_on_exec
operation.
- Rewrite close_files()
- Use idr_tag_get in fd_is_open()
- Remove full_fds_bits, open_fds bitmaps
44. Cont…
- Replace array of file pointer with IDR
- Remove next_fd
- Memory Saving
- Performance improvement
45. File Descriptor
- File descriptor is used to access a file or other I/O
resources (e. g pipe and socket)
- A file descriptor is a non-negative integer, generally
represented in the C programming language as the
type int (negative values being reserved to indicate "no
value" or an error condition).
46. Cont...
- Each Linux should expect to have three standard
POSIX file descriptors, corresponding to the
three standard streams
- stdin
- stdout
- stderr
47. Operations on file descriptors
- open() - open a file
- creat() - create a new file / rewrite an existing one
- pipe() - creates a pipe
- read() - read from a file descriptor
- write() - write to a file descriptor
- close() - close a file descriptor
- lseek() - reposition read/write file offset
- select() - synchronous I/O multiplexing
- socket() - create an endpoint for communication
- accept() - accept a connection on a socket
- dup(), dup2() - duplicate an open file descriptor
51. • static inline void idr_preload(gfp_t gfp_mask)
• - Preload for idr_alloc()
- Preallocate memory to use for the next call to
idr_alloc(). This function returns with preemption
disabled. It will be enabled by idr_preload_end().
- @gfp_mask: allocation mask to use for preloading
52. • static inline void idr_preload_end(void)
- end preload section started with idr_preload()
- Enable preemption
53. • int idr_alloc(struct idr *idr, void *ptr, int start, int end,
gfp_t gfp)
- Allocates an unused ID in the range [start, end]. Returns
–ENOSPC if there are no unused IDs in that range.
- @idr: idr handle
- @ptr: pointer to be associated with the new id
- @start: the minimum id (inclusive)
- @end: the maximum id (exclusive)
- @gfp: memory allocation flags
54. • static inline bool idr_check_preload(const struct idr
*src)
- Check the preload is still sufficient
- @src: IDR to be copied from
- Between the successful allocation of memory and
acquiring the lock that protects @src, the IDR may have
expanded. If this function returns false, more memory
needs to be preallocated.
- Return: true if enough memory remains allocated, false to
retry the preallocation.
55. • #define idr_for_each_entry(idr, entry, id)
- iterate over an idr's elements of a given type
- @idr: idr handle
- @entry: the type * to use as cursor
- @id: id entry's key
- @entry and @id do not need to be initialized before the
loop, and after normal termination @entry is left with the
value NULL. This is convenient for a "not found" value.
56. • static inline void *idr_find(const struct idr *idr, int
id)
- return pointer for given id
- @idr: idr handle
- @id: lookup key
- Return the pointer given the id it has been registered
with. A %NULL return indicates that @id is not valid or
you passed %NULL in idr_get_new().
57. • void idr_destroy(struct idr *idr)
- release all internal memory from an IDR
- @idr: idr handle
- After this function is called, the IDR is empty, and may be
reused or the data structure containing it may be freed.
- A typical clean-up sequence for objects stored in an idr
tree will use idr_for_each() to free all objects, if
necessary, then idr_destroy() to free the memory used to
keep track of those objects.
58. • static inline void *idr_remove(struct idr *idr, int id)
- Remove specific ID
- @idr - IDR handle
- @id - ID to be remove
59. • void *idr_replace(struct idr *idr, void *ptr, int id)
- replace pointer for given id
- @idr: idr handle
- @ptr: New pointer to associate with the ID
- @id: Lookup key
- Replace the pointer registered with an ID and return the
old value.
- Returns: 0 on success. %-ENOENT indicates that @id
was not found. %-EINVAL indicates that @id or @ptr
were not valid.
60. • static inline void *idr_tag_set(struct idr *idr, int id,
unsigned int tag)
- Set a tag on an entry
- @idr: IDR pointer
- @id: ID of entry to tag
- @tag: Tag index to set
- If there is an entry at @id in this IDR, set a tag on it
and return the address of the entry. If @id is outside
the range of the IDR, return NULL.
61. • static inline bool idr_tag_get(const struct idr *idr,
int id, unsigned int tag)
- Return whether a particular entry has a tag set
- @idr: IDR pointer
- @id: ID of entry to check
- @tag: Tag index to check
- Returns true/false depending whether @tag is set on
this ID.
62. • static inline void *idr_tag_clear(struct idr *idr, int
id, unsigned int tag)
- Clear a tag on an entry
- @idr: IDR pointer
- @id: ID of entry to tag
- @tag: Tag index to clear
- If there is an entry at @id in this IDR, clear its tag and
return the address of the entry. If @id is outside the
range of the IDR, return NULL.
65. Testing
- Performance benchmark
- Test cases to check below system call
- open()/close system call behaviour
- dup(), dup2() syscall behaviour
- select() syscall behaviour
- pipe() syscall behaviour
- Open file descriptor limit
- Test case which sets close_on_exec tag
71. Conclusion
- Implementation of IDR in __alloc_fd() and related code
path saved the memory and slightly improved the
performance.
- With current changes ~1M kernel memory is saved
- fd allocation code (kernel code) size reduced and it is
much readable than earlier
- Wherever in kernel if we need to map number with any
type of pointer then IDR can be best option.
- Custom allocator can be replaced with IDR