2. Notes
• Linux kernel version: 3.19
• Quoted source codes come from kernel/module.c
unless otherwise noted.
1
3. Kernel Module
• A feature for dynamically adding/removing kernel
features while the kernel is running
• Benefits
• To update the kernel features while running
• To reduce memory consumption (and CPU overhead) by
loading only necessary kernel modules
• Avoiding GPL (Not required to compliant with GPL;
proprietary drivers)
• Many kernel features can be compiled either linked
to the kernel statically or independent modules
• File systems, device drivers, etc.
• “TRISTATE” in Kconfig (y, m, or n)
2
4. Where is the kernel module?
• Linux kernel modules are ELF binaries with an
extension “.ko”
• Many distributions locate the kernel modules
under /lib/modules
• e.g. /lib/modules/3.13.0-44-generic/kernel (Ubuntu
14.10)
• “depmod” finds the kernel modules located under the
directory to create module dependency map
(modules.dep)
• “modprobe” utility loads a kernel module with its
dependent modules by looking up the modules.dep file
• However, a module located in any place can be
loaded to the kernel if specified explicitly.
3
5. What is the “dependency?”
• A kernel module can export “symbols” that may be
used by another kernel module
• A symbol : a name for a location in the memory; a global
variable or a function in C
• If a module (B) uses a symbol exported by another module
(A), then the module B has dependency for the module A
• Thus, the module A should be loaded before the module B is
loaded
• (There seems to be no way to load modules that have circular
dependencies (e.g. A depends on B; B also depends on A))
4
Kernel module A Kernel module B
function f() {
}
EXPORT_SYMBOL(f);
function g() {
f();
}
DEP
6. Exported Symbols
• The symbols explicitly marked as “export” can be
accessed by other kernel modules
• The Linux kernel itself has “export”-ed symbols.
• Kernel modules are allowed to use only the exported symbols
in the kernel
• Not all the global functions are available for the modules!
• The symbols to be exported are declared with the
EXPORT_SYMBOL and EXPORT_SYMBOL_GPL macros.
• The latter makes the symbol available only for GPL modules.
5
struct task_struct *pid_task(struct pid *pid, enum pid_type type)
{ ... }
EXPORT_SYMBOL(pid_task);
...
struct task_struct *get_pid_task(struct pid *pid, enum pid_type type)
{ ... }
EXPORT_SYMBOL_GPL(get_pid_task);
(kernel/pid.c)
8. Make a kernel module!
• Out-of-tree module
• The only necessary files are
• Makefile
• C source file(s)
• Example for Makefile
7
obj-m += hello.o
KERN_BUILD=/lib/modules/$(shell uname -r)/build
all:
make -C $(KERN_BUILD) M=$(PWD) modules
clean:
make -C $(KERN_BUILD) M=$(PWD) clean
cf.
obj-$(CONFIG_SHIMOS) = shimos.o
10. Sections
9
Section Name Description
.gnu.linkonce.this_module Module structure
.modinfo String-style module information
(Licenses, etc.)
__versions Expected (compile-time) versions (CRC) of the
symbols that this module depends on.
__ksymtab* Table of symbols which this module exports.
__kcrctab* Table of versions of symbols which this module
exports.
*.init Sections used while initialization (__init)
.text, .data, etc. The code and data
* : (none), _gpl, _gpl_future, _unused, unused_gpl
(License restriction / attribute of the symbols)
11. Module load and unload
• The simplest way : “insmod” and “rmmod” commands
• More sophisticated way is “modprobe” and “modprobe –r”
• The former tries to load modules which the specified module
depends on
• The latter tries to unload modules which the specified module
depends on
10
# insmod (file name) [parameters…]
(e.g.) # insmod helloworld.ko msg=hoge
# rmmod (module name)
(e.g.) # rmmod helloworld
12. How insmod calls the kernel?
• Source: kmod-19
11
KMOD_EXPORT int kmod_module_insert_module(struct kmod_module *mod,
unsigned int flags,
const char *options)
{
...
if (kmod_file_get_direct(mod->file)) {
unsigned int kernel_flags = 0;
if (flags & KMOD_INSERT_FORCE_VERMAGIC)
kernel_flags |= MODULE_INIT_IGNORE_VERMAGIC;
if (flags & KMOD_INSERT_FORCE_MODVERSION)
kernel_flags |= MODULE_INIT_IGNORE_MODVERSIONS;
err = finit_module(kmod_file_get_fd(mod->file), args, kernel_flags);
if (err == 0 || errno != ENOSYS)
goto init_finished;
}
...
(libkmod/libkmod-module.c)
13. System calls
• 3 Module-related System Calls
• init_module
• finit_module
• To load a module
• delete_module
• To unload a module
12
int init_module(void *module_image, unsigned long len,
const char *param_values);
int finit_module(int fd, const char *param_values,
int flags);
int delete_module(const char *name, int flags);
(from man pages)
14. init_module / finit_module
• Load a kernel module
• How to specify the module?
• init_module : by user memory buffer that contains the
kernel module image
• finit_module : by file descriptor for the kernel module
file
• By using finit_module, some flags can be specified
13
flags
MODULE_INIT_IGNORE_MODVERSIONS Ignore symbol version hashes
MODULE_INIT_IGNORE_VERMAGIC Ignore kernel version magic
15. delete_module
• Unload a kernel module
• Specifies a module to be unloaded by its “name”
• Some flags can be specified
• Why different policy from finit_module…?
14
flags
O_NONBLOCK | O_TRUNC Forcefully unload the module
(even when the ref count is not
zero; taints the kernel)
O_NONBLOCK Returns immediately with an error
(EWOULDBLOCK)
O_NONBLOCK not set Stops the module, and waits until
the ref count reaches zero.
(UNINTERRUPTIBLE)
16. Data structures for modules
• struct load_info
• Used while initializing a module
• Most members are ELF-related.
15
struct load_info {
Elf_Ehdr *hdr;
unsigned long len;
Elf_Shdr *sechdrs;
char *secstrings, *strtab;
unsigned long symoffs, stroffs;
struct _ddebug *debug;
unsigned int num_debug;
bool sig_ok;
struct {
unsigned int sym, str, mod, vers, info, pcpu;
} index;
};
(include/linux/module.h)
17. Data structures for modules
• struct module (too large..)
16
struct module {
enum module_state state;
/* Member of list of modules */
struct list_head list;
/* Unique handle for this module */
char name[MODULE_NAME_LEN];
/* Sysfs stuff. */
struct module_kobject mkobj;
...
/* Exported symbols */
const struct kernel_symbol *syms;
const unsigned long *crcs;
unsigned int num_syms;
/* Kernel parameters. */
struct kernel_param *kp;
unsigned int num_kp;
“modules” list
Exported symbols
Symbol CRC
18. Data structures for modules
17
/* GPL-only exported symbols. */
unsigned int num_gpl_syms;
const struct kernel_symbol *gpl_syms;
const unsigned long *gpl_crcs;
...
#ifdef CONFIG_MODULE_SIG
/* Signature was verified. */
bool sig_ok;
#endif
...
/* Exception table */
unsigned int num_exentries;
struct exception_table_entry *extable;
/* Startup function. */
int (*init)(void);
/* If this is non-NULL, vfree after init() returns */
void *module_init;
...
/* Here is the actual code + data, vfree'd on unload. */
void *module_core;
GPL Symbols
“init” function
“init” sections
Other (core) sections
19. Data structures for modules
18
/* Here are the sizes of the init and core sections */
unsigned int init_size, core_size;
/* The size of the executable code in each section. */
unsigned int init_text_size, core_text_size;
/* Size of RO sections of the module (text+rodata) */
unsigned int init_ro_size, core_ro_size;
/* Arch-specific module values */
struct mod_arch_specific arch;
...
/* The command line arguments (may be mangled). People like
keeping pointers to this stuff */
char *args;
...
#ifdef CONFIG_SMP
/* Per-cpu data. */
void __percpu *percpu;
unsigned int percpu_size;
#endifz
Sizes of sections
Command line
parameters
Per-CPU
Datas
20. Data structures for modules
19
...
#ifdef CONFIG_MODULE_UNLOAD
/* What modules depend on me? */
struct list_head source_list;
/* What modules do I depend on? */
struct list_head target_list;
/* Destruction function. */
void (*exit)(void);
struct module_ref __percpu *refptr;
#endif
#ifdef CONFIG_CONSTRUCTORS
/* Constructor functions. */
ctor_fn_t *ctors;
unsigned int num_ctors;
#endif
};
(include/linux/module.h)
Lists to manage
dependencies
(only unload is enabled)
21. Module state
• state in struct module
• During its load, state becomes
(created) -> UNFORMED -> COMING -> LIVE.
• During its unload, state becomes
LIVE -> GOING -> (removed)
20
state description
MODULE_STATE_UNFORMED Appeared in the modules list, but still during
set up
MODULE_STATE_COMING Fully formed. Running module_init.
MODULE_STATE_LIVE Normal state.
MODULE_STATE_GOING Being unloaded.
22. Global module information
Variables Description
LIST_HEAD(modules) List of modules that are in the kernel.
DEFINE_MUTEX(module_mutex) Protection against “modules,” etc.
• Add : RCU list operations
• Remove : stop_machine(~3.18)
21
/*
* Mutex protects:
* 1) List of modules (also safely readable with preempt_disable),
* 2) module_use links,
* 3) module_addr_min/module_addr_max.
* (delete uses stop_machine/add uses RCU list operations). */
DEFINE_MUTEX(module_mutex);
EXPORT_SYMBOL_GPL(module_mutex);
23. Loading a Module
• Load the whole module file onto memory
• Parse the ELF and module information
• Check the module information to
determine whether the module is
loadable or not
• Layout the sections and copy to the final
location
• Add the module to the kernel
• Resolve the symbols and apply
relocations
• Copy module parameters
• Call the init function
22
System Calls
load_module
layout_and_allocate
setup_load_info
check_mod_info
layout_sections
layout_symtabs
move_module
add_unformed_mo
dule
simply_symbols
apply_relocations
do_init_module
UNFORMED
COMING
LIVE
24. Unloading a Module
• Check if the reference count of the
module is zero
• If zero or it is forced unloading, then set
the state to GOING
• If not zero, it fails
• Call the “exit” function
• Free and cleanup everything
23
sys_delete_module
try_stop_module
__try_stop_module
free_module
25. stop_machine (-3.18)
• Until Linux 3.18, the reference count check and
module remove in module unloading is
implemented with stop_machine.
24
static int try_stop_module(struct module *mod, int flags, int *forced)
{
struct stopref sref = { mod, flags, forced };
return stop_machine(__try_stop_module, &sref, NULL);
}
static void free_module(struct module *mod)
{
...
mutex_lock(&module_mutex);
stop_machine(__unlink_module, mod, NULL);
mutex_unlock(&module_mutex);
...
}
26. Now (3.19)
• Reference count is now atomic_t (was per-cpu int
before) and checked without stop_machine
• (thanks to a mysterious guy)
25
static int try_stop_module(struct module *mod, int flags, int *forced)
{
/* If it's not unused, quit unless we're forcing. */
if (try_release_module_ref(mod) != 0) {
*forced = try_force_unload(flags);
if (!(*forced))
return -EWOULDBLOCK;
}
/* Mark it as dying. */
mod->state = MODULE_STATE_GOING;
return 0;
}
27. Now (3.19)
• Stop_machine also goes away from removing
26
static void free_module(struct module *mod)
{
...
/* Now we can delete it from the lists */
mutex_lock(&module_mutex);
/* Unlink carefully: kallsyms could be walking list. */
list_del_rcu(&mod->list);
/* Remove this module from bug list, this uses list_del_rcu */
module_bug_cleanup(mod);
/* Wait for RCU synchronizing before releasing mod->list and
buglist. */
synchronize_rcu();
mutex_unlock(&module_mutex);
...
}
29. sys_init_module/sys_finit_module
• Initialize a load_info structure
• Check whether module load is permitted or not.
(may_init_module function)
• [finit only] Flags check
• [init only] Copy module data in user memory to
kernel memory (copy_module_from_user function)
• [finit only] Read from the fd into kernel memory
(copy_module_from_fd function)
• Call the load_module function
28
30. may_init_module
• Capability: CAP_SYS_MODULE
• “module_disabled” parameter
• Blocks loading and unloading of modules
29
/* Block module loading/unloading? */
int modules_disabled = 0;
core_param(nomodule, modules_disabled, bint, 0);
...
static int may_init_module(void)
{
if (!capable(CAP_SYS_MODULE) || modules_disabled)
return -EPERM;
return 0;
}
(kernel/module.c)
# sysctl kernel.modules_disabled
kernel.modules_disabled = 0
31. copy_module_from_fd
• Pass the file struct to the security module
• vmalloc an area for the module data
• Load the whole module file into the area
• Set the pointer to info->hdr
30
static int copy_module_from_fd(int fd, struct load_info *info)
{
...
err = security_kernel_module_from_file(f.file);
if (err)
goto out;
...
info->hdr = vmalloc(stat.size);
if (!info->hdr) {
err = -ENOMEM;
goto out;
}
...
while (pos < stat.size) {
bytes = kernel_read(f.file, pos, (char *)(info->hdr) + pos,
stat.size - pos);
...
}
info->len = pos;
32. copy_module_from_user
• Differences:
• Pass “NULL” pointer to the security module
• Just copy_from_user instead of kernel_read
31
static int copy_module_from_user(const void __user *umod, unsigned long len,
struct load_info *info)
{...
info->len = len;
...
err = security_kernel_module_from_file(NULL);
if (err)
return err;
...
/* Suck in entire file: we'll want most of it. */
info->hdr = vmalloc(info->len);
if (!info->hdr)
return -ENOMEM;
...
if (copy_from_user(info->hdr, umod, info->len) != 0) {
vfree(info->hdr);
return -EFAULT;
}
return 0;
33. load_module function (1)
• Signature check (module_sig_check)
• ELF header check (elf_header_check)
• Layout and allocate the final location for the module
(layout_and_allocate)
• Add the module to the “modules” list
(add_unformed_module)
• Allocate per-cpu areas used in the module
(percpu_modalloc)
• Initialize link lists used for dependency management and
unloading features (module_unload_init)
• Find optional sections (find_module_sections)
• License and version dirty hack
(check_module_license_and_versions)
• Setup MODINFO_ATTR fields (setup_modinfo)
32
34. load_module function (2)
• Resolve the symbols (simplify_symbols)
• Fix up the addresses in the module (apply_relocations)
• Extable and per-cpu initialization (post_relocation)
• Flush I-cache for the module area
(flush_module_icache)
• Copy the module parameters to mod->args.
• Check duplication of symbols, and setup NX attributes.
(complete_formation)
• Parse the module parameters (parse_args)
• sysfs setup (mod_sysfs_setup)
• Free the copy in the load_info structure (free_copy)
• Call the init function of the module (do_init_module)
33
35. module_sig_check
• Check the signature in the module (if
CONFIG_MODULE_SIG=y)
• If a module is signed, “signature” and “marker” resides at the
tail of the module file.
• If signature is OK, module->sig_ok is set to true.
• If no signature is found (-ENOKEY) and signature is not
enforced, it returns success(0).
• Signature is enforced either
• When CONFIG_MODULE_SIG_FORCE is Y
• When “sig_enforce” parameter is set
34
Module (ELF) Signature Marker
“~Module signature appended~n”
$ hd /lib/module/3.13.0-45-generic/kernel/fs/btrfs/btrfs.ko
0014b470 f8 a6 b7 74 01 06 01 1e 14 00 00 00 00 00 02 02 |...t............|
0014b480 7e 4d 6f 64 75 6c 65 20 73 69 67 6e 61 74 75 72 |~Module signatur|
0014b490 65 20 61 70 70 65 6e 64 65 64 7e 0a |e appended~.|
0014b49c
36. elf_header_check
• Sanity check for the ELF header
• The magic number is correct
• The architecture is correct
• The length is large enough to contain all the section headers,
etc.
35
static int elf_header_check(struct load_info *info)
{
if (info->len < sizeof(*(info->hdr)))
return -ENOEXEC;
if (memcmp(info->hdr->e_ident, ELFMAG, SELFMAG) != 0
|| info->hdr->e_type != ET_REL
|| !elf_check_arch(info->hdr)
|| info->hdr->e_shentsize != sizeof(Elf_Shdr))
return -ENOEXEC;
if (info->hdr->e_shoff >= info->len
|| (info->hdr->e_shnum * sizeof(Elf_Shdr) >
info->len - info->hdr->e_shoff))
return -ENOEXEC;
return 0;
}
38. layout_and_allocate
• Fill the section information of the load_info, and
create a module structure pointing to the
temporary location (setup_load_info)
• Check the module information and report if the
module taints the kernel (check_modinfo)
• Calculate the size required for the final location of
the module (layout_sections / layout_symtab)
• Allocate the memory of the calculated size, and
copy the contents of the module, and move the
pointer of the module structure there
(move_module).
37
39. setup_load_info
• Set the following members according to the ELF header
and section headers.
• sechdrs (Pointer to the section header)
• secstrings (Pointer to the string section that contains section
names)
• index.info, index.ver (Section indices of modinfo, version)
• index.sym, index.str (Section indices of symbols, strings)
• strtab (Pointer to the string section)
• index.mod (section index of module section)
• “.gnu.linkonce.this_module” section
• Set the module pointer to this section (temporally)
• index.pcu (section index for per-cpu section)
• “.data..percpu” section (if exists)
• Return a pointer to a (temporary) module structure
38
41. check_modinfo (1)
• Check “modinfo” in the module, and check if the
version magic is identical to the current kernel, and
mark “tainted” if it taints the kernel.
• “Modinfo” resides in the “.modinfo” section, and is
composed of zero-terminated strings of key-value
pairs connected by “=“.
40
description=Hello world kernel module0
author=Taku Shimosawa <shimos@shimos.net>0
license=GPL v20
srcversion=8D5BACDC1EA9421ABFF79DD0
depends=0
vermagic=3.13.0-44-generic SMP mod_unload modversions
42. check_modinfo (2)
• First, check the version magic in the module
41
static int check_modinfo(struct module *mod, struct load_info *info,
int flags)
{
const char *modmagic = get_modinfo(info, "vermagic");
...
if (flags & MODULE_INIT_IGNORE_VERMAGIC)
modmagic = NULL;
...
if (!modmagic) {
err = try_to_force_load(mod, "bad vermagic");
if (err)
return err;
} else if (!same_magic(modmagic, vermagic, info->index.vers)) {
pr_err("%s: version magic '%s' should be '%s'n",
mod->name, modmagic, vermagic);
return -ENOEXEC;
}
43. check_modinfo (3)
• Version magic
• Example:
• same_magic function
• Compare the vermagic strings excluding CRCs if they
have CRCs.
42
#define VERMAGIC_STRING
UTS_RELEASE " "
MODULE_VERMAGIC_SMP MODULE_VERMAGIC_PREEMPT
MODULE_VERMAGIC_MODULE_UNLOAD MODULE_VERMAGIC_MODVERSIONS
MODULE_ARCH_VERMAGIC
(include/linux/vermagic.h)
3.13.0-44-generic SMP mod_unload modversions
44. check_modinfo (4)
• …And mark tainted if any is necesary
43
if (!get_modinfo(info, "intree"))
add_taint_module(mod, TAINT_OOT_MODULE, LOCKDEP_STILL_OK);
if (get_modinfo(info, "staging")) {
add_taint_module(mod, TAINT_CRAP, LOCKDEP_STILL_OK);
pr_warn("%s: module is from the staging directory, the
quality "
"is unknown, you have been warned.n", mod->name);
}
/* Set up license info based on the info section */
set_license(mod, get_modinfo(info, "license"));
45. check_modinfo (5)
• License information is also important
44
static void set_license(struct module *mod, const char *license)
{
if (!license)
license = "unspecified";
if (!license_is_gpl_compatible(license)) {
if (!test_taint(TAINT_PROPRIETARY_MODULE))
pr_warn("%s: module license '%s' taints
kernel.n",
mod->name, license);
add_taint_module(mod, TAINT_PROPRIETARY_MODULE,
LOCKDEP_NOW_UNRELIABLE);
}
}
47. check_modinfo (7)
• Also, the kernel is marked tainted when the module
is loaded forcefully
46
static int try_to_force_load(struct module *mod, const char
*reason)
{
#ifdef CONFIG_MODULE_FORCE_LOAD
if (!test_taint(TAINT_FORCED_MODULE))
pr_warn("%s: %s: kernel tainted.n", mod->name,
reason);
add_taint_module(mod, TAINT_FORCED_MODULE,
LOCKDEP_NOW_UNRELIABLE);
return 0;
#else
return -ENOEXEC;
#endif
}
48. Taints!
• Tainted mask are composed of several flags that
identifies the reason of tainting
• Lockdep is disabled if it will not work well
• Ignoring the version magic, proprietary drivers, forceful unload
47
void add_taint(unsigned flag, enum lockdep_ok lockdep_ok)
{
if (lockdep_ok == LOCKDEP_NOW_UNRELIABLE && __debug_locks_off())
pr_warn("Disabling lock debugging due to kernel taintn");
set_bit(flag, &tainted_mask);
}
(kernel/panic.c)
static inline void add_taint_module(struct module *mod, unsigned flag,
enum lockdep_ok lockdep_ok)
{
add_taint(flag, lockdep_ok);
mod->taints |= (1U << flag);
}
(kernel/module.c)
Kernel global flags
Per-module flags
50. layout_sections
• Calculate the size of final memory to load the module
• Load only sections with “SHF_ALLOC” flags set
• Calculate sizes for “core” and “init”
• “init” sections are determined when the section name starts with
“.init”
• Sets the following member of module
• core_size : sum of the sizes of the “core” sections to be
loaded
• core_text_size, core_ro_size : sum of the sizes of the text and
R/O “core” sections
• init_size : sum of the sizes of the “init” sections to be loaded
• init_text_size, init_ro_size : … of “init” sections
• sh_entsize in ELF_Shdr is used as the offset of the
memory where the section will be loaded.
49
51. layout_sections
• The sections in the example “hello.ko” are
categorized as follows:
50
Sections
Core Text .text, .exit.text
R/O __ksymtab, __kcrctab, .rodata.str1.1, __ksymtab_strings
__mcount_loc,
R/W .data, .gnu.linkonce.this_module, .bss,
Init Text .init.text
R/O
R/W
(Others) Not loaded .rela.text, .rela.init.text, .rela__ksymtab, .rela__kcrctab
.rela__mcount_loc, .rela.gnu.linonce.this_module
.comment, .note.GNU-stack, .shstrtab, .symtab, .strtab
.modinfo, __versions (*)
(*) These two sections originally have SHF_ALLOC, but the flags are
dropped by rewrite_section_headers
52. layout_symtab
• Put the symtab and strtab at the end of the init part
• (Actually this function does not put, but add init_size by
the size of symtab)
• Put the symtab and strtab for the core symbols at
the end of core part.
51
53. move_module
• Allocate the final memory of the module, and
update the boundary addresses for the modules
(module_alloc_update_bounds)
• Copy the section contents and update sh_addr’s
52
static void *module_alloc_update_bounds(unsigned long size)
{
void *ret = module_alloc(size);
if (ret) {
mutex_lock(&module_mutex);
if ((unsigned long)ret < module_addr_min)
module_addr_min = (unsigned long)ret;
if ((unsigned long)ret + size > module_addr_max)
module_addr_max = (unsigned long)ret + size;
mutex_unlock(&module_mutex);
}
return ret;
}
54. module_alloc : x86
• x86
• Get_module_load_offset() determines the load offset as
a random value at the first time if KASLR is enabled
53
#define MODULES_VADDR VMALLOC_START
#define MODULES_END VMALLOC_END
(arch/x86/include/asm/pgtable_32_types.h)
#define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE)
#define MODULES_END _AC(0xffffffffff000000, UL)
(arch/x86/include/asm/pgtable_64_types.h)
void *module_alloc(unsigned long size)
{
if (PAGE_ALIGN(size) > MODULES_LEN)
return NULL;
return __vmalloc_node_range(size, 1,
MODULES_VADDR + get_module_load_offset(),
MODULES_END, GFP_KERNEL | __GFP_HIGHMEM,
PAGE_KERNEL_EXEC, NUMA_NO_NODE,
__builtin_return_address(0));
}
(arch/x86/kernel/module.c)
56. module to final place
• Struct module for the module loaded was pointed
to the temporary module image memory
• Now, it’s loaded and copied to the final location , so
the pointer is also changed to the final location
55
/* Module has been copied to its final place now: return it. */
mod = (void *)info->sechdrs[info->index.mod].sh_addr;
57. load_module function (1) [RE]
• Signature check (module_sig_check)
• ELF header check (elf_header_check)
• Layout and allocate the final location for the module
(layout_and_allocate)
• Add the module to the “modules” list
(add_unformed_module)
• Allocate per-cpu areas used in the module
(percpu_modalloc)
• Initialize link lists used for dependency management and
unloading features (module_unload_init)
• Find optional sections (find_module_sections)
• License and version dirty hack
(check_module_license_and_versions)
• Setup MODINFO_ATTR fields (setup_modinfo)
56
58. add_unformed_module
• Add the module to the “modules” list
• Checking the duplicated loading of the same module
• If the same module is still being loaded, this waits for
the completion of the load, and it tries again
• Just in case that the module fails to be loaded
57
60. When loading occurs concurrently
59
Module A UNFORMED LIVE
Module A UNFORMED (fail)
Module B
(depends on A)
UNFORMED Resolve Resolve LIVE
wakeup_all
(@do_init_module)
time
COMING
61. percpu_modalloc
• Allocate per-cpu area for the size of the per-cpu
section
60
static int percpu_modalloc(struct module *mod, struct load_info *info)
{
Elf_Shdr *pcpusec = &info->sechdrs[info->index.pcpu];
unsigned long align = pcpusec->sh_addralign;
if (!pcpusec->sh_size)
return 0;
...
mod->percpu = __alloc_reserved_percpu(pcpusec->sh_size, align);
if (!mod->percpu) {
pr_warn("%s: Could not allocate %lu bytes percpu datan",
mod->name, (unsigned long)pcpusec->sh_size);
return -ENOMEM;
}
mod->percpu_size = pcpusec->sh_size;
return 0;
}
62. module_unload_init
• Initialize a reference counter for the module
• After this function, it becomes 2.
• Initialize lists that manages dependency
• source_list : list of “usages” in which the module is contained
as their “source” (= the list of modules which uses the
symbols of the module)
• target_list : list of “usages” in which the module is contained
as their “target” (= the list of modules symbols of which the
module uses)
61
static int module_unload_init(struct module *mod)
{
atomic_set(&mod->refcnt, MODULE_REF_BASE);
INIT_LIST_HEAD(&mod->source_list);
INIT_LIST_HEAD(&mod->target_list);
atomic_inc(&mod->refcnt);
return 0;
}
63. find_module_sections
• Find additional sections in the module
• Mostly related to symbol tables, and tracers
62
Sections
__param
__ksymtab
__kcrctab
__ksymtab_gpl
__kcrctab_gpl
__ksymtab_gpl_future
__kcrctab_gpl_future
__ksymtab_unused
__kcrctab_unused
__ksymtab_unused_gpl
__kcrctab_unused_gtpl
Sections
.ctors / .init_array
__tracepoints_ptrs
__jump_table
_ftrace_events
__trace_printk_fmt
__mcount_loc
__ex_table
__verbose
64. check_module_license_and_versions
• Some hacks on specific modules
• e.g.) ndiswrapper driver may be GPL (it needs symbols
exported only to GPL modules), but the driver it loads
will not be GPL, so mark tainted
63
static int check_module_license_and_versions(struct module *mod)
{
if (strcmp(mod->name, "ndiswrapper") == 0)
add_taint(TAINT_PROPRIETARY_MODULE, LOCKDEP_NOW_UNRELIABLE);
/* driverloader was caught wrongly pretending to be under GPL */
if (strcmp(mod->name, "driverloader") == 0)
add_taint_module(mod, TAINT_PROPRIETARY_MODULE,
LOCKDEP_NOW_UNRELIABLE);
/* lve claims to be GPL but upstream won't provide source */
if (strcmp(mod->name, "lve") == 0)
add_taint_module(mod, TAINT_PROPRIETARY_MODULE,
LOCKDEP_NOW_UNRELIABLE);
65. check_module_license_and_versions
• Checks whether the symbols have CRCs (versions)
64
#ifdef CONFIG_MODVERSIONS
if ((mod->num_syms && !mod->crcs)
|| (mod->num_gpl_syms && !mod->gpl_crcs)
|| (mod->num_gpl_future_syms && !mod->gpl_future_crcs)
#ifdef CONFIG_UNUSED_SYMBOLS
|| (mod->num_unused_syms && !mod->unused_crcs)
|| (mod->num_unused_gpl_syms && !mod->unused_gpl_crcs)
#endif
) {
return try_to_force_load(mod,
"no versions for exported
symbols");
}
#endif
return 0;
67. load_module function (2) [Re]
• Resolve the symbols (simplify_symbols)
• Fix up the addresses in the module (apply_relocations)
• Extable and per-cpu initialization (post_relocation)
• Flush I-cache for the module area
(flush_module_icache)
• Copy the module parameters to mod->args.
• Check duplication of symbols, and setup NX attributes.
(complete_formation)
• Parse the module parameters (parse_args)
• sysfs setup (mod_sysfs_setup)
• Free the copy in the load_info structure (free_copy)
• Call the init function of the module (do_init_module)
66
68. simplify_symbols
• Change the address of the unresolved symbols in
the “symtab” section to the actual addresses
67
static int simplify_symbols(struct module *mod, const struct load_info *info)
{
Elf_Shdr *symsec = &info->sechdrs[info->index.sym];
Elf_Sym *sym = (void *)symsec->sh_addr;
...
for (i = 1; i < symsec->sh_size / sizeof(Elf_Sym); i++) {
const char *name = info->strtab + sym[i].st_name;
...
case SHN_UNDEF:
ksym = resolve_symbol_wait(mod, info, name);
/* Ok if resolved. */
if (ksym && !IS_ERR(ksym)) {
sym[i].st_value = ksym->value;
break;
}
/* Ok if weak. */
if (!ksym && ELF_ST_BIND(sym[i].st_info) == STB_WEAK)
break;
69. resolve_symbol_wait
• Waits if the resolved symbol is that of the module
which is under initialization.
68
static const struct kernel_symbol *
resolve_symbol_wait(struct module *mod,
const struct load_info *info,
const char *name)
{
const struct kernel_symbol *ksym;
char owner[MODULE_NAME_LEN];
if (wait_event_interruptible_timeout(module_wq,
!IS_ERR(ksym = resolve_symbol(mod, info, name, owner))
|| PTR_ERR(ksym) != -EBUSY,
30 * HZ) <= 0) {
pr_warn("%s: gave up waiting for init of module %s.n",
mod->name, owner);
}
return ksym;
}
70. resolve_symbol
• Find the symbol from the kernel’s symbol tables
and other modules’ symbol tables. (find_symbol)
• If found, check if the version (CRC) of the symbol
matches one that the module expects
(check_versions)
• And add dependency for the target module and the
symbol owner module (ref_module)
69
74. ref_module
• If the target module is NULL (=the symbol is in the
kernel) or the module already uses the target module,
it immediately returns.
• Increment the reference counter of the target module
(if the target module is in the middle of initialization,
returns –EBUSY)
• Add usage
• Source : the module
• Target : the target module
73
static int add_module_usage(struct module *a, struct module *b)
{
struct module_use *use;
use = kmalloc(sizeof(*use), GFP_ATOMIC);
use->source = a;
use->target = b;
list_add(&use->source_list, &b->source_list);
list_add(&use->target_list, &a->target_list);
}
75. Usage example
74
Kernel module A Kernel module B
function f() {
}
function g() {
f();
}
DEP
struct module A
refcnt : 2
struct module B
refcnt: 1
struct module_use
source: &B
target: &A
source_list
target_list
source_list
target_list
77. Relocation
• Example
• This function uses
the “printk” symbol
outside the module.
(And also __fentry__)
76
0000000000000000 <say_hello>:
0: e8 00 00 00 00 callq 5 <say_hello+0x5>
1: R_X86_64_PC32 __fentry__-0x4
5: 55 push %rbp
6: 48 c7 c7 00 00 00 00 mov $0x0,%rdi
9: R_X86_64_32S .rodata.str1.1
d: 31 c0 xor %eax,%eax
f: 48 89 e5 mov %rsp,%rbp
12: e8 00 00 00 00 callq 17 <say_hello+0x17>
13: R_X86_64_PC32 printk-0x4
17: 5d pop %rbp
18: c3 retq
void say_hello(void)
{
printk(KERN_INFO
"Hello, World.n");
}
RIP-relative is based on
the next instruction
78. apply_relocate[_add]
• Addressing is architecture-dependent, so the
relocation is also architecture-dependent
• x86_64 (RELA)
• An RELA section is an array of Elf64_Rela
• In the “printk” example
• r_offset = 0x13
• r_info = R_X86_64_PC32 (RIP-relative in x86_64)
• r_addend = -0x04
77
typedef struct elf64_rela {
Elf64_Addr r_offset; /* Location at which to apply the action */
Elf64_Xword r_info; /* index and type of relocation */
Elf64_Sxword r_addend; /* Constant addend used to compute value */
} Elf64_Rela;
79. apply_relocate_add in x86_64
78
int apply_relocate_add(Elf64_Shdr *sechdrs,
const char *strtab,
unsigned int symindex,
unsigned int relsec,
struct module *me)
{
...
for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rel); i++) {
/* This is where to make the change */
loc = (void *)sechdrs[sechdrs[relsec].sh_info].sh_addr
+ rel[i].r_offset;
/* This is the symbol it is referring to. Note that
all undefined symbols have been resolved. */
sym = (Elf64_Sym *)sechdrs[symindex].sh_addr
+ ELF64_R_SYM(rel[i].r_info);
...
val = sym->st_value + rel[i].r_addend;
80. apply_relocate_add in x86_64
79
switch (ELF64_R_TYPE(rel[i].r_info)) {
...
case R_X86_64_64:
*(u64 *)loc = val;
break;
...
case R_X86_64_32S:
*(s32 *)loc = val;
if ((s64)val != *(s32 *)loc)
goto overflow;
break;
case R_X86_64_PC32:
val -= (u64)loc;
*(u32 *)loc = val;
#if 0
if ((s64)val != *(s32 *)loc)
goto overflow;
#endif
break;
Calculate the delta between
the current address and the
target address
81. post_relocation
• Sort the exception table (sort_extable)
• Exception table: the instruction addresses which the page
fault handler treats specially page faults for.
• get_user etc.
• Copy the per-cpu section contents for all the possible
cpus. (percpu_modcopy)
• Set kallsyms-related members to the final location, and
copy core symtab from the whole symtab.
(add_kallsyms)
• Call architecture-dependent finalizing function of
loading (module_finalize)
80
for_each_possible_cpu(cpu)
memcpy(per_cpu_ptr(mod->percpu, cpu), from, size);
82. module_finalize in x86_64
• Alternatives, paravirt and so on.
81
int module_finalize(const Elf_Ehdr *hdr,
const Elf_Shdr *sechdrs,
struct module *me)
{
const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
*para = NULL;
char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
if (!strcmp(".text", secstrings + s->sh_name))
text = s;
if (!strcmp(".altinstructions", secstrings + s->sh_name))
alt = s;
if (!strcmp(".smp_locks", secstrings + s->sh_name))
locks = s;
if (!strcmp(".parainstructions", secstrings + s->sh_name))
para = s;
if (alt) {
/* patch .altinstructions */
void *aseg = (void *)alt->sh_addr;
apply_alternatives(aseg, aseg + alt->sh_size);
}
...
83. flush_module_icache
• Flush instruction cache for text area so that the
code be executed correctly
82
static void flush_module_icache(const struct module *mod)
{
mm_segment_t old_fs;
/* flush the icache in correct context */
old_fs = get_fs();
set_fs(KERNEL_DS);
if (mod->module_init)
flush_icache_range((unsigned long)mod->module_init,
(unsigned long)mod->module_init
+ mod->init_size);
flush_icache_range((unsigned long)mod->module_core,
(unsigned long)mod->module_core + mod->core_size);
set_fs(old_fs);
}
84. complete_formation
• Check if the exported symbols are already exported
by another module (verify_export_symbols)
• Add section information of symbols for BUG report
(module_bug_finalize)
• Set NX and RO for core and init area.
• Set the module state to MODULE_STATE_COMING
83
mod->state = MODULE_STATE_COMING;
85. load_module function (2) [Re]
• Resolve the symbols (simplify_symbols)
• Fix up the addresses in the module (apply_relocations)
• Extable and per-cpu initialization (post_relocation)
• Flush I-cache for the module area
(flush_module_icache)
• Copy the module parameters to mod->args.
• Check duplication of symbols, and setup NX attributes.
(complete_formation)
• Parse the module parameters (parse_args)
• sysfs setup (mod_sysfs_setup)
• Free the copy in the load_info structure (free_copy)
• Call the init function of the module (do_init_module)
84
86. do_init_module (1)
• Make a structure for call_rcu to free init area
• And call the init function in the module
• Set the module state to MODULE_STATE_LIVE
85
struct mod_initfree *freeinit;
freeinit = kmalloc(sizeof(*freeinit), GFP_KERNEL);
...
freeinit->module_init = mod->module_init;
do_mod_ctors(mod);
/* Start the module */
if (mod->init != NULL)
ret = do_one_initcall(mod->init);
mod->state = MODULE_STATE_LIVE;
87. do_init_module (2)
• To avoid deadlock, perform synchronize
• Drop the initial reference
• And clears the init-related stuffs!
86
if (current->flags & PF_USED_ASYNC)
async_synchronize_full();
mutex_lock(&module_mutex);
/* Drop initial reference. */
module_put(mod);
trim_init_extable(mod);
#ifdef CONFIG_KALLSYMS
mod->num_symtab = mod->core_num_syms;
mod->symtab = mod->core_symtab;
mod->strtab = mod->core_strtab;
#endif
unset_module_init_ro_nx(mod);
module_arch_freeing_init(mod);
88. do_init_module (3)
• Finally, frees the init stuffs
• Wakes up if someone is waiting for the completion
of the initialization.
87
call_rcu(&freeinit->rcu, do_free_init);
mutex_unlock(&module_mutex);
wake_up_all(&module_wq);
90. sys_delete_module
• Check capability and module blocking parameter
• Find the specified module by name
• If the module has the init function AND does not
have the exit function and it is not forceful unload,
it fails with –EBUSY
• Try to stop the module (try_stop_module)
• Call the exit function
• Frees the module
89
91. Now (3.19) [RE]
• Reference count is now atomic_t (was per-cpu int
before) and checked without stop_machine
• (thanks to a mysterious guy)
90
static int try_stop_module(struct module *mod, int flags, int *forced)
{
/* If it's not unused, quit unless we're forcing. */
if (try_release_module_ref(mod) != 0) {
*forced = try_force_unload(flags);
if (!(*forced))
return -EWOULDBLOCK;
}
/* Mark it as dying. */
mod->state = MODULE_STATE_GOING;
return 0;
}
92. try_release_module_ref
• Decrement the reference counter and checks if it
reaches is zero (= can be unloaded).
91
static int try_release_module_ref(struct module *mod)
{
int ret;
/* Try to decrement refcnt which we set at loading */
ret = atomic_sub_return(MODULE_REF_BASE, &mod->refcnt);
BUG_ON(ret < 0);
if (ret)
/* Someone can put this right now, recover with
checking */
ret = atomic_add_unless(&mod->refcnt, MODULE_REF_BASE,
0);
return ret;
}
96. CRC sections
• Declare CRC symbols in CRC sections with the weak
attribute.
95
#ifndef __GENKSYMS__
#ifdef CONFIG_MODVERSIONS
/* Mark the CRC weak since genksyms apparently decides not to
* generate a checksums for some symbols */
#define __CRC_SYMBOL(sym, sec)
extern __visible void *__crc_##sym __attribute__((weak));
static const unsigned long __kcrctab_##sym
__used
__attribute__((section("___kcrctab" sec "+" #sym), unused))
= (unsigned long) &__crc_##sym;
#else
#define __CRC_SYMBOL(sym, sec)
#endif
(include/linux/export.h)
97. Build Steps (2) : .c -> .o
• Create __mcount_loc list (if –pg is enabled)
• The list of pointers where “mcount” is called
• Fix up the dep file
• Link into a single object file (<module>.o) if the
module is composed of multiple object files
96
98. Build Steps (3) – Stage 2
• Create <module>.mod.c and <module>.symvers by modpost
command
• Compile the <module>.mod.c
• Link the <module>.mod.o and <module>.o into a module
<module>.ko
97
modpost = scripts/mod/modpost
$(if $(CONFIG_MODVERSIONS),-m)
$(if $(CONFIG_MODULE_SRCVERSION_ALL),-a,)
$(if $(KBUILD_EXTMOD),-i,-o) $(kernelsymfile)
$(if $(KBUILD_EXTMOD),-I $(modulesymfile))
$(if $(KBUILD_EXTRA_SYMBOLS), $(patsubst %, -e %,$(KBUILD_EXTRA_SYMBOLS)))
$(if $(KBUILD_EXTMOD),-o $(modulesymfile))
$(if $(CONFIG_DEBUG_SECTION_MISMATCH),,-S)
$(if $(KBUILD_EXTMOD)$(KBUILD_MODPOST_WARN),-w)
MODPOST_OPT=$(subst -i,-n,$(filter -i,$(MAKEFLAGS)))
# We can go over command line length here, so be careful.
quiet_cmd_modpost = MODPOST $(words $(filter-out vmlinux FORCE, $^)) modules
cmd_modpost = $(MODLISTCMD) | sed 's/.ko$$/.o/' | $(modpost) $(MODPOST_OPT) -s
-T -
99. modpost (1)
• Collects module information, symbol information
and versions from kernel symbols, object files, and
generate module source file and symvers file.
• Arguments
• Options
98
Option Description
-m CONFIG_MODVERSIONS (Symbol version)
-a CONFIG_MODULE_SRCVERSION_ALL (“srcversion” in modinfo)
MD4 for the source files that made the module
-I (symvers file) Input symbol versions (kernel symbols)
-e (symvers file) Input extra symbol versions
-o (symvers file) Output symbol versions (for exported symbols of the module)
-T (files) Source (object) file list
$ modpost [Options...] [(Module object files...)]
101. modpost (3)
• Dump the symbol versions
100
static void write_dump(const char *fname)
{
struct buffer buf = { };
struct symbol *symbol;
int n;
for (n = 0; n < SYMBOL_HASH_SIZE ; n++) {
symbol = symbolhash[n];
while (symbol) {
if (dump_sym(symbol))
buf_printf(&buf, "0x%08xt%st%st%sn",
symbol->crc, symbol->name,
symbol->module->name,
export_str(symbol->export));
symbol = symbol->next;
}
}
write_if_changed(&buf, fname);
}
(scripts/mod/modpost.c)0xb37b83db say_hello /home/shimos/test_module/hello EXPORT_SYMBOL
102. Generated <module>.mod.c (1)
• Example
101
#include <linux/module.h>
#include <linux/vermagic.h>
#include <linux/compiler.h>
MODULE_INFO(vermagic, VERMAGIC_STRING);
__visible struct module __this_module
__attribute__((section(".gnu.linkonce.this_module"))) = {
.name = KBUILD_MODNAME,
.init = init_module,
#ifdef CONFIG_MODULE_UNLOAD
.exit = cleanup_module,
#endif
.arch = MODULE_ARCH_INIT,
};
static const struct modversion_info ____versions[]
__used
__attribute__((section("__versions"))) = {
{ 0x9412fa01, __VMLINUX_SYMBOL_STR(module_layout) },
{ 0x27e1a049, __VMLINUX_SYMBOL_STR(printk) },
{ 0xbdfb6dbb, __VMLINUX_SYMBOL_STR(__fentry__) },
}; ...
Additional modinfo is included
Base of struct module
Symbols and (expected) versions
which this module depends on.
103. Generated <module>.mod.c (2)
• Example
102
static const char __module_depends[]
__used
__attribute__((section(".modinfo"))) =
"depends=";
MODULE_INFO(srcversion, "8D5BACDC1EA9421ABFF79DD")
Modinfo about dependency
(but the kernel does not use this)
Modinfo “srcversion”
104. modinfo
• The modinfo string is created by macros, and concatenated
by collecting the string into a single section
103
#define __MODULE_INFO(tag, name, info)
static const char __UNIQUE_ID(name)[]
__used __attribute__((section(".modinfo"), unused, aligned(1)))
= __stringify(tag) "=" info
(include/linux/moduleparam.h)
#define MODULE_INFO(tag, info) __MODULE_INFO(tag, tag, info)
...
#define MODULE_LICENSE(_license) MODULE_INFO(license, _license)
...
#define MODULE_AUTHOR(_author) MODULE_INFO(author, _author)
...
(include/linux/module.h)