SlideShare a Scribd company logo
1 of 17
Download to read offline
SFO17- 421 The Linux Kernel
Scheduler
Viresh Kumar (PMWG)
ENGINEERS AND DEVICES
WORKING TOGETHER
Topics
● CPU Scheduler
● The O(1) scheduler
● Current scheduler design
● Scheduling classes
● schedule()
● Scheduling classes and policies
● Sched class: STOP
● Sched class: Deadline
● Sched class: Realtime
● Sched class: CFS
● Sched class: Idle
● Runqueue
ENGINEERS AND DEVICES
WORKING TOGETHER
CPU Scheduler
● Shares CPU between tasks.
● Selects next task to run (on each CPU) when:
○ Running task terminates
○ Running task sleeps (waiting for an event)
○ A new task created or sleeping task wakes up
○ Running task’s timeslice is over
● Goals:
○ Be *fair*
○ Timeslice based on task priority
○ Low task response time
○ High (task completion) throughput
○ Balance load among CPUs
○ Power efficient
○ Low overhead of running scheduler’s code
● Work with mainframes, servers, desktops/laptops, embedded/phones.
ENGINEERS AND DEVICES
WORKING TOGETHER
The O(1) scheduler
● 140 priorities. 0-99: RT tasks and 100-139: User tasks.
● Each CPU’s runqueue had 2 arrays: Active and Expired.
● Each arrays had 140 entries, one per priority level.
● Each entry was a linked list serviced in FIFO order.
● Bitmap (of 140 bits) used to find which priority lists aren’t empty.
● Timeslices were assigned to tasks based on these priorities.
● Expired tasks moved from Active to Passive array.
● Once Active array was empty, the arrays were swapped.
● Enqueue and dequeue of tasks and next task selection done in constant time.
● Replaced by CFS in Linux 2.6.23 (2007).
ENGINEERS AND DEVICES
WORKING TOGETHER
The O(1) scheduler (Cont..)
Priority 0
Priority 1
Priority 139
...
Priority 99
...
CPU Active array
Priority 0
Priority 1
Priority 139
...
Priority 99
...
CPU Expired array
Real Time task priorities
User task priorities
ENGINEERS AND DEVICES
WORKING TOGETHER
Current scheduler design
● Introduced by Ingo Molnar in 2.6.23 (2007).
● Scheduling policies within scheduling classes.
● Scheduling class with higher priority served first.
● Task can migrate between CPUs, scheduling policies and scheduling classes.
ENGINEERS AND DEVICES
WORKING TOGETHER
Scheduling Classes
● Represented by following structure:
struct sched_class {
const struct sched_class *next;
void (*enqueue_task) (struct rq *rq, struct task_struct *p, int flags);
void (*dequeue_task) (struct rq *rq, struct task_struct *p, int flags);
struct task_struct * (*pick_next_task) (struct rq *rq, struct task_struct *prev, struct rq_flags *rf);
…
};
STOP
next
DL RT CFS IDLE
nullnext next next next
ENGINEERS AND DEVICES
WORKING TOGETHER
Schedule()
● Picks the next runnable task to run on a CPU.
● Searches for a task starting from the highest priority class (stop).
● Helper: for_each_class().
● pick_next_task():
again:
for_each_class(class) {
p = class->pick_next_task(rq, prev, rf);
if (p) {
if (unlikely(p == RETRY_TASK))
goto again;
return p;
}
}
/* The idle class should always have a runnable task: */
BUG();
ENGINEERS AND DEVICES
WORKING TOGETHER
Scheduling classes and policies
● Stop
○ No policy
● Deadline
○ SCHED_DEADLINE
● Real Time
○ SCHED_FIFO
○ SCHED_RR
● Fair
○ SCHED_NORMAL
○ SCHED_BATCH
○ SCHED_IDLE
● Idle
○ No policy
ENGINEERS AND DEVICES
WORKING TOGETHER
Sched class: STOP
● Highest priority class.
● Only available for SMP (stop_machine() isn’t used in UP).
● Can preempt everything and is preempted by nothing.
● Mechanism to stop running everything else and run a specific function on
CPU(s).
● No scheduling policies.
● One kernel thread per CPU: “migration/N”.
● Used by task migration, CPU hotplug, RCU, ftrace, clockevents, etc.
ENGINEERS AND DEVICES
WORKING TOGETHER
Sched class: Deadline (DL)
● Introduced by Dario Faggioli & Juri Lelli, v3.14 (2013).
● Highest priority tasks in the system.
● Scheduling policy: SCHED_DEADLINE.
● Implemented with red-black tree (self balancing).
● Used for periodic real time tasks, eg. Video encoding/decoding.
ENGINEERS AND DEVICES
WORKING TOGETHER
Sched class: Realtime (RT)
● POSIX “real-time” tasks.
● Task priorities: 0 - 99.
● Inverted priorities: 0 highest in kernel, lowest in user space.
● Scheduling policies for tasks at same priority:
○ SCHED_FIFO
○ SCHED_RR, 100ms timeslice by default.
● Implemented with Linked lists.
● Used for short latency sensitive tasks, eg. IRQ threads.
ENGINEERS AND DEVICES
WORKING TOGETHER
Sched class: CFS (Completely fair scheduler)
● Introduced by Ingo Molnar (motivated by Rotating Staircase Deadline
Scheduler by Con Kolivas).
● Scheduling policies:
○ SCHED_NORMAL: Normal Unix tasks
○ SCHED_BATCH: Batch (non-interactive) tasks
○ SCHED_IDLE: Low priority tasks
● Implemented with red-black tree (self balancing).
● Tracks Virtual runtime of tasks (amount of time task has run).
● Tasks with shortest vruntime runs first.
● Priority is used to set task’s weight, that affects vruntime.
● Higher the weight, slower will vruntime increase.
● Task’s priority is calculated by 120 + nice (-20 to +19).
● Used for all other system tasks, eg: shell.
ENGINEERS AND DEVICES
WORKING TOGETHER
Sched class: Idle
● Lowest priority scheduling class.
● No scheduling policies.
● One kernel thread (idle) per CPU: “swapper/N”.
● Idle thread runs only when nothing else is runnable on a CPU.
● Idle thread may take the CPU to low power state.
ENGINEERS AND DEVICES
WORKING TOGETHER
Runqueue
● Each CPU has an instance of “struct rq”.
● Each “rq” has DL, RT and CFS runqueues within it.
● Runnable tasks are enqueued on those runqueues.
● Lots of other information, stats are available in struct rq.
struct rq {
…
struct cfs_rq cfs;
struct rt_rq rt;
struct dl_rq dl;
...
}
ENGINEERS AND DEVICES
WORKING TOGETHER
Runqueue (Cont.)
STOP
CFS
IDLE
DEADLINE
REALTIME
RQ CPU 0
STOP
CFS
IDLE
DEADLINE
REALTIME
RQ CPU 1
Thank You
#SFO17
BUD17 keynotes and videos on: connect.linaro.org
For further information: www.linaro.org

More Related Content

What's hot

Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
Ni Zo-Ma
 
Linux internal
Linux internalLinux internal
Linux internal
mcganesh
 

What's hot (20)

WALT vs PELT : Redux - SFO17-307
WALT vs PELT : Redux  - SFO17-307WALT vs PELT : Redux  - SFO17-307
WALT vs PELT : Redux - SFO17-307
 
Linux scheduling and input and output
Linux scheduling and input and outputLinux scheduling and input and output
Linux scheduling and input and output
 
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with schedulerLCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
LCA14: LCA14-306: CPUidle & CPUfreq integration with scheduler
 
Browsing Linux Kernel Source
Browsing Linux Kernel SourceBrowsing Linux Kernel Source
Browsing Linux Kernel Source
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
 
Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)Linux Memory Management with CMA (Contiguous Memory Allocator)
Linux Memory Management with CMA (Contiguous Memory Allocator)
 
Embedded Linux Kernel - Build your custom kernel
Embedded Linux Kernel - Build your custom kernelEmbedded Linux Kernel - Build your custom kernel
Embedded Linux Kernel - Build your custom kernel
 
LAS16-406: Android Widevine on OP-TEE
LAS16-406: Android Widevine on OP-TEELAS16-406: Android Widevine on OP-TEE
LAS16-406: Android Widevine on OP-TEE
 
Linux Internals - Part II
Linux Internals - Part IILinux Internals - Part II
Linux Internals - Part II
 
Network Drivers
Network DriversNetwork Drivers
Network Drivers
 
HKG15-107: ACPI Power Management on ARM64 Servers (v2)
HKG15-107: ACPI Power Management on ARM64 Servers (v2)HKG15-107: ACPI Power Management on ARM64 Servers (v2)
HKG15-107: ACPI Power Management on ARM64 Servers (v2)
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
Linux Preempt-RT Internals
Linux Preempt-RT InternalsLinux Preempt-RT Internals
Linux Preempt-RT Internals
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
Linux Internals - Part I
Linux Internals - Part ILinux Internals - Part I
Linux Internals - Part I
 
Basic Linux Internals
Basic Linux InternalsBasic Linux Internals
Basic Linux Internals
 
Prerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrencyPrerequisite knowledge for shared memory concurrency
Prerequisite knowledge for shared memory concurrency
 
Linux internal
Linux internalLinux internal
Linux internal
 
Linux Kernel Module - For NLKB
Linux Kernel Module - For NLKBLinux Kernel Module - For NLKB
Linux Kernel Module - For NLKB
 
Linux Crash Dump Capture and Analysis
Linux Crash Dump Capture and AnalysisLinux Crash Dump Capture and Analysis
Linux Crash Dump Capture and Analysis
 

Similar to The Linux Kernel Scheduler (For Beginners) - SFO17-421

Similar to The Linux Kernel Scheduler (For Beginners) - SFO17-421 (20)

Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptxEmbedded_ PPT_4-5 unit_Dr Monika-edited.pptx
Embedded_ PPT_4-5 unit_Dr Monika-edited.pptx
 
Scheduling in Android
Scheduling in AndroidScheduling in Android
Scheduling in Android
 
Scheduling in Android
Scheduling in AndroidScheduling in Android
Scheduling in Android
 
Understanding of linux kernel memory model
Understanding of linux kernel memory modelUnderstanding of linux kernel memory model
Understanding of linux kernel memory model
 
MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103MOVED: The challenge of SVE in QEMU - SFO17-103
MOVED: The challenge of SVE in QEMU - SFO17-103
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
multiprocessor real_ time scheduling.ppt
multiprocessor real_ time scheduling.pptmultiprocessor real_ time scheduling.ppt
multiprocessor real_ time scheduling.ppt
 
Os2
Os2Os2
Os2
 
Job Queues Overview
Job Queues OverviewJob Queues Overview
Job Queues Overview
 
When the OS gets in the way
When the OS gets in the wayWhen the OS gets in the way
When the OS gets in the way
 
RTAI - Earliest Deadline First
RTAI - Earliest Deadline FirstRTAI - Earliest Deadline First
RTAI - Earliest Deadline First
 
An End to Order
An End to OrderAn End to Order
An End to Order
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
 
Gluster dev session #6 understanding gluster's network communication layer
Gluster dev session #6  understanding gluster's network   communication layerGluster dev session #6  understanding gluster's network   communication layer
Gluster dev session #6 understanding gluster's network communication layer
 
RTX Kernal
RTX KernalRTX Kernal
RTX Kernal
 
05_Performance Optimization.pdf
05_Performance Optimization.pdf05_Performance Optimization.pdf
05_Performance Optimization.pdf
 
Real Time System
Real Time SystemReal Time System
Real Time System
 
An End to Order (many cores with java, session two)
An End to Order (many cores with java, session two)An End to Order (many cores with java, session two)
An End to Order (many cores with java, session two)
 
Rt linux-lab1
Rt linux-lab1Rt linux-lab1
Rt linux-lab1
 
Distributed Task Scheduling with Akka, Kafka and Cassandra
Distributed Task Scheduling with Akka, Kafka and CassandraDistributed Task Scheduling with Akka, Kafka and Cassandra
Distributed Task Scheduling with Akka, Kafka and Cassandra
 

More from Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Linaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
Linaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
Linaro
 

More from Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

The Linux Kernel Scheduler (For Beginners) - SFO17-421

  • 1. SFO17- 421 The Linux Kernel Scheduler Viresh Kumar (PMWG)
  • 2. ENGINEERS AND DEVICES WORKING TOGETHER Topics ● CPU Scheduler ● The O(1) scheduler ● Current scheduler design ● Scheduling classes ● schedule() ● Scheduling classes and policies ● Sched class: STOP ● Sched class: Deadline ● Sched class: Realtime ● Sched class: CFS ● Sched class: Idle ● Runqueue
  • 3. ENGINEERS AND DEVICES WORKING TOGETHER CPU Scheduler ● Shares CPU between tasks. ● Selects next task to run (on each CPU) when: ○ Running task terminates ○ Running task sleeps (waiting for an event) ○ A new task created or sleeping task wakes up ○ Running task’s timeslice is over ● Goals: ○ Be *fair* ○ Timeslice based on task priority ○ Low task response time ○ High (task completion) throughput ○ Balance load among CPUs ○ Power efficient ○ Low overhead of running scheduler’s code ● Work with mainframes, servers, desktops/laptops, embedded/phones.
  • 4. ENGINEERS AND DEVICES WORKING TOGETHER The O(1) scheduler ● 140 priorities. 0-99: RT tasks and 100-139: User tasks. ● Each CPU’s runqueue had 2 arrays: Active and Expired. ● Each arrays had 140 entries, one per priority level. ● Each entry was a linked list serviced in FIFO order. ● Bitmap (of 140 bits) used to find which priority lists aren’t empty. ● Timeslices were assigned to tasks based on these priorities. ● Expired tasks moved from Active to Passive array. ● Once Active array was empty, the arrays were swapped. ● Enqueue and dequeue of tasks and next task selection done in constant time. ● Replaced by CFS in Linux 2.6.23 (2007).
  • 5. ENGINEERS AND DEVICES WORKING TOGETHER The O(1) scheduler (Cont..) Priority 0 Priority 1 Priority 139 ... Priority 99 ... CPU Active array Priority 0 Priority 1 Priority 139 ... Priority 99 ... CPU Expired array Real Time task priorities User task priorities
  • 6. ENGINEERS AND DEVICES WORKING TOGETHER Current scheduler design ● Introduced by Ingo Molnar in 2.6.23 (2007). ● Scheduling policies within scheduling classes. ● Scheduling class with higher priority served first. ● Task can migrate between CPUs, scheduling policies and scheduling classes.
  • 7. ENGINEERS AND DEVICES WORKING TOGETHER Scheduling Classes ● Represented by following structure: struct sched_class { const struct sched_class *next; void (*enqueue_task) (struct rq *rq, struct task_struct *p, int flags); void (*dequeue_task) (struct rq *rq, struct task_struct *p, int flags); struct task_struct * (*pick_next_task) (struct rq *rq, struct task_struct *prev, struct rq_flags *rf); … }; STOP next DL RT CFS IDLE nullnext next next next
  • 8. ENGINEERS AND DEVICES WORKING TOGETHER Schedule() ● Picks the next runnable task to run on a CPU. ● Searches for a task starting from the highest priority class (stop). ● Helper: for_each_class(). ● pick_next_task(): again: for_each_class(class) { p = class->pick_next_task(rq, prev, rf); if (p) { if (unlikely(p == RETRY_TASK)) goto again; return p; } } /* The idle class should always have a runnable task: */ BUG();
  • 9. ENGINEERS AND DEVICES WORKING TOGETHER Scheduling classes and policies ● Stop ○ No policy ● Deadline ○ SCHED_DEADLINE ● Real Time ○ SCHED_FIFO ○ SCHED_RR ● Fair ○ SCHED_NORMAL ○ SCHED_BATCH ○ SCHED_IDLE ● Idle ○ No policy
  • 10. ENGINEERS AND DEVICES WORKING TOGETHER Sched class: STOP ● Highest priority class. ● Only available for SMP (stop_machine() isn’t used in UP). ● Can preempt everything and is preempted by nothing. ● Mechanism to stop running everything else and run a specific function on CPU(s). ● No scheduling policies. ● One kernel thread per CPU: “migration/N”. ● Used by task migration, CPU hotplug, RCU, ftrace, clockevents, etc.
  • 11. ENGINEERS AND DEVICES WORKING TOGETHER Sched class: Deadline (DL) ● Introduced by Dario Faggioli & Juri Lelli, v3.14 (2013). ● Highest priority tasks in the system. ● Scheduling policy: SCHED_DEADLINE. ● Implemented with red-black tree (self balancing). ● Used for periodic real time tasks, eg. Video encoding/decoding.
  • 12. ENGINEERS AND DEVICES WORKING TOGETHER Sched class: Realtime (RT) ● POSIX “real-time” tasks. ● Task priorities: 0 - 99. ● Inverted priorities: 0 highest in kernel, lowest in user space. ● Scheduling policies for tasks at same priority: ○ SCHED_FIFO ○ SCHED_RR, 100ms timeslice by default. ● Implemented with Linked lists. ● Used for short latency sensitive tasks, eg. IRQ threads.
  • 13. ENGINEERS AND DEVICES WORKING TOGETHER Sched class: CFS (Completely fair scheduler) ● Introduced by Ingo Molnar (motivated by Rotating Staircase Deadline Scheduler by Con Kolivas). ● Scheduling policies: ○ SCHED_NORMAL: Normal Unix tasks ○ SCHED_BATCH: Batch (non-interactive) tasks ○ SCHED_IDLE: Low priority tasks ● Implemented with red-black tree (self balancing). ● Tracks Virtual runtime of tasks (amount of time task has run). ● Tasks with shortest vruntime runs first. ● Priority is used to set task’s weight, that affects vruntime. ● Higher the weight, slower will vruntime increase. ● Task’s priority is calculated by 120 + nice (-20 to +19). ● Used for all other system tasks, eg: shell.
  • 14. ENGINEERS AND DEVICES WORKING TOGETHER Sched class: Idle ● Lowest priority scheduling class. ● No scheduling policies. ● One kernel thread (idle) per CPU: “swapper/N”. ● Idle thread runs only when nothing else is runnable on a CPU. ● Idle thread may take the CPU to low power state.
  • 15. ENGINEERS AND DEVICES WORKING TOGETHER Runqueue ● Each CPU has an instance of “struct rq”. ● Each “rq” has DL, RT and CFS runqueues within it. ● Runnable tasks are enqueued on those runqueues. ● Lots of other information, stats are available in struct rq. struct rq { … struct cfs_rq cfs; struct rt_rq rt; struct dl_rq dl; ... }
  • 16. ENGINEERS AND DEVICES WORKING TOGETHER Runqueue (Cont.) STOP CFS IDLE DEADLINE REALTIME RQ CPU 0 STOP CFS IDLE DEADLINE REALTIME RQ CPU 1
  • 17. Thank You #SFO17 BUD17 keynotes and videos on: connect.linaro.org For further information: www.linaro.org