Type: Systems Research CS Education Operating Systems
Author: Zirov
Date: 2025-03-30
When multiple processes compete for a single CPU, the scheduler decides who runs, when, and for how long — a deceptively simple problem with profound consequences for performance, fairness, and responsiveness.
Abstract
Modern operating systems routinely manage dozens to thousands of concurrent processes on hardware that may expose only a handful of CPU cores. CPU scheduling is the policy layer that resolves this contention — it decides which ready process gets the processor at any given moment, for how long, and under what conditions it can be preempted. This article covers the foundational scheduling algorithms (FCFS, SJF, SRTF, and Round Robin), derives their performance characteristics analytically using a shared example dataset, and traces how their theoretical models evolved into the schedulers shipping inside Linux and Windows today. It also surfaces the real-world pathologies — starvation, convoy effect, excessive context-switch overhead — that make production scheduling considerably harder than textbook examples suggest. By the end, the reader should be able to reason critically about scheduler trade-offs, implement a simulator in C, and understand why no single algorithm is universally optimal.
Introduction
The Real-World Problem
A web server can receive many requests at the same time. Behind the scenes, several processes wait in the ready queue for CPU time to run application logic and generate a response. But since the hardware has a limited number of CPU cores, something has to decide which process runs first.
This is the baseline state of any modern multitasking operating system. Even on a personal laptop, background services, the user interface, browser tabs, antivirus scans, and user applications are all competing for CPU time. The component responsible for managing this competition is the CPU scheduler.
Why It Matters for Performance and Responsiveness
The scheduling policy chosen by an OS has a direct and measurable impact on the user experience. A poorly chosen scheduler can cause an interactive application to freeze for several seconds while a background job monopolizes the CPU. A well-designed one ensures that every process receives timely, fair access to the processor — keeping the system responsive under load. Understanding scheduling is therefore essential not just for OS designers, but for any systems programmer who needs to reason about latency, throughput, and resource contention.
CPU Scheduling
CPU scheduling is the mechanism an operating system uses to determine which process or task gets access to the CPU at a given moment. This is necessary because the CPU can execute only one task at a time, while multiple tasks are usually waiting to run.
Why It Is Necessary
Scheduling is necessary because multiple processes may be ready to run simultaneously while the CPU can execute only one at a time. The operating system must decide which process gets the CPU next, ensuring that the processor does not remain idle and that system resources are used efficiently. Without a scheduling policy, the CPU would either sit idle or execute processes in an arbitrary order, leading to unpredictable and often poor performance.
Goals
As a general rule, the objective of scheduling is to:
- Maximize CPU utilization and processing throughput
- Minimize execution time, waiting time, and response time
- Ensure fairness — no process should be indefinitely denied access to the CPU
- Meet deadlines in real-time systems where timing guarantees are required
Types of Scheduling
Scheduling can be classified into two main types.
Preemptive
In a preemptive scheduler, the OS can forcibly reclaim the CPU from a running process — typically on a timer interrupt, on the arrival of a higher-priority process, or when a better candidate appears in the ready queue.
If a process with high priority arrives in the ready queue, it does not have to wait for the current process to complete its burst time. Instead, the current process is interrupted in the middle of execution and placed back in the ready queue until the higher-priority process finishes using the CPU. This makes preemptive scheduling flexible, but it increases the overhead associated with switching a process from the running state to the ready state and vice versa.
Algorithms that operate under preemptive scheduling include Round Robin and Shortest Remaining Time First (SRTF). Shortest Job First (SJF) and Priority Scheduling may or may not be preemptive depending on the implementation.
To better understand how preemptive scheduling works, consider the following example with four processes:
| Process | Arrival Time | Burst Time |
|---|---|---|
| P1 | 0 | 7 ms |
| P2 | 2 | 4 ms |
| P3 | 3 | 2 ms |
| P4 | 5 | 3 ms |
At time 0, only P1 has arrived in the ready queue, so the CPU begins executing P1.
At time 2, process P2 arrives. Because the system uses a preemptive scheduling algorithm, the scheduler compares the remaining execution time of P1 with the burst time of P2. Since P2 requires less time, the CPU preempts P1 and starts executing P2.
| Process | Remaining Time |
|---|---|
| P1 | 5 ms |
| P2 | 4 ms |
At time 3, process P3 arrives with an even shorter burst time. The scheduler compares the remaining execution times and selects the shortest one. As a result, P2 is interrupted, and the CPU is assigned to P3.
| Process | Remaining Time |
|---|---|
| P1 | 5 ms |
| P2 | 3 ms |
| P3 | 2 ms |
Process P3 continues executing until it finishes at time 5.
At time 5, P4 arrives. The scheduler now compares the remaining processes:
| Process | Remaining Time |
|---|---|
| P1 | 5 ms |
| P2 | 3 ms |
| P4 | 3 ms |
Since P2 and P4 have the same remaining time, the scheduler selects P2 because it arrived earlier. Therefore, P2 resumes execution and continues until it finishes at time 8.
After P2 completes, the scheduler compares the remaining processes:
| Process | Remaining Time |
|---|---|
| P1 | 5 ms |
| P4 | 3 ms |
The CPU is then allocated to P4, which executes from time 8 to time 11.
Finally, P1 is the only process left in the ready queue, so it resumes execution and finishes at time 16.
The resulting Gantt Chart is:
Non-Preemptive
In a non-preemptive scheduler, a process runs until it finishes or moves to a waiting state. The OS cannot forcibly take the CPU away from a running process.
Unlike preemptive scheduling, non-preemptive scheduling does not interrupt a process in the middle of execution. Instead, it waits for the process to complete its CPU burst time before allocating the CPU to another process. This eliminates the overhead of context switching mid-burst, but it also means that a high-priority process arriving while a long process is running must wait — making the system less responsive.
To understand how non-preemptive scheduling works, consider the same set of processes used in the previous example:
| Process | Arrival Time | Burst Time |
|---|---|---|
| P1 | 0 | 7 ms |
| P2 | 2 | 4 ms |
| P3 | 3 | 2 ms |
| P4 | 5 | 3 ms |
At time 0, only P1 has arrived in the ready queue, so the CPU starts executing P1.
In non-preemptive scheduling, once a process starts executing, it cannot be interrupted until it finishes its CPU burst. Therefore, even if other processes arrive with shorter burst times, they must wait until the current process completes.
Between time 2 and time 5, processes P2, P3, and P4 arrive in the ready queue. However, P1 continues executing because the scheduler cannot preempt it.
| Process | Arrival Time | Burst Time |
|---|---|---|
| P2 | 2 | 4 ms |
| P3 | 3 | 2 ms |
| P4 | 5 | 3 ms |
At time 7, P1 finishes execution. The scheduler now selects the next process from the ready queue. Assuming a First Come First Serve (FCFS) order among the waiting processes, P2 is selected because it arrived first.
P2 executes from time 7 to time 11.
After P2 completes, the scheduler selects the next process in the ready queue, which is P3.
P3 executes from time 11 to time 13.
Finally, P4 is the last remaining process in the queue and executes from time 13 to time 16.
The resulting Gantt Chart is:
Scheduling Algorithms
First Come First Served (FCFS)
FCFS is the simplest and most straightforward scheduling algorithm. The principle is exactly as it sounds: processes are executed in the order they arrive, without any prioritization. This method is analogous to standing in a queue at a supermarket checkout — the first person in line is the first to be served.
Advantages
- Trivial to implement (a single FIFO queue)
- No starvation — every process eventually reaches the front
- No parameters to tune
Disadvantages
- Convoy effect: One long CPU-bound process forces all shorter processes behind it to wait, dramatically inflating average waiting time
- Poor average waiting time for mixed workloads
- Zero responsiveness guarantee for interactive processes
Typical Use Cases
- Simple batch processing queues
- Print spoolers
- Foundational building block for more complex schedulers
Execution Example
Consider the following set of processes:
| Process | Arrival Time | Burst Time |
|---|---|---|
| P1 | 0 | 6 ms |
| P2 | 1 | 2 ms |
| P3 | 2 | 4 ms |
| P4 | 3 | 3 ms |
| P5 | 4 | 1 ms |
Since FCFS executes processes strictly in the order they arrive, once a process starts executing it runs until completion before the next process begins.
At time 0, only P1 has arrived, so the CPU starts executing P1. Even though P2, P3, P4, and P5 arrive while P1 is running, they must wait in the ready queue.
Execution proceeds as follows:
- P1 executes from 0 → 6
- P2 executes from 6 → 8
- P3 executes from 8 → 12
- P4 executes from 12 → 15
- P5 executes from 15 → 16
The resulting Gantt Chart is:
Implementation in C
#include <stdio.h>
int main() {
int n, i;
int at[20], bt[20], wt[20], ct[20], tat[20];
float wtavg = 0, tatavg = 0;
printf("Enter the number of processes: ");
scanf("%d", &n);
for (i = 0; i < n; i++) {
printf("Enter arrival time and burst time for process P%d: ", i + 1);
scanf("%d %d", &at[i], &bt[i]);
}
// First process starts immediately at its arrival time
ct[0] = at[0] + bt[0];
tat[0] = ct[0] - at[0];
wt[0] = 0;
tatavg = tat[0];
for (i = 1; i < n; i++) {
// If CPU is idle between processes, account for the gap
ct[i] = (ct[i - 1] > at[i] ? ct[i - 1] : at[i]) + bt[i];
wt[i] = ct[i] - at[i] - bt[i];
tat[i] = ct[i] - at[i];
wtavg += wt[i];
tatavg += tat[i];
}
wtavg /= n;
tatavg /= n;
printf("\n%-10s %-12s %-10s %-14s %-12s %-16s\n",
"Process", "Arrival_T", "Burst_T", "Complete_T", "Waiting_T", "Turnaround_T");
for (i = 0; i < n; i++) {
printf("%-10s %-12d %-10d %-14d %-12d %-16d\n",
(char[]){'P', '0' + i + 1, '\0'}, at[i], bt[i], ct[i], wt[i], tat[i]);
}
printf("\nGantt Chart:\n|");
for (i = 0; i < n; i++) printf(" P%d |", i + 1);
printf("\n0");
for (i = 0; i < n; i++) printf("\t%d", ct[i]);
printf("\n");
printf("\nAverage Waiting Time : %.2f ms\n", wtavg);
printf("Average Turnaround Time: %.2f ms\n", tatavg);
return 0;
}
Shortest Job First (SJF)
SJF selects the process in the ready queue with the smallest burst time. It is non-preemptive: once a process starts, it runs to completion. Among processes with equal burst times, FCFS tie-breaking applies.
SJF is provably optimal for minimizing average waiting time among non-preemptive algorithms — but it requires knowing burst times in advance, which is generally impossible in practice. Real systems estimate them from historical CPU usage using exponential averaging.
Advantages
- Minimizes average waiting time among all non-preemptive algorithms
- High throughput for mixed-length workloads
Disadvantages
- Starvation: Long processes may never execute if shorter ones keep arriving
- Requires accurate burst time prediction — estimation error propagates into suboptimal decisions
- Not suitable for interactive workloads in its non-preemptive form
Typical Use Cases
- Batch schedulers where job run-time estimates are available (e.g., HPC clusters with declared job times)
- Database query planners (analogous "shortest query first" heuristics)
Execution Example
Consider the following set of processes:
| Process | Arrival Time | Burst Time |
|---|---|---|
| P1 | 0 | 7 ms |
| P2 | 1 | 3 ms |
| P3 | 2 | 4 ms |
| P4 | 3 | 2 ms |
| P5 | 4 | 6 ms |
Since SJF always selects the process with the smallest burst time among those currently in the ready queue, scheduling decisions are based on job length rather than arrival order.
At time 0, only P1 has arrived, so the CPU begins executing P1. Because this version of SJF is non-preemptive, P1 continues running until it finishes, even though shorter jobs arrive later.
During the execution of P1, the remaining processes arrive and wait in the ready queue.
At time 7, P1 finishes. The scheduler selects the process with the shortest burst time from the queue:
| Process | Burst Time |
|---|---|
| P2 | 3 ms |
| P3 | 4 ms |
| P4 | 2 ms |
| P5 | 6 ms |
The shortest job is P4, so it executes next. Execution proceeds as follows:
- P1 executes from 0 → 7
- P4 executes from 7 → 9
- P2 executes from 9 → 12
- P3 executes from 12 → 16
- P5 executes from 16 → 22
The resulting Gantt Chart is:
Implementation in C
#include <stdio.h>
int main() {
int n, i, j, pos;
int at[20], bt[20], wt[20], ct[20], tat[20], temp_bt[20], temp_at[20], order[20];
float wtavg = 0, tatavg = 0;
int current_time = 0, completed = 0;
int visited[20] = {0};
printf("Enter the number of processes: ");
scanf("%d", &n);
for (i = 0; i < n; i++) {
printf("Enter arrival time and burst time for process P%d: ", i + 1);
scanf("%d %d", &at[i], &bt[i]);
temp_bt[i] = bt[i];
temp_at[i] = at[i];
}
// Non-preemptive SJF: simulate step by step
int exec_order[20];
int exec_count = 0;
while (completed < n) {
// Find shortest available job at current_time
int min_bt = 99999, sel = -1;
for (i = 0; i < n; i++) {
if (!visited[i] && at[i] <= current_time) {
if (bt[i] < min_bt) {
min_bt = bt[i];
sel = i;
} else if (bt[i] == min_bt && at[i] < at[sel]) {
sel = i;
}
}
}
if (sel == -1) {
// CPU idle — advance to next arrival
int next = 99999;
for (i = 0; i < n; i++)
if (!visited[i] && at[i] < next) next = at[i];
current_time = next;
continue;
}
visited[sel] = 1;
current_time += bt[sel];
ct[sel] = current_time;
tat[sel] = ct[sel] - at[sel];
wt[sel] = tat[sel] - bt[sel];
wtavg += wt[sel];
tatavg += tat[sel];
exec_order[exec_count++] = sel;
completed++;
}
wtavg /= n;
tatavg /= n;
printf("\n%-10s %-12s %-10s %-14s %-12s %-16s\n",
"Process", "Arrival_T", "Burst_T", "Complete_T", "Waiting_T", "Turnaround_T");
for (i = 0; i < n; i++) {
printf("%-10s %-12d %-10d %-14d %-12d %-16d\n",
(char[]){'P', '0' + i + 1, '\0'}, at[i], bt[i], ct[i], wt[i], tat[i]);
}
printf("\nGantt Chart:\n|");
for (i = 0; i < exec_count; i++) printf(" P%d |", exec_order[i] + 1);
printf("\n");
printf("\nAverage Waiting Time : %.2f ms\n", wtavg);
printf("Average Turnaround Time: %.2f ms\n", tatavg);
return 0;
}
Shortest Remaining Time First (SRTF)
SRTF is the preemptive version of SJF. Whenever a new process arrives, the scheduler compares its burst time against the remaining burst time of the currently running process. If the newcomer is shorter, it immediately preempts the current process.
SRTF is theoretically optimal for minimizing average waiting time among all scheduling algorithms — preemptive or otherwise — for the same reasons SJF is optimal in the non-preemptive case.
Advantages
- Theoretically optimal average waiting time
- Short jobs get immediate service regardless of when they arrive
Disadvantages
- Starvation of long processes can be severe
- Requires continuous tracking of remaining burst time
- High context-switch frequency increases overhead
- Burst time prediction error is more damaging than in SJF
Typical Use Cases
- Theoretical baseline for evaluating other algorithms
- Approximated in systems with accurate CPU profiling (some batch schedulers)
Execution Example
Consider the following set of processes:
| Process | Arrival Time | Burst Time |
|---|---|---|
| P1 | 0 | 7 ms |
| P2 | 2 | 4 ms |
| P3 | 3 | 2 ms |
| P4 | 5 | 3 ms |
| P5 | 6 | 1 ms |
At time 0, only P1 has arrived in the ready queue, so the CPU starts executing P1.
At time 2, P2 arrives. The scheduler compares the remaining time of P1 (5 ms) with the burst time of P2 (4 ms). Since P2 is shorter, it preempts P1 and begins execution.
At time 3, P3 arrives with a burst time of 2 ms. The scheduler compares all remaining processes:
| Process | Remaining Time |
|---|---|
| P1 | 5 ms |
| P2 | 3 ms |
| P3 | 2 ms |
Since P3 has the shortest remaining time, P2 is preempted and the CPU is assigned to P3.
P3 executes until time 5, when it finishes. At this moment P4 arrives.
Remaining processes:
| Process | Remaining Time |
|---|---|
| P1 | 5 ms |
| P2 | 3 ms |
| P4 | 3 ms |
The scheduler selects P2 (tie broken by earlier arrival), which executes from time 5 to time 6.
At time 6, P5 arrives with a burst time of 1 ms, which is shorter than the remaining time of P2 (2 ms). Therefore, P2 is preempted again, and the CPU is allocated to P5.
Execution continues as follows:
- P1 executes from 0 → 2
- P2 executes from 2 → 3
- P3 executes from 3 → 5
- P2 executes from 5 → 6
- P5 executes from 6 → 7
- P2 executes from 7 → 9
- P4 executes from 9 → 12
- P1 executes from 12 → 17
The resulting Gantt Chart is:
Implementation in C
#include <stdio.h>
int main() {
int n, i;
int at[20], bt[20], remaining[20], wt[20], ct[20], tat[20];
float wtavg = 0, tatavg = 0;
int completed = 0, current_time = 0, min_rem, sel;
int done[20] = {0};
printf("Enter the number of processes: ");
scanf("%d", &n);
for (i = 0; i < n; i++) {
printf("Enter arrival time and burst time for process P%d: ", i + 1);
scanf("%d %d", &at[i], &bt[i]);
remaining[i] = bt[i];
}
while (completed < n) {
min_rem = 99999;
sel = -1;
for (i = 0; i < n; i++) {
if (!done[i] && at[i] <= current_time && remaining[i] < min_rem) {
min_rem = remaining[i];
sel = i;
}
}
if (sel == -1) {
current_time++;
continue;
}
remaining[sel]--;
current_time++;
if (remaining[sel] == 0) {
done[sel] = 1;
ct[sel] = current_time;
tat[sel] = ct[sel] - at[sel];
wt[sel] = tat[sel] - bt[sel];
wtavg += wt[sel];
tatavg += tat[sel];
completed++;
}
}
wtavg /= n;
tatavg /= n;
printf("\n%-10s %-12s %-10s %-14s %-12s %-16s\n",
"Process", "Arrival_T", "Burst_T", "Complete_T", "Waiting_T", "Turnaround_T");
for (i = 0; i < n; i++) {
printf("%-10s %-12d %-10d %-14d %-12d %-16d\n",
(char[]){'P', '0' + i + 1, '\0'}, at[i], bt[i], ct[i], wt[i], tat[i]);
}
printf("\nAverage Waiting Time : %.2f ms\n", wtavg);
printf("Average Turnaround Time: %.2f ms\n", tatavg);
return 0;
}
Round Robin (RR)
Round Robin assigns each process a fixed time quantum (also called a time slice). Processes execute in FIFO order, but no process runs for more than one quantum before being preempted and moved to the back of the ready queue. If a process completes before its quantum expires, the CPU is immediately given to the next process.
Round Robin is the backbone of modern interactive schedulers. Its fairness is structural — every process gets CPU time within at most (n − 1) × quantum time units.
Advantages
- Strong fairness guarantee — no process waits more than
(n-1) × Qtime units - Good response time for interactive workloads
- No starvation possible
Disadvantages
- Average waiting time is often higher than SJF for identical workloads
- Performance is sensitive to quantum size:
- Too small: Context-switch overhead dominates useful work
- Too large: Degenerates toward FCFS behavior
- Higher context-switch cost than non-preemptive algorithms
Typical Use Cases
- General-purpose desktop and server operating systems (as a component of more complex schedulers)
- Time-sharing systems where equal treatment of interactive users is the primary goal
- Virtual machine CPU time-sharing between guests
Execution Example
Consider the following processes and a time quantum (Q) = 3 ms:
| Process | Arrival Time | Burst Time |
|---|---|---|
| P1 | 0 | 7 ms |
| P2 | 1 | 5 ms |
| P3 | 2 | 3 ms |
| P4 | 3 | 4 ms |
At time 0, only P1 is in the ready queue, so it begins execution. Since the quantum is 3 ms, P1 executes for 3 ms and is then preempted because it still has remaining burst time.
Remaining processes after the first quantum:
| Process | Remaining Time |
|---|---|
| P1 | 4 ms |
| P2 | 5 ms |
| P3 | 3 ms |
| P4 | 4 ms |
Execution proceeds in FIFO order, with each process receiving at most one quantum per turn:
- P1 executes from 0 → 3 (remaining: 4)
- P2 executes from 3 → 6 (remaining: 2)
- P3 executes from 6 → 9 (finishes)
- P4 executes from 9 → 12 (remaining: 1)
- P1 executes from 12 → 15 (remaining: 1)
- P2 executes from 15 → 17 (finishes)
- P4 executes from 17 → 18 (finishes)
- P1 executes from 18 → 19 (finishes)
The resulting Gantt Chart is:
Implementation in C
#include <stdio.h>
int main() {
int n, i, quantum;
int at[20], bt[20], remaining[20], wt[20], ct[20], tat[20];
float wtavg = 0, tatavg = 0;
int current_time = 0, completed = 0;
int queue[200], front = 0, rear = 0;
int in_queue[20] = {0};
printf("Enter the number of processes: ");
scanf("%d", &n);
for (i = 0; i < n; i++) {
printf("Enter arrival time and burst time for process P%d: ", i + 1);
scanf("%d %d", &at[i], &bt[i]);
remaining[i] = bt[i];
}
printf("Enter the time quantum: ");
scanf("%d", &quantum);
// Enqueue process 0 if it arrives at time 0
for (i = 0; i < n; i++) {
if (at[i] == 0) {
queue[rear++] = i;
in_queue[i] = 1;
}
}
while (completed < n) {
if (front == rear) {
// Queue empty — advance time to next arrival
current_time++;
for (i = 0; i < n; i++) {
if (!in_queue[i] && remaining[i] > 0 && at[i] <= current_time) {
queue[rear++] = i;
in_queue[i] = 1;
}
}
continue;
}
int p = queue[front++];
int run_time = (remaining[p] < quantum) ? remaining[p] : quantum;
current_time += run_time;
remaining[p] -= run_time;
// Enqueue newly arrived processes during this slice
for (i = 0; i < n; i++) {
if (!in_queue[i] && remaining[i] > 0 && at[i] <= current_time) {
queue[rear++] = i;
in_queue[i] = 1;
}
}
if (remaining[p] == 0) {
ct[p] = current_time;
tat[p] = ct[p] - at[p];
wt[p] = tat[p] - bt[p];
wtavg += wt[p];
tatavg += tat[p];
completed++;
} else {
// Re-enqueue for the next round
queue[rear++] = p;
}
}
wtavg /= n;
tatavg /= n;
printf("\n%-10s %-12s %-10s %-14s %-12s %-16s\n",
"Process", "Arrival_T", "Burst_T", "Complete_T", "Waiting_T", "Turnaround_T");
for (i = 0; i < n; i++) {
printf("%-10s %-12d %-10d %-14d %-12d %-16d\n",
(char[]){'P', '0' + i + 1, '\0'}, at[i], bt[i], ct[i], wt[i], tat[i]);
}
printf("\nAverage Waiting Time : %.2f ms\n", wtavg);
printf("Average Turnaround Time: %.2f ms\n", tatavg);
return 0;
}
Algorithm Comparison
| Algorithm | Type | Fairness | Complexity | Starvation Risk | Optimal For |
|---|---|---|---|---|---|
| FCFS | Non-preemptive | Arrival-order fair | O(1) enqueue | None | Simple batch workloads |
| SJF | Non-preemptive | Biased toward short jobs | O(n) selection | Long processes | Batch with known burst times |
| SRTF | Preemptive | Biased toward short jobs | O(n) on arrival | Long processes | Minimizing avg WT (theoretical) |
| Round Robin | Preemptive | Time-equal | O(1) | None | Interactive, time-sharing |
Comparative Example
To better understand the behavior of each scheduling algorithm, consider the same set of processes executed under different schedulers.
| Process | Arrival Time | Burst Time |
|---|---|---|
| P1 | 0 | 7 ms |
| P2 | 1 | 4 ms |
| P3 | 2 | 2 ms |
| P4 | 3 | 3 ms |
| P5 | 4 | 1 ms |
FCFS
Processes execute strictly in arrival order.
Execution order:
P1 → P2 → P3 → P4 → P5
- Avg Waiting Time: 9.2 ms
- Avg Turnaround Time: 12.6 ms
SJF (Non-Preemptive)
After the first process finishes, the scheduler selects the shortest job in the ready queue.
Execution order:
P1 → P5 → P3 → P4 → P2
- Avg Waiting Time: 7.0 ms
- Avg Turnaround Time: 10.4 ms
SRTF (Preemptive SJF)
Processes can interrupt the currently running job if they have a shorter remaining time.
Execution order (timeline):
P1 → P3 → P5 → P3 → P4 → P2 → P1
- Avg Waiting Time: 3.6 ms
- Avg Turnaround Time: 7.0 ms
Round Robin (Q = 3)
Each process runs for at most 3 ms before being preempted and moved to the back of the queue.
Execution order:
P1 → P2 → P3 → P4 → P5 → P1 → P2 → P4 → P1
- Avg Waiting Time: 9.0 ms
- Avg Turnaround Time: 12.4 ms
Summary
| Algorithm | Avg WT | Avg TAT | Characteristic |
|---|---|---|---|
| FCFS | 9.2 | 12.6 | Simple but prone to convoy effect |
| SJF | 7.0 | 10.4 | Better for short jobs |
| SRTF | 3.6 | 7.0 | Best average waiting time |
| RR (Q=3) | 9.0 | 12.4 | Fair and responsive |
The results confirm the theoretical expectations: SRTF delivers the lowest average waiting time, but at the cost of frequent preemptions and potential starvation of long jobs. Round Robin provides strong fairness with no starvation risk, though its average waiting time is comparable to FCFS for this workload. SJF sits in the middle — better than FCFS without the overhead of continuous preemption.
Real OS Scheduling
The algorithms covered so far are clean abstractions. Real OS schedulers are more complex — they must handle hundreds of concurrent processes, NUMA topologies, heterogeneous core types, energy management, and security isolation. Here is how the two dominant platforms approach the problem.
Linux: Completely Fair Scheduler (CFS)
Introduced: Linux 2.6.23 (2007), replacing the O(1) scheduler.
Fairness Model
CFS does not use time slices in the traditional sense. Instead, it tracks each process's virtual runtime (vruntime) — the amount of CPU time it has consumed, normalized by its priority weight. The scheduler always picks the process with the lowest vruntime — the one that has received the least CPU time relative to its entitlement.
This creates a fair allocation without requiring explicit time quantum bookkeeping: a process that has been waiting automatically accumulates a lower vruntime, making it the preferred candidate.
Red-Black Tree Structure
Ready processes are stored in a self-balancing red-black tree, indexed by vruntime. The leftmost node (minimum vruntime) is cached for O(1) access. When a process is selected to run, it is removed from the tree; when it is preempted, its updated vruntime is inserted back in the correct position in O(log n).
This gives CFS O(log n) scheduling decisions and O(1) best-candidate lookup — efficient even with thousands of runnable tasks.
Priority Integration
Linux nice values (−20 to +19) are translated into load weight multipliers. A process with nice=-20 has approximately 10× the weight of a nice=0 process. CFS advances vruntime more slowly for high-weight processes, giving them more real CPU time per unit of virtual time.
Key Characteristics
| Property | CFS Value |
|---|---|
| Algorithm base | Weighted fair queuing |
| Data structure | Red-black tree (per run queue, per NUMA node) |
| Preemption | Yes, on tick and on wakeup |
| Default time slice | Dynamic (scales with number of runnable tasks) |
| Starvation | Prevented by design (vruntime normalization) |
Windows: Priority-Based Scheduler with Dynamic Feedback
Priority Model
Windows uses a 32-level priority scheme (0–31). Real-time processes occupy levels 16–31; normal user processes occupy levels 1–15; the idle thread sits at level 0. The kernel always runs the highest-priority ready thread. Among equal priorities, Round Robin with a time quantum applies.
Multi-Level Feedback Queues
Windows maintains a separate ready queue for each of the 32 priority levels. The scheduler scans from level 31 downward to find the highest non-empty queue. This gives O(1) scheduling decisions for the common case.
Dynamic Priorities
To prevent starvation of lower-priority threads and to reward I/O-bound behavior, Windows applies temporary priority boosts:
- A thread woken from I/O wait receives a boost proportional to the I/O type (disk: +1, keyboard: +6, etc.)
- A thread that has not run for a long time (starvation detection) receives a temporary boost to level 15 for two quanta
- Foreground application threads receive a larger quantum multiplier (configurable, typically 3×)
Key Characteristics
| Property | Windows Value |
|---|---|
| Algorithm base | Preemptive multi-level priority + RR within level |
| Priority levels | 32 (0=idle, 1–15=normal, 16–31=real-time) |
| Quantum | Variable (foreground: ~60–120ms, background: ~20–40ms) |
| Starvation mitigation | Aging-based priority boost |
| API | SetThreadPriority(), SetPriorityClass() |
Conclusion
CPU scheduling sits at the intersection of theory and engineering pragmatism. The classical algorithms — FCFS, SJF, SRTF, and Round Robin — define a clean design space with well-understood trade-offs: you can optimize for minimum average waiting time (SRTF), for fairness (RR), or for simplicity (FCFS), but not all three simultaneously.
Real schedulers are composites. Linux CFS applies weighted fair queuing with an elegant vruntime abstraction that eliminates starvation while accommodating priorities. Windows combines strict priority preemption with dynamic feedback to balance responsiveness for interactive threads against throughput for background work. Both systems have evolved significantly over decades precisely because the problem is genuinely hard — workloads are diverse, hardware is heterogeneous, and the cost of a wrong decision scales with the number of users sharing the machine.
For the systems programmer, the practical takeaways are:
- Profile before assuming. The scheduler that looks optimal on paper may behave differently under your specific workload's arrival and burst distributions.
- Match the scheduler to the workload class. Batch jobs, interactive processes, and real-time tasks have fundamentally different requirements and belong in different scheduling classes.
- Respect the overhead model. Context switches are not free. Spinning up thousands of threads to "maximize parallelism" on a 4-core machine will often be slower than a smaller, carefully scheduled pool.
- Understand the tools your OS gives you.
nice,taskset,cgroups,SCHED_FIFO,SCHED_RR,SCHED_DEADLINE— these are production handles on the scheduler that every senior engineer should be comfortable reaching for.
Scheduling is not a solved problem. Multi-core NUMA machines, heterogeneous CPU architectures, and the pressure to reduce energy consumption continue to push the state of the art. The foundational algorithms in this article are the vocabulary — the real sentences are still being written.
References
- Silberschatz, A., Galvin, P. B., & Gagne, G. — Operating System Concepts, 10th ed. (Wiley, 2018)
- Tanenbaum, A. S. — Modern Operating Systems, 4th ed. (Pearson, 2014)
- Linux CFS Scheduler Documentation
- Linux
sched(7)man page - Jones, M. T. — Inside the Linux Scheduler (IBM Developer, 2009)
- Windows Scheduling — Microsoft Docs
- Molnar, I. — CFS Scheduler Design (kernel.org, 2007)
