Systems Performance 2nd Edition By Brendan Gregg Chapter 3 and 5

chapter 3 Operating System

3.2.1 Kernel

Unix-like operating systems including Linux and BSD have a monolithic kernel that manages CPU scheduling, memory, file systems, network protocols, and system devices

KernelはCPU, memory, file systemのドライバー的存在。したの図で表してるとおりApplicationsはSystem Librariesを介さなくもSystem Callsをよべる。JavaでいうとJNA(Java Native Access）機能など。

f:id:JunNguyen:20220130233823p:plain

3.2.2 Kernel and User Modes

In a traditional kernel, a system call is performed by switching to kernel mode and then executing the system call code.

Switching between user and kernel modes is a mode switch.

mode switch はcontext switche 同様、CPU overheadは発生する。

Since mode and context switches cost a small amount of overhead (CPU cycles),³ there are various optimizations to avoid them, including:

User-mode syscalls: It is possible to implement some syscalls in a user-mode library alone. The Linux kernel does this by exporting a virtual dynamic shared object (vDSO) that is mapped into the process address space, which contains syscalls such as gettimeofday(2) and getcpu(2) [Drysdale 14].

Kernel bypass: This allows user-mode programs to access devices directly, bypassing syscalls and the typical kernel code path. For example, DPDK for networking: the Data Plane Development Kit.

system call は基本的にkernel mode で呼ばれるが、karnel mode とuser mode 間のswitching (mode switch) を最小にするための機能がkernel には備わってる。

3.2.9 Schedulers

A commonly used scheduling policy dating back to UNIX identifies CPU-bound workloads and decreases their priority, allowing I/O-bound workloads—where low-latency responses are more desirable—to run sooner.

時間がかかるCPU-bound workloadsを後回しにするのか、意外。

3.4 Linux

Linux kernel developments, especially those related to performance, include the following

CPU scheduling classes: Various advanced CPU scheduling algorithms have been developed, including scheduling domains (2.6.7) to make better decisions regarding non-uniform memory access (NUMA)

TCP congestion algorithms: Linux allows different TCP congestion control algorithms to be configured, and supports Reno, Cubic, and more in later kernels mentioned in this list.

splice (2.6.17): A system call to move data quickly between file descriptors and pipes, without a trip through user-space.

and many others...

Linuxは初代からnetwork, CPU, file managerなど様々角度からのperformance 改善が施されている。

Chapter 5. Applications

For application performance, you can start with what operations the application performs (as described earlier) and what the goal for performance is. The goal may be:

Latency: A low or consistent application response time

Throughput: A high application operation rate or data transfer rate

Resource utilization: Efficiency for a given application workload

Price: Improving the performance/price ratio, lowering computing costs

まずはゴールを設定するのが大事。response timeを上げるだけがperformance tuning ではない。

5.1.2 Optimize the Common Case

One way to efficiently improve application performance is to find the most common code path for the production workload and begin by improving that.

あたりまえだけど、大事なところ。

5.2.2 Caching

Instead of always performing an expensive operation, the results of commonly performed operations may be stored in a local cache for future use. An example is the database buffer cache

Tuning the Database Buffer Cache

Oracle の Buffer Cacheについてもっと理解を深められたら、実業務で役立つ機会がありそう。

5.2.5 Concurrency and Parallelism

Fibers: Also called lightweight threads, these are a user-mode version of threads where each fiber represents a schedulable program. The application can use its own scheduling logic to choose which fiber to run.

Java ではThreadクラスをつかって非同期処理などを行えたけど、現在OpenJDK communityによってProject Loomという名前でSchedulerを開発者の任意でいじれるThreadクラスを現在開発されている。