Systems Performance 2nd Edition By Brendan Gregg

Chapter 1. Introduction

1.5.1 Subjectivity

Performance, on the other hand, is often subjective. With performance issues, it can be unclear whether there is an issue to begin with, and if so, when it has been fixed. What may be considered “bad” performance for one user, and therefore an issue, may be considered “good” performance for another.

答えがSubjectiveなものは確かにgood/badの判断が難しい。だからPerformance Tuningするときは定量的なゴールが必要。

1.7.2 Profiling

An effective visualization of CPU profiles is flame graphs. CPU flame graphs can help you find more performance wins than any other tool, after metrics.

flame graphsなんてものがあるのか、初めて聞いた。

perfコマンドで作れる。

CPU Flame Graphs

f:id:JunNguyen:20211206204312p:plain

2.1 Terminology

Bottleneck: In systems performance, a bottleneck is a resource that limits the performance of the system. Identifying and removing systemic bottlenecks is a key activity of systems performance.

パフォーマンス改善というと、手の付けやすそうな所から改善していくけどそこが本当にBottleneckなのかはきちんと調べないといけないなー。

2.2.1 System Under Test

The performance of a system under test (SUT) is shown in Figure 2.1.

Figure 2.1 Block diagram of system under test

It is important to be aware that perturbations (interference) can affect results, including those caused by scheduled system activity, other users of the system, and other workloads.

Pertubations: 辞書には動揺とか混乱とかいてあるけど、この本ではパフォーマンステストする際のシステムにかかるWorkload以外の負荷の事らしい。

本番環境でパフォーマンステストする際はデータソースはどうしてもPertubatonsを受けてしまうから、ちゃんと考慮しないといけないな。

2.3.6 When to Stop Analysis

When the potential ROI is less than the cost of analysis. Some performance issues I work on can deliver wins measured in tens of millions of dollars per year. For these I can justify spending months of my own time (engineering cost) on analysis.

自分の時間（eninering cost)と成果物が釣り合ってるのかもっと考えないといけない。

2.3.9 Scalability

Figure 2.7 Performance degradation

Higher response time is, of course, bad. The “fast” degradation profile may occur for memory load, when the system begins moving memory pages to disk to free main memory. The “slow” degradation profile may occur for CPU load.

なるほどー。覚えておこう。

Overhead

Performance metrics are not free; at some point, CPU cycles must be spent to gather and store them. This causes overhead, which can negatively affect the performance of the target of measurement. This is called the observer effect.

二重スリット実験を思い出す。見るとはpassiveではなくactive.

2.3.15 Known-Unknowns

Known-knowns: These are things you know. You know you should be checking a performance metric, and you know its current value. For example, you know you should be checking CPU utilization, and you also know that the value is 10% on average.

Known-unknowns: These are things you know that you do not know. You know you can check a metric or the existence of a subsystem, but you haven’t yet observed it. For example, you know you could use profiling to check what is making the CPUs busy, but have yet to do so.

Unknown-unknowns: These are things you do not know that you do not know. For example, you may not know that device interrupts can become heavy CPU consumers, so you are not checking them

この本を読むだけじゃなくて、ちゃんと会得してUnknow-unknownsをもっと減らしたい。