解讀 Golang pprof 報告

【導讀】在讀 go 的 pprof 報告時候如何理解裏面的數字？本文在官方文檔基礎上進一步解釋了這些指標的含義。

Profiling Go Programs(https://blog.golang.org/profiling-go-programs) 裏詳細舉例說明了如何看 pprof 報告，但並沒有清晰簡明的告知讀者提供數字的是什麼意思，所以本文做一個歸納筆記。

解讀 CPU

以文中提供的 CPU Profile 來舉例說明，我們使用go tool pprof -http=0.0.0.0:4231 havlak1 havalk1.prof來觀察

解讀 Top

(pprof) top10
Total: 2525 samples
    Flat  Flat%   Sum%      Cum   Cum% Name
     298  11.8%  11.8%      345  13.7% runtime.mapaccess1_fast64
     268  10.6%  22.4%     2124  84.1% main.FindLoops
     251   9.9%  32.4%      451  17.9% scanblock
     178   7.0%  39.4%      351  13.9% hash_insert
     131   5.2%  44.6%      158   6.3% sweepspan
     119   4.7%  49.3%      350  13.9% main.DFS
      96   3.8%  53.1%       98   3.9% flushptrbuf
      95   3.8%  56.9%       95   3.8% runtime.aeshash64
      95   3.8%  60.6%      101   4.0% runtime.settype_flush
      88   3.5%  64.1%      988  39.1% runtime.mallocgc

先了解是如何採樣的：

採樣頻率是每秒 100 次
一個樣本包含 goroutine 棧的程序計數器（program counters）
每次只會採樣調用棧的前 100 行

原文中沒有給出列名，這裏給了出來，下面是解釋：

Total：總共採樣次數，這裏是 2525 次。
Flat：函數在樣本中處於運行狀態的次數。簡單來說就是函數出現在棧頂的次數，而函數在棧頂則意味着它在使用 CPU。
Flat%：Flat / Total。
Sum%：自己以及所有前面的 Flat% 的累積值。解讀方式：表中第 3 行 Sum% 32.4%，意思是前 3 個函數（運行狀態）的計數佔了總樣本數的 32.4%
Cum：函數在樣本中出現的次數。只要這個函數出現在棧中那麼就算進去，這個和 Flat 不同（必須是棧頂才能算進去）。也可以解讀爲這個函數的調用次數。
Cum%：Cum / Total

解讀圖

方框：函數
方框尺寸：代表了 Flat 的次數
箭頭：X 調用 Y
線條：記錄了 X 調用 Y 的次數。數字越大，線條越粗。圖中 main.DFS 有一個指向自己的箭頭，說明存在遞歸調用，而且調用了 21342 次。
方框第一行數字：Flat (Flat%)，棧頂次數
方框第二行數字：Cum (Cum%)，調用次數

解讀源碼

下面是在 pprof 交互 cli 界面看到的報告：

(pprof) list DFS
Total: 2525 samples
ROUTINE ====================== main.DFS in /home/rsc/g/benchgraffiti/havlak/havlak1.go
   119    697 Total samples (flat / cumulative)
     3      3  240: func DFS(currentNode *BasicBlock, nodes []*UnionFindNode, number map[*BasicBlock]int, last []int, current int) int {
     1      1  241:     nodes[current].Init(currentNode, current)
     1     37  242:     number[currentNode] = current
     .      .  243:
     1      1  244:     lastid := current
    89     89  245:     for _, target := range currentNode.OutEdges {
     9    152  246:             if number[target] == unvisited {
     7    354  247:                     lastid = DFS(target, nodes, number, last, lastid+1)
     .      .  248:             }
     .      .  249:     }
     7     59  250:     last[number[currentNode]] = lastid
     1      1  251:     return lastid
(pprof)

第一列：Flat
第二列：Cum
第三列：行號

下面是在 Web 界面看到的報告（基本差不多，見這裏）：

havlak1
Total: 5758 samples
main.DFS
/home/rsc/g/benchgraffiti/havlak/havlak1.go

Total:   225   2296 (flat / cumulative samples)
  235                return false
  236               }
  237               
  238               // DFS - Depth-First-Search and node numbering.
  239               //
  240      3      3 func DFS(currentNode *BasicBlock, nodes []*UnionFindNode, number map[*BasicBlock]int, last []int, current int) int {
  241     18     19  nodes[current].Init(currentNode, current)
  242           166  number[currentNode] = current
  243               
  244      2      2  lastid := current
  245    167    167  for _, target := range currentNode.OutEdges {
  246     17    508   if number[target] == unvisited {
  247     10   1157    lastid = DFS(target, nodes, number, last, lastid+1)
  248                 }
  249                }
  250      7    273  last[number[currentNode]] = lastid
  251      1      1  return lastid
  252               }
  253               
  254               // FindLoops
  255               //
  256               // Find loops and build loop forest using Havlak's algorithm, which

第一列：行號
第二列：Flat
第三列：Cum

解讀內存

以文中提供的內存 Profile 來舉例說明，我們使用go tool pprof -http=0.0.0.0:4231 havlak3 havalk3.mprof來觀察。

pprof 提供了 4 種視角，默認是-inuse_space：

-inuse_space ：live object 佔用內存
-inuse_objects ：live object 的數量
-alloc_space ：程序啓動到現在，總共分配的內存
-alloc_objects ：程序啓動到現在總共 object 的數量

解讀 Top

(pprof) top5
Total: 82.4 MB
    Flat  Flat%   Sum%      Cum   Cum% Name
    56.3  68.4%  68.4%     56.3  68.4% main.FindLoops
    17.6  21.3%  89.7%     17.6  21.3% main.(*CFG).CreateNode
     8.0   9.7%  99.4%     25.6  31.0% main.NewBasicBlockEdge
     0.5   0.6% 100.0%      0.5   0.6% itab
     0.0   0.0% 100.0%      0.5   0.6% fmt.init
(pprof)

採樣頻率：

每分配 512K，採樣一個 block（具體啥意思不知道）

照例我們加上列：

Total：總共佔用內存
Flat：函數分配的內存，不包含它調用其他函數造成的內存分配。
Flat%：Flat / Total
Sum%：自己和前面所有的 Flat% 累積值
Cum：這個函數分配的內存，以及它調用其他函數分配的內存之和。可以解讀爲因爲這個函數所造成的所有內存分配。
Cum%：Cum / Total

解讀源碼

和 CPU 源碼解讀差不多：

(pprof) list FindLoops
Total: 82.4 MB
ROUTINE ====================== main.FindLoops in /home/rsc/g/benchgraffiti/havlak/havlak3.go
  56.3   56.3 Total MB (flat / cumulative)
...
   1.9    1.9  268:     nonBackPreds := make([]map[int]bool, size)
   5.8    5.8  269:     backPreds := make([][]int, size)
     .      .  270:
   1.9    1.9  271:     number := make([]int, size)
   1.9    1.9  272:     header := make([]int, size, size)
   1.9    1.9  273:     types := make([]int, size, size)
   1.9    1.9  274:     last := make([]int, size, size)
   1.9    1.9  275:     nodes := make([]*UnionFindNode, size, size)
     .      .  276:
     .      .  277:     for i := 0; i < size; i++ {
   9.5    9.5  278:             nodes[i] = new(UnionFindNode)
     .      .  279:     }
...
     .      .  286:     for i, bb := range cfgraph.Blocks {
     .      .  287:             number[bb.Name] = unvisited
  29.5   29.5  288:             nonBackPreds[i] = make(map[int]bool)
     .      .  289:     }
...

可以發現 L288 佔用了 29.5M 內存。用-inuse_objects來觀察，可以看到分配次數：

$ go tool pprof --inuse_objects havlak3 havlak3.mprof
Adjusting heap profiles for 1-in-524288 sampling rate
Welcome to pprof!  For help, type 'help'.
(pprof) list FindLoops
Total: 1763108 objects
ROUTINE ====================== main.FindLoops in /home/rsc/g/benchgraffiti/havlak/havlak3.go
720903 720903 Total objects (flat / cumulative)
...
     .      .  277:     for i := 0; i < size; i++ {
311296 311296  278:             nodes[i] = new(UnionFindNode)
     .      .  279:     }
     .      .  280:
     .      .  281:     // Step a:
     .      .  282:     //   - initialize all nodes as unvisited.
     .      .  283:     //   - depth-first traversal and numbering.
     .      .  284:     //   - unreached BB's are marked as dead.
     .      .  285:     //
     .      .  286:     for i, bb := range cfgraph.Blocks {
     .      .  287:             number[bb.Name] = unvisited
409600 409600  288:             nonBackPreds[i] = make(map[int]bool)
     .      .  289:     }
...
(pprof)

分析 GC

你可以通過 CPU Profile 來分析 GC：

(pprof) top10
Total: 1173 samples
     205  17.5%  17.5%     1083  92.3% main.FindLoops
     138  11.8%  29.2%      215  18.3% scanblock
      88   7.5%  36.7%       96   8.2% sweepspan
      76   6.5%  43.2%      597  50.9% runtime.mallocgc
      75   6.4%  49.6%       78   6.6% runtime.settype_flush
      74   6.3%  55.9%       75   6.4% flushptrbuf
      64   5.5%  61.4%       64   5.5% runtime.memmove
      63   5.4%  66.8%      524  44.7% runtime.growslice
      51   4.3%  71.1%       51   4.3% main.DFS
      50   4.3%  75.4%      146  12.4% runtime.MCache_Alloc
(pprof)

可以看到runtime.mallocgc的調用次數佔了 50.9%。

查看系統爲何進行垃圾收集的另一種方法是查看導致收集的分配，這些分配在 mallocgc 中花費了大部分時間。使用--nodefraction=0.1去掉佔比小於 10% 的結果：

可以看到main.FindLoops導致了大多數 GC。

在線 Profile

如果你是一個 Web 應用，你可以使用 net/http/pprof 來添加一個 Handler，訪問http://<host>:<port>/debug/pprof/可以得到功能列表：

/debug/pprof/

Types of profiles available:
Count Profile
8   allocs
0   block
0   cmdline
10 goroutine
8   heap
0   mutex
0   profile
19 threadcreate
0   trace
...

然後你可以通過這樣來用go tool pprof <url>來分析，比如：

# 分析 CPU
go tool pprof http://localhost:9090/debug/pprof/profile
# 打開網頁分析 heap
go tool pprof -http=0.0.0.0:4231 http://localhost:9090/debug/pprof/heap

在生產中，你需要對/debug/pprof/*做 HTTP BasicAuth 保護（很簡單，一個響應頭和請求頭罷了），那麼你去抓取數據分析的時候得這樣：

go tool pprof http://<user>:<password>@localhost:9090/debug/pprof/profile

轉自： chanjarster.github.io/post/go/pprof-explained/

本文由 Readfog 進行 AMP 轉碼，版權歸原作者所有。
來源：https://mp.weixin.qq.com/s/aAtbKJGnyMG477dwsA4nGg

解讀 CPU

解讀 Top

解讀圖

解讀源碼

解讀內存

解讀 Top

解讀源碼

分析 GC

在線 Profile

猜你喜歡