Golang Heap 運行時性能分析

通常，我們只會在兩種情況下，會去分析一個程序的表現：

你遇到了問題；
你閒的沒事幹；

好了，開個玩笑，其實研究程序的性能對於每一個工程師來說，都很重要，我甚至可以這麼說：這是一個工程師的必備技能。

下面來說說，我們如何去研究 Golang 程序的性能問題。

介紹

Golang 的性能分析工具 pprof 可謂是豪華了，它內建支持以下幾種分析：

heap：當前內存中所有存活對象的採樣（幾乎是帶 GC 語言的必備了），可以用來分析內存問題；
profile：與 heap 相對的，是 CPU 的採樣，可以用來分析程序的耗時瓶頸；
goroutine：所有當前 goroutine 的棧追蹤；
allocs：所有過去的內存申請採樣；
threadcreate：系統層面的線程棧追蹤；
block：同步原語上的堵塞的棧追蹤；
mutex：所有競爭關係的 mutex 棧追蹤；
trace：當前程序執行情況的追蹤；

憑藉良好的工具帶來的調試體驗也是非常棒的，整個過程只需幾個簡單的命令，你就能進行分析個大概了。不過受限於篇幅，以及之前也多次提到過 CPU 的分析，因此今天只說說如何分析內存，也就是 Heap。

Heap 的使用一般是內存泄露，或者是你想優化內存的使用。

內存泄露與內存優化

對於內存泄露，這類問題往往難以發現與分析，因爲需要監控 Go 程序本身，或者看 Linux 的 dmesg 裏面的 OOM 記錄才能發現。

dmesg | grep oom-killer

當你發現一次 OOM 記錄時，你就要考慮給本身忽略的監控加上了，因爲這種問題會復現的（但是往往難以在自己的機器以及預發佈環境中復現）。如果不知道是是什麼監控參數，你可以看監控數據，簡單定一個比例，比如當你的程序初始化的時候佔用 10% 的內存，那麼一旦 Go 程序的內存使用達到一定比例比如機器內存 50% 時，就要馬上進行告警了，你也可以進場分析了。

不過，也不用大費周章，因爲你只需用幾行簡單的代碼，就能給你的 Go 程序增加 pprof 支持，不會影響程序的運行，並且是支持 Web 訪問的：

import (
  "net/http"
  _ "net/http/pprof"
)

func main() {
  go func() {
    http.ListenAndServe("localhost:8080", nil)
  }()
}

然後，使用 go 提供的 pprof 工具就能進行分析了，比如對於內存泄露問題：

go tool pprof http://localhost:8080/debug/pprof/heap

就會進入 pprof 的 REPL，在這裏用一些簡單的命令你就能定位問題所在。不過爲了更好的分析體驗，有兩個地方需要注意：

如果你的編譯參數重加了 -trimpath 以及 -ldflag "-s -w"，最好去掉，不然會影響到你定位問題；
在編譯機器上執行這條命令，這樣可以直接分析到每一行代碼的級別；

接下來的我用的實際例子是屬於內存使用分析優化，由於還沒遇到 OOM，先用我遇到的一個小例子來代替，因爲兩個問題的分析方法是一致的。

如何使用 pprof

第一步，先看 top10：

(pprof) top10
Showing nodes accounting for 3759.91kB, 100% of 3759.91kB total
Showing top 5 nodes out of 24
      flat  flat%   sum%        cum   cum%
 2345.25kB 62.38% 62.38%  2345.25kB 62.38%  io.ReadAll
  902.59kB 24.01% 86.38%   902.59kB 24.01%  compress/flate.NewWriter
         0     0%   100%   902.59kB 24.01%  bufio.(*Writer).Flush
         0     0%   100%   902.59kB 24.01%  compress/gzip.(*Writer).Write
(以下省略)...

這裏需要提示下，flat 表示目前最右邊的調用仍舊沒有被釋放的空間，而 cum 表示累計 (cumulative) 申請的空間。top 的默認排序是按照 flat 排序，你可以通過參數來切換排序方式：top10 -cum 。

如果在這裏看不到什麼異常的地方，那麼還有別的地方可以看，因爲 Golang heap 的採樣統計會區分成四個部分：

alloc_objects：申請過的對象
alloc_space ：申請過的空間
inuse_objects：正在使用的對象
inuse_space：正在使用的空間（默認）

你可以通過類似於 sample_index=inuse_objects 的命令來切換。

在我的這個例子中，由於我這裏確定第一項 io.ReadAll 爲什麼會在我的程序中，但是第二項的 compress/flate.NewWriter 讓我覺得有異常，但是不知到是哪裏調用的。因此，在確定異常項後，第二步可以通過 tree 來進一步確認調用鏈條：

(pprof) tree 10 compress
Active filters:
   focus=compress
Showing nodes accounting for 2354.01kB, 29.36% of 8018.09kB total
Showing top 10 nodes out of 11
----------------------------------------------------------+-------------
      flat  flat%   sum%        cum   cum%   calls calls% + context              
----------------------------------------------------------+-------------
                                         2354.01kB   100% |   compress/gzip.(*Writer).Write
 1805.17kB 22.51% 22.51%  2354.01kB 29.36%                | compress/flate.NewWriter
                                          548.84kB 23.32% |   compress/flate.(*compressor).init
----------------------------------------------------------+-------------
                                          548.84kB   100% |   compress/flate.(*compressor).init (inline)
  548.84kB  6.85% 29.36%   548.84kB  6.85%                | compress/flate.(*compressor).initDeflate
----------------------------------------------------------+-------------
                                         2354.01kB   100% |   github.com/prometheus/common/expfmt.MetricFamilyToText.func1
         0     0% 29.36%  2354.01kB 29.36%                | bufio.(*Writer).Flush
                                         2354.01kB   100% |   compress/gzip.(*Writer).Write
----------------------------------------------------------+-------------
                                          548.84kB   100% |   compress/flate.NewWriter
         0     0% 29.36%   548.84kB  6.85%                | compress/flate.(*compressor).init
                                          548.84kB   100% |   compress/flate.(*compressor).initDeflate (inline)
----------------------------------------------------------+-------------
                                         2354.01kB   100% |   bufio.(*Writer).Flush
         0     0% 29.36%  2354.01kB 29.36%                | compress/gzip.(*Writer).Write
                                         2354.01kB   100% |   compress/flate.NewWriter
----------------------------------------------------------+-------------
                                         2354.01kB   100% |   github.com/prometheus/common/expfmt.NewEncoder.func7
         0     0% 29.36%  2354.01kB 29.36%                | github.com/prometheus/common/expfmt.MetricFamilyToText
                                         2354.01kB   100% |   github.com/prometheus/common/expfmt.MetricFamilyToText.func1
----------------------------------------------------------+-------------
                                         2354.01kB   100% |   github.com/prometheus/common/expfmt.MetricFamilyToText
         0     0% 29.36%  2354.01kB 29.36%                | github.com/prometheus/common/expfmt.MetricFamilyToText.func1
                                         2354.01kB   100% |   bufio.(*Writer).Flush
----------------------------------------------------------+-------------
                                         2354.01kB   100% |   github.com/prometheus/common/expfmt.encoderCloser.Encode
         0     0% 29.36%  2354.01kB 29.36%                | github.com/prometheus/common/expfmt.NewEncoder.func7
                                         2354.01kB   100% |   github.com/prometheus/common/expfmt.MetricFamilyToText
----------------------------------------------------------+-------------
                                         2354.01kB   100% |   xizhibei-app/controllers/internal_rpc.(*SystemCtrl).GetMetrics
         0     0% 29.36%  2354.01kB 29.36%                | github.com/prometheus/common/expfmt.encoderCloser.Encode
                                         2354.01kB   100% |   github.com/prometheus/common/expfmt.NewEncoder.func7
----------------------------------------------------------+-------------
         0     0% 29.36%  2354.01kB 29.36%                | xizhibei-app/controllers/internal_rpc.(*SystemCtrl).GetMetrics
                                         2354.01kB   100% |   github.com/prometheus/common/expfmt.encoderCloser.Encode
----------------------------------------------------------+-------------

現在，我們基本可以確認是在我實現的 GetMetrics 中，處理 prometheus 客戶端的序列化壓縮時候出了點小問題（但是還沒有到內存泄露的地步）。另外，這裏你也可以加個第三步：用 list 加上關鍵詞的命令來查看精確到每一行代碼級別的分析。

定位到問題後，就是最後一步解決，我的解決方案是用 sync.Pool。在之前，我是直接使用 gzip.NewWriter 來壓縮每次從 prometheus 中取出的指標文本，但是這樣會造成 gzip 多次重複的內存申請以及初始化，所以當改用 sync.Pool 後，我的代碼從：

buf := new(bytes.Buffer)
gzipWritter := gzip.NewWriter(buf)

變爲：

var (
 gzipPool = sync.Pool{
  New: func() interface{} {
   return gzip.NewWriter(nil)
  },
 }
 bufferPool = sync.Pool{
  New: func() interface{} {
   return new(bytes.Buffer)
  },
 }
)

...

gzipWritter := gzipPool.Get().(*gzip.Writer)
defer gzipPool.Put(gzipWritter)

buf := bufferPool.Get().(*bytes.Buffer)
defer bufferPool.Put(buf)

buf.Reset()
gzipWritter.Reset(buf)

我們可以寫個 benchmark 來測試下：

goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i9-9820X CPU @ 3.30GHz
BenchmarkEncode-20                          2422            504022 ns/op          851822 B/op        129 allocs/op
BenchmarkEncodeWithSyncPool-20              7654            150188 ns/op           48799 B/op        108 allocs/op

可以看到，內存的 allocs 從 129 降到了 108。

好了，分析就暫時到這。

P.S.

對於大多數人來說，在網頁上用鼠標點擊分析問題更簡單，因爲目前 Go pprof 這個工具做到了一條龍服務，你可以直接在網頁上看到調用圖表以及火焰圖（這裏需要着重艾特下 C/C++，咱還能不能把調試體驗做好點了）。

go tool pprof -http=:6000 http://localhost:8080/debug/pprof/heap

Go 會打開一個本地 6000 端口的網頁，但如果你在雲服務器上，你有兩種選擇：

用 wget 下載 heap 文件 wget http://localhost:8080/debug/pprof/heap，然後拷貝到本地進行分析；
用 ssh 代理 ssh -L 8080:127.0.0.1:8080 user@server；

轉自：

blog.xizhibei.me/2021/06/27/golang-heap-profiling/

本文由 Readfog 進行 AMP 轉碼，版權歸原作者所有。
來源：https://mp.weixin.qq.com/s/lFBZdpZVlyZNTe6R-kLCKw

介紹

內存泄露與內存優化

如何使用 pprof

P.S.

猜你喜歡