Golang Profiling: 關於 pprof

golang 自身提供的工具中包括性能分析工具 - pprof。這個工具被實現在兩個位置：

runtime/pprof：採集器，負責採集應用程序的運行數據供給 pprof 可視化工具
net/http/pprof：通過一個 HTTP Server 將 prof 數據進行可視化分析。

golang 內建提供了多種性能收集器，它們負責收集這些性能數據：

Goroutine: stack traces of all current Goroutines。Go Routine 分析。
CPU: stack traces of CPU returned by runtime。CPU 分析。
Heap: a sampling of memory allocations of live objects。內存分析，堆分配情況以及內存泄漏等。
Allocation: a sampling of all past memory allocations。內存分析，對象分配統計以及內存泄漏等。
Thread: stack traces that led to the creation of new OS threads。OS 線程創建情況。
Block: stack traces that led to blocking on synchronization primitives。阻塞分析。記錄 go routine 阻塞情況，等待以及同步情況，timer/channel 通訊等各項細節。
Mutex: stack traces of holders of contended mutexes。互斥鎖分析。包括各種競爭情況。

在 app 中收集性能數據

當我們需要針對一個 app 進行性能分析時，首先的問題是如何收集性能數據。在 golang 中，上面提到的性能數據可以通過幾種方式來收集：

go test 方式

go test 方式可以無侵入地從你的 app 中收集性能數據，你可以使用這樣的命令行：

go test -cpuprofile cpu.prof -memprofile mem.prof -bench .

由於我們被約定通過 benchmark 測試方式來測試 app 的運行性能，所以上面的命令行是發起 benchmark 測試的。但是你也可以採用 coverage 測試方式來做：

go test . -v -race -coverprofile=coverage.txt -covermode=atomic -timeout=20m -test.short -cpuprofile cpu.prof -memprofile mem.prof | tee coverage.log
go tool cover -html=coverage.txt -o cover.html

無論採用哪一種 go testing 手段，你都可以加入命令行參數來收集性能數據。

可用的命令行選項有：

-memprofile mem.prof
-memprofilerate 4096
-cpuprofile cpu.prof
-blockprofile block.prof
-blockprofilerate 4096
-mutexprofile mutex.prof
-mutexprofilefraction 1
-trace trace.out

也可以查閱源碼：https://golang.org/src/testing/testing.go#L289

web 應用類

對於 Webapp 應用類，或者系統服務類等持續性工作的 app，我們可以做實時的性能數據分析，也即 live profile download 或者 live profiling。

最簡單的方法是嵌入 "net/http/pprof" 包，當你使用標準的 DefaultServeMux 方式來啓動 http 服務時，什麼都不必做：

package main

import (
    "net/http"
    _ "net/http/pprof"
)

func main() {
    http.ListenAndServe(":8080", nil)
}

"net/http/pprof" 包將會自動地添加一組 api 端點，該包的 init 函數會做這些事：

func init() {
    http.HandleFunc("/debug/pprof/", Index)
    http.HandleFunc("/debug/pprof/cmdline", Cmdline)
    http.HandleFunc("/debug/pprof/profile", Profile)
    http.HandleFunc("/debug/pprof/symbol", Symbol)
    http.HandleFunc("/debug/pprof/trace", Trace)
}

所以你的 HTTP 服務器自動擁有上面這些端點。那麼接下來你可以通過瀏覽器直接訪問上述端點，也可以讓 pprof 使用這些端點來進行交互式分析：

go tool pprof http://localhost:8080/debug/pprof/heap
> top
...
> exit

關於進一步的說明要在後續章節中詳細剖析，詳見 []。

如果你沒有采用默認的 DefaultServeMux 方式，那就需要手動地鏈接端點到你的 mux。例如對於 gin 來說，可能會是這樣：

r := gin.Default()
r.GET("/debug/pprof/allocs", WrapH(pprof.Handler("allocs")))
r.GET("/debug/pprof/block", WrapH(pprof.Handler("block")))
r.GET("/debug/pprof/goroutine", WrapH(pprof.Handler("goroutine")))
r.GET("/debug/pprof/heap", WrapH(pprof.Handler("heap")))
r.GET("/debug/pprof/mutex", WrapH(pprof.Handler("mutex")))
r.GET("/debug/pprof/threadcreate", WrapH(pprof.Handler("threadcreate")))
r.Run(":8080")

func WrapH(h http.Handler) gin.HandlerFunc {
    return func(c *gin.Context) {
        h.ServeHTTP(c.Writer, c.Request)
    }
}

一般應用程序

如果你的 app 並非持續性 web 服務，那麼也可以通過 runtime/pprof 包來手工插入 prof 專用代碼到應用程序中，然後在運行結束後拿着產生的 cpu.prof 或 mem.prof 等數據收集文件到 pprof 中進行分析。

藉助於 `pkg/profile`

這裏有一個簡單的應用程序：

package main

import (
    "fmt"
    "github.com/pkg/profile"
)

func main(){
    defer profile.Start(profile.ProfilePath(".")).Stop()
    a()
}

func a(){
    for i:=0;i<10000; i++{
        fmt.Print(".")
    }
    fmt.Println()
}

它簡單地收集 cpu 數據產生 cpu.prof 文件，你可以直接分析該文件：

go tool pprof -http=:6060 cpu.prof

你還可以收集其他數據，這需要在 main 函數的開頭加入這些調用：

// CPUProfile enables cpu profiling. Note: Default is CPU
defer profile.Start(profile.CPUProfile).Stop()

// GoroutineProfile enables goroutine profiling.
// It returns all Goroutines alive when defer occurs.
defer profile.Start(profile.GoroutineProfile).Stop()

// BlockProfile enables block (contention) profiling.
defer profile.Start(profile.BlockProfile).Stop()

// ThreadcreationProfile enables thread creation profiling.
defer profile.Start(profile.ThreadcreationProfile).Stop()

// MemProfile changes which type of memory profiling to 
// profile the heap.
defer profile.Start(profile.MemProfile).Stop()

// MutexProfile enables mutex profiling.
defer profile.Start(profile.MutexProfile).Stop()

但一次只能加入一種性能數據的調用。如果你想要同時添加多種，需要這樣做調用：

defer profile.Start(profile.MutexProfile, profile.MemProfile).Stop()

此方法可以繼續延展，因爲 Start(...) 採用了 Options Pattern。

不過，如果你想要更多定製特性時，或許不得不通過 runtime/pprof 自行編碼以便支持個性化選項。

自行編碼

這裏有一個簡單的 demo 程序，向你展示了怎麼進行定製性編碼來收集性能數據，你可以在此基礎上進一步地改善它：

package main

import (
    "fmt"
    "github.com/hedzr/log"
    stdLog "log"
    "os"
    "runtime"
    "runtime/pprof"
    "sync"
)

func main() {

    if cpuProfile != "" {
        defer enableCpuProfile(cpuProfile)()
    }
    if memProfile != "" {
        defer enableMemProfile(memProfile)()
    }

    var wg sync.WaitGroup
    wg.Add(1)
    go a(&wg)
    wg.Add(1)
    go b(&wg)
    wg.Wait()
}

func a(wg *sync.WaitGroup) {
    for i := 0; i < 10000; i++ {
        fmt.Print(".")
    }
    fmt.Println()
    wg.Done()
}

func b(wg *sync.WaitGroup) {
    for i := 0; i < 10000; i++ {
        fmt.Print("_")
    }
    fmt.Println()
    wg.Done()
}

var cpuProfile, memProfile string

func init() {
    stdLog.SetFlags(stdLog.LstdFlags | stdLog.Llongfile)
    cpuProfile, memProfile = "cpu.prof", "mem.prof"
}

//
// And review the pprof result in a web ui:
//
//    go tool pprof -http=:8555 ./cpu.pprof
//
// Now you can open 'http://localhost:8555/ui' in a browser
//
func enableCpuProfile(cpuProfilePath string) (closer func()) {
    closer = func() {}
    if cpuProfilePath != "" {
        f, err := os.Create(cpuProfilePath)
        if err != nil {
            log.Fatal("could not create cpu profile: %v", err)
        }
        err = pprof.StartCPUProfile(f)
        if err != nil {
            log.Fatal("error: %v", err)
        }
        closer = pprof.StopCPUProfile
    }
    runtime.SetBlockProfileRate(20)
    return
}

func enableMemProfile(memProfilePath string) (closer func()) {
    closer = func() {}
    if memProfilePath != "" {
        closer = func() {
            f, err := os.Create(memProfilePath)
            if err != nil {
                log.Fatal("could not create memory profile: ", err)
            }
            defer f.Close()
            runtime.GC() // get up-to-date statistics
            if err := pprof.WriteHeapProfile(f); err != nil {
                log.Fatal("could not write memory profile: ", err)
            }
        }
    }
    return
}

以上

`cmdr` 集成

在 hedzr/cmdr (v1.7.46+) 中，我們提供了完整的附加包 pprof 來簡化一般應用程序集成 go tool pprof 的簡便方案。如果你正在利用 cmdr 進行命令行應用程序的開發，那麼簡單地：

package main

import (
    "github.com/hedzr/cmdr"
    "github.com/hedzr/cmdr/plugin/pprof"
    "github.com/hedzr/log"
    "github.com/hedzr/logex/build"
    "gopkg.in/hedzr/errors.v2"
)

func main() { Entry() }

func Entry() {
    if err := cmdr.Exec(buildRootCmd(),
        cmdr.WithLogx(build.New(build.NewLoggerConfigWith(true, "logrus", "debug"))),
        pprof.GetCmdrProfilingOptions(),
    ); err != nil {
        log.Fatalf("error occurs in app running: %+v\n", err)
    }
}

func buildRootCmd() (rootCmd *cmdr.RootCommand) {
    root := cmdr.Root(appName, cmdr.Version).
        Copyright(copyright, "hedzr").
        Description(desc, longDesc).
        Examples(examples)
    rootCmd = root.RootCommand()

    cmdr.NewBool(false).
        Titles("enable-ueh", "ueh").
        Description("Enables the unhandled exception handler?").
        AttachTo(root)

    //pprof.AttachToCmdr(root.RootCmdOpt())
    return
}

你可以如 Line 17 那樣簡單地集成 pprof 附加包，也可以像 Line 35 那樣將其明確地附着在 root 上。它會爲你提供一組命令行參數，如 -ep 等等：

對於這樣的應用程序，可以簡單地啓用 profiling 和 pprof：

app -ep
app --pprof
app --enable-profile

默認情況下，它會產生 cpu.prof, mem.prof 等幾個性能數據文件。如果你認爲有必要改變要收集的性能數據類別，可以這樣調用：

pprof.GetCmdrProfilingOptions("cpu", "mem", "mutex", "block", "thread-create", "trace", "go-routine"),
// Or
pprof.AttachToCmdr(root.RootCmdOpt(), "cpu", "mem", "mutex", "block", "thread-create", "trace", "go-routine")

我們當然會強烈推薦你採用 cmdr 來簡化 profiling 集成工作，而且不僅僅是對此的簡化，也包含更多的 CLI 輔助特性。

可視化工具 pprof

一般來說我們有兩種方式來啓動 pprof 可視化工具：直接運行，或者編寫一小段代碼來啓動。

無論你通過哪種方式獲得了 prof 數據，你都可以簡單地以其爲待分析數據直接啓動可視化工具：

go tool pprof -http=:6060 cpu.prof

使用 pprof 命令行工具

pprof 有獨立的命令行工具，你可以這樣安裝它：

go get -u github.com/google/pprof

這條命令會拉取 pprof 的源代碼編譯爲命令行執行文件 pprof 並放在 $GOPATH/bin 中，它需要你有完整的 golang 編譯環境（這當然不必贅述了）。

儘管如此，這個獨立的命令行工具其實和 go tool pprof 是沒有任何區別的。所以假設我們有獲得 prof 數據，那麼以下方式都可以啓動可視化工具：

pprof -http=:6060 cpu.prof
go tool pprof -http=:6060 cpu.prof

生成報告

在 pprof 交互模式中，可以使用 pdf 命令生成報告。

通過 TTY 交互終端

如果沒有指定 -web 或者 -http 參數，則 pprof 會進入交互模式。

查看實時數據

如果你的 app 是 http 服務類型的，並且嵌入了 "net/http/pprof" 包，那麼一系列端點（/debug/pprof/*）就是可用的，此時可以直接從對應端點下載性能數據並在交互模式中供你分析：

go tool pprof 'http://localhost:6060/debug/pprof/profile?seconds=30'
(pprof) top

但實際上你往往可以直接在瀏覽器中打開這些端點以便直接觀察性能數據的快照。

在交互模式中，這些命令很常用：

q/exit/quit

退出交互模式，返回 Shell 命令行界面。

tree

顯示調用結構的文字版本。

top

顯示耗時前幾名。可以加上數字，例如 top5, top10，還可以附加參數 -cum 按照 cumulative 累積時間排序，這個參數常常很有用地幫助你展示出調用序列。

list

List 命令顯示指定的包中指定的函數的性能數據。

peek

peek 命令和 list 相似，但你可以在這裏指定一個模糊的關鍵字，例如：

通過瀏覽器查看

當使用瀏覽器方式查閱 cpu 性能數據時，我們可以通過 pprof 的 web 界面查看多種視圖。

top view

此視圖較爲簡單，但由於 go 核心調用以及系統調用的比重較大，所以你需要耐心排除它們，然後才能找到業務邏輯的 topN。

如果你想要更好的檢索 topN，可以通過交互方式直接執行 top10 -cum，top5 -cum 命令，或者在 web 視圖中點擊表頭 cum 重新排序，此時業務代碼通常會被優先排列。

Grpah 視圖

Grpah 視圖是按照 app 執行順序排列的，幾乎等價於流程圖。此視圖中字體越大代表着耗時越多。

Flame Graph 視圖

火焰圖也是按照執行順序從上向下排列的，其中一個函數的寬度佔比代表了其耗時比例，越寬則耗時越多，所以簡單看的話能看清函數簽名的就是耗時多的。但是由於程序結構（例如 goroutine，timerproc 等）的多樣性，這樣的簡單判斷並不確切。在大型程序中你需要仔細地排除雜音，正確地 REFINE 之後才能得到有價值的性能判定。

Peek 視圖

Peek 視圖以文字表格的形式列出耗時統計表，你可以對其做篩選。

Source View

Source view 可以列出函數調用的源代碼。爲了讓此視圖正確顯示，你需要使用 go tool pprof 版本，命令行的 pprof 版本有可能不能呈現源碼。此外，在加載視圖時必須添加你的 app 的可執行文件：

go tool pprof -http :6060 bin/mdx-cli ref/cpu.prof

Disassemble View

和 Source view 差不多，只不過顯示的是彙編代碼。

如何分析 pprof 性能數據

這是一個很難解說的話題。

一般來說，多數性能數據你可以望文生義。

如何篩選、如何聚焦可以藉助 pprof 的 web 界面進行正則式篩選。

更詳細的參考說明需要直接查看 Profiling Go Programs - The Go Blog （https://blog.golang.org/pprof）。

如何定位不良代碼則需要長期的調試經驗。

轉自：

juejin.cn/post/6951078043922202631#heading-25

本文由 Readfog 進行 AMP 轉碼，版權歸原作者所有。
來源：https://mp.weixin.qq.com/s/YpUUj4xqlaZ9paEJe7VPYg