golang benchmark 源碼分析

testing 包提供了對 Go 包的自動測試支持。這是和 go test 命令相呼應的功能， go test 命令會自動執行所以符合格式

func TestXXX(t *testing.T)

當帶着 -bench=“.” （參數必須有！）來執行 * go test 命令的時候性能測試程序就會被順序執行。符合下面格式的函數被認爲是一個性能測試程序，

func BenchmarkXxx(b *testing.B)

執行 go test -bench=”.” 後結果：

// 表示測試全部通過

>PASS                       
// Benchmark 名字 - CPU            循環次數             平均每次執行時間 
BenchmarkLoops-2                  100000             20628 ns/op     
BenchmarkLoopsParallel-2          100000             10412 ns/op   
//      哪個目錄下執行go test         累計耗時
ok      swap/lib                   2.279s

源碼包位置：src/testing/benchmark.go

testing 目錄下的文件有

allocs.go               helper_test.go          quick
allocs_test.go          helperfuncs_test.go     run_example.go
benchmark.go            internal                run_example_js.go
benchmark_test.go       iotest                  sub_test.go
cover.go                match.go                testing.go
example.go              match_test.go           testing_test.go
export_test.go          panic_test.go

testing.T

判定失敗接口

Fail 失敗繼續
FailNow 失敗終止

打印信息接口

Log 數據流 （cout　類似）
Logf format (printf 類似）
SkipNow 跳過當前測試
Skiped 檢測是否跳過

綜合接口產生：

Error / Errorf 報告出錯繼續 [ Log / Logf + Fail ]
Fatel / Fatelf 報告出錯終止 [ Log / Logf + FailNow ]
Skip / Skipf 報告並跳過 [ Log / Logf + SkipNow ]

testing.B

首先， testing.B 擁有 testing.T 的全部接口。

SetBytes( i uint64) 統計內存消耗， 如果你需要的話。
SetParallelism(p int) 制定並行數目。
StartTimer / StopTimer / ResertTimer 操作計時器
testing.PB
Next() 接口 。判斷是否繼續循環

下面帶着三個問題去閱讀源碼：

b.N 是如何自動調整的？
內存統計是如何實現的？
SetBytes() 其使用場景是什麼？

B 定義了性能測試的數據結構，我們提取其比較重要的一些成員進行分析：

type B struct {
  common                         // 與testing.T共享的testing.common，負責記錄日誌、狀態等
  importPath       string // import path of the package containing the benchmark
  context          *benchContext
  N                int            // 目標代碼執行次數，不需要用戶瞭解具體值，會自動調整
  previousN        int           // number of iterations in the previous run
  previousDuration time.Duration // total duration of the previous run
  benchFunc        func(b *B)   // 性能測試函數
  benchTime        time.Duration // 性能測試函數最少執行的時間，默認爲1s，可以通過參數'-benchtime 10s'指定
  bytes            int64         // 每次迭代處理的字節數
  missingBytes     bool // one of the subbenchmarks does not have bytes set.
  timerOn          bool // 是否已開始計時
  showAllocResult  bool
  result           BenchmarkResult // 測試結果
  parallelism      int // RunParallel creates parallelism*GOMAXPROCS goroutines
  // The initial states of memStats.Mallocs and memStats.TotalAlloc.
  startAllocs uint64  // 計時開始時堆中分配的對象總數
  startBytes  uint64  // 計時開始時時堆中分配的字節總數
  // The net total of this test after being run.
  netAllocs uint64 // 計時結束時，堆中增加的對象總數
  netBytes  uint64 // 計時結束時，堆中增加的字節總數
}

啓動計時：B.StartTimer()

StartTimer() 負責啓動計時並初始化內存相關計數，測試執行時會自動調用，一般不需要用戶啓動。

func (b *B) StartTimer() {
  if !b.timerOn {
    runtime.ReadMemStats(&memStats)     // 讀取當前堆內存分配信息
    b.startAllocs = memStats.Mallocs    // 記錄當前堆內存分配的對象數
    b.startBytes = memStats.TotalAlloc  // 記錄當前堆內存分配的字節數
    b.start = time.Now()                // 記錄測試啓動時間
    b.timerOn = true                   // 標記計時標誌
  }
}

StartTimer()負責啓動計時，並記錄當前內存分配情況，不管是否有 “-benchmem” 參數，內存都會被統計，參數只決定是否要在結果中輸出。

停止計時：B.StopTimer()

StopTimer() 負責停止計時，並累加相應的統計值。

func (b *B) StopTimer() {
  if b.timerOn {
    b.duration += time.Since(b.start)                   // 累加測試耗時
    runtime.ReadMemStats(&memStats)                     // 讀取當前堆內存分配信息
    b.netAllocs += memStats.Mallocs - b.startAllocs     // 累加堆內存分配的對象數
    b.netBytes += memStats.TotalAlloc - b.startBytes    // 累加堆內存分配的字節數
    b.timerOn = false                                  // 標記計時標誌
  }
}

需要注意的是，StopTimer() 並不一定是測試結束，一個測試中有可能有多個統計階段，所以其統計值是累加的。

重置計時：B.ResetTimer()

ResetTimer() 用於重置計時器，相應的也會把其他統計值也重置。

func (b *B) ResetTimer() {
  if b.timerOn {
    runtime.ReadMemStats(&memStats)     // 讀取當前堆內存分配信息
    b.startAllocs = memStats.Mallocs    // 記錄當前堆內存分配的對象數
    b.startBytes = memStats.TotalAlloc  // 記錄當前堆內存分配的字節數
    b.start = time.Now()                // 記錄測試啓動時間
  }
  b.duration = 0                          // 清空耗時
  b.netAllocs = 0                         // 清空內存分配對象數
  b.netBytes = 0                          // 清空內存分配字節數
}

ResetTimer() 比較常用，典型使用場景是一個測試中，初始化部分耗時較長，初始化後再開始計時。

設置處理字節數：B.SetBytes(n int64)

// SetBytes records the number of bytes processed in a single operation.

// If this is called, the benchmark will report ns/op and MB/s.

func (b *B) SetBytes(n int64) {
  b.bytes = n
}

這是一個比較含糊的函數，通過其函數說明很難明白其作用。

其實它是用來設置單次迭代處理的字節數，一旦設置了這個字節數，那麼輸出報告中將會呈現 “xxx MB/s” 的信息，用來表示待測函數處理字節的性能。待測函數每次處理多少字節數只有用戶清楚，所以需要用戶設置。

報告內存信息：

func (b *B) ReportAllocs() {
  b.showAllocResult = true
}

ReportAllocs() 用於設置是否打印內存統計信息，與命令行參數 “-benchmem” 一致，但本方法只作用於單個測試函數。

性能測試是如何啓動的

性能測試要經過多次迭代，每次迭代可能會有不同的 b.N 值，每次迭代執行測試函數一次，跟據此次迭代的測試結果來分析要不要繼續下一次迭代。

我們先看一下每次迭代時所用到的方法，runN():

func (b *B) runN(n int) {
  b.N = n                       // 指定B.N
  b.ResetTimer()                // 清空統計數據
  b.StartTimer()                // 開始計時
  b.benchFunc(b)                // 執行測試
  b.StopTimer()                 // 停止計時
}

該方法指定 b.N 的值，執行一次測試函數。

與 T.Run() 類似，B.Run() 也用於啓動一個子測試，實際上用戶編寫的任何一個測試都是使用 Run() 方法啓動的，我們看下 B.Run() 的僞代碼：

func (b *B) Run(name string, f func(b *B)) bool {
  sub := &B{                          // 新建子測試數據結構
    common: common{
      signal:  make(chan bool),
      name:    name,
      parent:  &b.common,
    },
    benchFunc:  f,
  }
  if sub.run1() { // 先執行一次子測試，如果子測試不出錯且子測試沒有子測試的話繼續執行sub.run()
  sub.run()       // run()裏決定要執行多少次runN()
  }
  b.add(sub.result) // 累加統計結果到父測試中
  return !sub.failed
}

所有的測試都是先使用 run1() 方法執行一次測試, run1() 方法中實際上調用了 runN(1)，執行一次後再決定要不要繼續迭代。

測試結果實際上以最後一次迭代的數據爲準，當然，最後一次迭代往往意味着 b.N 更大，測試準確性相對更高。

B.N 是如何調整的？

B.launch() 方法裏最終決定 B.N 的值。我們看下僞代碼：

func (b *B) launch() { // 此方法自動測算執行次數，但調用前必須調用run1以便自動計算次數
  d := b.benchTime
  for n := 1; !b.failed && b.duration < d && n < 1e9; { // 最少執行b.benchTime（默認爲1s）時間，最多執行1e9次
    last := n
    n = int(d.Nanoseconds()) // 預測接下來要執行多少次，b.benchTime/每個操作耗時
    if nsop := b.nsPerOp(); nsop != 0 {
      n /= int(nsop)
    }
    n = max(min(n+n/5, 100*last), last+1) // 避免增長較快，先增長20%，至少增長1次
    n = roundUp(n) // 下次迭代次數向上取整到10的指數，方便閱讀
    b.runN(n)
  }
}

不考慮程序出錯，而且用戶沒有主動停止測試的場景下，每個性能測試至少要執行 b.benchTime 長的秒數，默認爲 1s。先執行一遍的意義在於看用戶代碼執行一次要花費多長時間，如果時間較短，那麼 b.N 值要足夠大才可以測得更精確，如果時間較長，b.N 值相應的會減少，否則會影響測試效率。

最終的 b.N 會被定格在某個 10 的指數級，是爲了方便閱讀測試報告。

內存是如何統計的？

我們知道在測試開始時，會把當前內存值記入到 b.startAllocs 和 b.startBytes 中，測試結束時，會用最終內存值與開始時的內存值相減，得到淨增加的內存值，並記入到 b.netAllocs 和 b.netBytes 中。

每個測試結束，會把結果保存到 BenchmarkResult 對象裏，該對象裏保存了輸出報告所必需的統計信息：

type BenchmarkResult struct {
  N         int           // 用戶代碼執行的次數
  T         time.Duration // 測試耗時
  Bytes     int64         // 用戶代碼每次處理的字節數，SetBytes()設置的值
  MemAllocs uint64        // 內存對象淨增加值
  MemBytes  uint64        // 內存字節淨增加值
}

其中 MemAllocs 和 MemBytes 分別對應 b.netAllocs 和 b.netBytes。

那麼最終統計時只需要把淨增加值除以 b.N 即可得到每次新增多少內存了。

每個操作內存對象新增值：

func (r BenchmarkResult) AllocsPerOp() int64 {
  return int64(r.MemAllocs) / int64(r.N)
}

每個操作內存字節數新增值：

func (r BenchmarkResult) AllocedBytesPerOp() int64 {
  if r.N <= 0 {
    return 0
  }
  return int64(r.MemBytes) / int64(r.N)
}

本文由 Readfog 進行 AMP 轉碼，版權歸原作者所有。
來源：https://mp.weixin.qq.com/s/LcIIiFdl-oXEAiH5iI4Urw

猜你喜歡