Go: string 和 bytes 四種轉換方式的性能比較

昨天公司羣中同事提到 Go 1.22 中 string 和 bytes 的互轉不需要再用 unsafe 那個包了，直接轉就可以。我翻看了 Go 1.22 的 release notes 沒找到相應的介紹，但是大家提到了 kubernetes 的 issue[1] 中有這個說法：

As of go 1.22, for string to bytes conversion, we can replace the usage of unsafe.Slice(unsafe.StringData(s), len(s)) with type casting []bytes(str), without the worry of losing performance.

As of go 1.22, string to bytes conversion []bytes(str) is faster than using the unsafe package. Both methods have 0 memory allocation now.

自 Go 1.22 起，對於 string 到 bytes 的轉換，我們可以用類型轉換 []bytes(str) 來替換 unsafe.Slice(unsafe.StringData(s), len(s)) 的用法，而不用擔心性能損失。自 Go 1.22 起，string 到 bytes 的轉換 []bytes(str) 比使用 unsafe 包更快。現在兩種方法都不會有內存分配。

這個說法讓我很好奇，但是我還是想驗證一下這個說法。

注意，這個說法只談到了 string 到 bytes 的轉換，並沒有提到 bytes 到 string 的轉換，這篇文章也會關注這兩者的互轉。

首先，讓我們看看幾種 string 和 bytes 的轉換方式，然後我們再寫 benchmark 比較它們之間的性能。

一、強轉

字符串和 bytes 之間可以強制轉換，編譯器會內部處理。代碼如下：

func toRawBytes(s string) []byte {
 if len(s) == 0 {
  return nil
 }
 return []byte(s)
}

func toRawString(b []byte) string {
 if len(b) == 0 {
  return ""
 }
 return string(b)
}

這裏我們做了一點點優化，處理空 string 或者 bytes 的情況。

二、傳統 unsafe 方式

reflect 包中定義了 SliceHeader 和 StringHeader, 分別對應 slice 和 string 的數據結構

type SliceHeader struct {
 Data uintptr
 Len  int
 Cap  int
}
type StringHeader struct {
 Data uintptr
 Len  int
}

我們按照這種數據結構，可以實現 string 和 bytes 的互轉。我們暫且把它叫做 reflect 方式吧，雖然下面的代碼沒有用到 reflect 包，但是實際我們是按照 reflect 包中的這兩個數據結構進行轉換的:

func toReflectBytes(s string) []byte {
 if len(s) == 0 {
  return nil
 }

 x := (*[2]uintptr)(unsafe.Pointer(&s))
 h := [3]uintptr{x[0], x[1], x[1]}
 return *(*[]byte)(unsafe.Pointer(&h))
}

func toReflectString(b []byte) string {
 if len(b) == 0 {
  return ""
 }
 return *(*string)(unsafe.Pointer(&b))
}

三、新型 unsafe 方式

我在兩年前的文章與日俱進，在 Go 1.20 中這種高效轉換的方式又變了 [2] 介紹了新的 unsafe 方式，reflect 包中的 SliceHeader 和 StringHeader 準備廢棄了。讓我們看看這種新的轉換方式：

func toBytes(s string) []byte {
 if len(s) == 0 {
  return nil
 }
 return unsafe.Slice(unsafe.StringData(s), len(s))
}

func toString(b []byte) string {
 if len(b) == 0 {
  return ""
 }
 return unsafe.String(unsafe.SliceData(b), len(b))
}

利用 unsafe.Slice 、unsafe.String、unsafe.StringData 和 unsafe.SliceData 完成 Slice 和 String 的轉換以及底層數據的指針的獲取。

四、kubernetes 的實現

在 k8s 中，使用的是下面方式的優化的轉換：

func toK8sBytes(s string) []byte {
 return *(*[]byte)(unsafe.Pointer(&s))
}

func toK8sString(b []byte) string {
 return *(*string)(unsafe.Pointer(&b))
}

可以看到，相對於傳統 unsafe 方式，k8s 的實現更簡潔，並沒有爲toBytes臨時構造 3 元素的數組，而是直接將 string 和 bytes 的指針進行轉換。

string 不是隻包含兩個字段麼？slice 不是包含三個字段麼？toK8sBytes返回的 []byte 的 cap 是怎麼確定的呢？最後我們再分析這個問題，現在先把這幾個實現的性能搞清楚。

性能比較

我們分別對這幾種實現進行 benchmark，看看它們之間的性能差異。使用一個簡單的字符串和它對應的 bytes, 分別進行 string 到 bytes 、 bytes 到 string 的轉換。

var s = "hello, world"
var bts = []byte("hello, world")

func BenchmarkStringToBytes(b *testing.B) {
 var fns = map[string]func(string) []byte{
  "強制轉換":  toRawBytes,
  "傳統轉換":  toReflectBytes,
  "新型轉換":  toBytes,
  "k8s轉換": toK8sBytes,
 }

 for name, fn := range fns {
  b.Run(name, func(b *testing.B) {
   for i := 0; i < b.N; i++ {
    bts = fn(s)
   }
  })
 }
}

func BenchmarkBytesToString(b *testing.B) {
 var fns = map[string]func([]byte) string{
  "強制轉換":  toRawString,
  "傳統轉換":  toReflectString,
  "新型轉換":  toString,
  "k8s轉換": toK8sString,
 }

 for name, fn := range fns {
  b.Run(name, func(b *testing.B) {
   for i := 0; i < b.N; i++ {
    s = fn(bts)
   }
  })
 }
}

在 Mac mini M2 上運行，go1.22.6 darwin/arm64，結果如下：

goos: darwin
goarch: arm64
pkg: github.com/smallnest/study/str2bytes

BenchmarkStringToBytes/強制轉換-8               78813638         14.73 ns/op       16 B/op        1 allocs/op
BenchmarkStringToBytes/傳統轉換-8               599346962          2.010 ns/op        0 B/op        0 allocs/op
BenchmarkStringToBytes/新型轉換-8               624976126          1.929 ns/op        0 B/op        0 allocs/op
BenchmarkStringToBytes/k8s轉換-8              887370499          1.211 ns/op        0 B/op        0 allocs/op

string 轉 bytes 性能最好的是 k8s 方案，新型轉換和傳統轉換性能差不多，新型方案略好，強制轉換性能最差。

BenchmarkBytesToString/強制轉換-8               92011309         12.68 ns/op       16 B/op        1 allocs/op
BenchmarkBytesToString/傳統轉換-8               815922964          1.471 ns/op        0 B/op        0 allocs/op
BenchmarkBytesToString/新型轉換-8               624965414          1.922 ns/op        0 B/op        0 allocs/op
BenchmarkBytesToString/k8s轉換-8              1000000000          1.194 ns/op        0 B/op        0 allocs/op

而對於 bytes 轉 string，k8s 方案性能最好，傳統轉換次之，新型轉換性能再次之，強制轉換性能非常不好。

在 Linux amd64 上運行，go1.22.0 linux/amd64，結果如下：

goos: linux
goarch: amd64
pkg: test
cpu: Intel(R) Xeon(R) Platinum
BenchmarkStringToBytes/強制轉換-2                  30606319         42.02 ns/op       16 B/op        1 allocs/op
BenchmarkStringToBytes/傳統轉換-2                  315913948          3.779 ns/op        0 B/op        0 allocs/op
BenchmarkStringToBytes/新型轉換-2                  411972518          2.753 ns/op        0 B/op        0 allocs/op
BenchmarkStringToBytes/k8s轉換-2                 449640819          2.770 ns/op        0 B/op        0 allocs/op


BenchmarkBytesToString/強制轉換-2                  38716465         29.18 ns/op       16 B/op        1 allocs/op
BenchmarkBytesToString/傳統轉換-2                  458832459          2.593 ns/op        0 B/op        0 allocs/op
BenchmarkBytesToString/新型轉換-2                  439537762          2.762 ns/op        0 B/op        0 allocs/op
BenchmarkBytesToString/k8s轉換-2                 478885546          2.375 ns/op        0 B/op        0 allocs/op

整體上看，k8s 方案、傳統轉換、新型轉換性能都挺好，強制轉換性能最差。k8s 在 bytes 轉 string 上性能最好。

性能分析

等等，kubernates 的討論中，不是說 Go1.22 中 string 到 bytes 的轉換可以直接用[]byte(str)了麼？爲什麼這裏的性能測試中，強制轉換爲什麼性能那麼差呢？

同時你也可以看到，強制轉換每個 op 都會有一次內存分配:1 allocs/op, 這嚴重影響了它的性能。

如果我們編寫兩個 benchmark 測試函數, 如下:

func BenchmarkStringToBytesRaw(b *testing.B) {
 for i := 0; i < b.N; i++ {
  _ = toRawBytes(s)
 }
}

func BenchmarkBytesToStringRaw(b *testing.B) {
 for i := 0; i < b.N; i++ {
  _ = toRawString(bts)
 }
}

執行:

goos: darwin
goarch: arm64
pkg: github.com/smallnest/study/str2bytes
BenchmarkStringToBytesRaw-8    1000000000          0.2921 ns/op        0 B/op        0 allocs/op
BenchmarkBytesToStringRaw-8    506502222          2.363 ns/op        0 B/op        0 allocs/op

你會發現一個令人詫異的事情，強制轉換的性能非常好，沒有額外的內存分配 (零拷貝)，設置字符串轉換爲 bytes 好太多。

這是咋回事呢？

當然聰明的你就會想到這個肯定是編譯器做了優化，通過內聯，把 toRawBytes 的函數調用展開了，這個好處是發現 s

# go test -gcflags="-m=2" -bench Raw -benchmem
...
./convert_test.go:48:6: can inline toRawBytes with cost 10 as: func(string) []byte { if len(s) == 0 { return nil }; return ([]byte)(s) }
./convert_test.go:55:6: can inline toRawString with cost 10 as: func([]byte) string { if len(b) == 0 { return "" }; return string(b) }
...
./convert_test.go:101:17: ([]byte)(s) does not escape
./convert_test.go:101:17: zero-copy string->[]byte conversion
...

通過-gcflags="-m=2", 我們可以觀察內聯和逃逸分析的結果，可以看到編譯器優化了強制轉換的函數，將 string 轉換爲 bytes 的操作優化爲零拷貝。

而上一節我們的 benchmark 中，bts = toRawBytes(s)這個操作，會導致([]byte)(s)逃逸到堆上，這樣就會有一次內存分配，並且性能底下。

所以你現在情況了，Go1.22 確實對強制轉換做了優化，但是這個優化是通過編譯器的內聯和逃逸分析來實現的，並不是所有的場景都能夠優化到零拷貝。

誰能在編寫代碼的時候注意到這個優化呢，甚至準確的判斷能否避免逃逸？所以可能在現階段，我們還是會通過其他三種方式進行優化。

貌似 Go 1.23 會進一步優化，參考這個 CL: cmd/compile: restore zero-copy string->[]byte optimization[3]

k8s 實現的問題

一開始，我們留了一個問題：toK8sBytes返回的 []byte 的 cap 是多少？

func toK8sBytes(s string) []byte {
 return *(*[]byte)(unsafe.Pointer(&s))
}

len是明確的，字段對應字符串的 len 字段，但是cap是多少呢？字符串可是沒有cap字段的。

我們可以通過下面的代碼來驗證:

func Test_toK8sBytes(t *testing.T) {
 a := *(*[3]int64)(unsafe.Pointer(&s))
 fmt.Printf("%d, %d, %d\n", a[0], a[1], a[2])

 b := *(*[]byte)(unsafe.Pointer(&s))
 fmt.Printf("%d, %d, %d\n", unsafe.SliceData(b), len(b), cap(b))
}

首先我們強制獲取三個字段，第一個字段應該是字符串底層數據的指針。第二個字段是字符串的長度，第三個字段是什麼呢？同樣我進行強制轉換成 slice of byte, 然後打印 slice 的底層數據指針，長度和容量。

輸出結果如下 (每次運行可能會得到不同的結果):

4375580047, 12, 4375914624
4375580047, 12, 4375914624

可以看到兩者的結果是一致的，第一個值就是底層數據指針，第二個值是長度 12，第三個啥也不是，就取得的內存中的值，隨機的，並不是容量 12。

所以通過這種方式轉換的 slice，其容量是不確定的，這個是一個問題，可能會導致一些問題，比如 slice 的 append 操作。

1、如果得到的 slice 的容量那麼大，我們是不是盡情的 append 數據呢？

 b := *(*[]byte)(unsafe.Pointer(&s))
 fmt.Printf("%d, %d, %d\n", unsafe.SliceData(b), len(b), cap(b))

 b = append(b, '!')

運行上面的測試會導致 panic:

unexpected fault address 0x105020dfb
fatal error: fault
[signal SIGBUS: bus error code=0x1 addr=0x105020dfb pc=0x10501ee98]

2、如果修改返回的 bytes, 共享底層數據的原始 string 是不是也會發生變化？

 b := *(*[]byte)(unsafe.Pointer(&s))
 fmt.Printf("%d, %d, %d\n", unsafe.SliceData(b), len(b), cap(b))
 b[0] = 'H'

運行上面的測試，會導致 string 的值s發生變化嗎? 答案是不會，運行這段代碼依然會導致 panic"

unexpected fault address 0x104f1cdcf
fatal error: fault
[signal SIGBUS: bus error code=0x1 addr=0x104f1cdcf pc=0x104f1ae74]

3、如果修改原始的 bytes, 返回的 string 是不是也會發生變化？我們知道，字符串是不可變的，所以這個問題的答案是？測試代碼如下：

 c := *(*string)(unsafe.Pointer(&bts))
 fmt.Printf("%s\n", c)
 bts[0] = 'H'
 fmt.Printf("%s\n", c)

原始的 bytes bts 發生變化，返回的 string c 會發生變化嗎？上面的代碼打印出修改前後同一個字符串的值：

hello, world
Hello, world

哈，字符串也變成了 "可變" 的了。

總結

Go 1.22 中，string 和 bytes 的互轉在部分場景 (未逃逸的情況) 下做了優化，實現了零拷貝，性能優秀，但是並不是所有的場景都能優化到零拷貝，所以我們、可以再等等，再等幾個版本優化完全後再替換傳統的互轉方式。

在字符串和 bytes 互轉的情況下，我們要確定 bytes 是不是可變的，這樣會避免意外的情況發生，否則不妨採用強制轉換的方式，安全第一。

參考資料

[1]

issue: https://github.com/kubernetes/kubernetes/issues/124656

[2]

與日俱進，在 Go 1.20 中這種高效轉換的方式又變了: https://colobu.com/2022/09/06/string-byte-convertion/

[3]

cmd/compile: restore zero-copy string->[]byte optimization: https://github.com/golang/go/commit/925d2fb36c8e4c9c0e6e240a1621db36c34e5d31

本文由 Readfog 進行 AMP 轉碼，版權歸原作者所有。
來源：https://mp.weixin.qq.com/s/vXQcbZV5MI0FrYMG4hsn6w