GO: sync-Pool 的實現與演進

一般 sync.Pool 用作小對像池，比如前公司同事，在 thrift golang lib 增加了 sync.Pool 實現 []byte 等對象的複用。網上也有很多 objectPool 的輪子，但總體實現都不如 sync.Pool 高效。

基本原理與演進初探

想象一下如果我們自己實現，該怎麼做呢？用一個定長的 channel 保存對象，拿到了就用，拿不到就 new 創建一個，僞代碼大致如下：

type ObjectPool struct {
    ch chan {}interface
    newFunc func() {}interface
}

func (o *ObjectPool) Get() {}interface {
    select {
        v := <-o.ch:
          return v
        default:
    }
    return o.newFunc()
}

func (o *ObjectPool) Put(v {}interface) {
    select {
        o.ch <- v:
        default:
    }
}

代碼很簡潔，利用 select default 語法實現無阻塞操作。這裏最大的問題就是 channel 也是有代價的，一把大鎖讓性能會變得很低，參考我之前的關 dpvs 性能優化。那怎麼優化呢？多核 cpu 高併發編程，就是要每個 cpu 擁有自己的本地數據，這樣就避免了鎖爭用的開銷。而事實上 sync.Pool 也是這麼做的。

看了下提交記錄，從增加該功能後實現的大方現基本沒變：

每個 P (邏輯併發模型，參考 GMP) 擁有本地緩存隊列，如果本地獲取不到對象，再從其它 P 去偷一個，其它 P 也沒的話，調 new factory 創建新的返回。
Pool 裏的對象不是永生的，老的實現，對象如果僅由 Pool 引用，那麼會在下次 GC 之間被銷燬。但是最新優化 22950 裏，爲了優化 GC 後 Pool 爲空導致的冷啓動性能抖動，增加了 victim cache, 用來保存上一次 GC 本應被銷燬的對象，也就是說，對象至少存活兩次 GC 間隔。
性能優化，將本地隊列變成無鎖隊列 (單生產者，多消費者模型，嚴格來講不通用)，還有一些 fix bug...

數據結構及演進

type Pool struct {
    local     unsafe.Pointer // local fixed-size per-P pool, actual type is [P]poolLocal
    localSize uintptr        // size of the local array

    // New optionally specifies a function to generate
    // a value when Get would otherwise return nil.
    // It may not be changed concurrently with calls to Get.
    New func() interface{}
}

// Local per-P Pool appendix.
type poolLocal struct {
    private interface{}   // Can be used only by the respective P.
    shared  []interface{} // Can be used by any P.
    Mutex                 // Protects shared.
    pad     [128]byte     // Prevents false sharing.
}

對象是存儲在 poolLocal 裏的，private 字段表示最新生成的單個對象，只能由本地 P 訪問，shared 是一個 slice, 可以被任意 P 訪問，Mutex 用來保護 shared. pad 用來對齊，作用參考我之前的 [cpu cache] https://www.jianshu.com/p/dc4b5562aad2

再加頭看 Pool 結構體，New 是創建對象的工廠方法。local 是一個指向 []poolLocal 的指針 (準確說，是 slice 底層數組的首地址)，localSize 是 slice 的長度，由於 P 的個數是可以在線調整的，所以 localSize 運行時可能會變化。訪問時，P 的 id 對應 []poolLocal 下標索引。

type Pool struct {
    noCopy noCopy

    local     unsafe.Pointer // local fixed-size per-P pool, actual type is [P]poolLocal
    localSize uintptr        // size of the local array

    victim     unsafe.Pointer // local from previous cycle
    victimSize uintptr        // size of victims array

    // New optionally specifies a function to generate
    // a value when Get would otherwise return nil.
    // It may not be changed concurrently with calls to Get.
    New func() interface{}
}

// Local per-P Pool appendix.
type poolLocalInternal struct {
    private interface{} // Can be used only by the respective P.
    shared  poolChain   // Local P can pushHead/popHead; any P can popTail.
}

type poolLocal struct {
    poolLocalInternal

    // Prevents false sharing on widespread platforms with
    // 128 mod (cache line size) = 0 .
    pad [128 - unsafe.Sizeof(poolLocalInternal{})%128]byte
}

Pool 增加了 noCopy 字段，Pool 默認創建後禁止拷貝，必須使用指針。noCopy 用來編繹時 go vet 檢查，靜態語言就是爽，編繹期幹了好多髒活累活。參考 [issue 8005]https://links.jianshu.com/go?to=https%3A%2F%2Fgolang.org%2Fissues%2F8005%23issuecomment-190753527, 裏面有很多討論，關於禁止拷貝如何實現。
增加 victim cache, 以減少 GC 後冷啓動導致的性能抖動。
poolLocal 拆成了兩個結構體，pad 實現也稍微變了下，爲了兼容更多硬件 cache line size. 另外最重要的優化，就是 shared slice 變成了無鎖隊列。

第一版本實現

對象 put

// Put adds x to the pool.
func (p *Pool) Put(x interface{}) {
    if raceenabled {
        // Under race detector the Pool degenerates into no-op.
        // It's conforming, simple and does not introduce excessive
        // happens-before edges between unrelated goroutines.
        return
    }
    if x == nil {
        return
    }
    l := p.pin()
    if l.private == nil {
        l.private = x
        x = nil
    }
    runtime_procUnpin()
    if x == nil {
        return
    }
    l.Lock()
    l.shared = append(l.shared, x)
    l.Unlock()
}

邏輯很簡單，先 pin 住，如果 private 字段爲空，將對象放到 private 字段，否則添加到 share 池裏。

對象 get

func (p *Pool) Get() interface{} {
    if raceenabled { // race 檢測時禁用 Pool 功能，後續去掉了這個
        if p.New != nil {
            return p.New()
        }
        return nil
    }
    l := p.pin() // pin 會禁止 P 被搶佔，並返回本地 P 對應的 poolLocal 信息。
    x := l.private
    l.private = nil
    runtime_procUnpin()
    if x != nil { // 如果 private 有了，就不用去看 share 直接返回就好
        return x
    }
    l.Lock() // 上鎖保護 share
    last := len(l.shared) - 1
    if last >= 0 {
        x = l.shared[last]
        l.shared = l.shared[:last]
    }
    l.Unlock()
    if x != nil { // 此時從 share 中拿到了對象，返回即可
        return x
    }
    return p.getSlow() // 走慢的邏輯：從其它 P 偷或是調用 new 工廠方法創建
}

func (p *Pool) getSlow() (x interface{}) {
    // See the comment in pin regarding ordering of the loads.
    size := atomic.LoadUintptr(&p.localSize) // load-acquire
    local := p.local                         // load-consume
    // Try to steal one element from other procs.
    pid := runtime_procPin()
    runtime_procUnpin()
    for i := 0; i < int(size); i++ { // 輪循從下一個 P 本地隊列偷數據
        l := indexLocal(local, (pid+i+1)%int(size))
        l.Lock()
        last := len(l.shared) - 1
        if last >= 0 {
            x = l.shared[last]
            l.shared = l.shared[:last]
            l.Unlock()
            break
        }
        l.Unlock()
    }

    if x == nil && p.New != nil { // 其它 P 中也沒偷到，New 一個
        x = p.New()
    }
    return x
}

從這裏，可以看到大體邏輯，和之前描述基本一致，那具體 pin 如何實現的呢？有什麼作用呢？接着看源碼

func sync·runtime_procPin() (p int) {
    M *mp;

    mp = m;
    // Disable preemption.
    mp->locks++;
    p = mp->p->id;
}

func sync·runtime_procUnpin() {
    m->locks--;
}

實際上 sync·runtime_procPin 和 sync·runtime_procUnpin 就是針對 M 進行加鎖，防止被 runtime 搶佔而己。Pin 除了上鎖，會返回 P 的 id

// pin pins the current goroutine to P, disables preemption and returns poolLocal pool for the P.
// Caller must call runtime_procUnpin() when done with the pool.
func (p *Pool) pin() *poolLocal {
    pid := runtime_procPin()
    // In pinSlow we store to localSize and then to local, here we load in opposite order.
    // Since we've disabled preemption, GC can not happen in between.
    // Thus here we must observe local at least as large localSize.
    // We can observe a newer/larger local, it is fine (we must observe its zero-initialized-ness).
    s := atomic.LoadUintptr(&p.localSize) // load-acquire 獲取 []poolLocal slice 長度
    l := p.local                          // load-consume 獲取 []poolLocal 首地址
    if uintptr(pid) < s { // 由於 P 的 id 就是 []poolLocal 下標
        return indexLocal(l, pid)
    }
    return p.pinSlow()
}

func (p *Pool) pinSlow() *poolLocal {
    // Retry under the mutex.
    // Can not lock the mutex while pinned.
    runtime_procUnpin()
    allPoolsMu.Lock()
    defer allPoolsMu.Unlock()
    pid := runtime_procPin()
    // poolCleanup won't be called while we are pinned.
    s := p.localSize
    l := p.local
    if uintptr(pid) < s { // pid 就是 slice 的下村，所以如果 pid 小於 s 就查找 slice
        return indexLocal(l, pid)
    }
    if p.local == nil { // 第一次使用，把 Pool 添加到全局 allPools 
        allPools = append(allPools, p)
    }
    // If GOMAXPROCS changes between GCs, we re-allocate the array and lose the old one. 走擴容邏輯
    size := runtime.GOMAXPROCS(0)
    local := make([]poolLocal, size)
    atomic.StorePointer((*unsafe.Pointer)(&p.local), unsafe.Pointer(&local[0])) // store-release
    atomic.StoreUintptr(&p.localSize, uintptr(size))                            // store-release
    return &local[pid]
}
    // l 是指針地地，做類型轉換，然後返回下標 i 的 poolLocal
func indexLocal(l unsafe.Pointer, i int) *poolLocal {
    return &(*[1000000]poolLocal)(l)[i]
}

pin 的作用將當前 goroutine 和 P 進行綁定，禁止搶佔，然後返回當前 P 所對應的 poolLocal 結構體。

localSize 是 []poolLocal slice 長度，由於是用 pid 做下標索引，所以如果 pid 小於 localSize，直接返回，否則走 pinSlow 邏輯
pinSlow 觸發有兩點：Pool 第一次被使用，GOMAXPROCS 運行時個改。這時可以看到 p.local 直接用一個新的 slice 覆蓋了，舊的對象池會被丟棄。

可以看到，整體實現不是很複雜，最新版本與第一版變化不太大。

對象 cleanup

func poolCleanup() {
    // This function is called with the world stopped, at the beginning of a garbage collection.
    // It must not allocate and probably should not call any runtime functions.
    // Defensively zero out everything, 2 reasons:
    // 1. To prevent false retention of whole Pools.
    // 2. If GC happens while a goroutine works with l.shared in Put/Get,
    //    it will retain whole Pool. So next cycle memory consumption would be doubled.
    for i, p := range allPools {
        allPools[i] = nil
        for i := 0; i < int(p.localSize); i++ {
            l := indexLocal(p.local, i)
            l.private = nil
            for j := range l.shared {
                l.shared[j] = nil
            }
            l.shared = nil
        }
    }
    allPools = []*Pool{}
}

var (
    allPoolsMu Mutex
    allPools   []*Pool
)

func init() {
    runtime_registerPoolCleanup(poolCleanup)
}

代碼很簡單，init 函數會將 poolCleanup 註冊到 runtime, 在 GC 開始，STW 後執行，遍歷 poolLocal 然後解引用即可。

indexLocal 性能優化

參見官方 commit（https://links.jianshu.com/go?to=https%3A%2F%2Fgithub.com%2Fgolang%2Fgo%2Fcommit%2Faf5c95117b26e22d942a12e15bdc8e25607f738c），修改如下

 func indexLocal(l unsafe.Pointer, i int) *poolLocal {
-       return &(*[1000000]poolLocal)(l)[i]
+       lp := unsafe.Pointer(uintptr(l) + uintptr(i)*unsafe.Sizeof(poolLocal{}))
+       return (*poolLocal)(lp)
 }

    Performance results on linux/amd64:

    name            old time/op  new time/op  delta
    Pool-4          19.1ns ± 2%  10.1ns ± 1%  -47.15%  (p=0.000 n=10+8)
    PoolOverflow-4  3.11µs ± 1%  2.10µs ± 2%  -32.66%  (p=0.000 n=10+10)

    Performance results on linux/386:

    name            old time/op  new time/op  delta
    Pool-4          20.0ns ± 2%  13.1ns ± 1%  -34.59%  (p=0.000 n=10+9)
    PoolOverflow-4  3.51µs ± 1%  2.49µs ± 0%  -28.99%  (p=0.000 n=10+8)

可以看到，修改後性能大幅提升，那麼這次性能優化的原理是什麼呢？？？原版本是轉化成 [1000000]poolLocal 定長數組後尋址，一個是直接根據 offset 定位到指定內存，然後做 poolLocal 類型轉換。先看下彙編實現

"".indexLocal STEXT nosplit size=20 args=0x18 locals=0x0
    0x0000 00000 (test.go:11)   TEXT    "".indexLocal(SB), NOSPLIT|ABIInternal, $0-24
    0x0000 00000 (test.go:11)   FUNCDATA    $0, gclocals·9fad110d66c97cf0b58d28cccea80b12(SB)
    0x0000 00000 (test.go:11)   FUNCDATA    $1, gclocals·7d2d5fca80364273fb07d5820a76fef4(SB)
    0x0000 00000 (test.go:11)   FUNCDATA    $3, gclocals·9a26515dfaeddd28bcbc040f1199f48d(SB)
    0x0000 00000 (test.go:12)   PCDATA  $2, $0
    0x0000 00000 (test.go:12)   PCDATA  $0, $0
    0x0000 00000 (test.go:12)   MOVQ    "".i+16(SP), AX
    0x0005 00005 (test.go:12)   PCDATA  $2, $1
    0x0005 00005 (test.go:12)   PCDATA  $0, $1
    0x0005 00005 (test.go:12)   MOVQ    "".l+8(SP), CX
    0x000a 00010 (test.go:12)   PCDATA  $2, $2
    0x000a 00010 (test.go:12)   LEAQ    (CX)(AX*8), AX
    0x000e 00014 (test.go:13)   PCDATA  $2, $0
    0x000e 00014 (test.go:13)   PCDATA  $0, $2
    0x000e 00014 (test.go:13)   MOVQ    AX, "".~r2+24(SP)
    0x0013 00019 (test.go:13)   RET
    0x0000 48 8b 44 24 10 48 8b 4c 24 08 48 8d 04 c1 48 89  H.D$.H.L$.H...H.
    0x0010 44 24 18 c3                                      D$..
"".indexLocal2 STEXT nosplit size=58 args=0x18 locals=0x8
    0x0000 00000 (test.go:16)   TEXT    "".indexLocal2(SB), NOSPLIT|ABIInternal, $8-24
    0x0000 00000 (test.go:16)   SUBQ    $8, SP
    0x0004 00004 (test.go:16)   MOVQ    BP, (SP)
    0x0008 00008 (test.go:16)   LEAQ    (SP), BP
    0x000c 00012 (test.go:16)   FUNCDATA    $0, gclocals·9fad110d66c97cf0b58d28cccea80b12(SB)
    0x000c 00012 (test.go:16)   FUNCDATA    $1, gclocals·7d2d5fca80364273fb07d5820a76fef4(SB)
    0x000c 00012 (test.go:16)   FUNCDATA    $3, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
    0x000c 00012 (test.go:17)   PCDATA  $2, $1
    0x000c 00012 (test.go:17)   PCDATA  $0, $1
    0x000c 00012 (test.go:17)   MOVQ    "".l+16(SP), AX
    0x0011 00017 (test.go:17)   TESTB   AL, (AX)
    0x0013 00019 (test.go:17)   MOVQ    "".i+24(SP), CX
    0x0018 00024 (test.go:17)   CMPQ    CX, $1000000
    0x001f 00031 (test.go:17)   JCC 51
    0x0021 00033 (test.go:17)   LEAQ    (AX)(CX*8), AX
    0x0025 00037 (test.go:17)   PCDATA  $2, $0
    0x0025 00037 (test.go:17)   PCDATA  $0, $2
    0x0025 00037 (test.go:17)   MOVQ    AX, "".~r2+32(SP)
    0x002a 00042 (test.go:17)   MOVQ    (SP), BP
    0x002e 00046 (test.go:17)   ADDQ    $8, SP
    0x0032 00050 (test.go:17)   RET
    0x0033 00051 (test.go:17)   PCDATA  $0, $1
    0x0033 00051 (test.go:17)   CALL    runtime.panicindex(SB)
    0x0038 00056 (test.go:17)   UNDEF

indexLocal 是優化之後的，indexLocal2 是優化前的代碼。可以看多，老版本多了個 CMPQ, 也就是查看是否數組越界的檢查，多了層分支預測的邏輯。想不到吧，兩種轉換方式還有性能差距。

增加無鎖隊列

poolLocal.share 字段由 []interface{} 變成了 poolChain, 這個隊列專爲 Pool 而設計，單生產者多消費者，多消費者消費時使用 CAS 實現無鎖。

Currently, Pool stores each per-P shard's overflow in a slice
protected by a Mutex. In order to store to the overflow or steal from
another shard, a P must lock that shard's Mutex. This allows for
simple synchronization between Put and Get, but has unfortunate
consequences for clearing pools.

Pools are cleared during STW sweep termination, and hence rely on
pinning a goroutine to its P to synchronize between Get/Put and
clearing. This makes the Get/Put fast path extremely fast because it
can rely on quiescence-style coordination, which doesn't even require
atomic writes, much less locking.

The catch is that a goroutine cannot acquire a Mutex while pinned to
its P (as this could deadlock). Hence, it must drop the pin on the
slow path. But this means the slow path is not synchronized with
clearing. As a result,

1) It's difficult to reason about races between clearing and the slow
path. Furthermore, this reasoning often depends on unspecified nuances
of where preemption points can occur.

2) Clearing must zero out the pointer to every object in every Pool to
prevent a concurrent slow path from causing all objects to be
retained. Since this happens during STW, this has an O(# objects in
Pools) effect on STW time.

3) We can't implement a victim cache without making clearing even
slower.

This CL solves these problems by replacing the locked overflow slice
with a lock-free structure. This allows Gets and Puts to be pinned the
whole time they're manipulating the shards slice (Pool.local), which
eliminates the races between Get/Put and clearing. This, in turn,
eliminates the need to zero all object pointers, reducing clearing to
O(# of Pools) during STW.

In addition to significantly reducing STW impact, this also happens to
speed up the Get/Put fast-path and the slow path. It somewhat
increases the cost of PoolExpensiveNew, but we'll fix that in the next
CL.

name                 old time/op     new time/op     delta
Pool-12                 3.00ns ± 0%     2.21ns ±36%  -26.32%  (p=0.000 n=18+19)
PoolOverflow-12          600ns ± 1%      587ns ± 1%   -2.21%  (p=0.000 n=16+18)
PoolSTW-12              71.0µs ± 2%      5.6µs ± 3%  -92.15%  (p=0.000 n=20+20)
PoolExpensiveNew-12     3.14ms ± 5%     3.69ms ± 7%  +17.67%  (p=0.000 n=19+20)

name                 old p50-ns/STW  new p50-ns/STW  delta
PoolSTW-12               70.7k ± 1%       5.5k ± 2%  -92.25%  (p=0.000 n=20+20)

name                 old p95-ns/STW  new p95-ns/STW  delta
PoolSTW-12               73.1k ± 2%       6.7k ± 4%  -90.86%  (p=0.000 n=18+19)

name                 old GCs/op      new GCs/op      delta
PoolExpensiveNew-12       0.38 ± 1%       0.39 ± 1%   +2.07%  (p=0.000 n=20+18)

name                 old New/op      new New/op      delta
PoolExpensiveNew-12       33.9 ± 6%       40.0 ± 6%  +17.97%  (p=0.000 n=19+20)

完整的看下 Get 代碼實現：

func (p *Pool) Get() interface{} {
    if race.Enabled {
        race.Disable()
    }
    l, pid := p.pin()
    x := l.private
    l.private = nil
    if x == nil {
        // Try to pop the head of the local shard. We prefer
        // the head over the tail for temporal locality of
        // reuse.
        x, _ = l.shared.popHead()
        if x == nil {
            x = p.getSlow(pid)
        }
    }
    runtime_procUnpin()
    if race.Enabled {
        race.Enable()
        if x != nil {
            race.Acquire(poolRaceAddr(x))
        }
    }
    if x == nil && p.New != nil {
        x = p.New()
    }
    return x
}

func (p *Pool) getSlow(pid int) interface{} {
    // See the comment in pin regarding ordering of the loads.
    size := atomic.LoadUintptr(&p.localSize) // load-acquire
    local := p.local                         // load-consume
    // Try to steal one element from other procs.
    for i := 0; i < int(size); i++ {
        l := indexLocal(local, (pid+i+1)%int(size))
        if x, _ := l.shared.popTail(); x != nil {
            return x
        }
    }
    return nil
}

具體無鎖隊列怎麼實現的，就不貼了，各種 CAS... 沒啥特別的。

增加 victim cache

爲什麼要增加 victim cache 看這個 22950(https://links.jianshu.com/go?to=https%3A%2F%2Fgithub.com%2Fgolang%2Fgo%2Fissues%2F22950)，說白了，就是要減少 GC 清除所有 Pool 後的冷啓動問題，讓分配對象更平滑。參見 commit(https://links.jianshu.com/go?to=https%3A%2F%2Fgithub.com%2Fgolang%2Fgo%2Fcommit%2F2dcbf8b3691e72d1b04e9376488cef3b6f93b286).

Currently, every Pool is cleared completely at the start of each GC.
This is a problem for heavy users of Pool because it causes an
allocation spike immediately after Pools are clear, which impacts both
throughput and latency.

This CL fixes this by introducing a victim cache mechanism. Instead of
clearing Pools, the victim cache is dropped and the primary cache is
moved to the victim cache. As a result, in steady-state, there are
(roughly) no new allocations, but if Pool usage drops, objects will
still be collected within two GCs (as opposed to one).

This victim cache approach also improves Pool's impact on GC dynamics.
The current approach causes all objects in Pools to be short lived.
However, if an application is in steady state and is just going to
repopulate its Pools, then these objects impact the live heap size *as
if* they were long lived. Since Pooled objects count as short lived
when computing the GC trigger and goal, but act as long lived objects
in the live heap, this causes GC to trigger too frequently. If Pooled
objects are a non-trivial portion of an application's heap, this
increases the CPU overhead of GC. The victim cache lets Pooled objects
affect the GC trigger and goal as long-lived objects.

This has no impact on Get/Put performance, but substantially reduces
the impact to the Pool user when a GC happens. PoolExpensiveNew
demonstrates this in the substantially reduction in the rate at which
the "New" function is called.

name                 old time/op     new time/op     delta
Pool-12                 2.21ns ±36%     2.00ns ± 0%     ~     (p=0.070 n=19+16)
PoolOverflow-12          587ns ± 1%      583ns ± 1%   -0.77%  (p=0.000 n=18+18)
PoolSTW-12              5.57µs ± 3%     4.52µs ± 4%  -18.82%  (p=0.000 n=20+19)
PoolExpensiveNew-12     3.69ms ± 7%     1.25ms ± 5%  -66.25%  (p=0.000 n=20+19)

name                 old p50-ns/STW  new p50-ns/STW  delta
PoolSTW-12               5.48k ± 2%      4.53k ± 2%  -17.32%  (p=0.000 n=20+20)

name                 old p95-ns/STW  new p95-ns/STW  delta
PoolSTW-12               6.69k ± 4%      5.13k ± 3%  -23.31%  (p=0.000 n=19+18)

name                 old GCs/op      new GCs/op      delta
PoolExpensiveNew-12       0.39 ± 1%       0.32 ± 2%  -17.95%  (p=0.000 n=18+20)

name                 old New/op      new New/op      delta
PoolExpensiveNew-12       40.0 ± 6%       12.4 ± 6%  -68.91%  (p=0.000 n=20+19)

重點在註釋的第一段，以前 Pool 的原理：如果對象在 GC 時只有 Pool 引用這個對象，那麼會在 GC 時被釋放掉。但是對於 Pool 重度用戶來講，GC 後會有大量的對象分配創建，影響吞吐和性能。這個 patch 就是爲了讓更平滑，變成了對象至少存活兩個 GC 區間。

func poolCleanup() {
    // This function is called with the world stopped, at the beginning of a garbage collection.
    // It must not allocate and probably should not call any runtime functions.

    // Because the world is stopped, no pool user can be in a
    // pinned section (in effect, this has all Ps pinned).

    // Drop victim caches from all pools.
    for _, p := range oldPools {
        p.victim = nil
        p.victimSize = 0
    }

    // Move primary cache to victim cache.
    for _, p := range allPools {
        p.victim = p.local
        p.victimSize = p.localSize
        p.local = nil
        p.localSize = 0
    }

    // The pools with non-empty primary caches now have non-empty
    // victim caches and no pools have primary caches.
    oldPools, allPools = allPools, nil
}

可以看下新版 poolCleanup 函數最後一行。使用時 Get 會在 slow path 邏輯裏調用 victim cache.

總結

衡量一個基礎組件，不僅要看他的性能，還要考濾穩定性，尤其是這種語言標準庫。

轉自：jianshu.com/p/2e08332481c5

鏈接：董澤潤

本文由 Readfog 進行 AMP 轉碼，版權歸原作者所有。
來源：https://mp.weixin.qq.com/s/6yO3xBsZm7y6-7O9IIdXDA