eBPF Talk: bpf map helpers 內聯優化

通過學習 eBPF Talk: eBPF helpers 的另一面，我們知道 eBPF helpers 是通過 bpf_func_proto 結構體來實現的，不同的 helpers 函數對應不同的 bpf_func_proto；甚至，對於同一個 helpers 函數，不同類型的 bpf prog 會有不同的 bpf_func_proto。

對於 bpf map 的 helpers，在 runtime 時，會調用 bpf_map_lookup_elem、bpf_map_update_elem、bpf_map_delete_elem 等 helpers 的 bpf_func_proto 裏的 func，還是會調用 bpf map 對應的 bpf_map_ops 裏的 map_lookup_elem、map_update_elem、map_delete_elem 等函數呢？

內聯 bpf map helpers

bpf: avoid retpoline for lookup/update/delete calls on maps[1] since 4.18 kernel

在該 commit 裏，將 bpf map 的 helpers 函數編號直接轉換爲 bpf_map_ops 裏的函數指針，從而避免了非常消耗性能的間接調用。

在 verifier.c 的 bpf_check() 主函數最後階段，即在做完所有檢查之後，調用 fixup_bpf_calls() 函數修復 BPF_CALL 指令。

在該 commit 裏，內聯的具體實現如下：

// https://github.com/torvalds/linux/commit/09772d92cd5ad998b0d5f6f46cd1658f8cb698cf

/* fixup insn->imm field of bpf_call instructions
 * and inline eligible helpers as explicit sequence of BPF instructions
 *
 * this function is called after eBPF program passed verification
 */
static int fixup_bpf_calls(struct bpf_verifier_env *env)
{
    // ...

        /* BPF_EMIT_CALL() assumptions in some of the map_gen_lookup
         * and other inlining handlers are currently limited to 64 bit
         * only.
         */
        if (prog->jit_requested && BITS_PER_LONG == 64 &&
            insn->imm == BPF_FUNC_map_lookup_elem) {
            (insn->imm == BPF_FUNC_map_lookup_elem ||
             insn->imm == BPF_FUNC_map_update_elem ||
             insn->imm == BPF_FUNC_map_delete_elem)) {
            aux = &env->insn_aux_data[i + delta];
            if (bpf_map_ptr_poisoned(aux))
                goto patch_call_imm;

            map_ptr = BPF_MAP_PTR(aux->map_state);
            ops = map_ptr->ops;

            // ...

            switch (insn->imm) {
            case BPF_FUNC_map_lookup_elem:
                insn->imm = BPF_CAST_CALL(ops->map_lookup_elem) -
                        __bpf_call_base;
                continue;
            case BPF_FUNC_map_update_elem:
                insn->imm = BPF_CAST_CALL(ops->map_update_elem) -
                        __bpf_call_base;
                continue;
            case BPF_FUNC_map_delete_elem:
                insn->imm = BPF_CAST_CALL(ops->map_delete_elem) -
                        __bpf_call_base;
                continue;
            }

            goto patch_call_imm;
        }

    // ...
}

以上代碼片段的處理邏輯：將 BPF_CALL 指令的 imm 值轉換爲相對 __bpf_call_base 的偏移量，爲了給後面的 JIT 時或者解釋執行時使用。

JIT 函數調用

不像解釋執行 bpf insn，在 JIT 時就將 BPF_CALL 指令轉換爲函數調用的機器指令了。

P.S. 以下代碼基於 bpf-next 分支的較新本版的內核倉庫。

// ${KERNEL}/arch/x86/net/bpf_jit_comp.c

static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image, u8 *rw_image,
          int oldproglen, struct jit_context *ctx, bool jmp_padding)
{
    // ...

        case BPF_JMP | BPF_CALL:
            func = (u8 *) __bpf_call_base + imm32;
            if (tail_call_reachable) {
                /* mov rax, qword ptr [rbp - rounded_stack_depth - 8] */
                EMIT3_off32(0x48, 0x8B, 0x85,
                        -round_up(bpf_prog->aux->stack_depth, 8) - 8);
                if (!imm32 || emit_call(&prog, func, image + addrs[i - 1] + 7))
                    return -EINVAL;
            } else {
                if (!imm32 || emit_call(&prog, func, image + addrs[i - 1]))
                    return -EINVAL;
            }
            break;

    // ...
}

static int emit_patch(u8 **pprog, void *func, void *ip, u8 opcode)
{
    u8 *prog = *pprog;
    s64 offset;

    offset = func - (ip + X86_PATCH_SIZE);
    if (!is_simm32(offset)) {
        pr_err("Target call %p is out of range\n", func);
        return -ERANGE;
    }
    EMIT1_off32(opcode, offset);
    *pprog = prog;
    return 0;
}

static int emit_call(u8 **pprog, void *func, void *ip)
{
    return emit_patch(pprog, func, ip, 0xE8); // 0xE8 is x86 call opcode
}

以上代碼片段的處理邏輯：通過 BPF_CALL 指令的 imm 值加上 __bpf_call_base 得到目標函數的地址，然後通過 emit_call() 函數將目標函數的地址轉爲 x86 平臺的 call 指令。

小結

通過上面的分析，我們可以看到：

helpers 函數在編譯時就被編譯成帶有函數編號的 BPF_CALL 指令。
在校驗時，會將 BPF_CALL 指令的 imm 值轉換爲相對 __bpf_call_base 的偏移量。
在 JIT 時，會將 BPF_CALL 指令的 imm 值加上 __bpf_call_base 得到目標函數的地址，然後通過 emit_call() 函數將目標函數的地址轉爲 x86 平臺的 call 指令。

參考資料

[1]

bpf: avoid retpoline for lookup/update/delete calls on maps: https://github.com/torvalds/linux/commit/09772d92cd5ad998b0d5f6f46cd1658f8cb698cf

本文由 Readfog 進行 AMP 轉碼，版權歸原作者所有。
來源：https://mp.weixin.qq.com/s/POiH2oWwrVPMED-kNdnbYQ

內聯 bpf map helpers

JIT 函數調用

小結

參考資料

猜你喜歡