6 張圖帶你深入瞭解 kube-scheduler
kube-scheduler 用途
顧名思義:負責將 Pod 調度到 Node 上。
Pod 創建流程:
說明:所有的組件只與 Apiserver 做交互,Apiserver 再把信息更新的 Etcd 中。
-
用戶向 Kubernetes API server 發送創建(create/apply)指令。
-
Apiserver 接收到配置文件,進行校驗後,將配置數據存儲到 etcd 中。
-
Controller-manager 監聽 Apiserver 的變化,檢測到有新的 Pod 對象時,控制器創建 Pod 並將其狀態設爲 Pending。
-
Scheduler 也通過監聽 Apiserver 的變化,發現有新的、尚未分配節點的 Pod。根據預選策略和優選策略,選擇一個最適合的 Node 來運行新的 Pod。
-
Pod 被調度到對應的 Node 後,該 Node 上的 kubelet 組件則開始根據 Pod 配置文件,拉鏡像、啓動 app、就緒探針探測。
-
Kubelet 向 Apiserver 上報狀態爲 Reday, Apiserver 寫入到 etcd 中。
Scheduler 調度流程
Scheduler 的作用是 負責將 Pod 調度到 Node 上。
如果讓你設計這個組件,你會如何設計,保證它穩定高效的運行。
1)需要能夠實時監聽到 有新的 Pod 待調度
2)同一時間如果有大量待調度的 Pod,如果處理,如果保證不能漏掉,應該先處理哪個 Pod,調度過程中,如果失敗,如何處理, 所以得加個隊列,有重試機制等
3)調度過程中依賴 Node、Pod 的實時信息,根據 Node、Pod 信息,決策 Pod 調度到哪個 Node 上合適,每次調度 調 Apiserver ,顯然低效, 得在本地緩存一份數據,加個緩存
4)調度選擇過程中,考慮因素太多,很難周全,可擴展性一定要設計好
5)Pod 綁定過程中 可能依賴 pvc 綁定等,耗時較長, 所以綁定得是異步的, 但是匹配哪個 Node 合適的算法 需要同步執行,所以要有兩個週期, 調度週期和綁定週期,調度週期串行,綁定週期並行
duang,框架這不就出來的了
源碼調用鏈路
原圖放到 Github 上了,需要的自取,圖片使用 draw.io 畫的,可打開後二次編輯
https://github.com/clay-wangzhi/draw/blob/main/k8s-scheduler.png
# 42個序號對應的源碼位置依次爲:
1 找到啓動主函數
https://github.com/kubernetes/kubernetes/blob/v1.31.0/cmd/kube-scheduler/scheduler.go#L30
https://github.com/kubernetes/kubernetes/blob/v1.31.0/cmd/kube-scheduler/app/server.go#L81
https://github.com/kubernetes/kubernetes/blob/v1.31.0/cmd/kube-scheduler/scheduler.go#L31
https://github.com/kubernetes/kubernetes/blob/v1.31.0/cmd/kube-scheduler/app/server.go#L134
2 Setup 初始化
https://github.com/kubernetes/kubernetes/blob/v1.31.0/cmd/kube-scheduler/app/server.go#L153
https://github.com/kubernetes/kubernetes/blob/v1.31.0/cmd/kube-scheduler/app/server.go#L384
3、16 初始化 scheduler 實例
https://github.com/kubernetes/kubernetes/blob/v1.31.0/cmd/kube-scheduler/app/server.go#L413
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/scheduler.go#L363
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/scheduler.go#L65
4、5 初始化 snapshot 實例
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/scheduler.go#L293
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/internal/cache/snapshot.go#L48
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/internal/cache/snapshot.go#L29
6、7、8、9 初始化 profiles、fwk 實例
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/scheduler.go#L304
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/profile/profile.go#L49
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/profile/profile.go#L38
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/framework/runtime/framework.go#L260
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/framework/runtime/framework.go#L53
10、11、12 初始化 podQueue 實例
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/scheduler.go#L340
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/internal/queue/scheduling_queue.go#L134
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/internal/queue/scheduling_queue.go#L372
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/internal/queue/scheduling_queue.go#L155
13、14、15 初始化 schedulerCache 實例
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/scheduler.go#L357
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/internal/cache/cache.go#L41
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/internal/cache/cache.go#L87
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/internal/cache/cache.go#L57
17、18 運行 scheduler
https://github.com/kubernetes/kubernetes/blob/v1.31.0/cmd/kube-scheduler/app/server.go#L159
https://github.com/kubernetes/kubernetes/blob/v1.31.0/cmd/kube-scheduler/app/server.go#L163
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/scheduler.go#L460
19、運行 SchedulingQueue
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/scheduler.go#L462
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/internal/queue/scheduling_queue.go#L417
20、21 從隊列中拿出 Pod 進行調度
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/scheduler.go#L470
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L65
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/internal/queue/scheduling_queue.go#L944
獲取 fwk
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L85
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L368
22、23、24、25、26、27、28、29、30、31、32、33、34、35 進入 調度週期
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L110
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L138
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L148
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L400
更新 Snapshot
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L403
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/internal/cache/cache.go#L185
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L412
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/internal/cache/snapshot.go#L173
運行 PreFilterPlugins
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L463
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/framework/runtime/framework.go#L698
運行 FilterPlugin
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L507
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L582
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L616
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L649
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/framework/runtime/framework.go#L973
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/framework/runtime/framework.go#L861
運行 PreScorePlugins
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L435
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L754
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L777
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/framework/runtime/framework.go#L1052
運行 ScorePlugins
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L783
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/framework/runtime/framework.go#L1101
運行 ReservePluginsReserve
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L208
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/framework/runtime/framework.go#L1359
運行 PermitPlugins
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L230
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/framework/runtime/framework.go#L1443
36、37、38、39、40、41 進入 綁定週期
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L124
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L264
運行 WaitOnPermit
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L277
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/framework/runtime/framework.go#L1503
運行 PreBindPlugins
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L293
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/framework/runtime/framework.go#L1232
運行 BindPlugins
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L309
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L967
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L977
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/framework/runtime/framework.go#L1275
運行 PostBindPlugins
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L322
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/framework/runtime/framework.go#L1324
42 標記 Pod 調度已完成,不要回隊列
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/schedule_one.go#L131
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/internal/queue/scheduling_queue.go#L981
https://github.com/kubernetes/kubernetes/blob/v1.31.0/pkg/scheduler/internal/queue/scheduling_queue.go#L988
具體代碼就不貼了,太多了,可以根據調用鏈路圖,梳理下,看不懂的可以問 gpt。
Scheduler Framework 調度器
上面調度鏈路圖,可以清晰的看出 Framework 調度算法的擴展點。
具體每個擴展點包含哪些 plugin, 每個 plugin 可作用於那些擴展點,如下圖:
如何擴展
如果我們要實現自己的插件,必須向調度框架註冊插件並完成配置,另外還必須實現擴展點接口。
1)向調度框架註冊插件 & 擴展點實現接口如下:
out-of-tree 實現擴展示例
main.go
package main
import (
"os"
"k8s.io/component-base/cli"
_ "k8s.io/component-base/metrics/prometheus/clientgo" // for rest client metric registration
_ "k8s.io/component-base/metrics/prometheus/version" // for version metric registration
"k8s.io/kubernetes/cmd/kube-scheduler/app"
"xxx/pkg/example"
// Ensure scheme package is initialized.
_ "sigs.k8s.io/scheduler-plugins/apis/config/scheme"
)
func main() {
// Register custom plugins to the scheduler framework.
// Later they can consist of scheduler profile(s) and hence
// used by various kinds of workloads.
command := app.NewSchedulerCommand(
app.WithPlugin(example.Name, example.New),
)
code := cli.Run(command)
os.Exit(code)
}
example.go
package example
import (
"context"
v1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/klog/v2"
"k8s.io/kubernetes/pkg/scheduler/framework"
)
const Name = "example"
var _ framework.FilterPlugin = &ExamplePlugin{}
type ExamplePlugin struct{}
// NewExampleSchedPlugin initializes a new plugin and returns it.
func NewExamplePlugin(_ runtime.Object, _ framework.Handle) (framework.Plugin, error) {
return &ExamplePlugin{}, nil
}
func (e *ExamplePlugin) Filter(ctx context.Context, state *framework.CycleState, pod *v1.Pod, nodeInfo *framework.NodeInfo) *framework.Status {
cpu := nodeInfo.Allocatable.MilliCPU
memory := nodeInfo.Allocatable.Memory
klog.InfoS("tanjunchen-scheduler Filter", "pod_name", pod.Name, "current node", nodeInfo.Node().Name, "cpu", cpu, "memory", memory)
return framework.NewStatus(framework.Success, "")
}
func (e *ExamplePlugin) Name() string {
return Name
}
2)進行配置, enable 插件
編寫 kube-scheduler 啓動命令配置文件 example-cm.yaml
一個 ksc 裏面可以描述多個 profile, 會啓動多個獨立 scheduler。
pod 想用哪個 schduler,就填對應的 schdulerName。如果沒指定,就是 default-scheduler。
由於這個配置是給 kube-scheduler 的,而不是 kube-apiserver,
所以
k api-resources
或k get KubeSchedulerConfiguration
都是找不到這個資源的。
apiVersion: v1
kind: ConfigMap
metadata:
name: example-scheduler-config
namespace: kube-system
data:
scheduler-config.yaml: |
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
leaderElection:
leaderElect: false
clientConnection:
acceptContentTypes: ""
burst: 100
contentType: application/vnd.kubernetes.protobuf
qps: 100
profiles:
- schedulerName: example-scheduler
plugins:
filter:
enabled:
- name: "example"
3) 使用自定義 scheduler
Pod yaml 文件中指定 schedulerName: example-scheduler
即可使用自定義 Scheduler 了
rbac 授權 及 自定義 Scheduler 部署文件:略
enjoy~
參考鏈接:
-
深入理解 Kubernetes Scheduler Framework 調度框架(Part 2):https://tanjunchen.github.io/post/2024-04-07-scheduler-framework-02/
-
K8s 調度框架設計與 scheduler plugins 開發部署示例(2024):https://arthurchiao.art/blog/k8s-scheduling-plugins-zh/
本文由 Readfog 進行 AMP 轉碼,版權歸原作者所有。
來源:https://mp.weixin.qq.com/s/2elOZD0yaBf-WvMCSD5zHQ