renovate: 處理 Postgres 模式遷移

去年 10 月，我在 review 數據庫遷移代碼時，不斷回溯十多個已有的遷移文件，艱難地試圖瞭解目前數據庫 schema 的最終形態時，萌生了做一個數據庫模式遷移工具的想法。當時主流的模式遷移工具，無論是直接撰寫 SQL，還是撰寫某個語言的 DSL，都要求開發者以數據庫上一次遷移的狀態爲基礎，撰寫對該狀態的更改。比如要對已有的 todos 表加一個字段 created_at，我需要創建一個新的遷移文件，撰寫類似如下的代碼：

ALTER TABLE todos ADD COLUMN created_at timestamptz;

當一個數據庫維護數年，這樣的遷移腳本會多達數十個甚至上百個，導致閱讀和維護上的不便。更重要的是，手工撰寫遷移腳本是一件反直覺的事情，它和我們正常的修改更新邏輯是割裂的。

於是 10 月份，我開始思考如何解決這個問題。我查閱了一些已有的開源項目，並詳細研究了 golang 下的 atlas（https://github.com/ariga/atlas）。它是最接近於我想要的工具：通過描述當前數據庫模式，然而自動生成遷移腳本。然而 atlas 對 Postgres 的支持並不太好，生成的 migration plan 很多時候都是破壞性的（比如 drop table 再 crate table），這根本無法在生產環境使用。此外，atlas 使用了類似 Terraform 的 HCL 來描述數據庫模式，這讓人很抓狂 —— 我需要學習新的語法，並且在腦海中爲 SQL DDL 和 HCL 中建立相應的映射，才能很好地修改數據庫模式。

在對開源項目的一番探索卻收穫不大後，我開始着手思考如何自己解決這一問題。我有兩個剛性的目標：

使用 SQL 來描述 schema，而不是發明一種新的語言
生成的 migration plan 儘量避免破壞性更新

於是我給這個項目起了個名字：Renovate，然後開始撰寫 RFC（見：https://github.com/tyrchen/renovate/blob/master/rfcs/0001-sql-migration.md）來梳理思路，構想我自己的解決方案。這是我當時寫下的整個用戶流程：

# dump all the schemas into a folder
$ renovate schema init --url postgres://user@localhost:5432/hello
Database schema has successfully dumped into ./hello.
# if schema already exists, before modifying it, it is always a good practice to fetch the latest schema. Fetch will fail if current folder is not under git or it is not up to date with remote repository.
$ renovate schema fetch
# do whatever schema changes you want
# then run plan to see what changes will be applied. When redirect to a file, it will just print all the SQL statements for the migration.
$ renovate schema plan
Table auth.users changed:
create table auth.users(
    id uuid primary key,
    name text not null,
    email text not null,
    password text not null,
-   created_at timestamptz not null,
+   created_at timestamptz not null default now(),
+   updated_at timestamptz not null
);
The following SQLs will be applied:
    alter table auth.users add column updated_at timestamptz not null;
    alter table auth.users alter column created_at set default now();
# then apply the changes
$ renovate apply
Your repo is dirty. Please commit the changes before applying.
$ git commit -a -m "add updated_at column and set default value for created_at"
# now you can directly apply
# apply can use -p to run a previously saved plan or manually edited plan
# the remove schema and the plan being executed will be saved in _meta/plans/202109301022/.
$ renovate apply
The following SQLs will be applied:
    alter table auth.users add column updated_at timestamptz not null;
    alter table auth.users alter column created_at set default now();
Continue (y/n)? y
Successfully applied migration to postgres://user@localhost:5432/hello.
Your repo is updated with the latest schema. See `git diff HEAD~1` for details.

我的大概想法是：用戶可以創建一個 db schema repo，用 git 管理 schema 的修改。用戶不必考慮 schema migration，只需在現有的 schema 上修改即可，當 renovate schema plan 時，Renovate 會通過 pg_dump 來獲取遠端的 schema，然後本地和和遠端的 SQL 都會被解析成 AST，二者在 AST 級別對比找不同即可。

有了這個思路，接下來就是一些大的數據結構的定義，比如 postgres 下的一個 schema 可以這樣描述：

pub struct Schema {
    pub types: BTreeMap<String, DataType>,
    pub tables: BTreeMap<String, Table>,
    pub views: BTreeMap<String, View>,
    pub functions: BTreeMap<String, Function>,
    pub triggers: BTreeMap<String, Trigger>,
}

一個 table 可以這麼描述：

pub struct Table {
    pub columns: BTreeMap<String, Column>,
    pub constraints: BTreeMap<String, Constraint>,
    pub privileges: BTreeMap<String, Privilege>,
}

每個級別的數據都需要實現 Planner trait：

pub trait Planner {
    fn diff(&self, remote: &Self) -> Vec<Diff>;
    fn plan(&self, diff: &[Diff]) -> Vec<Plan>;
}

這樣，我們就可以從頂層的 schema 一層層追溯到一個 table 的 column 下的 constraint，進行 diff 並給出 migration plan。整體的架構如下（圖是今天畫的，大致思路沒變）：

思路有了，我就開始有一搭沒一搭地爲每個數據結構寫一些基礎的 parser，然後實現其 migration planner trait。最初，處理的都是一些比較容易的情況，比如用戶修改 index 後，我們可以刪除舊的 index，再創建新的 index，如下所示：

#[test]
fn changed_index_should_generate_migration() {
    let sql1 = "CREATE INDEX foo ON bar (baz)";
    let sql2 = "CREATE INDEX foo ON bar (ooo)";
    let old: TableIndex = sql1.parse().unwrap();
    let new: TableIndex = sql2.parse().unwrap();
    let diff = old.diff(&new).unwrap().unwrap();
    let migrations = diff.plan().unwrap();
    assert_eq!(migrations[0], "DROP INDEX foo");
    assert_eq!(migrations[1], "CREATE INDEX foo ON bar USING btree (ooo)");
}

這樣斷斷續續寫了近兩千行代碼後，我卡在了 table migration 上。這裏的數據結構和狀態至多，讓人望而生畏。很多 column 級別的改動需要一點點對着 AST 扣細節，很是折磨人。於是我就將其放在一邊。

上週四，我們 Tubi 一年一度的 Hackathon 又開始了。我自己有好幾個想嘗試的項目：

繼續開發 Renovate，將其推進成一個可用的產品
開發一個通過 JSON 生成 UI 的工具
使用 pulumi + CloudFront function + CloudFront + lambda function (deno layer + deno code) 構建一個 serverless framework

考慮再三，我還是選擇繼續開發 Renovate，因爲我不確定如果再放久一點，這個項目是否也會步其他未完成的項目後塵，永遠被撂在一邊。

於是，加上週末兩天總共四天，刨去開會，面試，接送娃上課後班等開銷，我在這個項目上花費了大約 30 小時，又寫下了兩千五百多行代碼：

其中包含 57 個單元測試和 1 個 CLI 測試（包含 5 個 CLI 測試項），項目總體有 73% 的覆蓋率：

最終的成品，已經非常接近我心目中數據庫遷移工具的樣子，大家可以自行去 https://github.com/tyrchen/renovate 代碼庫感受。我用 asciinema 錄了個簡單的 demo：https://asciinema.org/a/N7Pd3gDPGFcpCddREJKAKTtbx，有條件的同學可以去看看。沒條件的看低清 gif 吧：

在這個 demo 裏，我先是用 pgcli 爲一個空的 neon db 創建了一個 todo 表，之後用 renovate schema init 獲取 neon db 的 schema，本地創建了一個 schema repo。隨後我修改了數據庫，添加了字段，然後使用 renovate schema plan 和 renovate schema apply 生成 migration 並執行。一切如德芙般絲滑。

一些心得

從 1 到 100

Renovate 這個項目，技術上並沒有太大的挑戰 —— 一旦思路確定，剩下的就是工作量。工作量包括兩部分：1) 龐雜的 SQL 語句的 AST diff 的支持，以及 2) 如何儘可能把細節掩蓋，給用戶一個良好的使用體驗。然而我自己很多時候過於關注從 0 到 1 的問題，對做 PoC 樂此不疲，而忽視從 1 到 100 的問題。如果不是這次 Hackathon，Renovate 差點又成爲我的另一個 PoC。在過去的 4 天裏，我幾乎就是解決完一個細節，再解決下一個，前前後後一共發佈了近 20 個平平無奇的小版本。這些小版本無非就是支持一下 default constraint 或者解決 varchar(256)[] 解析的問題，但就是這樣一個個瑣碎的功能，共同構築了目前 Renovate 還算不錯的用戶體驗。

把 trait 設計當作架構和設計的一部分

trait 是 Rust 做軟件開發的靈魂，我們應該在做架構設計時就考慮 trait。不僅如此，還可以在實現的時候爲局部代碼引入 trait（局部設計）。我在處理整個 db schema plan 時遇到 DRY 的問題：數據結構可能是 BTreeMap<_, T>, BTreeMap<_, BTreeMap<_, T>>, BTreeMap<_, BTreeSet<T>> 等，它們有類似的 diff 的結構，如果爲每種結構寫一份大致差不多的代碼，維護成本很高；如果使用宏（macro_rules），又帶來代碼代碼閱讀和日後重構的痛苦。此時，使用 trait 是最好的方案：

trait SchemaPlan {
    fn diff_altered(&self, remote: &Self, verbose: bool) -> Result<Vec<String>>;
    fn diff_added(&self, verbose: bool) -> Result<Vec<String>>;
    fn diff_removed(&self, verbose: bool) -> Result<Vec<String>>;
}
impl<T> SchemaPlan for T
where
    T: NodeItem + Clone + FromStr<Err = anyhow::Error> + PartialEq + Eq + 'static,
    NodeDiff<T>: MigrationPlanner<Migration = String> { ... }
impl<T> SchemaPlan for BTreeMap<String, T>
where
    T: NodeItem + Clone + FromStr<Err = anyhow::Error> + PartialEq + Eq + 'static,
    NodeDiff<T>: MigrationPlanner<Migration = String> { ... }
impl<T> SchemaPlan for BTreeSet<T>
where
    T: NodeItem + Clone + FromStr<Err = anyhow::Error> + PartialEq + Eq + Ord + Hash + 'static,
    NodeDiff<T>: MigrationPlanner<Migration = String> { ... }
fn schema_diff<K, T>(
    local: &BTreeMap<K, T>,
    remote: &BTreeMap<K, T>,
    verbose: bool,
) -> Result<Vec<String>>
where
    K: Hash + Eq + Ord,
    T: SchemaPlan,
{ ... }

使用 trait 後，我可以用一份 schema_diff 完成好幾份工作，還不用擔心可維護性。

在 Renovate 項目中，我一共設計了這些 trait：

CommandExecutor：統一處理 CLI
SchemaLoader：處理 local repo 和 remote db 加載和解析 SQL
SqlSaver：保存 sql
NodeItem：統一 db object 的行爲
Differ：對數據庫中的對象進行 diff
MigrationPlanner：處理 migration
DeltaItem：生成細粒度的 migration
SqlFormatter：格式化 sql
MigrationExecutor：執行 migration
SchemaPlan：見上文

它們共同構築了 Renovate 的主脈絡。

避免使用 macro_rules，儘量使用泛型函數

我之前有個不太好的習慣，就是複雜的重複性的邏輯，我會順手將其寫成 macro_rules，便於複用。然而，宏不容易閱讀，也不太好單元測試，很多工具對宏都支持不好（比如 lint tool），所以，在使用 macro_rules 時，想想看，是否可以通過泛型函數將其取代。上文中的 schema_diff，一開始我是用宏實現的，後來做了一些大的重構，才改成了現在的模樣。雖然使用泛型函數，類型修飾會非常辣眼睛，但帶來的巨大好處值得這樣的重構。

做，做就能贏

《讓子彈飛》中有句著名的臺詞：「打，打就能贏」，我把它稍作修改當小標題。在 hackathon 開始時，Renovate 會何去何從我非常沒底，但快速爲一個很傻很天真的版本構建最基本的用戶界面，並將其展示給別人時（我錄了個屏發公司 hackathon 的 slack channel 裏），就能收到有意義的反饋。根據反饋，我調整了 CLI 的用戶體驗，思考了如何讓 Renovate 適用於不同的環境（開發環境，生產環境等）。

爲了錄屏，我重新拾起好久不用的 aciinema；後來爲了讓錄屏的體驗在 github 好一些，我又找到了 agg 這個可以把 asciinema 錄屏轉換成 gif 的工具。就這樣一點點，我完善用戶體驗，完善文檔，在讓產品變得更好的同時，不經意掌握了一些新的工具。

與此同時，我對 Rust 的使用也更加熟絡，也更加熟練地利用遞歸處理讓人頭大的 AST。

有時候你真的很難分辨究竟是「能者多勞」還是「勞者多能」。對於這樣一段旅程，其目的地固然重要，但沿途的風景也是超值的收穫。假如沒有這次 hackathon，我大概率也不會寫這篇文章，也就少了一次對着鏡子總結和自我審視的機會。所以，無論如何，做就完了，做，做就已經贏在路上了。

題圖：AI 生成 optimus prime is cooking Italian noodles for a cute toddler, bumblebee makes laughs at him. Digital art

本文由 Readfog 進行 AMP 轉碼，版權歸原作者所有。
來源：https://mp.weixin.qq.com/s/FZxbr19xKm65o5kO9vBiUg

一些心得

從 1 到 100

把 trait 設計當作架構和設計的一部分

避免使用 macro_rules，儘量使用泛型函數

做，做就能贏

猜你喜歡