babel 背後到底執行了什麼？

babel 對於大多數前端開發人員來說，不陌生，但是背後的原理是黑盒。

我們需要了解 babel 背後的原理在我們開發中廣泛應用。

一、babel 簡單應用

[1,2,3].map(n => n+1);

經過 babel 轉譯之後，代碼變成這樣

[1,2,3].map(function(n){
  return n + 1;
})

那我們應該知道了 babel 定位：babel 將 ES6 新引進的語法轉換爲瀏覽器可以運行的 ES5 語法。

二、babel 背後

babel 過程：解析 ---- 轉換 --- 生成。

babel 背後過程

我們看到一個叫 AST(抽象語法樹) 的東西。

主要三個過程：

• 解析：將代碼（字符串）轉換爲 AST（抽象語法樹）。• 轉換：訪問 AST 的節點進行變化操作生成新的 AST。• 生成：以新的 AST 爲基礎生成代碼

三、過程 1：代碼解析（parse）

代碼解析（parse）將一段代碼解析成一個數據結構。其中主要關鍵步驟：

• 詞法分析：代碼（字符串）分割成 token 流。即語法單元組成的數組。• 語法分析：分析 token 流（生成的數組）生成 AST。

3.1 詞法分析

詞法分析，首先明白 JS 中哪些屬於語法單元？

• 數字：js 中科學計數法以及普通數組都是語法單元。• 括號：(和) 只要出現，不管意義都算是語法單元。• 標識符：連續字符，常見變量，常量，關鍵字等等 • 運算符：+，-，*，/ 等。• 註釋和中括號。

我們來看一下簡單的詞法分析器（Tokenizer）

// 詞法分析器,接收字符串返回token數組
export const tokenizer = (code) => {
    // 儲存 token 的數組
    const tokens  = [];
    // 指針
    let current = 0;
    while (current < code.length) {
        // 獲取指針指向的字符
        const char = code[current];
        // 我們先處理單字符的語法單元 類似於`;` `(` `)`等等這種
        if (char === '(' || char === ')') {
            tokens.push({
                type: 'parens',
                value: char,
            });
            current ++;
            continue;
        }
        // 我們接着處理標識符,標識符一般爲以字母、_、$開頭的連續字符
        if (/[a-zA-Z\$\_]/.test(char)) {
            let value = '';
            value += char;
            current ++;
            // 如果是連續字那麼將其拼接在一起,隨後指針後移
            while (/[a-zA-Z0-9\$\_]/.test(code[current]) && current < code.length) {
                value += code[current];
                current ++;
            }
            tokens.push({
                type: 'identifier',
                value,
            });
            continue;
        }
        // 處理空白字符
        if (/\s/.test(char)) {
            let value = '';
            value += char;
            current ++;
            //道理同上
            while (/\s]/.test(code[current]) && current < code.length) {
                value += code[current];
                current ++;
            }
            tokens.push({
                type: 'whitespace',
                value,
            });
            continue;
        }
        // 處理逗號分隔符
        if (/,/.test(char)) {
            tokens.push({
                type: ',',
                value: ',',
            });
            current ++;
            continue;
        }
        // 處理運算符
        if (/=|\+|>/.test(char)) {
            let value = '';
            value += char;
            current ++;
            while (/=|\+|>/.test(code[current])) {
                value += code[current];
                current ++;
            }
            // 當 = 後面有 > 時爲箭頭函數而非運算符
            if (value === '=>') {
                tokens.push({
                    type: 'ArrowFunctionExpression',
                    value,
                });
                continue;
            }
            tokens.push({
                type: 'operator',
                value,
            });
            continue;
        }
        // 如果碰到我們詞法分析器以外的字符,則報錯
        throw new TypeError('I dont know what this character is: ' + char);
    }
    return tokens;
};

上述的這個詞法分析器：主要是針對例子的箭頭函數。

3.2 語法分析

語法分析之所以複雜, 是因爲要分析各種語法的可能性, 需要開發者根據 token 流 (上一節我們生成的 token 數組) 提供的信息來分析出代碼之間的邏輯關係, 只有經過詞法分析 token 流才能成爲有結構的抽象語法樹.

做語法分析最好依照標準, 大多數 JavaScript Parser 都遵循 estree 規範

1、語句 (Statements): 語句是 JavaScript 中非常常見的語法, 我們常見的循環、if 判斷、異常處理語句、with 語句等等都屬於語句。

2、表達式 (Expressions): 表達式是一組代碼的集合，它返回一個值, 表達式是另一個十分常見的語法, 函數表達式就是一種典型的表達式, 如果你不理解什麼是表達式, MDN 上有很詳細的解釋.

3、聲明 (Declarations): 聲明分爲變量聲明和函數聲明, 表達式(Expressions) 中的函數表達式的例子用聲明的寫法就是下面這樣.

const parser = tokens => {
    // 聲明一個全時指針，它會一直存在
    let current = -1;
    // 聲明一個暫存棧,用於存放臨時指針
    const tem = [];
    // 指針指向的當前token
    let token = tokens[current];
    const parseDeclarations = () => {
        // 暫存當前指針
        setTem();
        // 指針後移
        next();
        // 如果字符爲'const'可見是一個聲明
        if (token.type === 'identifier' && token.value === 'const') {
            const declarations = {
                type: 'VariableDeclaration',
                kind: token.value
            };
            next();
            // const 後面要跟變量的,如果不是則報錯
            if (token.type !== 'identifier') {
                throw new Error('Expected Variable after const');
            }
            // 我們獲取到了變量名稱
            declarations.identifierName = token.value;
            next();
            // 如果跟着 '=' 那麼後面應該是個表達式或者常量之類的,額外判斷的代碼就忽略了,直接解析函數表達式
            if (token.type === 'operator' && token.value === '=') {
                declarations.init = parseFunctionExpression();
            }
            return declarations;
        }
    };
    const parseFunctionExpression = () => {
        next();
        let init;
        // 如果 '=' 後面跟着括號或者字符那基本判斷是一個表達式
        if (
            (token.type === 'parens' && token.value === '(') ||
            token.type === 'identifier'
        ) {
            setTem();
            next();
            while (token.type === 'identifier' || token.type === ',') {
                next();
            }
            // 如果括號後跟着箭頭,那麼判斷是箭頭函數表達式
            if (token.type === 'parens' && token.value === ')') {
                next();
                if (token.type === 'ArrowFunctionExpression') {
                    init = {
                        type: 'ArrowFunctionExpression',
                        params: [],
                        body: {}
                    };
                    backTem();
                    // 解析箭頭函數的參數
                    init.params = parseParams();
                    // 解析箭頭函數的函數主體
                    init.body = parseExpression();
                } else {
                    backTem();
                }
            }
        }
        return init;
    };
    const parseParams = () => {
        const params = [];
        if (token.type === 'parens' && token.value === '(') {
            next();
            while (token.type !== 'parens' && token.value !== ')') {
                if (token.type === 'identifier') {
                    params.push({
                        type: token.type,
                        identifierName: token.value
                    });
                }
                next();
            }
        }
        return params;
    };
    const parseExpression = () => {
        next();
        let body;
        while (token.type === 'ArrowFunctionExpression') {
            next();
        }
        // 如果以(開頭或者變量開頭說明不是 BlockStatement,我們以二元表達式來解析
        if (token.type === 'identifier') {
            body = {
                type: 'BinaryExpression',
                left: {
                    type: 'identifier',
                    identifierName: token.value
                },
                operator: '',
                right: {
                    type: '',
                    identifierName: ''
                }
            };
            next();
            if (token.type === 'operator') {
                body.operator = token.value;
            }
            next();
            if (token.type === 'identifier') {
                body.right = {
                    type: 'identifier',
                    identifierName: token.value
                };
            }
        }
        return body;
    };
    // 指針後移的函數
    const next = () => {
        do {
            ++current;
            token = tokens[current]
                ? tokens[current]
                : { type: 'eof', value: '' };
        } while (token.type === 'whitespace');
    };
    // 指針暫存的函數
    const setTem = () => {
        tem.push(current);
    };
    // 指針回退的函數
    const backTem = () => {
        current = tem.pop();
        token = tokens[current];
    };
    const ast = {
        type: 'Program',
        body: []
    };
    while (current < tokens.length) {
        const statement = parseDeclarations();
        if (!statement) {
            break;
        }
        ast.body.push(statement);
    }
    return ast;
};

四、過程 2：代碼轉換

• 代碼解析和代碼生成是 babel。• 代碼轉換是 babel 插件

比如 taro 就是用 babel 完成小程序語法轉換。

代碼轉換的關鍵是根據當前的抽象語法樹，以我們定義的規則生成新的抽象語法樹。轉換的過程就是新的抽象語法樹生成過程。

代碼轉換的具體過程：

• 遍歷抽象語法樹（實現遍歷器 traverser）• 代碼轉換（實現轉換器 transformer）

五、過程 3：代碼轉換生成代碼（實現生成器 generator）

生成代碼這一步實際上是根據我們轉換後的抽象語法樹來生成新的代碼, 我們會實現一個函數, 他接受一個對象 (ast), 通過遞歸生成最終的代碼

六、核心原理

Babel 的核心代碼是 babel-core 這個 package，Babel 開放了接口，讓我們可以自定義 Visitor，在 AST 轉換時被調用。所以 Babel 的倉庫中還包括了很多插件，真正實現語法轉換的其實是這些插件，而不是 babel-core 本身。

感謝支持

松寶，「松寶寫代碼」公衆號作者，也用 saucxs 混跡於江湖，watermark-dom 包 700+ star，曾在 ACM 校隊，在字節做 AB 實驗，擔任面試官，出校招編程題，愛好折騰，致力於全棧，喜歡挑戰自己。公衆號有精選文章，進階學習，每日一題，實驗室，AB 實驗，字節內推等模塊，歡迎關注和諮詢，和松寶一起寫代碼，沖沖衝！

本文由 Readfog 進行 AMP 轉碼，版權歸原作者所有。
來源：https://mp.weixin.qq.com/s/Jd7sX1yNYdXPgepwlq-XLw