靜態程序分析入門之 Go 實踐筆記

編譯與靜態分析的關係

在 Introduction to the Go compiler[5] 也能看到相似的流程，靜態分析主要發生在 IR 層，生成機器碼後端部分 (back-end) 那就是編譯器所考慮的了。

• 詞法分析器（Scanner）結合正則表達式 (Regular Expression) ，通過詞法分析（Lexical Analysis）將源碼翻譯爲 token。
• 語法分析器（Parser）根據上下文無關文法（Context-Free Grammar）通過語法分析（Syntax Analysis），將 tokens 解析爲抽象語法樹（Abstract Syntax Tree, AST）
• 語義分析器（Type Checker），結合屬性文法（Attribute Grammar），通過語義分析（Semantic Analysis），將 AST 解析爲 decorated AST
• Translator 將 decorated AST 翻譯爲中間表示形式（Intermediate Representation, IR）通常是三地址碼會 (Three address code, 3AC)，並基於 IR 做靜態分析。
• Code Generator，將 IR 轉換爲機器代碼。

瞭解 Go 內部實現

Scanner

Go 的 token 定義在 token.go[6] 文件，還包含對標識符 [7]，關鍵詞 [8] 等判斷。

Scan[9] 方法實現詞法分析將源碼翻譯爲 tokens。

Parser

獲得 tokens 流後通過文法（Grammar）將其處理爲 AST(Abstract Syntax Tree，抽象語法樹)，Go 編程語言規範 [10] 有關於文法說明：

SourceFile = PackageClause ";" { ImportDecl ";" } { TopLevelDecl ";" } .
PackageClause = "package" PackageName .
PackageName   = identifier .

ImportDecl = "import" ( ImportSpec | "(" { ImportSpec ";" } ")" ) .
ImportSpec = [ "." | PackageName ] ImportPath .
ImportPath = string_lit .

Declaration  = ConstDecl | TypeDecl | VarDecl .
TopLevelDecl = Declaration | FunctionDecl | MethodDecl .

每個 Go 源代碼文件最終都會被解析成一個獨立的抽象語法樹，所以語法樹最頂層的結構或者開始符號都是 SourceFile。
每一個文件都包含一個 package 的定義以及可選的 import 聲明和其他的頂層聲明，頂層聲明包括：常量，類型，別名，變量，函數等。

除此之外，Go 編程語言規範 [11] 還包含了 Types, Blocks, Declarations, Expressions, Statements 等文法。

go/ast[12] 中定義Node,Expr,Stmt,Decl幾個接口，其中表達式 (expression), 語句(statement) 和聲明 (declaration) 是語法的三個主體，Node 是基類接口任何類型的主體都是 Node，用於標記該節點位置的開始和結束。

使用 go/parser 來解析下代碼看看 AST 的結構：

...
import (
    "go/ast"
    "go/parser"
    "go/token"
)

func TestParser() {
    fs := token.NewFileSet()
    src := `package foo

import (
    "fmt"
    "time"
)

var a string

func foo() {
    if a != "" {
        fmt.Println(a)
    }

    for i:=0; i<10; i++{
        fmt.Println(i)
    }

    fmt.Println(time.Now())
}
`
    fast, _ := parser.ParseFile(fs, "foo.go", src, parser.ParseComments)
    ast.Print(fs, fast)
}

除了可以用 ast.Print 打印語法樹，還可以使用可視化工具 goast-viewer[13]：

Go 文件的 AST 結構大致如圖 (參考《Go 語言設計與實現》[14])：

IR

IR 是編譯器或靜態分析工具將源代碼轉換爲一種便於分析和優化的中間形式，通常是與源語言和目標平臺無關的表示。它保留了程序的語義，方便後續分析、優化和代碼生成。

IR 可以是多種形式，如 AST（抽象語法樹）、三地址碼、圖、或字節碼。

IR 的分類：

• 樹 IR：AST(Abstract Syntax Tree)
• 線性 IR：3AC(Three Address Code)
• 圖 IR：CFG(Control Flow Graph), SSA(Static Single Assignment Form), PDG(Program Dependence Graph) ...

靜態分析爲什麼使用 IR 而非 AST 呢？

AST 是一個語法樹的形式，是一個高層級的形式，更加接近程序的源代碼，語言相關的，適合做快速的類型檢測，但是缺少了控制流、數據流的信息。

• AST 是 high-level 且接近語法結構的，而 IR 是 low-level 且接近機器代碼的。

• AST 是依賴於語言的，IR 通常是獨立於語言的：三地址碼會被分析器重點關注，因爲可以將各種前端語言統一翻譯成同一種 IR 再加以優化。

• AST 適合快速類型檢查，IR 的結構更加緊湊和統一：在 AST 中包含了很多非終結符所佔用的結點（body, assign 等），而 IR 中不會需要到這些信息。

• AST 缺少控制流信息，IR 包含了控制流信息：AST 中只是有結點表明了這是一個 do-while 結構，但是無法看出控制流信息；而 IR 中的 goto 等信息可以輕易看出控制流。

因此 IR 更適合作爲靜態分析的基礎。

GoSSA

Go 的編譯器在中間表示（IR）中使用 SSA（靜態單賦值，Static Single Assignment）形式。

這個圖從左到右分別是：AST,CFG,SSA，SSA 是基於 CFG 的一種：

CFG 是一個過程或程序的抽象表現，是用在編譯器中的一個抽象數據結構，由編譯器在內部維護，代表了一個程序執行過程中會遍歷到的所有路徑。它用圖的形式表示一個過程內所有基本塊執行的可能流向, 也能反映一個過程的實時執行過程。

• CFG 的節點是基本塊（Basic Blocks），表示一組順序執行的語句；邊表示控制流跳轉（如 if 分支、循環、或 goto）。
• 將函數的語句組織成基本塊（Basic Blocks），每個基本塊是一組順序執行的指令，沒有中間的跳轉或分支。

在 Go Tools[15] 倉庫其中包含各種工具和包，主要用於 Go 程序的靜態分析。go/cfg 爲 Go 函數提供將 AST 生成爲一個簡單的控制流圖：

...
fs := token.NewFileSet()
src := `package foo

func max(a, b int) int {
    if a > b {
        return a
    }
    return b
}
`
fast, _ := parser.ParseFile(fs, "foo.go", src, parser.ParseComments)
// find max func decl
var funcDecl *ast.FuncDecl
for _, decl := range fast.Decls {
    if f, ok := decl.(*ast.FuncDecl); ok && f.Name.Name == "max" {
        funcDecl = f
        break
    }
}
// build cfg with ast
cfg := cfg.New(funcDecl.Body, func(expr *ast.CallExpr)bool { returnfalse })
// generate cfg dot
cfg.Dot(fs)

使用 GraphvizOnline[16] 展示 CFG(without Unreachable node)：

SSA 是基於 CFG 的中間表示（IR），所以基本塊直接對應 CFG 的基本塊，控制流邊（block.Succs）保持不變。但每個變量只賦值一次，ssa 會爲每個變量的每次賦值生成唯一版本（例如，x1、x2）。跟蹤每個基本塊中的變量定義（Def）和使用（Use）。在控制流合併點（例如，if 分支合併），插入 φ(phi) 函數選擇變量值，使用支配樹（Dominator Tree）確定 φ 函數的插入位置。

Phi 指令（來源於希臘字母 φ）是 SSA 的核心特性，用於處理變量在不同控制流路徑上的不同賦值。它的作用是在控制流合併點根據到達當前塊的前驅塊選擇正確的變量值。簡單來說，Phi 就像一個 “選擇器”，它根據程序執行的路徑動態決定變量的值。

go 編譯工具有個GOSSAFUNC參數可以指定生成某個函數 SSA，以下代碼爲例：

package main

func max(a, b int) int {
    if a > b {
        return a
    }
    return b
}

func main() {
    println(max(1, 2))
}

使用 $GOSSAFUNC=max go build foo.go 會生成 ssa.html 文件在瀏覽器上打開：

可以看到 If v6 → b3 b2 (5) v6 即b爲 true 時跳轉到 b3, false 跳轉到 b2。(Tip: (5)表示源碼第五行 )

現在假設 b 爲 true，跳轉到 b3 塊後 Plain → b2 (8) 指無條件跳轉 b2，b2 塊中 v10 (8) = Phi <int> v8 v9 (x[int]) v10 的值就是 return 的值，Phi 函數需要根據控制流選擇 v8 或 v9。如從b3跳轉過來的就選 v9 即爲8。

但是 phi 具體是怎麼實現的也不知道，後面還有一堆優化，不去研究下編譯器後端的話是看不懂一點。

接下來想通過分析現有工具瞭解一下如何實現 Go 靜態分析。

gosec

gosec[17] 是一個 Go 安全檢查工具，它通過分析 Go 代碼的 AST 和 SSA 表示來檢測安全問題。

gosec 規則如下：

• 基於 AST 檢測的

G101: Look for hardcoded credentials
G102: Bind to all interfaces
G103: Audit the use of unsafe block
G104: Audit errors not checked
G106: Audit the use of ssh.InsecureIgnoreHostKey function
G107: Url provided to HTTP request as taint input
G108: Profiling endpoint is automatically exposed
G109: Converting strconv.Atoi result to int32/int16
G110: Detect io.Copy instead of io.CopyN when decompression
G111: Detect http.Dir('/') as a potential risk
G112: Detect ReadHeaderTimeout not configured as a potential risk
G114: Use of net/http serve function that has no support for setting timeouts
G201: SQL query construction using format string
G202: SQL query construction using string concatenation
G203: Use of unescaped data in HTML templates
G204: Audit use of command execution
G301: Poor file permissions used when creating a directory
G302: Poor file permissions used when creation file or using chmod
G303: Creating tempfile using a predictable path
G304: File path provided as taint input
G305: File path traversal when extracting zip archive
G306: Poor file permissions used when writing to a file
G307: Poor file permissions used when creating a file with os.Create
G401: Detect the usage of MD5 or SHA1
G402: Look for bad TLS connection settings
G403: Ensure minimum RSA key length of 2048 bits
G404: Insecure random number source (rand)
G405: Detect the usage of DES or RC4
G406: Detect the usage of deprecated MD4 or RIPEMD160
G501: Import blocklist: crypto/md5
G502: Import blocklist: crypto/des
G503: Import blocklist: crypto/rc4
G504: Import blocklist: net/http/cgi
G505: Import blocklist: crypto/sha1
G506: Import blocklist: golang.org/x/crypto/md4
G507: Import blocklist: golang.org/x/crypto/ripemd160
G601: Implicit memory aliasing in RangeStmt

• 基於 SSA 檢測：

G115: Type conversion which leads to integer overflow
G407: Use of hardcoded IV/nonce for encryption
G602: Possible slice bounds out of range

我們以 G101(Look for hardcoded credentials) 規則爲例，gosec 是如何使用 AST 進行檢測的：

• ast 規則定義

func Generate(trackSuppressions bool, filters ...RuleFilter) RuleList {
    rules := []RuleDefinition{
        {"G101", "Look for hardcoded credentials", NewHardcodedCredentials},
        ...
    }
}

• 規則初始化。返回 []ast.Node 表示 rule 對應哪些節點，比如這裏是對應：賦值語句（=, :=, += 等）、變(常) 量聲明、二元表達式

func NewHardcodedCredentials(id string, conf gosec.Config) (gosec.Rule, []ast.Node) {
    pattern := `(?i)passwd|pass|password|pwd|secret|token|pw|apiKey|bearer|cred`
    return &credentials{
        pattern:          regexp.MustCompile(pattern),
        ...
        MetaData: issue.MetaData{
            ID:         id,
            What:       "Potential hardcoded credentials",
            Confidence: issue.Low,
            Severity:   issue.High,
        },
    }, []ast.Node{(*ast.AssignStmt)(nil), (*ast.ValueSpec)(nil), (*ast.BinaryExpr)(nil)}
}

• analyzer 初始化，規則註冊

analyzer := gosec.NewAnalyzer(config, *flagScanTests, *flagExcludeGenerated, *flagTrackSuppressions, *flagConcurrency, logger)
analyzer.LoadRules(ruleList.RulesInfo())

• 按照 ast.Node 類型分類註冊

func (r RuleSet) Register(rule Rule, isSuppressed bool, nodes ...ast.Node) {
    for _, n := range nodes {
        t := reflect.TypeOf(n)
        if rules, ok := r.Rules[t]; ok {
            r.Rules[t] = append(rules, rule)
        } else {
            r.Rules[t] = []Rule{rule}
        }
    }
    ...
}

• analyzer 使用 "golang.org/x/tools/go/packages" 模塊加載 Go 源碼，用於加載和解析 Go 包的元數據，包括源代碼的 AST（抽象語法樹）、類型信息和其他相關信息。

pkgs, err := packages.Load(conf, packageFiles...)

• 遍歷 package 下的文件 AST，進行規則檢測

// gosec.CheckRules(pkg)

for _, file := range pkg.Syntax {
    ...
    ast.Walk(gosec, file)
}

• gosec 實現了 ast.Visitor 接口

func (gosec *Analyzer) Visit(n ast.Node) ast.Visitor {
    ...
    // 獲取 ast.node 註冊的對應規則
    for _, rule := range gosec.ruleset.RegisteredFor(n) {
        issue, err := rule.Match(n, gosec.context)
        ...
        gosec.updateIssues(issue)
    }
    return gosec
}

• 這裏調用規則的 Match 函數, 仍然以 hardcoded credentials 爲例：
• matchAssign 先判斷左邊表達式是否爲 ast.Ident 是否能匹配 (?i)passwd|pass|password|pwd|secret|token|pw|apiKey|bearer|cred 或者判斷右邊（如是 string）是否滿足 secretsPatterns 的規則
• matchValueSpec 也是差不多，分別對 valueSpec.Names valueSpec.Values 進行匹配
• matchEqualityCheck 當 Op 符號爲 "==" "!=" 嘗試匹配 binaryExpr.X,Y node。

func (r *credentials) Match(n ast.Node, ctx *gosec.Context) (*issue.Issue, error) {
    switch node := n.(type) {
    case *ast.AssignStmt:
        return r.matchAssign(node, ctx)
    case *ast.ValueSpec:
        return r.matchValueSpec(node, ctx)
    case *ast.BinaryExpr:
        return r.matchEqualityCheck(node, ctx)
    }
    return nil, nil
}

大概瞭解 AST 檢測機制後，再通過G602: Possible slice bounds out of range 規則爲例，gosec 是如何使用 SSA 進行檢測的：

• SSA 規則定義

var defaultAnalyzers = []AnalyzerDefinition{
    {"G115", "Type conversion which leads to integer overflow", newConversionOverflowAnalyzer},
    ...
}

• SSA 規則註冊

// analyzer.LoadAnalyzers(analyzerList.AnalyzersInfo())
...
for id, def := range analyzerDefinitions {
    r := def.Create(def.ID, def.Description)
    gosec.analyzerSet.Register(r, analyzerSuppressed[id])
}
// Register 函數
func (a *AnalyzerSet) Register(analyzer *analysis.Analyzer, isSuppressed bool) {
    a.Analyzers = append(a.Analyzers, analyzer)
    ...
}

• 規則檢測

// gosec.CheckAnalyzers(pkg)
...
ssaResult, err := gosec.buildSSA(pkg) // // 將 pkg 轉換爲 ssa
// 準備result
resultMap := map[*analysis.Analyzer]interface{}{
    buildssa.Analyzer: &analyzers.SSAAnalyzerResult{
        Config: gosec.Config(),
        Logger: gosec.logger,
        SSA:    ssaResult.(*buildssa.SSA),
    },
}
// 依次運行註冊的 Analyzer
for _, analyzer := range gosec.analyzerSet.Analyzers {
    pass := &analysis.Pass{
        Analyzer:          analyzer,
        Fset:              pkg.Fset,
        Files:             pkg.Syntax,
        OtherFiles:        pkg.OtherFiles,
        IgnoredFiles:      pkg.IgnoredFiles,
        Pkg:               pkg.Types,
        TypesInfo:         pkg.TypesInfo,
        TypesSizes:        pkg.TypesSizes,
        ResultOf:          resultMap,
        ...
    }
    result, err := pass.Analyzer.Run(pass)
    ...
}

• slice bounds 具體實現邏輯

func newSliceBoundsAnalyzer(id string, description string) *analysis.Analyzer {
    return &analysis.Analyzer{
        Name:     id,
        Doc:      description,
        Run:      runSliceBounds,
        Requires: []*analysis.Analyzer{buildssa.Analyzer},
    }
}

func runSliceBounds(pass *analysis.Pass) (interface{}, error) {
    ssaResult, err := getSSAResult(pass)
    ...
    // 對每個函數（SrcFuncs）和基本塊（DomPreorder）進行分析。
    for _, mcall := range ssaResult.SSA.SrcFuncs {
        for _, block := range mcall.DomPreorder() {
            for _, instr := range block.Instrs {
                // 處理 *ssa.Alloc（slice 分配）
                instr.(*ssa.Alloc)
                // 提取容量大小
                sliceCap, err := extractSliceCapFromAlloc(instr.String())
                // 查找引用
                allocRefs := instr.Referrers()
                for _, instr := range *allocRefs {
                    slice, ok := instr.(*ssa.Slice) // 確保是切片操作
                    ...
                    l, h := extractSliceBounds(slice) // 獲取切片的上下界（low, high）
                    newCap := computeSliceNewCap(l, h, sliceCap) // 根據切片邊界和原始容量重新計算新 slice 的容量
                    violations := []ssa.Instruction{}
                    trackSliceBounds(0, newCap, slice, &violations, ifs)// 遞歸檢查該 slice 的後續使用，記錄越界操作
                    // 包括切片操作、索引訪問、函數調用、if判斷長度等。
                }
            }
        }
    }
    // 判斷if操作，消除誤報
    for ifref, binop := range ifs {
        bound, value, err := extractBinOpBound(binop) // 提取邊界信息（bound 和 value）
        for i, block := range ifref.Block().Succs { // 分析if 語句所在基本塊的後繼塊
            if i == 1 { //（0 表示真分支，1 表示假分支）
                bound = invBound(bound) // 反轉bound的值
            }
            var processBlock func(block *ssa.BasicBlock, depth int)
            ...
            // processBlock會遍歷基本塊中的指令（block.Instrs）
            // 根據 bound 的類型（lowerUnbounded、upperUnbounded、unbounded、upperBounded）執行不同的邏輯：
            // 1. 消除誤報（從 issues 中移除）。
            // 2. 分析切片操作（ssa.Slice）或索引操作（ssa.IndexAddr）是否在邊界內。
            // 如果遇到嵌套的 if 語句（ssa.If），遞歸分析其後繼塊。使用depth 參數控制遞歸深度，防止無限遞歸。
        }
}

gosec 的缺點

總結一下，gosec 主要通過分析 Go 的抽象語法樹（AST）進行檢查，部分規則利用了 go/ssa 進行簡單的控制流和數據流分析。不支持全局（global）數據流分析，遇到複雜的跨函數 / 模塊跟蹤等力較弱，好在輕量，適合快速掃描。

更更更強的工具

一個好用 SAST 分析工具，得有個強大的污點分析引擎、支持本地和全局數據流分析和支持多語言的。

Joern

https://github.com/joernio/joern

Joern 是一個開源代碼分析平臺，專注於 C/C++、Java 等語言，通過生成代碼屬性圖（CPG）進行靜態分析，支持 Scala 的查詢語言。

CodeQL

https://codeql.github.com/docs/

基於數據流圖（data flow graph），支持本地和全局數據流分析，精確跟蹤跨函數 / 模塊的污點傳播。QL 查詢可定義源、匯和 sanitization 規則。

CodeQL zero to hero

• CodeQL zero to hero part 1: The fundamentals of static analysis for vulnerability research[18]
• CodeQL zero to hero part 2: Getting started with CodeQL[19]
• CodeQL zero to hero part 3: Security research with CodeQL[20]

Go 相關

• https://codeql.github.com/docs/codeql-language-guides/codeql-library-for-go/

參考鏈接

• Go 語言設計與實現 [21]
• Unveiling the Power of Intermediate Representations for Static Analysis: A Survey[22]
• 深入理解 LLVM 代碼生成 [23]
• Why your code is a Graph[24]
• lorexxar - sast2024[25]

結語

本文內容如有錯誤或疏漏之處，歡迎讀者朋友指出或與我交流討論，您的寶貴意見將幫助我不斷改進！

引用鏈接

[1] 南京大學（李樾、譚添老師）的課程《軟件分析》:https://tai-e.pascal-lab.net/lectures.html
[2]Static Program Analysis Book:https://ranger-nju.gitbook.io/static-program-analysis-book
[3]靜態分析入門:https://fushuling.com/index.php/2025/01/08/%e9%9d%99%e6%80%81%e5%88%86%e6%9e%90%e5%85%a5%e9%97%a8/
[4]geekby - 靜態程序分析:https://www.geekby.site/2022/02/%E9%9D%99%E6%80%81%E7%A8%8B%E5%BA%8F%E5%88%86%E6%9E%90%E7%B3%BB%E5%88%97%E4%B8%80/
[5]Introduction to the Go compiler:https://go.dev/src/cmd/compile/README
[6]token.go:https://github.com/golang/go/blob/master/src/go/token/token.go
[7]標識符:https://github.com/golang/go/blob/master/src/go/token/token.go#L331-L341
[8]關鍵詞:https://github.com/golang/go/blob/master/src/go/token/token.go#L322-L326
[9]Scan:https://github.com/golang/go/blob/master/src/go/scanner/scanner.go#L80-L974
[10]Go 編程語言規範:https://go.dev/ref/spec
[11]Go 編程語言規範:https://go.dev/ref/spec
[12]go/ast:https://github.com/golang/go/blob/master/src/go/ast/ast.go#L32-L54
[13]goast-viewer:https://yuroyoro.github.io/goast-viewer/
[14]《Go 語言設計與實現》:https://draven.co/golang/docs/part1-prerequisite/ch02-compile/golang-compile-intro/
[15]Go Tools:https://cs.opensource.google/go/x/tools
[16]GraphvizOnline:https://dreampuf.github.io/GraphvizOnline
[17]gosec:https://github.com/securego/gosec
[18]CodeQL zero to hero part 1: The fundamentals of static analysis for vulnerability research:https://github.blog/developer-skills/github/codeql-zero-to-hero-part-1-the-fundamentals-of-static-analysis-for-vulnerability-research/
[19]CodeQL zero to hero part 2: Getting started with CodeQL:https://github.blog/developer-skills/github/codeql-zero-to-hero-part-2-getting-started-with-codeql/
[20]CodeQL zero to hero part 3: Security research with CodeQL:https://github.blog/security/vulnerability-research/codeql-zero-to-hero-part-3-security-research-with-codeql/
[21]Go 語言設計與實現:https://draven.co/golang/docs/part1-prerequisite/ch02-compile/golang-compile-intro/
[22]Unveiling the Power of Intermediate Representations for Static Analysis: A Survey:https://arxiv.org/abs/2405.12841
[23]深入理解 LLVM 代碼生成:https://www.bilibili.com/video/BV1GCo4YmEK6/
[24]Why your code is a Graph:https://blog.shiftleft.io/why-your-code-is-a-graph-f7b980eab740
[25]lorexxar - sast2024:https://lorexxar.cn/2023/12/18/sast2024/

本文由 Readfog 進行 AMP 轉碼，版權歸原作者所有。
來源：https://mp.weixin.qq.com/s/mRRiYu7U20aDgjHCBsfsvQ