實戰: 使用 LangChainGo - Gin 實現流式 AI 問答系統

在本篇文章中，我們將使用 LangChainGo + Gin 框架，結合 Ollama 大語言模型，實現一個流式 AI 問答系統。最終，我們還會使用 curl 進行測試，由於是實戰例子所以我會提供一個簡單的 html+css+js 的前端來實時顯示 AI 的回答。

流式響應的意義

在傳統 API 調用中，我們往往要等到大語言模型（LLM） 計算完成後，才能返回完整的回答。這會導致：

等待時間長：用戶需要等待很久，體驗不佳。
缺乏實時性：無法在模型生成內容的同時，逐步顯示給用戶。

而流式響應可以：

逐步輸出生成的內容，前端可以即時渲染，提升用戶體驗。
優化帶寬，避免一次性傳輸大數據，減少系統壓力。

後端：Gin+LangChainGo 實現流式響應

我們閒話少說，開始今天的正題，具體步驟如下所示:

1. 首先我們創建一個名爲 robot-go 的項目，並安裝其所需要的依賴

mkdir robot-go
cd robot-go/
go mod init github.com/xxx/robot-go
go get github.com/gin-gonic/gin
go get github.com/tmc/langchaingo@v0.1.13
go get github.com/tmc/langchaingo/llms@v0.1.13

2. 鍵入如下代碼

func chatHandler(c *gin.Context) {
    var request struct {
        Question string `json:"question"`
    }
    if err := c.ShouldBindJSON(&request); err != nil {
        log.Printf("Invalid request: %v", err)
        c.JSON(http.StatusBadRequest, gin.H{"error": "invalid request"})
        return
    }
    if request.Question == "" {
        log.Print("Empty question received")
        c.JSON(http.StatusBadRequest, gin.H{"error": "question cannot be empty"})
        return
    }
    ctx := context.Background()
    // 設置 SSE 頭部
    c.Writer.Header().Set("Content-Type", "text/event-stream")
    c.Writer.Header().Set("Cache-Control", "no-cache")
    c.Writer.Header().Set("Connection", "keep-alive")
    c.Writer.Flush()
    content := []llms.MessageContent{
        llms.TextParts(llms.ChatMessageTypeHuman, request.Question),
    }
    // 調用流式 API
    _, err := llmClient.GenerateContent(ctx, content, llms.WithStreamingFunc(func(ctx context.Context, chunk []byte) error {
        fmt.Fprintf(c.Writer, "data: %s\n\n", string(chunk))
        c.Writer.Flush()
        return nil
    }))
    if err != nil {
        log.Printf("Failed to generate content: %v", err)
        c.JSON(http.StatusInternalServerError, gin.H{"error": "failed to get response"})
        return
    }
    fmt.Fprintln(c.Writer, "data: [DONE]\n")
    c.Writer.Flush()
}

代碼解析

llm.GenerateContent：調用 LangChainGo 生成流式數據。
llms.WithStreamingFunc：註冊一個回調函數，每當 LLM 生成新的文本，都會實時返回。
c.Writer.Flush()：確保數據立即推送到客戶端，而不是緩存。

使用 curl 進行測試

我們先運行後端服務，具體命令如下所示：

go run main.go

然後，使用 curl 進行測試：

curl -X POST http://localhost:9527/api/chat \
     -H "Content-Type: application/json" \
     -d '{"question": "請介紹一下Go語言"}' \
     --no-buffer

⚠️注意: --no-buffer 讓 curl 立即顯示流式數據。

測試結果如下所示:

⚠️注意: 別忘記了在本地運行 ollama，這裏我使用的模型是 qwen2:7b！！！

前端實現

由於是一個簡單的例子所以就沒有用 react 框架來做，前端的效果如下所示:

總結

本篇文章，我們從後端實現到前端流式渲染，完整實現了一個流式 AI 問答系統：

✅ 使用 LangChainGo + Ollama 處理 LLM 調用

✅ Gin 提供 SSE（Server-Sent Events）流式 API

✅ curl 終端測試，逐步返回 AI 生成文本

✅ 簡單的使用了 html+css+js 實現前端實時顯示

🚀 完整代碼已開源，你可以嘗試改進並擴展，比如：

支持多輪對話，攜帶對話上下文記憶等。
接入更強大的 LLM 模型，例如 deepseek 等。
優化前端 UI 交互，可以使用 react 這類框架來實現。
實現更復雜的應用場景，例如問答搜索等。

實戰例子源碼地址: https://github.com/JXLSP/robot-go, 歡迎點個 star☺️，今天的內容就到此結束，感謝您的收看！！！🌹🌹🌹🦀🦀🦀

本文由 Readfog 進行 AMP 轉碼，版權歸原作者所有。
來源：https://mp.weixin.qq.com/s/k29VYhXsg3RlCvtBuJCZFA

猜你喜歡