別再被誤導了!MCP 無需依賴 Function Calling!扒 Cline 源碼給你看!

Function Calling 指的是 LLM 根據用戶側的自然語言輸入,自主決定調用哪些工具(tools),並輸出格式化的工具調用的能力;Model Context Protocol(MCP,模型上下文協議)則是 LLM Agent 應用與外部系統的交互的標準化協議

MCP 只是規範瞭如何執行 tools,而調用哪些 tools 以及 tools 的入參還是需要 LLM 提供的。因此很多人甚至很多技術文章中都會誤認爲 MCP 必須依賴於 LLM 的 function calling 的能力。但實際上 MCP 並不關心 LLM 本身的任何能力,只要 LLM 應用通過 MCP client 向 MCP server 傳遞了符合 MCP tools 調用格式的請求,那麼就能正確實現 MCP 的能力。

MCP 雖然不依賴於 LLM 的 function calling 能力,但是 LLM 應用產生 MCP 標準的請求也需要 LLM 產生固定格式的 tools 調用。那麼對於沒有 function calling 能力的 LLM 該怎麼做呢?

Cline

VSCode 的 Cline[1] 插件就是一個符合 MCP client 標準的 LLM 應用,它支持通過 OpenRouter[2] 這樣的 LLM 集成接口來選擇所使用的 LLM。顯然在有着上百款 LLM 的 OpenRouter 中有 function calling 能力的 LLM 只是少數,但是卻依然不影響 MCP server 的接入和使用。Cline 採用的辦法就是給 LLM 提供強大的系統提示詞(System Prompt)

源碼淺析

通過翻閱 Cline 的開源源碼 [3],能發現在 src/core/prompts/system.ts 中的 SYSTEM_PROMPT 函數提供了近 1000 行的 system prompt。

其中在第 20 行中向 LLM 提示了 tools 調用的格式,是一種規定好的 XML 格式:

TOOL USE

You have access to a set of tools that are executed upon the user's approval. You can use one tool per message, and will receive the result of that tool use in the user's response. You use tools step-by-step to accomplish a given task, with each tool use informed by the result of the previous tool use.

# Tool Use Formatting

Tool use is formatted using XML-style tags. The tool name is enclosed in opening and closing tags, and each parameter is similarly enclosed within its own set of tags. Here's the structure:

<tool_name>
<parameter1_name>value1</parameter1_name>
<parameter2_name>value2</parameter2_name>
...
</tool_name>

For example:

<read_file>
<path>src/main.js</path>
</read_file>

Always adhere to this format for the tool use to ensure proper parsing and execution.

根據 system prompt 可以看到 Cline 預先定義了一些 tools,比如 execute_commandread_file 、write_to_file 等,以實現 Cline 的基礎應用功能(根據源碼 [4] 這幾個 tools 的執行是通過 VSCode API 實現的,沒有走 MCP):

# Tools

## execute_command
Description: Request to execute a CLI command on the system. Use this when you need to perform system operations or run specific commands to accomplish any step in the user's task. You must tailor your command to the user's system and provide a clear explanation of what the command does. For command chaining, use the appropriate chaining syntax for the user's shell. Prefer to execute complex CLI commands over creating executable scripts, as they are more flexible and easier to run. Commands will be executed in the current working directory: ${cwd.toPosix()}
Parameters:
- command: (required) The CLI command to execute. This should be valid for the current operating system. Ensure the command is properly formatted and does not contain any harmful instructions.
- requires_approval: (required) A boolean indicating whether this command requires explicit user approval before execution in case the user has auto-approve mode enabled. Set to 'true' for potentially impactful operations like installing/uninstalling packages, deleting/overwriting files, system configuration changes, network operations, or any commands that could have unintended side effects. Set to 'false' for safe operations like reading files/directories, running development servers, building projects, and other non-destructive operations.
Usage:
<execute_command>
<command>Your command here</command>
<requires_approval>true or false</requires_approval>
</execute_command>

## read_file
Description: Request to read the contents of a file at the specified path. Use this when you need to examine the contents of an existing file you do not know the contents of, for example to analyze code, review text files, or extract information from configuration files. Automatically extracts raw text from PDF and DOCX files. May not be suitable for other types of binary files, as it returns the raw content as a string.
Parameters:
- path: (required) The path of the file to read (relative to the current working directory ${cwd.toPosix()})
Usage:
<read_file>
<path>File path here</path>
</read_file>

## write_to_file
Description: Request to write content to a file at the specified path. If the file exists, it will be overwritten with the provided content. If the file doesn't exist, it will be created. This tool will automatically create any directories needed to write the file.
Parameters:
- path: (required) The path of the file to write to (relative to the current working directory ${cwd.toPosix()})
- content: (required) The content to write to the file. ALWAYS provide the COMPLETE intended content of the file, without any truncation or omissions. You MUST include ALL parts of the file, even if they haven't been modified.
Usage:
<write_to_file>
<path>File path here</path>
<content>
Your file content here
</content>
</write_to_file>
...

SYSTEM_PROMPT 函數的入參有一個 mcpHub 用於封裝 MCP 的管理,通過 mcpHub.getMode() 方法能夠判斷是否啓用 MCP(用戶配置),如果啓用則在 system prompt 中加入使用 MCP server 能力的 tools,即 use_mcp_tool 和 access_mcp_resource(看起來目前只支持 MCP server 的 tools 和 resources,還不支持 prompts):

mcpHub.getMode() !== "off"
  ? `
## use_mcp_tool
Description: Request to use a tool provided by a connected MCP server. Each MCP server can provide multiple tools with different capabilities. Tools have defined input schemas that specify required and optional parameters.
Parameters:
- server_name: (required) The name of the MCP server providing the tool
- tool_name: (required) The name of the tool to execute
- arguments: (required) A JSON object containing the tool's input parameters, following the tool's input schema
Usage:
<use_mcp_tool>
<server_name>server name here</server_name>
<tool_name>tool name here</tool_name>
<arguments>
{
  "param1": "value1",
  "param2": "value2"
}
</arguments>
</use_mcp_tool>

## access_mcp_resource
Description: Request to access a resource provided by a connected MCP server. Resources represent data sources that can be used as context, such as files, API responses, or system information.
Parameters:
- server_name: (required) The name of the MCP server providing the resource
- uri: (required) The URI identifying the specific resource to access
Usage:
<access_mcp_resource>
<server_name>server name here</server_name>
<uri>resource URI here</uri>
</access_mcp_resource>
`
  : ""

獲取 MCP server 的 tools 和 resources 併產生對應的 system prompt 的代碼在源碼的 398 行左右:

mcpHub.getServers().length > 0
  ? `${mcpHub
    .getServers()
    .filter((server) => server.status === "connected")
    .map((server) => {
     const tools = server.tools
      ?.map((tool) => {
       const schemaStr = tool.inputSchema
        ? `    Input Schema:
    ${JSON.stringify(tool.inputSchema, null, 2).split("\n").join("\n    ")}`
        : ""

       return `${tool.name}${tool.description}\n${schemaStr}`
      })
      .join("\n\n")

     const templates = server.resourceTemplates
      ?.map((template) => `- ${template.uriTemplate} (${template.name}): ${template.description}`)
      .join("\n")

     const resources = server.resources
      ?.map((resource) => `- ${resource.uri} (${resource.name}): ${resource.description}`)
      .join("\n")

     const config = JSON.parse(server.config)

     return (
      `## ${server.name} (\`${config.command}${config.args && Array.isArray(config.args) ? ` ${config.args.join(" ")}` : ""}\`)` +
      (tools ? `\n\n### Available Tools\n${tools}` : "") +
      (templates ? `\n\n### Resource Templates\n${templates}` : "") +
      (resources ? `\n\n### Direct Resources\n${resources}` : "")
     )
    })
    .join("\n\n")}`
  : "(No MCP servers currently connected)"
}`

根據這段代碼,如果存在 MCP server,則會產生類似這樣的 system prompt:

## example-weather-server (`node /path/to/weather-server/build/index.js`)

### Available Tools
- get_forecast: Get weather forecast for a city
    Input Schema:
    {
      "type": "object",
      "properties": {
        "city": {
          "type": "string",
          "description": "City name"
        },
        "days": {
          "type": "number",
          "description": "Number of days (1-5)",
          "minimum": 1,
          "maximum": 5
        }
      },
      "required": ["city"]
    }

### Resource Templates
- weather://{city}/current (Current weather for a given city): Real-time weather data for a specified city

### Direct Resources
- weather://San Francisco/current (Current weather in San Francisco): Real-time weather data for San Francisco including temperature, conditions, humidity, and wind speed

關於使用 System Prompt 的一些思考

Cline 通過 system prompt 的方式實現了對所有 LLM 的兼容,但這樣的做法顯然也是存在些問題的。其中 Token 消耗就是各非常大的問題:根據源碼可以看到,Cline 的 system prompt 大概佔用了 60KB 左右,且會隨着 MCP server 的增加而增大,對於一些 token limit 比較低的模型,會佔據較大部分的上下文窗口,留給用戶提示詞的空間就很少了。

Cline 實際可以考慮通過一些手段率先對 LLM 是否具有 function calling 的能力進行一個 “詢問”,如果支持,則直接利用其 function calling 的能力,而無需提供這麼龐大的 system prompt。判斷 LLM 是否具有 function calling 能力可以考慮通過對模型名字判斷,或與 LLM 協商相應的 API 等。

讀者還有可能會關心的一個問題就是:對於具有 function calling 能力的 LLM,如果像 Cline 這樣都使用 system prompt 的方式,會不會浪費了 LLM 的能力,今兒達不到理想的效果?關於這個問題其實可以看一下 Berkeley Function-Calling Leaderboard[5],這個排行榜通過多維度的評價指標對各 LLM 的 function calling 能力做了比較。在該排行中,也加入了使用 system prompt 的方式來使 LLM 間接實現 function calling 能力的比較。意外地發現例如 OpenAI 的 GPT-4o 和 o1 居然使用 system prompt 反而會有更好的表現!甚至 o1 在某些測試維度上,使用 system prompt 要明顯更具優勢。

GPT-4o 使用自身 function calling(FC) v.s. system prompt(Prompt)

GPT-4o 使用自身 function calling(FC) v.s. system prompt(Prompt)

o1 使用自身 function calling(FC) v.s. system prompt(Prompt)

o1 使用自身 function calling(FC) v.s. system prompt(Prompt)

但大部分 LLM 還是使用自身 function calling 能力的表現更好,不過也沒有太大的差距,因此只要 system prompt 寫得好,那麼 Cline 目前的做法就是平衡了其 LLM 的通用性與 tool 調用的準確性。

總結

本文旨在說明 MCP 的實現無需依賴於大語言模型的 function calling 能力,二者在概念上並沒有任何依賴關係。通過對很流行的 VSCode LLM 插件應用——Cline 的源碼淺析,瞭解到了可以通過使用強大的 system prompt 來讓 LLM 具有 tools 調用的能力,無需依賴 LLM 本身 function calling 的能力。

參考資料

[1] 

Cline: https://marketplace.visualstudio.com/items?itemName=saoudrizwan.claude-dev

[2] 

OpenRouter: https://openrouter.ai/

[3] 

Cline 的開源源碼: https://github.com/cline/cline/blob/dbe5f74884fddbf31091457210abce531cbeadbc/src/core/prompts/system.ts#L7

[4] 

tool 調用的源碼: https://github.com/cline/cline/blob/main/src/core/task/index.ts

[5] 

Berkeley Function-Calling Leaderboard: https://gorilla.cs.berkeley.edu/leaderboard.html

本文由 Readfog 進行 AMP 轉碼,版權歸原作者所有。
來源https://mp.weixin.qq.com/s/Y7wXwq-anNCCxyeh0oSA6w