Llama-3-8B 中文版來了，在自己設備上運行試試看吧

Llama 3-8B Chinese[1] 是在 Meta 最新發布的 Llama-3-8b 模型基礎上進行微調的中文版。該模型採用 firefly-train-1.1M、moss-003-sft-data、school_math_0.25M、ruozhiba 數據集，使模型能夠使用中文回答用戶的提問。

在本文中，我們將使用 LlamaEdge 本地運行 Llama-3-8B 中文版模型。使用 LlamaEdge 運行大模型，只需一個 Wasm 文件，無需安裝複雜的 Python 包或 C++ 工具鏈！點擊這裏 [2]，瞭解我們爲什麼選擇 Rust + Wasm。

在你的設備上運行 Llama-3-8B 中文版

第一步：用下面的命令行安裝 WasmEdge[3]

curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install.sh | bash -s -- --plugin wasi_nn-ggml

第二步：下載 Llama-3-8B-Chinese-Chat 模型 GGUF[4] 文件。模型有 5.73 GB，所以下載可能需要一定時間

curl -LO https://huggingface.co/zhouzr/Llama3-8B-Chinese-Chat-GGUF/resolve/main/Llama3-8B-Chinese-Chat.q4_k_m.GGUF

第三步：下載一個跨平臺的可移植 Wasm 文件，用於聊天應用。該應用讓你能在命令行中與模型聊天。該應用的 Rust 源代碼在這裏 [5]。

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-chat.wasm

就這樣。可以通過輸入以下命令在終端與模型聊天。這個可移植的 Wasm 應用會自動利用設備上的硬件加速器（例如 GPU）。

wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama3-8B-Chinese-Chat.q4_k_m.GGUF.gguf llama-chat.wasm -p llama-3-chat

爲 Llama-3-8B 中文版創建一個 API server

我們還提供了一個兼容 OpenAI API 的服務。這使得 Llama-3-8B-Chinese 能夠與不同的開發框架和工具無縫集成，比如 flows.network[6], LangChain and LlamaIndex 等等，提供更廣泛的應用可能。大家也可以參考其代碼自己寫自己的 API 服務器或者其它大模型應用。

想要啓動 API 服務，請按以下步驟操作：

下載這個 API 服務器應用。它是一個跨平臺的可移植 Wasm 應用，可以在各種 CPU 和 GPU 設備上運行。

curl -LO https://github.com/LlamaEdge/LlamaEdge/releases/latest/download/llama-api-server.wasm

然後，下載聊天機器人 Web UI，從而通過聊天機器人 UI 與模型進行交互。

curl -LO https://github.com/LlamaEdge/chatbot-ui/releases/latest/download/chatbot-ui.tar.gz
tar xzf chatbot-ui.tar.gz
rm chatbot-ui.tar.gz

接下來，使用以下命令行啓動模型的 API 服務器。然後，打開瀏覽器訪問 http://localhost:8080 開始聊天！

wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama3-8B-Chinese-Chat.q4_k_m.GGUF llama-api-server.wasm -p llama-3-chat -m Llama-3-8B-Chinese --log-all

另外打開一個終端窗口，可以使用 curl 與 API 服務器進行交互。

curl -X POST http://localhost:8080/v1/chat/completions \
  -H 'accept:application/json' \
  -H 'Content-Type: application/json' \
  -d '{"messages":[{"role":"system", "content": "You are a sentient, superintelligent artificial general intelligence, here to teach and assist me."}, {"role":"user", "content": "你是誰"}], "model":"Llama-3-8B-Chinese"}'

就是這樣啦。WasmEdge 是運行 LLM 應用最簡單、最快、最安全的方式 [7]。試試看吧！

參考資料

[1]

Llama 3-8B Chinese: https://huggingface.co/shenzhi-wang/Llama3-8B-Chinese-Chat

[2]

這裏: https://www.secondstate.io/articles/fast-llm-inference/

[3]

WasmEdge: https://github.com/WasmEdge/WasmEdge

[4]

Llama-3-8B-Chinese-Chat 模型 GGUF: https://huggingface.co/zhouzr/Llama3-8B-Chinese-Chat-GGUF

[5]

這裏: https://github.com/LlamaEdge/LlamaEdge/tree/main/chat

[6]

flows.network: http://flows.network/

[7]

運行 LLM 應用最簡單、最快、最安全的方式: https://www.secondstate.io/articles/fast-llm-inference/

本文由 Readfog 進行 AMP 轉碼，版權歸原作者所有。
來源：https://mp.weixin.qq.com/s/jf0eXp8slhHfZIMkJykyEQ

在你的設備上運行 Llama-3-8B 中文版

爲 Llama-3-8B 中文版創建一個 API server

猜你喜歡