掌握 Rust：從零開始的所有權之旅

所有權是 Rust 很有意思的一個語言特性，但對於初學者卻是一個比較有挑戰的內容。

今天嘗試用代碼示例來聊聊 Rust 的所有權是什麼，以及爲什麼要有所有權。希望能給初學的朋友一點幫助。

Tips：文中代碼有相應註釋，建議可以先不用糾結細節，關注整體。後邊可以再挨個去研究具體代碼細節

文章目錄

移動？拷貝？
作用域和銷燬
借用
修改
可變借用
所有權原則
內部可變性
生命週期
總結

移動？拷貝？

先來試試常規的賦值語句在Rust有什麼樣的表現

println!("start");
// code 1:
let a = 1;
let _b = a;
let _c = a;

// code 2:
let d = String::from("hello");
let _e = d;
let _f = d;

結果是

error[E0382]: use of moved value: `d`
  --> src/main.rs:12:10
   |
10 | let d = String::from("hello");
   |     - move occurs because `d` has type `String`, which does not implement the `Copy` trait
11 | let _e = d;
   |          - value moved here
12 | let _f = d;
   |          ^ value used here after move
   |
help: consider cloning the value if the performance cost is acceptable
   |
11 | let _e = d.clone();
   |           ++++++++

爲什麼 code 2 出錯了？ code 1 沒有？

看起來都是初始化賦值操作，分別將數字 a 和字符串 d 多次賦值給別的變量爲什麼字符串的賦值失敗了。

這裏要引出 Rust 世界裏對值拷貝和所有的區分

對於一切變量，當把他傳遞給別的變量或函數，如果他可以拷貝（Copy）就複製一份；否則就將值的所有權移動（Move）過去。

這裏a是數字，數字是可以拷貝的，所以 code 1 是可以編譯通過的。而d是字符串，字符串是不可以拷貝的，第一次賦值就將所有權 move 給了_e，只能move一次，所以 code 2 編譯不通過。

爲什麼要拷貝或移動？先劇透下 Rust 沒有內存垃圾回收器（GC），它對內存的管理就是依賴所有權，誰持有（Own）變量，誰可以在變量需要銷燬時釋放內存。

我們拿代碼看看它如何銷燬變量

作用域和銷燬

這裏我們關注在何時銷燬的

// 因爲孤兒原則，包裝原生string類型，來支持添加drop trait實現，來觀察銷燬
#[derive(Debug)]
struct MyString(String);
impl MyString {
    fn from(name: &str) -> Self {
        MyString(String::from(name))
    }
}
struct MyData {
    data: MyString,
}
// 銷燬時打印字符串
impl Drop for MyString {
    fn drop(&mut self) {
        println!("Dropping MyString with value: {:?}", self.0);
    }
}
// 銷燬時打印包含字符串的結構體
impl Drop for MyData {
    fn drop(&mut self) {
        println!("Dropping MyData with value: {:?}", self.data);
    }
}

fn main() {
    {
        let _ = MyData {
            data: MyString::from("not used"),
        };
        let _wrapper = MyData {
            data: MyString::from("used as variable"),
        };
        println!("End of the scope inside main.");
    }

    println!("End of the scope.");
}

運行結果是：

Dropping MyData with value: MyString("not used")
Dropping MyString with value: "not used"
End of the scope inside main.
Dropping MyData with value: MyString("used as variable")
Dropping MyString with value: "used as variable"
End of the scope.

代碼分了兩個作用域（Scope）

Tips: 其實有多個，每個let也可以看做是一個作用域，這裏爲了方便理解，只分了兩個

main 函數自身的scope
main 函數內的scope

在此作用域內_變量的結構體及包含的字符串就銷燬了。這裏let _代表這個變量被忽略，也無法再被別人使用，所以當即銷燬

離開此作用域時，局部變量_wrapper也被銷燬

結合之前字符串不能多次移動，這裏就展示Rust對內存管理的兩個原則：

值只能有一個所有者，當離開作用域，值將被丟棄。
所有權可以轉移

嗯，這麼搞確實很利於內存管理。

那要只是想引用一個變量，不想移動怎麼辦？（畢竟移動只能一次）

借用

先來看看常規的 “引用”

println!("start");
let a = String::from("hello");
let d = &a;
// 等效於
// let ref d = a;
let _e = d;
let _f = d;

這段代碼是可以編譯通過的

Tips，Rust在編譯階段就能分析出很多代碼問題，這也是爲什麼前邊的錯誤裏沒有打印 “start”，因爲編譯就失敗了

Rust裏對 “引用” 有細分，這裏叫借用（Borrow），至於爲什麼，我們後邊講

從目前的代碼看，如果一個變量借用了字符串變量，這個借用是可以賦值給多個變量的。

這樣對於不需要Move整個字符串，只是要借用值來說，使用確實方便多了，那借用什麼時候回收呢？

// 增加一個借用結構體
struct MyDataRef<'a> {
    reference: &'a MyData,
}

// 對應的drop trait實現
impl Drop for MyDataRef<'_> {
    fn drop(&mut self) {
        println!("Dropping MyDataRef");
    }
}

fn main() {
    {
        let a = MyData {
            data: MyString::from("used as variable"),
        };
        let b = MyDataRef { reference: &a };
        let c = MyDataRef { reference: &a };
        println!("End of the scope inside main.");
    }

    println!("End of the scope.");
}

結果是：

End of the scope inside main.
Dropping MyDataRef
Dropping MyDataRef
Dropping MyData with value: MyString("used as variable")
Dropping MyString with value: "used as variable"
End of the scope.

在銷燬借用的變量前，先銷燬了所有的借用。哈哈，你可以有多個借用（準確說是不可變借用（immutable borrow），後邊在展開），但銷燬變量時，所有借用都會被一起銷燬，這樣保證你不是借用一個已經銷燬的變量（use after free）

修改

到這裏我們都沒有修改過一個變量

Rust能像別的語言這樣賦值修改麼？

let d = String::from("hello");
d = String::from("world");

結果是不行

error[E0384]: cannot assign twice to immutable variable `d`
  --> src/main.rs:33:5
   |
32 |     let d = String::from("hello");
   |         -
   |         |
   |         first assignment to `d`
   |         help: consider making this binding mutable: `mut d`
33 |     d = String::from("world");
   |     ^ cannot assign twice to immutable variable

Rust對讀取和修改是有區分的，像錯誤提示那樣

需要mut關鍵字來聲明變量可修改

let mut d = String::from("hello");
d = String::from("world");

那對應的銷燬時什麼樣的呢？

fn main() {
    {
        let mut wrapper = MyData {
            data: MyString::from("used as mut variable1"),
        };
        wrapper.data = MyString::from("used as mut variable2");
        println!("[Mutable] End of the scope inside main.");
    }

    println!("End of the scope.");
}

結果是

Dropping MyString with value: "used as mut variable1"
[Mutable] End of the scope inside main.
Dropping MyData with value: MyString("used as mut variable2")
Dropping MyString with value: "used as mut variable2"
End of the scope.

基本和之前不可變（immutable）變量銷燬類似，唯一不同是賦值後，賦值前的值要被銷燬，內存的管理很是細緻啊。

現在說了借用，說了可變，我們可以來看看前邊提到借用是有區分的：還有一個可變借用（mutable borrow）

可變借用

對於可變變量，是可以有對應的可變借用的

let mut d = String::from("hello");
let g = &mut d;
*g = "world".to_string();

那如果同時有可變借用和不可變借用，下邊的代碼可以麼？

fn main() {
    let mut d = String::from("hello");
    let e = &d;
    let f = &d;
    let g = &mut d;
    *g = "world".to_string();
    println!("{f}");
}

答案是不可以

error[E0502]: cannot borrow `d` as mutable because it is also borrowed as immutable
 --> src/main.rs:5:13
  |
4 |     let f = &d;
  |             -- immutable borrow occurs here
5 |     let g = &mut d;
  |             ^^^^^^ mutable borrow occurs here
6 |     *g = "world".to_string();
7 |     println!("{f}");
  |               --- immutable borrow later used here

編譯器明確告訴我們，可變借用的時候不能同時有不可變借用。

爲什麼，如果拿讀寫互斥鎖來類比，就很好理解了，我有可變借用，就像拿到寫鎖，這個時候是不允許有讀鎖的，不然我修改和你讀取不一致怎麼辦。

這是就得出了所有權裏借用的規則：

不可變借用可以有多個
可變借用同一時間只能有一個，且和不可變借用互斥

所有權原則

到此，所有權的三條原則就全部出來了

值有且只有一個所有者, 且所有者離開作用域時, 值將被丟棄
所有權可轉移
借用
不可變借用可以有多個
可變借用同一時間只能有一個

這些規則，規範了對於一個變量誰持有，離開作用域是否可以釋放，變量的修改和借用有什麼樣要求，避免釋放後的內存被借用，也防止修改和讀取的內容不一致有race condition的問題。

最厲害的是這些都是編譯階段就分析保證了的，提前暴露了問題，不然等到代碼上線了遇到問題再 crash，追查起來就滯後太久了。

到這所有權就結束了麼？還沒有，快了，再耐着性子往下看

內部可變性

目前爲止，一個借用要麼是隻讀的要麼是可寫的，限制都很嚴格，萬一我想需要寫的時候再可寫，平時只要一個只讀的借用就可以，能搞定麼？

能！

Rust 提供了Cell（針對實現Copy的簡單類型）和RefCell(針對任何類型，運行時做借用檢查)Arc（多線程安全的引用計數類型）等類型，來支持內部可變性。Mutex和RwLock也是內部可變性的一種實現，只不過是在多線程場景下的。

Tips: 本質上可以理解爲對讀寫互斥的不同粒度下的封裝，不需要顯式聲明可變借用，但內部有可變的能力

以RefCell爲例，來看看內部可變性

use std::cell::RefCell;
let value = RefCell::new(5);
// Mutate the value using an immutable reference
// 讀取
let borrowed = value.borrow();
println!("Before mutation: {}", *borrowed);
drop(borrowed);
// Interior mutation
{
    // 修改
    let mut borrowed_mut = value.borrow_mut();
    *borrowed_mut += 1;
}
// 讀取
let borrowed = value.borrow();
println!("After mutation: {}", *borrowed);

生命週期

終於到了最後一個話題，生命週期

下邊一段簡單的字符串切片的長度比較函數

你能想到它爲什麼編譯不通過麼？

fn longest(str1:  &str, str2: &str) -> &str {
    if str1.len() > str2.len() {
        str1
    } else {
        str2
    }
}

fn main() {
    let str1 = "hello";
    let str2 = "world！";

    let result = longest(str1, str2);
    println!("The longest string is: {}", result);
}

錯誤是：

error[E0106]: missing lifetime specifier
 --> src/main.rs:1:39
  |
1 | fn longest(str1: &str, str2: &str) -> &str {
  |                  ----        ----     ^ expected named lifetime parameter
  |
  = help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `str1` or `str2`
help: consider introducing a named lifetime parameter
  |
1 | fn longest<'a>(str1: &'a str, str2: &'a str) -> &'a str {
  |           ++++        ++             ++          ++

編譯器再一次友好的提示我們，函數入參兩個借用，返回值一個借用，無法確定返回值是用了哪個入參的生命週期。

一個新的概念出現了：生命週期

生命週期是Rust用來標註引用存活週期，藉此標識變量的借用與作用域是否合法，即借用是否在作用域內還有效，畢竟不能懸空指針（dangling pointer，借用一個失效的內存地址）啊。

就像這裏，函數返回一個借用，那返回的借用是否在作用域內合法，和入參的兩個引用的關係是什麼，靠的就是生命週期標註。如果入參和出參都是一個生命週期，即出參的借用在入參的借用作用域內，只要入參的生命週期合法，那出參的就是合法的。不然如果出參用了只是借用函數內部變量的生命週期，那函數返回後，函數內部變量就被銷燬了，出參就是懸空指針了。

你可以簡單理解爲給借用多增加了一個參數，用來標識其借用在一個scope內使用是否合法。

題外話，其實你如果瞭解Golang的逃逸分析，比如當函數內部變量需要返回給函數外部繼續使用，其實是要擴大內部變量的作用域（即內部變量的生命週期），不能只依靠當前函數棧來保存變量，就會把它逃逸到堆上。它做的其實也是變量的生命週期分析，用增加堆的內存開銷來避免懸空指針。只不過那是在 gc 基礎上一種優化，而Rust則是在編譯期就能通過生命週期標註就能確定借用是否合法。對於想把內部變量返回給外部使用的情況，Rust也提供了Box來支持，這裏就不展開了。

那是不是每個借用都要標註?

也不是，rust 默認會對所有借用自動標註，只有出現衝突無法自動標註的時候才需要程序員手動標註。如果感興趣的話，可以深入看下 Subtyping and Variance[1]，瞭解下生命週期的一些約束。

最後我們看下下邊編譯不通過的代碼，從編譯期的報錯你就應該能明白，爲什麼要生命週期標註了，它對於讓編譯期做借用的作用域合法性檢查很有用。

fn get_longest<'a>(str1: &'a str, str2: &'a str) -> &'a str {
    if str1.len() > str2.len() {
        str1
    } else {
        str2
    }
}

fn main() {
    let result;
    {
        let str1 = String::from("hello");
        let str2 = "world!";
        result = get_longest(str1.as_str(), str2);
    }

    println!("The longest string is: {}", result);
}

錯誤是：

error[E0597]: `str1` does not live long enough
  --> src/main.rs:15:30
   |
13 |         let str1 = String::from("hello");
   |             ---- binding `str1` declared here
14 |         let str2 = "world!";
15 |         result = get_longest(str1.as_str(), str2);
   |                              ^^^^^^^^^^^^^ borrowed value does not live long enough
16 |     }
   |     - `str1` dropped here while still borrowed
17 |
18 |     println!("The longest string is: {}", result);
   |                                           ------ borrow later used here

總結

好了，收個尾吧：

所有權關注的是值的擁有和管理
借用檢查器在編譯時保證借用的有效性和安全性
生命週期關注的是借用的有效範圍和引用的合法性

他們配合在一起，構建起了Rust強大的內存管理能力。避免了內存泄漏和懸空指針的問題，也避免了GC帶來的性能問題。

怎麼樣？是不是感覺Rust的所有權設計還挺有意思的？一個所有權把內存管理的清晰又明瞭！

參考資料

[1]

Subtyping and Variance: https://doc.rust-lang.org/nomicon/subtyping.html

本文由 Readfog 進行 AMP 轉碼，版權歸原作者所有。
來源：https://mp.weixin.qq.com/s/-Kdg1OAS5MGpY71J27Zuvg