細說 Linux 虛擬文件系統原理

在 Unix 的世界裏，有句很經典的話：一切對象皆是文件。這句話的意思是說，可以將 Unix 操作系統中所有的對象都當成文件，然後使用操作文件的接口來操作它們。Linux 作爲一個類 Unix 操作系統，也努力實現這個目標。

虛擬文件系統簡介

爲了實現 一切對象皆是文件 這個目標，Linux 內核提供了一箇中間層：虛擬文件系統（Virtual File System）。

如果大家使用過面向對象編程語言（如 C++/Java 等）的話，應該對 接口 這個概念並不陌生。而虛擬文件系統類似於面向對象中的接口，定義了一套標準的接口。開發者只需要實現這套接口，即可以使用操作文件的接口來操作對象。如下圖所示：

上圖中的藍色部分就是虛擬文件系統所在位置。

從上圖可以看出，虛擬文件系統爲上層應用提供了統一的接口。如果某個文件系統實現了虛擬文件系統的接口，那麼上層應用就能夠使用諸如 open()、read() 和 write() 等函數來操作它們。

今天，我們就來介紹虛擬文件系統的原理與實現。

虛擬文件系統原理

在闡述虛擬文件系統的原理前，我們先來介紹一個 Java 例子。通過這個 Java 例子，我們能夠更容易理解虛擬文件系統的原理。

一個 Java 例子

如果大家使用過 Java 編寫程序的話，那麼就很容易理解虛擬文件系統了。我們使用 Java 的接口來模擬虛擬文件系統的定義：

public interface VFSFile {
  int open(String file, int mode);
  int read(int fd, byte[] buffer, int size);
  int write(int fd, byte[] buffer, int size);
  ...
}

上面定義了一個名爲 VFSFile 的接口，接口中定義了一些方法，如 open()、read() 和 write() 等。現在我們來定義一個名爲 Ext3File 的對象來實現這個接口：

public class Ext3File implements VFSFile {
  @Override
  public int open(String file, int mode) {
    ...
  }
  
  @Override
  public int read(int fd, byte[] buffer, int size) {
    ...
  }
  
  @Override
  public int write(int fd, byte[] buffer, int size) {
    ...
  }
  
  ...
}

現在我們就能使用 VFSFile 接口來操作 Ext3File 對象了，如下代碼：

public class Main() {
  public static void main(String[] args) {
    VFSFile file = new Ext3File();
    
    int fd = file.open("/tmp/file.txt", 0);
    ...
  }
}

從上面的例子可以看出，底層對象只需要實現 VFSFile 接口，就可以使用 VFSFile 接口相關的方法來操作對象，用戶完全不需要了解底層對象的實現過程。

虛擬文件系統原理

上面的 Java 例子已經大概說明虛擬文件系統的原理，但由於 Linux 是使用 C 語言來編寫的，而 C 語言並沒有接口這個概念。所以，Linux 內核使用了一些技巧來模擬接口這個概念。

下面來介紹一下 Linux 內核是如何實現的。

1. file 結構

爲了模擬接口，Linux 內核定義了一個名爲 file 的結構體，其定義如下：

struct file {
    ...
    const struct file_operations *f_op;
    ...
};

在 file 結構中，最爲重要的一個字段就是 f_op，其類型爲 file_operations 結構。而 file_operations 結構是由一組函數指針組成，其定義如下：

struct file_operations {
    ...
    loff_t (*llseek) (struct file *, loff_t, int);
    ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
    ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
    ...
    int (*open) (struct inode *, struct file *);
    ...
};

從 file_operations 結構的定義可以隱約看到接口的影子，所以可以猜想出，如果實現了 file_operations 結構中的方法，應該就能接入到虛擬文件系統中。

在 Linux 內核中，file 結構代表着一個被打開的文件。所以，只需要將 file 結構的 f_op 字段設置成不同文件系統實現好的方法集，那麼就能夠使用不同文件系統的功能。

這個過程在 __dentry_open() 函數中實現，如下所示：

static struct file *
__dentry_open(struct dentry *dentry, 
              struct vfsmount *mnt, 
              truct file *f, 
              int (*open)(struct inode *, struct file *), 
              const struct cred *cred)
{
    ...
    inode = dentry->d_inode;
    ...
    // 設置file結構的f_op字段爲底層文件系統實現的方法集
    f->f_op = fops_get(inode->i_fop);
    ...
    return f;
}

設置好 file 結構的 f_op 字段後，虛擬文件系統就能夠使用通用的接口來操作此文件了。調用過程如下：

2. file_operations 結構

底層文件系統需要實現虛擬文件系統的接口，才能被虛擬文件系統使用。也就是說，底層文件系統需要實現 file_operations 結構中的方法集。

一般底層文件系統會在其內部定義好 file_operations 結構，並且填充好其方法集中的函數指針。如 minix文件系統 就定義了一個名爲 minix_file_operations 的 file_operations 結構。其定義如下：

// 文件：fs/minix/file.c

const struct file_operations minix_file_operations = {
    .llseek         = generic_file_llseek,
    .read           = do_sync_read,
    .aio_read       = generic_file_aio_read,
    .write          = do_sync_write,
    .aio_write      = generic_file_aio_write,
    .mmap           = generic_file_mmap,
    .fsync          = generic_file_fsync,
    .splice_read    = generic_file_splice_read,
};

也就是說，如果當前使用的是 minix 文件系統，當使用 read() 函數讀取其文件的內容時，那麼最終將會調用 do_sync_read() 函數來讀取文件的內容。

3. dentry 結構

到這裏，虛擬文件系統的原理基本分析完畢，但還有兩個非常重要的結構要介紹一下的：dentry 和 inode。

dentry 結構表示一個打開的目錄項，當我們打開文件 /usr/local/lib/libc.so 文件時，內核會爲文件路徑中的每個目錄創建一個 dentry 結構。如下圖所示：

可以看到，file 結構有個指向 dentry 結構的指針，如下所示：

struct file {
    ...
    struct path f_path;
    ...
    const struct file_operations *f_op;
    ...
};

struct path {
    ...
    struct dentry *dentry;
};

與文件類似，目錄也有相關的操作接口，所以在 dentry 結構中也有操作方法集，如下所示：

struct dentry {
    ...
    struct dentry *d_parent;              // 父目錄指針
    struct qstr d_name;                   // 目錄名字
    struct inode *d_inode;                // 指向inode結構
    ...
    const struct dentry_operations *d_op; // 操作方法集
    ...
};

其中的 d_op 字段就是目錄的操作方法集。

內核在打開文件時，會爲路徑中的每個目錄創建一個 dentry 結構，並且使用 d_parent 字段來指向其父目錄項，這樣就能通過 d_parent 字段來追索到根目錄。

4. inode 結構

在 Linux 內核中，inode 結構表示一個真實的文件。爲什麼有了 dentry 結構還需要 inode 結構呢？這是因爲 Linux 存在硬鏈接的概念。

例如使用以下命令爲 /usr/local/lib/libc.so 文件創建一個硬鏈接：

ln /usr/local/lib/libc.so /tmp/libc.so

現在 /usr/local/lib/libc.so 和 /tmp/libc.so 指向同一個文件，但它們的路徑是不一樣的。所以，就需要引入 inode 結構了。如下圖所示：

由於 /usr/local/lib/libc.so 和 /tmp/libc.so 指向同一個文件，所以它們都使用同一個 inode 對象。

inode 結構保存了文件的所有屬性值，如文件的創建時間、文件所屬用戶和文件的大小等。其定義如下所示：

struct inode {
    ...
    uid_t           i_uid;               // 文件所屬用戶
    gid_t           i_gid;               // 文件所屬組
    ...
    struct timespec i_atime;             // 最後訪問時間
    struct timespec i_mtime;             // 最後修改時間
    struct timespec i_ctime;             // 文件創建時間
    ...
    unsigned short  i_bytes;             // 文件大小
    ...
    const struct file_operations *i_fop; // 文件操作方法集（用於設置file結構）
    ...
};

我們注意到 inode 結構有個類型爲 file_operations 結構的字段 i_fop，這個字段保存了文件的操作方法集。當用戶調用 open() 系統調用打開文件時，內核將會使用 inode 結構的 i_fop 字段賦值給 file 結構的 f_op 字段。我們再來重溫下賦值過程：

static struct file *
__dentry_open(struct dentry *dentry, 
              struct vfsmount *mnt, 
              truct file *f, 
              int (*open)(struct inode *, struct file *), 
              const struct cred *cred)
{
    ...
    // 文件對應的inode對象
    inode = dentry->d_inode;
    ...
    // 使用inode結構的i_fop字段賦值給file結構的f_op字段
    f->f_op = fops_get(inode->i_fop);
    ...
    return f;
}

總結

本文主要介紹了 虛擬文件系統 的基本原理，從分析中可以發現，虛擬文件系統使用了類似於面向對象編程語言中的接口概念。正是有了 虛擬文件系統，Linux 才能支持各種各樣的文件系統。

本文由 Readfog 進行 AMP 轉碼，版權歸原作者所有。
來源：https://mp.weixin.qq.com/s/M07G-Fm249OaHvk9k8_DrA

虛擬文件系統簡介

虛擬文件系統原理

一個 Java 例子

虛擬文件系統原理

1. file 結構

2. file_operations 結構

3. dentry 結構

4. inode 結構

總結

猜你喜歡