如何提高代碼的可讀性

本文整理自 taowen 師傅在滴滴內部的分享。

1.Why

對一線開發人員來說，每天工作內容大多是在已有項目的基礎上繼續堆代碼。當項目實在堆不動時就需要尋找收益來重構代碼。既然我們的大多數時間都花在坐在顯示器前讀寫代碼這件事上，那可讀性不好的代碼都是在謀殺自己 or 同事的生命，所以不如一開始就提煉技巧，努力寫好代碼; )

2.How

爲提高代碼可讀性，先來分析代碼實際運行環境。代碼實際運行於兩個地方：cpu 和人腦。對於 cpu，代碼優化需理解其工作機制，寫代碼時爲針對 cpu 特性進行優化；對於人腦，我們在讀代碼時，它像解釋器一樣，一行一行運行代碼，從這個角度來說，要提高代碼的可讀性首先需要知道大腦的運行機制。

下面來看一下人腦適合做的事情和不適合做的事情：

大腦擅長做的事情

JnNkCN SCz4AZ

大腦不擅長做的事情

7DGLpK

代碼優化理論

瞭解人腦的優缺點後，寫代碼時就可以根據人腦的特點對應改善代碼的可讀性了。這裏提取出三種理論：

Align Models ，匹配模型：代碼中的數據和算法模型應和人腦中的心智模型對應
Shorten Process ，簡短處理：寫代碼時應縮短 “福爾摩斯探案集” 的流程長度，即不要寫大段代碼
Isolate Process，隔離處理：寫代碼一個流程一個流程來處理，不要同時描述多個流程的演進過程

下面通過例子詳細解釋這三種模型：

Align Models

在代碼中，模型無外乎就是數據結構與算法，而在人腦中，對應的是心智模型，所謂心智模型就是人腦對於一個物體 or 一件事情的想法，我們平時說話就是心智模型的外在表現。寫代碼時應把代碼中的名詞與現實名詞對應起來，減少人腦從需求文檔到代碼的映射成本。比如對於 “銀行賬戶” 這個名詞，很多變量名都可以體現這個詞，比如：bankAccount、bank_account、account、BankAccount、BA、bank_acc、item、row、record、model，編碼中應統一使用和現實對象能鏈接上的變量名。

代碼命名技巧

起變量名時候取其實際含義，沒必要隨便寫個變量名然後在註釋裏面偷偷用功。

// bad
var d int // elapsed time in days

// good
var elapsedTimeInDays int // 全局使用

起函數名時動詞 + 名詞結合，還要注意標識出你的自定義變量類型：

// bad
func getThem(theList [][]int) [][]int {
 var list1 [][]int // list1是啥，不知道
 for _, x := range theList {
  if x[0] == 4 { // 4是啥，不知道
   list1 = append(list1, x)
  }
 }
 return list1
}

// good
type Cell []int // 標識[]int作用

func (cell Cell) isFlagged() bool { // 說明4的作用
 return cell[0] == 4
}

func getFlaggedCells(gameBoard []Cell) []Cell { // 起有意義的變量名
 var flaggedCells []Cell
 for _, cell := range gameBoard {
  if cell.isFlagged() {
   flaggedCells = append(flaggedCells, cell)
  }
 }
 return flaggedCells
}

代碼分解技巧

按照空間分解 (Spatial Decomposition)：下面這塊代碼都是與 Page 相關的邏輯，仔細觀察可以根據 page 的空間分解代碼：

// bad
// …then…and then … and then ... // 平鋪直敘描述整個過程
func RenderPage(request *http.Request) map[string]interface{} {
 page := map[string]interface{}{}
 name := request.Form.Get("name")
 page["name"] = name
 urlPathName := strings.ToLower(name)
 urlPathName = regexp.MustCompile(`['.]`).ReplaceAllString(
  urlPathName, "")
 urlPathName = regexp.MustCompile(`[^a-z0-9]+`).ReplaceAllString(
  urlPathName, "-")
 urlPathName = strings.Trim(urlPathName, "-")
 page["url"] = "/biz/" + urlPathName
 page["date_created"] = time.Now().In(time.UTC)
 return page
}

// good
// 按空間分解，這樣的好處是可以集中精力到關注的功能上
var page = map[string]pageItem{
 "name":         pageName,
 "url":          pageUrl,
 "date_created": pageDateCreated,
}

type pageItem func(*http.Request) interface{}

func pageName(request *http.Request) interface{} { // name 相關過程
 return request.Form.Get("name")
}

func pageUrl(request *http.Request) interface{} { // URL 相關過程
 name := request.Form.Get("name")
 urlPathName := strings.ToLower(name)
 urlPathName = regexp.MustCompile(`['.]`).ReplaceAllString(
  urlPathName, "")
 urlPathName = regexp.MustCompile(`[^a-z0-9]+`).ReplaceAllString(
  urlPathName, "-")
 urlPathName = strings.Trim(urlPathName, "-")
 return "/biz/" + urlPathName
}

func pageDateCreated(request *http.Request) interface{} { // Date 相關過程
 return time.Now().In(time.UTC)
}

按照時間分解 (Temporal Decomposition)：下面這塊代碼把整個流程的算賬和打印賬單混寫在一起，可以按照時間順序對齊進行分解：

// bad 
func (customer *Customer) statement() string {
 totalAmount := float64(0)
 frequentRenterPoints := 0
 result := "Rental Record for " + customer.Name + "\n"

 for _, rental := range customer.rentals {
  thisAmount := float64(0)
  switch rental.PriceCode {
  case REGULAR:
   thisAmount += 2
  case New_RELEASE:
   thisAmount += rental.rent * 2
  case CHILDREN:
   thisAmount += 1.5
  }
  frequentRenterPoints += 1
  totalAmount += thisAmount
 }
 result += strconv.FormatFloat(totalAmount,'g',10,64) + "\n"
 result += strconv.Itoa(frequentRenterPoints)

 return result
}

// good 邏輯分解後的代碼
func statement(custom *Customer) string {
 bill := calcBill(custom)

 statement := bill.print()

 return statement
}

type RentalBill struct {
 rental Rental
 amount float64
}

type Bill struct {
 customer             *Customer
 rentals              []RentalBill
 totalAmount          float64
 frequentRenterPoints int
}

func calcBill(customer *Customer) Bill {

 bill := Bill{}
 for _, rental := range customer.rentals {
  rentalBill := RentalBill{
   rental: rental,
   amount: calcAmount(rental),
  }
  bill.frequentRenterPoints += calcFrequentRenterPoints(rental)
  bill.totalAmount += rentalBill.amount
  bill.rentals = append(bill.rentals, rentalBill)
 }
 return bill
}

func (bill Bill) print() string {

 result := "Rental Record for " + bill.customer.name + "(n"

 for _, rental := range bill.rentals{
  result += "\t" + rental.movie.title + "\t" +
   strconv.FormatFloat(rental.amount, 'g', 10, 64) + "\n"
 }
 

 result += "Amount owed is " +
  strconv.FormatFloat(bill.totalAmount, 'g', 10, 64) + "\n"

 result += "You earned + " +
  strconv.Itoa(bill.frequentRenterPoints) + "frequent renter points"

 return result
}

func calcAmount(rental Rental) float64 {
 thisAmount := float64(0)
 switch rental.movie.priceCode {
 case REGULAR:
  thisAmount += 2
  if rental.daysRented > 2 {
   thisAmount += (float64(rental.daysRented) - 2) * 1.5
  }
 case NEW_RELEASE:
  thisAmount += float64(rental.daysRented) * 3
 case CHILDRENS:
  thisAmount += 1.5
  if rental.daysRented > 3 {
   thisAmount += (float64(rental.daysRented) - 3) * 1.5
  }
 }
 return thisAmount
}

func calcFrequentRenterPoints(rental Rental) int {
 frequentRenterPoints := 1
 switch rental.movie.priceCode {
 case NEW_RELEASE:
  if rental.daysRented > 1 {
   frequentRenterPointst++
  }
 }
 return frequentRenterPoints
}

按層分解 (Layer Decomposition)：

// bad
func findSphericalClosest(lat float64, lng float64, locations []Location) *Location {
 var closest *Location
  closestDistance := math.MaxFloat64
  for _, location := range locations {
    latRad := radians(lat)
    lngRad := radians(lng)
    lng2Rad := radians(location.Lat)
    lng2Rad := radians(location.Lng)
    var dist = math.Acos(math.Sin(latRad) * math.Sin(lat2Rad) +  
                         math.Cos(latRad) * math.Cos(lat2Rad) *
                         math.Cos(lng2Rad - lngRad) 
                        )
    if dist < closestDistance {
   closest = &location
      closestDistance = dist
    }
  }
 return closet
}

// good
type Location struct {
}

type compare func(left Location, right Location) int

func min(objects []Location, compare compare) *Location {
 var min *Location
 for _, object := range objects {
  if min == nil {
   min = &object
   continue
  }
  if compare(object, *min) < 0 {
   min = &object
  }
 }
 return min
}

func findSphericalClosest(lat float64, lng float64, locations []Location) *Location {
 isCloser := func(left Location, right Location) int {
  leftDistance := rand.Int()
  rightDistance := rand.Int()
  if leftDistance < rightDistance {
   return -1
  } else {
   return 0
  }
 }
 closet := min(locations, isCloser)
 return closet
}

註釋

註釋不應重複代碼的工作。應該去解釋代碼的模型和心智模型的映射關係，應說明爲什麼要使用這個代碼模型，下面的例子就是反面教材:

// bad
/** the name. */
var name string
/** the version. */
var Version string
/** the info. */
var info string

// Find the Node in the given subtree, with the given name, using the given depth.
func FindNodeInSubtree(subTree *Node, name string, depth *int) *Node {
}

下面的例子是正面教材:

// Impose a reasonable limit - no human can read that much anyway
const MAX_RSS_SUBSCRIPTIONS = 1000

// Runtime is O(number_tags * average_tag_depth), 
// so watch out for badly nested inputs.
func FixBrokenHTML(HTML string) string {
 // ...
}

Shorten Process

Shorten Process 的意思是要縮短人腦 “編譯代碼” 的流程。應該避免寫出像小白鼠走迷路一樣又長又繞的代碼。所謂又長又繞的代碼表現在，跨表達式跟蹤、跨多行函數跟蹤、跨多個成員函數跟蹤、跨多個文件跟蹤、跨多個編譯單元跟蹤，甚至是跨多個代碼倉庫跟蹤。

對應的手段可以有：引入變量、拆分函數、提早返回、縮小變量作用域，這些方法最終想達到的目的都是讓大腦喘口氣，不要一口氣跟蹤太久。同樣來看一些具體的例子：

例子

下面的代碼，多種複合條件組合在一起，你看了半天繞暈了可能也沒看出到底什麼情況下爲 true，什麼情況爲 false。

// bad
func (rng *Range) overlapsWith(other *Range) bool {
 return (rng.begin >= other.begin && rng.begin < other.end) ||
  (rng.end > other.begin && rng.end <= other.end) ||
  (rng.begin <= other.begin && rng.end >= other.end)
}

但是把情況進行拆解，每種條件進行單獨處理。這樣邏輯就很清晰了。

// good
func (rng *Range) overlapsWith(other *Range) bool {
 if other.end < rng.begin {
  return false // they end before we begin 
 } 
 if other.begin >= rng.end {
  return false // they begin after we end 
 }
  return true // Only possibility left: they overlap
}

再來看一個例子，一開始你寫代碼的時候，可能只有一個 if ... else...，後來 PM 讓加一下權限控制，於是你可以開心的在 if 裏繼續套一層 if，補丁打完，開心收工，於是代碼看起來像這樣：

// bad 多層縮進的問題
func handleResult(reply *Reply, userResult int, permissionResult int) {
  if userResult == SUCCESS {
    if permissionResult != SUCCESS {
      reply.WriteErrors("error reading permissions")
     reply.Done()
     return
    }
    reply.WriteErrors("")
  } else {
    reply.WriteErrors("User Result")
  }
  reply.Done()
}

這種代碼也比較好改，一般反向寫 if 條件返回判否邏輯即可：

// good
func handleResult(reply *Reply, userResult int, permissionResult int) {
  defer reply.Done()
  if userResult != SUCCESS {
    reply.WriteErrors("User Result")
    return 
  }
  if permissionResult != SUCCESS {
    reply.WriteErrors("error reading permissions")
    return
  }
  reply.WriteErrors("")
}

這個例子的代碼問題比較隱晦，它的問題是所有內容都放在了 MooDriver 這個對象中。

// bad
type MooDriver struct {
 gradient Gradient
  splines []Spline
}
func (driver *MooDriver) drive(reason string) {
  driver.saturateGradient()
  driver.reticulateSplines()
  driver.diveForMoog(reason)
}

比較好的方法是儘可能減少全局 scope，而是使用上下文變量進行傳遞。

// good 
type ExplicitDriver struct {
  
}

// 使用上下文傳遞
func (driver *MooDriver) drive(reason string) {
  gradient := driver.saturateGradient()
  splines := driver.reticulateSplines(gradient)
  driver.diveForMoog(splines, reason)
}

Isolate Process

人腦缺陷是不擅長同時跟蹤多件事情，如果”同時跟蹤 “事物的多個變化過程，這不符合人腦的構造；但是如果把邏輯放在很多地方，這對大腦也不友好，因爲大腦需要” 東拼西湊“才能把一塊邏輯看全。所以就有了一句很經典的廢話，每個學計算機的大學生都聽過。你的代碼要做到高內聚，低耦合，這樣就牛逼了！-_-|||，但是你要問說這話的人什麼叫高內聚，低耦合呢，他可能就得琢磨琢磨了，下面來通過一些例子來琢磨一下。

首先先來玄學部分，如果你的代碼寫成下面這樣，可讀性就不會很高。

一般情況下，我們可以根據業務場景努力把代碼修改成這樣：

舉幾個例子，下面這段代碼非常常見，裏面 version 的含義是用戶端上不同的版本需要做不同的邏輯處理。

func (query *Query) doQuery() {
  if query.sdQuery != nil {
    query.sdQuery.clearResultSet()
  }
  // version 5.2 control
  if query.sd52 {
    query.sdQuery = sdLoginSession.createQuery(SDQuery.OPEN_FOR_QUERY)
  } else {
    query.sdQuery = sdSession.createQuery(SDQuery.OPEN_FOR_QUERY)
  }
  query.executeQuery()
}

這段代碼的問題是由於版本差異多塊代碼流程邏輯 Merge 在了一起，造成邏輯中間有分叉現象。處理起來也很簡單，封裝一個 adapter，把版本邏輯抽出一個 interface，然後根據版本實現具體的邏輯。

再來看個例子，下面代碼中根據 expiry 和 maturity 這樣的產品邏輯不同 也會造成分叉現象，所以你的代碼會寫成這樣：

// bad
type Loan struct {
 start    time.Time
 expiry   *time.Time
 maturity *time.Time
 rating   int
}

func (loan *Loan) duration() float64 {
 if loan.expiry == nil {
  return float64(loan.maturity.Unix()-loan.start.Unix()) / 365 * 24 * float64(time.Hour)
 } else if loan.maturity == nil {
  return float64(loan.expiry.Unix()-loan.start.Unix()) / 365 * 24 * float64(time.Hour)
 }
 toExpiry := float64(loan.expiry.Unix() - loan.start.Unix())
 fromExpiryToMaturity := float64(loan.maturity.Unix() - loan.expiry.Unix())
 revolverDuration := toExpiry / 365 * 24 * float64(time.Hour)
 termDuration := fromExpiryToMaturity / 365 * 24 * float64(time.Hour)
 return revolverDuration + termDuration
}

func (loan *Loan) unusedPercentage() float64 {
 if loan.expiry != nil && loan.maturity != nil {
  if loan.rating > 4 {
   return 0.95
  } else {
   return 0.50
  }
 } else if loan.maturity != nil {
  return 1
 } else if loan.expiry != nil {
  if loan.rating > 4 {
   return 0.75
  } else {
   return 0.25
  }
 }
 panic("invalid loan")
}

解決多種產品邏輯的最佳實踐是 Strategy pattern，代碼如下圖，根據產品類型創建出不同的策略接口，然後分別實現 duration 和 unusedPercentage 這兩個方法即可。

// good
type LoanApplication struct {
 expiry   *time.Time
 maturity *time.Time
}

type CapitalStrategy interface {
 duration() float64
 unusedPercentage() float64
}

func createLoanStrategy(loanApplication LoanApplication) CapitalStrategy {
 if loanApplication.expiry != nil && loanApplication.maturity != nil {
  return createRCTL(loanApplication)
 }
 if loanApplication.expiry != nil {
  return createRevolver(loanApplication)
 }
 if loanApplication.maturity != nil {
  return createTermLoan
 }
 panic("invalid loan application")
}

但是現實情況沒有這麼簡單，因爲不同事物在你眼中就是多進程多線程運行的，比如上面產品邏輯的例子，雖然通過一些設計模式把執行的邏輯隔離到了不同地方，但是代碼中只要含有多種產品，代碼在執行時還是會有一個產品選擇的過程。邏輯發生在同一時間、同一空間，所以 “自然而然” 就需要寫在了一起：

功能展示時，由於需要展示多種信息，會造成 concurrent process
寫代碼時，業務包括功能性和非功能性需求，也包括正常邏輯和異常邏輯處理
考慮運行效率時，爲提高效率我們會考慮異步 I/O、多線程 / 協程
考慮流程複用時，由於版本差異和產品策略也會造成 merged concurrent process

對於多種功能雜糅在一起，比如上面的RenderPage函數，對應解法爲不要把所有事情合在一起搞，把單塊功能內聚，整體再耦合成爲一個單元。

對於多個同步進行的 I/O 操作，可以通過協程把揉在一起的過程分開來：

// bad 兩個I/O寫到一起了
func sendToPlatforms() {
 httpSend("bloomberg", func(err error) {
  if err == nil {
   increaseCounter("bloomberg_sent", func(err error) {
    if err != nil {
     log("failed to record counter", err)
    }
   })
  } else {
   log("failed to send to bloom berg", err)
  }
 })
 ftpSend("reuters", func(err error) {
  if err == DIRECTORY_NOT_FOUND {
   httpSend("reuterHelp", err)
  }
 })
}

對於這種併發的 I/O 場景，最佳解法就是給每個功能各自寫一個計算函數，代碼真正運行的時候是” 同時 “在運行，但是代碼中是分開的。

//good 協程寫法
func sendToPlatforms() {
 go sendToBloomberg()
 go sendToReuters()
}

func sendToBloomberg() {
 err := httpSend("bloomberg")
 if err != nil {
  log("failed to send to bloom berg", err)
  return
 }
 err := increaseCounter("bloomberg_sent")
 if err != nil {
  log("failed to record counter", err)
 }
}

func sendToReuters() {
 err := ftpSend("reuters")
 if err == nil {
  httpSend("reutersHelp", err)
 }
}

有時，邏輯必須要合併到一個 Process 裏面，比如在買賣商品時必須要對參數做邏輯檢查：

// bad
func buyProduct(req *http.Request) error {
 err := checkAuth(req)
 if err != nil {
  return err
 }
 // ...
}

func sellProduct(req *http.Request) error {
 err := checkAuth(req)
 if err != nil {
  return err
 }
 // ...
}

這種頭部有公共邏輯經典解法是寫個 Decorator 單獨處理權限校驗邏輯，然後 wrapper 一下正式邏輯即可：

// good 裝飾器寫法
func init() {
 buyProduct = checkAuthDecorator(buyProduct)
 sellProduct = checkAuthDecorator(sellProduct)
}

func checkAuthDecorator(f func(req *http.Request) error) func(req *http.Request) error {
 return func(req *http.Request) error {
  err := checkAuth(req)
  if err != nil {
   return err
  }
  return f(req)
 }
}

var buyProduct = func(req *http.Request) error {
 // ...
}

var sellProduct = func(req *http.Request) error {
 // ...
}

此時你的代碼會像這樣：

當然公共邏輯不僅僅存在於頭部，仔細思考一下所謂的 strategy、Template pattern，他們是在邏輯的其他地方去做這樣的邏輯處理。

這塊有一個新的概念叫：信噪比。信噪比是一個相對概念，信息，對我有用的；噪音，對我沒用的。代碼應把什麼邏輯寫在一起，不僅取決於讀者是誰，還取決於這個讀者當時希望完成什麼目標。

比如下面這段 C++ 和 Python 代碼：

void sendMessage(const Message &msg) const {...}

def sendMessage(msg):

如果你現在要做業務開發，你可能會覺得 Python 代碼讀起來很簡潔；但是如果你現在要做一些性能優化的工作，C++ 代碼顯然能給你帶來更多信息。

再比如下面這段代碼，從業務邏輯上講，這段開發看起來非常清晰，就是去遍歷書本獲取 Publisher。

for _, book := range books {
  book.getPublisher()
}

但是如果你看了線上打瞭如下的 SQL 日誌，你懵逼了，心想這個 OOM 真 **，真就是一行一行執行 SQL，這行代碼可能會引起 DB 報警，讓你的 DBA 同事半夜起來修 DB。

SELECT * FROM Pubisher WHERE PublisherId = book.publisher_id
SELECT * FROM Pubisher WHERE PublisherId = book.publisher_id
SELECT * FROM Pubisher WHERE PublisherId = book.publisher_id
SELECT * FROM Pubisher WHERE PublisherId = book.publisher_id
SELECT * FROM Pubisher WHERE PublisherId = book.publisher_id

所以如果代碼改成這樣，你可能就會更加明白這塊代碼其實是在循環調用實體。

for _, book := range books {
  loadEntity("publisher", book.publisher_id)
}

總結一下：

優先嚐試給每個 Process 一個自己的函數，不要合併到一起來算
嘗試界面拆成組件
嘗試把訂單拆成多個單據，獨立跟蹤多個流程
嘗試用協程而不是回調來表達 concurrent i/o
如果不得不在一個 Process 中處理多個相對獨立的事情
嘗試複製一份代碼，而不是複用同一個 Process
嘗試顯式插入: state/ adapter/ strategy/template/ visitor/ observer
嘗試隱式插入: decorator/aop
提高信噪比是相對於具體目標的，提高了一個目標的信噪比，就降低了另外一個目標的信噪比

總結

當我們吐槽這塊代碼可讀性太差時，不要把可讀性差的原因簡單歸結爲註釋不夠或者不 OO，而是可以從人腦特性出發，根據下面的圖片去找到代碼問題，然後試着改進它 (跑了幾年的老代碼還是算了，別改一行線上全炸了:)

歡迎加我的個人微信：709834997。

本文由 Readfog 進行 AMP 轉碼，版權歸原作者所有。
來源：https://mp.weixin.qq.com/s/_fg_oS74DEd4jKqJFHKHzg