深入淺出 Sanitizer Interceptor 機制

語言: CN / TW / HK

時間 2022-12-13 16:02:04 字節跳動SYS Tech

歡迎關注【字節跳動 SYS Tech】。字節跳動 SYS Tech 聚焦系統技術領域，與大家分享前沿技術動態、技術創新與實踐、行業技術熱點分析等內容。

背景

對於 C++ 開發者來説，經常會碰到緩衝區溢出/懸垂指針等內存錯誤、數據競爭/死鎖等多線程錯誤，這些錯誤往往會導致程序出現非預期的行為，從而影響程序的安全性和穩定性。如何快速定位上述問題，一直是大家非常頭疼的問題。由 Google 開源的 sanitizer 動態分析工具，可以高效地幫助 C/C++ 開發者定位問題，提升研發效率。目前 sanitizer 已經廣泛應用於字節跳動的搜索、廣告、推薦等核心服務端業務的 crash/coredump 分析中，解決了數百個因內存錯誤和多線程數據競爭導致的疑難問題。本文通過介紹 sanitizer interceptor 機制的原理，來幫助大家更好地理解並使用 sanitizer。

Sanitizer 簡介

Sanitizer 是由 Google 開源的一系列動態代碼分析工具，從 Clang 3.1 和 GCC 4.8 開始被集成在 Clang 和 GCC 中，能夠幫助程序員快速準確地在運行時定位程序中的內存錯誤和多線程錯誤。Sanitizer 工具集包括：

AddressSanitizer (ASan)：用於檢測緩衝區溢出、訪問已釋放的內存、空指針解引用等內存錯誤
LeakSanitizer (LSan)：用於檢測內存泄漏
ThreadSanitizer (TSan)：用於檢測多線程數據競爭和死鎖
UndefinedBehaviorSanitizer (UBSsan)：用於檢測未定義行為
MemorySanitizer (MSan)：用於檢測未初始化內存的訪問

從代碼實現來看，所有的 sanitizer 都由編譯時插樁 (compile-time instrumentation) 和運行時庫 (run-time library) 兩部分組成。

以 ASan 為例：

ASan 編譯時會在每一處內存讀寫語句之前插入代碼，根據每一次訪問的內存所對應的影子內存 ( shadow memory，就是使用額外的內存來記錄常規的內存狀態）的狀態來檢測本次內存訪問是否合法。還會在棧變量和全局變量附近申請額外內存作為危險區，用於檢測內存溢出。
ASan 運行時庫會替換 malloc/free, operator new/delete 等內存分配函數的實現，這樣應用程序的內存分配都由 ASan 實現的內存分配器負責。ASan 內存分配器會在它分配的堆內存附近申請額外內存用於檢測堆內存溢出，還會將被釋放的內存優先放在隔離區 (quarantine) 用於檢測像 heap-use-after-free 這樣的堆內存錯誤。

實際上 ASan 運行時庫不止替換了 malloc/free, operator new/delete 的函數實現，還替換了非常多的庫函數實現，如：memcpy, memmove, strcpy, strcat, pthread_create 等。

那麼 sanitizer 是如何做到替換 malloc/free 這些函數實現的呢？答案就是 sanitizer 中的 interceptor 機制。

本文以 ASan 為例，分析在 Linux x86_64 環境下 sanitizer interceptor 的實現原理。

Symbol interposition

在講解 sanitizer interceptor 的實現原理之前，我們先來了解一下前置知識：symbol interposition。

首先我們考慮這樣一個問題：如何在我們的應用程序中替換 libc 的 malloc 實現為我們自己實現的版本？

一個最簡單的方式就是在我們的應用程序中定義一個同名的 malloc 函數
還有一種方式就是將我們的 malloc 函數實現在 libmymalloc.so 中，然後在運行我們的應用程序之前設置環境變量 LD_PRELOAD=/path/to/libmymalloc.so

那麼為什麼上述兩種方式能生效呢？答案是 symbol interposition。

ELF specfication 在第五章 Program Loading and Dynamic Linking 中提到：

When resolving symbolic references, the dynamic linker examines the symbol tables with a breadth-first search. That is, it first looks at the symbol table of the executable program itself, then at the symbol tables of the DT_NEEDED entries (in order), and then at the second level DT_NEEDED entries, and so on.

動態鏈接器 (dynamic linker/loader) 在符號引用綁定時，以一種廣度優先搜索的順序來查找符號：executable, needed0.so, needed1.so, needed2.so, needed0_of_needed0.so, needed1_of_needed0.so, ...

如果設置了 LD_PRELOAD，那麼查找符號的順序會變為：executable, preload0.so, preload1.so needed0.so, needed1.so, needed2.so, needed0_of_needed0.so, needed1_of_needed0.so, ...

如果一個符號在多個組件（executable 或 shared object）中都存在定義，那麼動態鏈接器會選擇它所看到的第一個定義。

我們通過一個例子來理解該過程：

$ cat main.c
extern int W(), X();

int main() { return (W() + X()); }

$ cat W.c
extern int b();

int a() { return (1); }
int W() { return (a() - b()); }

$ cat w.c
int b() { return (2); }

$ cat X.c
extern int b();

int a() { return (3); }
int X() { return (a() - b()); }

$ cat x.c
int b() { return (4); }

$ gcc -o libw.so -shared w.c
$ gcc -o libW.so -shared W.c -L. -lw -Wl,-rpath=.
$ gcc -o libx.so -shared x.c
$ gcc -o libX.so -shared X.c -L. -lx -Wl,-rpath=.
$ gcc -o test-symbind main.c -L. -lW -lX -Wl,-rpath=.

該例子中可執行文件與動態庫之間的依賴關係如下圖所示：

按照我們前面所説，本例中動態鏈接器在進行符號引用綁定時，是按照 test-symbind, libW.so, libX.so, libc.so.6, libw.so, libx.so 的順序查找符號定義的。

動態鏈接器提供了環境變量 LD_DEBUG 來輸出一些調試信息，我們可以通過設置環境變量 LD_DEBUG="symbols:bindings" 看下 test-symbind 的 symbol binding 的過程：

$ LD_DEBUG="symbols:bindings" ./test-symbind
   1884890:        symbol=a;  lookup in file=./test-symbind [0]
   1884890:        symbol=a;  lookup in file=./libW.so [0]
   1884890:        binding file ./libW.so [0] to ./libW.so [0]: normal symbol `a'
   1884890:        symbol=b;  lookup in file=./test-symbind [0]
   1884890:        symbol=b;  lookup in file=./libW.so [0]
   1884890:        symbol=b;  lookup in file=./libX.so [0]
   1884890:        symbol=b;  lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
   1884890:        symbol=b;  lookup in file=./libw.so [0]
   1884890:        binding file ./libW.so [0] to ./libw.so [0]: normal symbol `b'
   1884890:        symbol=a;  lookup in file=./test-symbind [0]
   1884890:        symbol=a;  lookup in file=./libW.so [0]
   1884890:        binding file ./libX.so [0] to ./libW.so [0]: normal symbol `a'
   1884890:        symbol=b;  lookup in file=./test-symbind [0]
   1884890:        symbol=b;  lookup in file=./libW.so [0]
   1884890:        symbol=b;  lookup in file=./libX.so [0]
   1884890:        symbol=b;  lookup in file=/lib/x86_64-linux-gnu/libc.so.6 [0]
   1884890:        symbol=b;  lookup in file=./libw.so [0]
   1884890:        binding file ./libX.so [0] to ./libw.so [0]: normal symbol `b'

函數 a 在 libW.so 和 libX.so 中都有一份定義，但因為是按照 test-symbind, libW.so, libX.so, libc.so.6, libw.so, libx.so 的順序查找符號定義的，所以最終所有對函數 a 的引用都綁定到 libW.so 中函數 a 的實現
函數 b 在 libw.so 和 libx.so 中都有一份定義，但因為是按照 test-symbind, libW.so, libX.so, libc.so.6, libw.so, libx.so 的順序查找符號定義的，所以最終所有對函數 b 的引用都綁定到 libw.so 中函數 b 的實現

這樣我們就理解為什麼本節開始提到的兩種替換 malloc 的方式能生效了：

方式一：在我們的應用程序中定義一個同名的 malloc 函數。動態鏈接器在查找符號時 executable 的順序在 libc.so.6 之前，因此所有對 malloc 的引用都會綁定到 executable 中 malloc 的實現。
方式二：將我們的 malloc 函數實現在 libmymalloc.so 中，然後在運行我們的應用程序之前設置環境變量 LD_PRELOAD=/path/to/libmymalloc.so。動態鏈接器在查找符號時 libmymalloc.so 的順序在 libc.so.6 之前，因此所有對 malloc 的引用都會綁定到 libmymalloc.so 中 malloc 的實現。

實際上 sanitizer 對於 malloc/free 等庫函數的替換正是利用了 symbol interposition 這一特性。下面我們以 ASan 為例來驗證一下。

考慮如下代碼：

// test.cpp
#include <iostream>
int main() {
    std::cout << "Hello AddressSanitizer!\n";
}

我們首先看下 GCC 的行為。

使用 GCC 開啟 ASan 編譯 test.cpp ，g++ -fsanitize=address test.cpp -o test-gcc-asan 得到編譯產物 test-gcc-asan。因為 GCC 默認會動態鏈接 ASan 運行時庫，所以我們可以使用 objdump -p test-gcc-asan | grep NEEDED 查看 test-gcc-asan 依賴的動態庫 (shared objects)：

$ objdump -p test-gcc-asan | grep NEEDED
  NEEDED               libasan.so.5
  NEEDED               libstdc++.so.6
  NEEDED               libm.so.6
  NEEDED               libgcc_s.so.1
  NEEDED               libc.so.6

可以清楚的看到在 test-gcc-asan 依賴的動態庫中 libasan.so 的順序是在 libc.so.6 之前的。實際上鍊接時參數 -fsanitize=address 會使得 libasan.so 成為程序的第一個依賴庫。

通過設置環境變量 LD_DEBUG="bindings" 看下 test-gcc-asan 的 symbol binding 的過程：

暫時無法在飛書文檔外展示此內容

可以看到動態鏈接器將 libc.so.6, ld-linux-x86-64.so 和 libstdc++.so 中對 malloc 的引用都綁定到了 libasan.so 中的 malloc 實現。

下面我們看下 Clang，因為 Clang 默認是靜態鏈接 ASan 運行時庫，所以我們就不看 test-clang-asan 所依賴的動態庫了，直接看 symbol binding 的過程：

$ LD_DEBUG="bindings" ./test-gcc-asan
   3309213:        binding file /lib/x86_64-linux-gnu/libc.so.6 [0] to /usr/lib/x86_64-linux-gnu/libasan.so.5 [0]: normal symbol `malloc' [GLIBC_2.2.5]
   3309213:        binding file /lib64/ld-linux-x86-64.so.2 [0] to /usr/lib/x86_64-linux-gnu/libasan.so.5 [0]: normal symbol `malloc' [GLIBC_2.2.5]
   3309213:        binding file /usr/lib/x86_64-linux-gnu/libstdc++.so.6 [0] to /usr/lib/x86_64-linux-gnu/libasan.so.5 [0]: normal symbol `malloc' [GLIBC_2.2.5]

同樣可以看到動態鏈接器將 libc.so.6, ld-linux-x86-64.so.2 和 libstdc++.so 中對 malloc 的引用都綁定到了 test-clang-asan 中的 malloc 實現（因為 ASan 運行時庫中實現了 malloc，並且 clang 將 ASan 運行時庫靜態鏈接到 test-clang-asan 中）。

Sanitizer interceptor

下面我們來在源碼的角度，學習下 sanitizer interceptor 的實現。

閲讀學習 LLVM 代碼的一個非常有效的方式就是結合對應的測試代碼來學習。

Sanitizer interceptor 存在一個測試文件interception_linux_test.cpp，

#include "interception/interception.h"
#include "gtest/gtest.h"

static int InterceptorFunctionCalled;

DECLARE_REAL(int, isdigit, int);

INTERCEPTOR(int, isdigit, int d) {
  ++InterceptorFunctionCalled;
  return d >= '0' && d <= '9';
}

namespace __interception {

TEST(Interception, Basic) {
  EXPECT_TRUE(INTERCEPT_FUNCTION(isdigit));

  // After interception, the counter should be incremented.
  InterceptorFunctionCalled = 0;
  EXPECT_NE(0, isdigit('1'));
  EXPECT_EQ(1, InterceptorFunctionCalled);
  EXPECT_EQ(0, isdigit('a'));
  EXPECT_EQ(2, InterceptorFunctionCalled);

  // Calling the REAL function should not affect the counter.
  InterceptorFunctionCalled = 0;
  EXPECT_NE(0, REAL(isdigit)('1'));
  EXPECT_EQ(0, REAL(isdigit)('a'));
  EXPECT_EQ(0, InterceptorFunctionCalled);
}

}  // namespace __interception

這段測試代碼基於 sanitizer 的 interceptor 機制替換了 isdigit 函數的實現。在測試文件實現的 isdigit 函數中，每次 isdigit 函數被調用時都將變量 InterceptorFunctionCalled 自增 1，然後通過檢驗變量 InterceptorFunctionCalled 的值來測試 interceptor 機制的實現是否正確。

上述測試文件 interception_linux_test.cpp 中實現替換 isdigit 函數的核心部分是如下代碼片段：

暫時無法在飛書文檔外展示此內容

INTERCEPTOR(int, isdigit, int d) { ... } 用於將函數 isdigit 的實現替換為 { ... } 的實現
在代碼中調用 isdigit 之前，需要先調用 INTERCEPT_FUNCTION(isdigit)。如果 INTERCEPT_FUNCTION(isdigit) 返回為 true，則説明成功替換了將 libc 中 isdigit 函數的實現。
REAL(isdigit)('1') 用於調用真正的 isdigit 實現，不過在調用 REAL(isdigit)('1') 之前需要先 DECLARE_REAL(int, isdigit, int)。

上述代碼在宏展開後的內容如下：

INTERCEPTOR(int, isdigit, int d) {
  ++InterceptorFunctionCalled;
  return d >= '0' && d <= '9';
}

INTERCEPT_FUNCTION(isdigit);

DECLARE_REAL(int, isdigit, int);
REAL(isdigit)('1');

我們首先看下 INTERCEPTOR 宏做了哪些事情
- 首先在 __interception namespace 中定義了一個函數指針 real_isdigit，該函數指針實際上在 INTERCEPT_FUNCTION 宏中會被設置為指向真正的 isdigit 函數地址。
- 然後將 isdigit 函數設置為弱符號 (weak)，並且將 isdigit 設置成 __interceptor_isdigit 的別名 (alias)。
- 最後將我們自己版本的 isdigit 函數邏輯實現在 __interceptor_isdigit 函數中

根據 symbol interposition 這一節的內容，我們知道：要想替換 libc.so.6 中某個函數的實現（不妨把該函數稱作 foo），只需要在 sanitizer 運行時庫中定義同名 foo 函數，然後讓動態鏈接器在查找符號時 sanitizer 運行時庫的順序先於 libc.so.6 即可。

那為什麼這裏要將我們的 isdigit 函數邏輯實現在函數 __interceptor_isdigit 中，並且將 isdigit 設置成 __interceptor_isdigit 的別名呢？

考慮如下場景：假設用户代碼中也替換了 isdigit 函數的實現，添加了自己的邏輯，那麼最終動態鏈接器選擇的是用户代碼中的 isdigit 的實現，而不是 sanitizer 運行時庫中 isdigit 的實現，這樣的話 sanitizer 的功能就不能正確運行了（實際上 sanitizer 運行時庫中並沒有替換 isdigit 的實現，這裏只是用 isdigit 舉例子便於説明）。

但是如果我們在 sanitizer 運行時庫中將 isdigit 設置成 __interceptor_isdigit 的別名，那麼在用户代碼中自己替換 isdigit 實現時就可以顯式調用 __interceptor_isdigit 。這樣既不影響用户自行替換庫函數，也不影響 sanitizer 功能的正確運行：

extern "C" int __interceptor_isdigit(int d);
extern "C" int isdigit(int d) {
 fprintf(stderr, "my_isdigit_interceptor\n");
 return __interceptor_isdigit(d);
}

那在 sanitizer 運行時庫中為什麼將被替換的函數設置為弱符號呢？這是因為如果不設置為弱符號，在靜態鏈接 sanitizer 運行時庫時就會因為 multiple definition 而鏈接失敗。

接着我們看下 INTERCEPT_FUNCTION 宏做了哪些事情
- INTERCEPT_FUNCTION 宏展開後就是對 __interception::InterceptFunction 函數的調用。InterceptFunction 的函數定義：

namespace __interception {
static void *GetFuncAddr(const char *name, uptr wrapper_addr) {
  void *addr = dlsym(RTLD_NEXT, name);
  if (!addr) {
    // If the lookup using RTLD_NEXT failed, the sanitizer 運行時庫 is
    // later in the library search order than the DSO that we are trying to
    // intercept, which means that we cannot intercept this function. We still
    // want the address of the real definition, though, so look it up using
    // RTLD_DEFAULT.
    addr = dlsym(RTLD_DEFAULT, name);

    // In case `name' is not loaded, dlsym ends up finding the actual wrapper.
    // We don't want to intercept the wrapper and have it point to itself.
    if ((uptr)addr == wrapper_addr)
      addr = nullptr;
  }
  return addr;
}

bool InterceptFunction(const char *name, uptr *ptr_to_real, uptr func,
                       uptr wrapper) {
  void *addr = GetFuncAddr(name, wrapper);
  *ptr_to_real = (uptr)addr;
  return addr && (func == wrapper);
}
}  // namespace __interception

其實 InterceptFunction 函數的實現很簡單：首先通過函數 GetFuncAddr 獲得原本的名為 name 的函數地址，然後將該地址保存至指針 ptr_to_real 指向的內存。

函數 GetFuncAddr 的代碼實現也很簡單，核心就是 dlsym：

dlsym 的第一個參數為 RTLD_DEFAULT 時，查找名為 name 的函數地址的順序就是前面提到的 executable, preload0.so, preload1.so needed0.so, needed1.so, needed2.so, needed0_of_needed0.so, needed1_of_needed0.so, ... 這個順序。
dlsym 的第一個參數為 RTLD_NEXT 時，則是以當前 object 後面動態庫為起點去查找名為 name 的函數的地址

這也是為什麼在函數 GetFuncAddr 中，先用 dlsym(RTLD_NEXT, name) 尋找被替換函數的真實地址，因為依賴項 sanitizer 運行時庫是先於 name 函數真正所在的動態庫。

最後我們看下 DECLARE_REAL 宏和 REAL 宏做了哪些事情

DECLARE_REAL 展開後就是聲明瞭在 __interception namespace 中存在一個指向被替換函數真正實現的函數指針，REAL 宏就是通過這個函數指針來調用被替換函數的真正實現。

例如，在測試用例中，DECLARE_REAL(int, isdigit, int); 就是在聲明 __interception namespace 中存在一個函數指針 real_isdigit，該函數指針指向真正的 isdigit 函數地址，通過 REAL(isdigit) 來調用真正的 isdigit 函數。

總結

至此，我們就明白在 Linux 下 sanitizer interceptor 機制的底層原理了。

ASan 基於 sanitizer interceptor 機制替換了 malloc/free 這類的內存分配/釋放函數，使得所有的內存分配和釋放都由 ASan 實現的內存分配器負責，這樣 ASan 就能很容易檢測到 heap-use-after-free，double-free 這樣的堆內存錯誤。

對於 sanitizer 的使用者來説，熟悉 sanitizer 的原理後，就能夠幫忙我們更好地理解它，利用它的機制幫助我們更高效地排查程序中存在的疑難錯誤。

參考鏈接

直播預告

近日，字節正式對外開源了高性能的C++ JSON 庫sonic-cpp，極致地利用當前 CPU 硬件特性與向量化編程，大幅提高了序列化反序列化性能，解析性能為 rapidjson 的 2.5 倍。sonic-cpp 在字節內部上線以來，已為抖音、今日頭條等核心業務，累計節省了數十萬 CPU 核心。

為了幫助大家更好地理解其原理與使用，我們將於2022年12月15日19:30在《掘金公開課18期》 ，與大家直播分享 sonic-cpp 的技術原理、實踐效果和未來規劃。參與直播互動還有機會贏取周邊禮品哦！禮品多多，歡迎大家關注並掃描下方二維碼預約直播。

直播互動禮品圖片

「其他文章」