日常辣雞水文:關於 logging 的進程安全問題

日常辣雞水文：關於 logging 的進程安全問題#

團隊聚餐喝了點酒，作為一個垃圾文檔工程師來寫一篇日常水文

正文#

現在團隊的日誌搜集方式從原本的 TCP 直傳 logstash 的方式改進為寫入一個單文件後，改用 FileBeat 來作為日誌搜集的前端。但是這樣時常帶來一個問題，即日誌丟失

嗯，我們線上服務是 Gunicorn 啟用多個 Worker 來處理的。這就有個問題了，我們都知道，logging 模塊是 Thread Safe 的，在標準的 Log Handler 內部加了一系列鎖來確保線程安全，但是 logging 直寫文件是不是進程安全的呢？

分析#

我們寫文件的方式是用的是 logging 模塊中自帶的 FileHandler ，首先看看它源碼吧

class FileHandler(StreamHandler):
    """
    A handler class which writes formatted logging records to disk files.
    """
    def __init__(self, filename, mode='a', encoding=None, delay=False):
        """
        Open the specified file and use it as the stream for logging.
        """
        # Issue #27493: add support for Path objects to be passed in
        filename = os.fspath(filename)
        #keep the absolute path, otherwise derived classes which use this
        #may come a cropper when the current directory changes
        self.baseFilename = os.path.abspath(filename)
        self.mode = mode
        self.encoding = encoding
        self.delay = delay
        if delay:
            #We don't open the stream, but we still need to call the
            #Handler constructor to set level, formatter, lock etc.
            Handler.__init__(self)
            self.stream = None
        else:
            StreamHandler.__init__(self, self._open())

    def close(self):
        """
        Closes the stream.
        """
        self.acquire()
        try:
            try:
                if self.stream:
                    try:
                        self.flush()
                    finally:
                        stream = self.stream
                        self.stream = None
                        if hasattr(stream, "close"):
                            stream.close()
            finally:
                # Issue #19523: call unconditionally to
                # prevent a handler leak when delay is set
                StreamHandler.close(self)
        finally:
            self.release()

    def _open(self):
        """
        Open the current base file with the (original) mode and encoding.
        Return the resulting stream.
        """
        return open(self.baseFilename, self.mode, encoding=self.encoding)

    def emit(self, record):
        """
        Emit a record.

        If the stream was not opened because 'delay' was specified in the
        constructor, open it before calling the superclass's emit.
        """
        if self.stream is None:
            self.stream = self._open()
        StreamHandler.emit(self, record)

    def __repr__(self):
        level = getLevelName(self.level)
        return '<%s %s (%s)>' % (self.__class__.__name__, self.baseFilename, level)

嗯，其中關注的點是 _open 方法，以及 emit 方法，首先科普一個背景知識，在我們用 logging 輸出日誌的時候，logging 模塊會調用對應 Handler 中的 handle 方法，在 handle 方法中，會調用 emit 方法輸出最後的日誌。於是我們如果使用 FileHandler 的話，那麼就是先觸發 handle 方法的調用，然後觸發 emit 方法，在調用 _open 方法獲取一個 file point 後，調用父類（更準確的描述書 MRO 上一級）StreamHandler 中 emit 方法。

來看看 StreamHandler 中的 emit 方法吧

class StreamHandler(Handler):
    """
    A handler class which writes logging records, appropriately formatted,
    to a stream. Note that this class does not close the stream, as
    sys.stdout or sys.stderr may be used.
    """

    terminator = '\n'

    def __init__(self, stream=None):
        """
        Initialize the handler.

        If stream is not specified, sys.stderr is used.
        """
        Handler.__init__(self)
        if stream is None:
            stream = sys.stderr
        self.stream = stream

    def flush(self):
        """
        Flushes the stream.
        """
        self.acquire()
        try:
            if self.stream and hasattr(self.stream, "flush"):
                self.stream.flush()
        finally:
            self.release()

    def emit(self, record):
        """
        Emit a record.

        If a formatter is specified, it is used to format the record.
        The record is then written to the stream with a trailing newline.  If
        exception information is present, it is formatted using
        traceback.print_exception and appended to the stream.  If the stream
        has an 'encoding' attribute, it is used to determine how to do the
        output to the stream.
        """
        try:
            msg = self.format(record)
            stream = self.stream
            stream.write(msg)
            stream.write(self.terminator)
            self.flush()
        except Exception:
            self.handleError(record)

    def __repr__(self):
        level = getLevelName(self.level)
        name = getattr(self.stream, 'name', '')
        if name:
            name += ' '
        return '<%s %s(%s)>' % (self.__class__.__name__, name, level)

嗯很簡單，就是調用我們之前獲取的 file point 往文件中寫入數據

問題就在這裡，在 FileHandler 中，_open 函數中調用 open 函數時，所選擇的 mode 是 'a' ，也就是通常而言的 O_APPEND 模式。我們知道，通常而言 O_APPEND 可以視作進程安全的，因為 O_APPEND 能夠保證內容不被別的 O_APPEND 寫操作所覆蓋。但是這裡為什麼會出現日誌丟失的情況呢？

原因是在 POSIX 中存在著一種特殊設計，在《POSIX Programmers Guide》一書中對此描述如下：

A write of fewer than PIPE_BUF bytes is atomic; the data will not be interleaved with data from other processes writing to the same pipe. A write of more than PIPE_BUF may have data interleaved in arbitrary ways.

這段話翻譯大概就是，在 POSIX 中存在著一個變量叫做 PIPE_BUF ，這個變量大小為 512 ，在進行寫入操作時，如果大小小於 PIPE_BUF 值的寫操作，是具有原子性的，即不可被打斷，因此不會和其餘進程寫入的值產生混亂，而如果寫入的內容大於 PIPE_BUF ，則操作系統不能保證這一點。

在 Linux 操作系統中，這個值發生了一點變化

POSIX.1 says that write(2)s of less than PIPE_BUF bytes must be atomic: the output data is written to the pipe as a contiguous sequence. Writes of more than PIPE_BUF bytes may be nonatomic: the kernel may interleave the data with data written by other processes. POSIX.1 requires PIPE_BUF to be at least 512 bytes. (On Linux, PIPE_BUF is 4096 bytes.)

即大於 4K 的寫入操作都不能保證其原子性，可能會發生數據紊亂。

而發生數據紊亂後，其日誌格式不固定，最終造成解析端沒法解析，從而最終日誌丟失。

這裡我們復現一下，首先測試代碼

最後#

這種操作之前從未想過，今天算是打開了新的大門，最後感謝 @依雲前輩的指點 = = 如果沒有前輩的提醒，完全想不到即便是 O_APPEND 模式下，數據也不能保證安全。

Reference#

文中參考了兩處參考資料，鏈接如下

1.OReilly POSIX Programmers Guide

2.Linux Man: PIPE