Manjusaka

Manjusaka

日常辣雞水文:關於 logging 的進程安全問題

日常辣雞水文:關於 logging 的進程安全問題#

團隊聚餐喝了點酒,作為一個垃圾文檔工程師來寫一篇日常水文

正文#

現在團隊的日誌搜集方式從原本的 TCP 直傳 logstash 的方式改進為寫入一個單文件後,改用 FileBeat 來作為日誌搜集的前端。但是這樣時常帶來一個問題,即日誌丟失

嗯,我們線上服務是 Gunicorn 啟用多個 Worker 來處理的。這就有個問題了,我們都知道,logging 模塊是 Thread Safe 的,在標準的 Log Handler 內部加了一系列鎖來確保線程安全,但是 logging 直寫文件是不是進程安全的呢?

分析#

我們寫文件的方式是用的是 logging 模塊中自帶的 FileHandler ,首先看看它源碼吧

class FileHandler(StreamHandler):
    """
    A handler class which writes formatted logging records to disk files.
    """
    def __init__(self, filename, mode='a', encoding=None, delay=False):
        """
        Open the specified file and use it as the stream for logging.
        """
        # Issue #27493: add support for Path objects to be passed in
        filename = os.fspath(filename)
        #keep the absolute path, otherwise derived classes which use this
        #may come a cropper when the current directory changes
        self.baseFilename = os.path.abspath(filename)
        self.mode = mode
        self.encoding = encoding
        self.delay = delay
        if delay:
            #We don't open the stream, but we still need to call the
            #Handler constructor to set level, formatter, lock etc.
            Handler.__init__(self)
            self.stream = None
        else:
            StreamHandler.__init__(self, self._open())

    def close(self):
        """
        Closes the stream.
        """
        self.acquire()
        try:
            try:
                if self.stream:
                    try:
                        self.flush()
                    finally:
                        stream = self.stream
                        self.stream = None
                        if hasattr(stream, "close"):
                            stream.close()
            finally:
                # Issue #19523: call unconditionally to
                # prevent a handler leak when delay is set
                StreamHandler.close(self)
        finally:
            self.release()

    def _open(self):
        """
        Open the current base file with the (original) mode and encoding.
        Return the resulting stream.
        """
        return open(self.baseFilename, self.mode, encoding=self.encoding)

    def emit(self, record):
        """
        Emit a record.

        If the stream was not opened because 'delay' was specified in the
        constructor, open it before calling the superclass's emit.
        """
        if self.stream is None:
            self.stream = self._open()
        StreamHandler.emit(self, record)

    def __repr__(self):
        level = getLevelName(self.level)
        return '<%s %s (%s)>' % (self.__class__.__name__, self.baseFilename, level)

嗯,其中關注的點是 _open 方法,以及 emit 方法,首先科普一個背景知識,在我們用 logging 輸出日誌的時候,logging 模塊會調用對應 Handler 中的 handle 方法,在 handle 方法中,會調用 emit 方法輸出最後的日誌。於是我們如果使用 FileHandler 的話,那麼就是先觸發 handle 方法的調用,然後觸發 emit 方法,在調用 _open 方法獲取一個 file point 後,調用父類(更準確的描述書 MRO 上一級)StreamHandleremit 方法。

來看看 StreamHandler 中的 emit 方法吧

class StreamHandler(Handler):
    """
    A handler class which writes logging records, appropriately formatted,
    to a stream. Note that this class does not close the stream, as
    sys.stdout or sys.stderr may be used.
    """

    terminator = '\n'

    def __init__(self, stream=None):
        """
        Initialize the handler.

        If stream is not specified, sys.stderr is used.
        """
        Handler.__init__(self)
        if stream is None:
            stream = sys.stderr
        self.stream = stream

    def flush(self):
        """
        Flushes the stream.
        """
        self.acquire()
        try:
            if self.stream and hasattr(self.stream, "flush"):
                self.stream.flush()
        finally:
            self.release()

    def emit(self, record):
        """
        Emit a record.

        If a formatter is specified, it is used to format the record.
        The record is then written to the stream with a trailing newline.  If
        exception information is present, it is formatted using
        traceback.print_exception and appended to the stream.  If the stream
        has an 'encoding' attribute, it is used to determine how to do the
        output to the stream.
        """
        try:
            msg = self.format(record)
            stream = self.stream
            stream.write(msg)
            stream.write(self.terminator)
            self.flush()
        except Exception:
            self.handleError(record)

    def __repr__(self):
        level = getLevelName(self.level)
        name = getattr(self.stream, 'name', '')
        if name:
            name += ' '
        return '<%s %s(%s)>' % (self.__class__.__name__, name, level)

嗯很簡單,就是調用我們之前獲取的 file point 往文件中寫入數據

問題就在這裡,在 FileHandler 中,_open 函數中調用 open 函數時,所選擇的 mode'a' ,也就是通常而言的 O_APPEND 模式。我們知道,通常而言 O_APPEND 可以視作進程安全的,因為 O_APPEND 能夠保證內容不被別的 O_APPEND 寫操作所覆蓋。但是這裡為什麼會出現日誌丟失的情況呢?

原因是在 POSIX 中存在著一種特殊設計,在 《POSIX Programmers Guide》 一書中對此描述如下:

  • A write of fewer than PIPE_BUF bytes is atomic; the data will not be interleaved with data from other processes writing to the same pipe. A write of more than PIPE_BUF may have data interleaved in arbitrary ways.

這段話翻譯大概就是,在 POSIX 中存在著一個變量叫做 PIPE_BUF ,這個變量大小為 512 ,在進行寫入操作時,如果大小小於 PIPE_BUF 值的寫操作,是具有原子性的,即不可被打斷,因此不會和其餘進程寫入的值產生混亂,而如果寫入的內容大於 PIPE_BUF ,則操作系統不能保證這一點。

在 Linux 操作系統中,這個值發生了一點變化

  • POSIX.1 says that write(2)s of less than PIPE_BUF bytes must be atomic: the output data is written to the pipe as a contiguous sequence. Writes of more than PIPE_BUF bytes may be nonatomic: the kernel may interleave the data with data written by other processes. POSIX.1 requires PIPE_BUF to be at least 512 bytes. (On Linux, PIPE_BUF is 4096 bytes.)

即大於 4K 的寫入操作都不能保證其原子性,可能會發生數據紊亂。

而發生數據紊亂後,其日誌格式不固定,最終造成解析端沒法解析,從而最終日誌丟失。

這裡我們復現一下,首先測試代碼

最後#

這種操作之前從未想過,今天算是打開了新的大門,最後感謝 @依雲 前輩的指點 = = 如果沒有前輩的提醒,完全想不到即便是 O_APPEND 模式下,數據也不能保證安全。

Reference#

文中參考了兩處參考資料,鏈接如下

1.OReilly POSIX Programmers Guide

2.Linux Man: PIPE

載入中......
此文章數據所有權由區塊鏈加密技術和智能合約保障僅歸創作者所有。