Detailed Explanation of Template Engine Working Mechanism

Original link: How a template engine works

Original author: Shipeng Feng
Translation from: Gold Mining Translation Project
Translator: Zheaoli
Proofreaders: Kulbear, hpoenixf

I have been using various template engines for a long time, and now I finally have time to study how template engines actually work.

Introduction#

Simply put, a template engine is a tool that can be used to accomplish programming tasks involving a large amount of text data. Generally speaking, we often use template engines in a web application to generate HTML. In Python, when you want to use a template engine, you will find that you have quite a few options, such as jinja or mako. From now on, we will use the template engine in tornado to explain how template engines work. The built-in template engine in tornado is relatively simple, which allows us to delve into its principles easily.

Before we study the implementation principles of the (template engine), let’s look at a simple example of an interface call.

    from tornado import template

    PAGE_HTML = """
    <html>
      Hello, {{ username }}!
      <ul>
        {% for job in job_list %}
          <li>{{ job }}</li>
        {% end %}
      </ul>
    </html>
  
    """
    t = template.Template(PAGE_HTML)
    print t.generate(username='John', job_list=['engineer'])

In this code, the username will be generated dynamically, and the job list will be as well. You can install tornado and run this code to see the final effect.

Detailed Explanation#

If you closely observe PAGE_HTML, you will find that this template string consists of two parts: one part is a fixed string, and the other part is the content that will be generated dynamically. We will use special symbols to mark the dynamically generated parts. Throughout the workflow, the template engine needs to correctly output the fixed strings while replacing the marked strings that need to be generated dynamically with the correct results.

The simplest way to use a template engine is to solve it with a single line of python code like this:

    deftemplate_engine(template_string, **context):# process here return result_string

In the entire working process, the template engine will operate on our string in the following two stages:

Parsing
Rendering

In the parsing stage, we will parse the prepared string and format it into a renderable format, which may be a string that can be parsed by rendered.Consider. The parser may be an interpreter of a language or a compiler of a language. If the parser is an interpreter, a special data structure will be generated during the parsing process to store the data, and the renderer will traverse the entire data structure to perform rendering. For example, the parser in the Django template engine is an interpreter-based tool. In addition, the parser may generate some executable code, and the renderer will only execute this code to generate the corresponding result. In Jinja2, Mako, and Tornado, the template engines use a compiler as the parsing tool.

Compilation#

As mentioned above, we need to parse the template string we have written, and then the template parser in tornado will compile the template string we have written into executable Python code. Our parsing tool is responsible for generating Python code, which consists of a single Python function:

    def parse_template(template_string):
      # compilation
      return python_source_code

Before we analyze the code of parse_template, let’s first look at an example of a template string:

    <html>
      Hello, { { username } }!
      <ul>
        { % for job in jobs % }
          <li>{ { job.name } }</li>
        { % end % }
      </ul>
    </html>

The parse_template function in the template engine will compile the above string into Python source code, and the simplest implementation is as follows:

    def _execute():
        _buffer = []
        _buffer.append('\n<html>\n  Hello, ')
        _tmp = username
        _buffer.append(str(_tmp))
        _buffer.append('!\n  <ul>\n    ')
        for job in jobs:
            _buffer.append('\n      <li>')
            _tmp = job.name
            _buffer.append(str(_tmp))
            _buffer.append('</li>\n    ')
        _buffer.append('\n  </ul>\n</html>\n')
        return ''.join(_buffer)

Now we are processing our template in the _execute function. This function can use all valid variables in the global namespace. This function will create a list containing multiple strings and return them after merging. Obviously, finding a local variable is much faster than finding a global variable. At the same time, we also optimize the rest of the code at this stage, such as:

    _buffer.append('hello')

    _append_buffer = _buffer.append
    # faster for repeated use
    _append_buffer('hello')

The expressions in { { ... } } will be extracted and added to the string list. In the tornado template module, there are no restrictions on the expressions written in { { ... } }, and if and for code blocks can be accurately converted into Python code.

Let’s take a look at the specific code implementation#

Let’s take a look at the specific implementation of the template engine. We declare core variables in the Template class. After we create a Template object, we can compile the template string we have written, and then we can render it based on the compiled result. We only need to compile the template string we have written once, and we can cache our compilation results. Below is a simplified version of the constructor of the Template class:

    class Template(object):
        def __init__(self, template_string):
            self.code = parse_template(template_string)
            self.compiled = compile(self.code, '<string>', 'exec')

The compile function in the above code will compile the string into executable code, and we can later call the exec function to execute the generated code. Now, let’s take a look at the implementation of the parse_template function. First, we need to convert the template string we have written into independent nodes to prepare for generating Python code later. During this process, we need a _parse function, and we will set it aside for now and come back to it later. Now, we need to write some helper functions to help us read data from the template file. Let’s take a look at the _TemplateReader class, which is used to read data from our custom template:

    class _TemplateReader(object):
      def __init__(self, text):
          self.text = text
          self.pos = 0

      def find(self, needle, start=0, end=None):
          pos = self.pos
          start += pos
          if end is None:
              index = self.text.find(needle, start)
          else:
              end += pos
              index = self.text.find(needle, start, end)
          if index != -1:
              index -= pos
          return index

      def consume(self, count=None):
          if count is None:
              count = len(self.text) - self.pos
          newpos = self.pos + count
          s = self.text[self.pos:newpos]
          self.pos = newpos
          return s

      def remaining(self):
          return len(self.text) - self.pos

      def __len__(self):
          return self.remaining()

      def __getitem__(self, key):
          if key < 0:
              return self.text[key]
          else:
              return self.text[self.pos + key]

      def __str__(self):
          return self.text[self.pos:]

To generate Python code, we need to look at the source code of the _CodeWriter class, which can write lines of code and manage indentation, and it is also a Python context manager:

    class _CodeWriter(object):
      def __init__(self):
          self.buffer = cStringIO.StringIO()
          self._indent = 0

      def indent(self):
          return self

      def indent_size(self):
          return self._indent

      def __enter__(self):
          self._indent += 1
          return self

      def __exit__(self, *args):
          self._indent -= 1

      def write_line(self, line, indent=None):
          if indent is None:
              indent = self._indent
          for i in xrange(indent):
              self.buffer.write("    ")
          print self.buffer, line

      def __str__(self):
          return self.buffer.getvalue()

In the parse_template function, we first need to create a _TemplateReader object:

    def parse_template(template_string):
        reader = _TemplateReader(template_string)
        file_node = _File(_parse(reader))
        writer = _CodeWriter()
        file_node.generate(writer)
        return str(writer)

Then, we pass the _TemplateReader object we created into the _parse function to generate a list of nodes. All nodes generated here are child nodes of the template file. Next, we create a _CodeWriter object, and then the file_node object will write the generated Python code into the _CodeWriter object. Then we return a series of dynamically generated Python code. The _Node class will generate Python source code in a special way. We will set this aside for now and come back to it later. Now let’s look back at the _parse function mentioned earlier:

    def _parse(reader, in_block=None):
      body = _ChunkList([])
      while True:
          # Find next template directive
          curly = 0
          while True:
              curly = reader.find("{", curly)
              if curly == -1 or curly + 1 == reader.remaining():
                  # EOF
                  if in_block:
                      raise ParseError("Missing { %% end %% } block for %s" %
                                       in_block)
                  body.chunks.append(_Text(reader.consume()))
                  return body
              # If the first curly brace is not the start of a special token,
              # start searching from the character after it
              if reader[curly + 1] not in ("{", "%"):
                  curly += 1
                  continue
              # When there are more than 2 curlies in a row, use the
              # innermost ones.  This is useful when generating languages
              # like latex where curlies are also meaningful
              if (curly + 2 < reader.remaining() and
                  reader[curly + 1] == '{' and reader[curly + 2] == '{'):
                  curly += 1
                  continue
              break

We will loop indefinitely in the file to look for the special markers we specified. When we reach the end of the file, we will add the text node to the list and exit the loop.

    # Append any text before the special token
    if curly > 0:
      body.chunks.append(_Text(reader.consume(curly)))

Before we process the code blocks of the special markers, we first add the static parts to the node list.

    start_brace = reader.consume(2)

When encountering { { or { %, we begin to process the corresponding expressions:

    # Expression
    if start_brace == "{ {":
        end = reader.find("} }")
        if end == -1 or reader.find("\n", 0, end) != -1:
            raise ParseError("Missing end expression } }")
        contents = reader.consume(end).strip()
        reader.consume(2)
        if not contents:
            raise ParseError("Empty expression")
        body.chunks.append(_Expression(contents))
        continue

When encountering { {, it means that an expression will follow, and we only need to extract the expression and add it to the _Expression node list.

      # Block
      assert start_brace == "{ %", start_brace
      end = reader.find("% }")
      if end == -1 or reader.find("\n", 0, end) != -1:
          raise ParseError("Missing end block % }")
      contents = reader.consume(end).strip()
      reader.consume(2)
      if not contents:
          raise ParseError("Empty block tag ({ % % })")
      operator, space, suffix = contents.partition(" ")
      # End tag
      if operator == "end":
          if not in_block:
              raise ParseError("Extra { % end % } block")
          return body
      elif operator in ("try", "if", "for", "while"):
          # parse inner body recursively
          block_body = _parse(reader, operator)
          block = _ControlBlock(contents, block_body)
          body.chunks.append(block)
          continue
      else:
          raise ParseError("unknown operator: %r" % operator)

When encountering code blocks in the template, we need to extract the code blocks recursively and add them to the _ControlBlock node list. When encountering { % end %}, it means the end of this code block, and we can exit the corresponding function.

Now, let’s look at the _Node node mentioned earlier. Don’t worry, it’s actually quite simple:

    class _Node(object):
      def generate(self, writer):
          raise NotImplementedError()


    class _ChunkList(_Node):
      def __init__(self, chunks):
          self.chunks = chunks

      def generate(self, writer):
          for chunk in self.chunks:
              chunk.generate(writer)

_ChunkList is just a list of nodes.

    class _File(_Node):
      def __init__(self, body):
          self.body = body

      def generate(self, writer):
          writer.write_line("def _execute():")
          with writer.indent():
              writer.write_line("_buffer = []")
              self.body.generate(writer)
              writer.write_line("return ''.join(_buffer)")

In _File, it will write the _execute function into CodeWriter.

    class _Expression(_Node):
        def __init__(self, expression):
            self.expression = expression

        def generate(self, writer):
            writer.write_line("_tmp = %s" % self.expression)
            writer.write_line("_buffer.append(str(_tmp))")


    class _Text(_Node):
        def __init__(self, value):
            self.value = value

        def generate(self, writer):
            value = self.value
            if value:
                writer.write_line('_buffer.append(%r)' % value)

The implementations of _Text and _Expression nodes are also very simple; they just add the data we get from the template into the list.

    class _ControlBlock(_Node):
        def __init__(self, statement, body=None):
            self.statement = statement
            self.body = body

        def generate(self, writer):
            writer.write_line("%s:" % self.statement)
            with writer.indent():
                self.body.generate(writer)

In _ControlBlock, we need to format the code blocks we obtain according to Python syntax.

Now let’s take a look at the rendering part of the template engine mentioned earlier. We call the generate method implemented in the Template object to invoke the Python code parsed from the template.

    def generate(self, **kwargs):
        namespace = { }
        namespace.update(kwargs)
        exec self.compiled in namespace
        execute = namespace["_execute"]
        return execute()

In the given global namespace, the exec function will execute the compiled code object. Then we can call the _execute function globally.

Conclusion#

After the above series of operations, we can compile our templates and get the corresponding results. In fact, there are many features in the tornado template engine that we have not discussed, but we have understood its most basic working mechanism. You can study the parts you are interested in based on this foundation, such as:

Template inheritance
Template inclusion
Other logical control statements, such as else, elif, try, etc.
Whitespace control
Special character escaping
More template directives not mentioned (Translator's note: Please refer to the tornado official documentation for more details.)