Manjusaka

Manjusaka

Implementing the simplest object model with Python

A Simple Object Model#

Carl Friedrich Bolz is a researcher at King's College London, deeply engrossed in the implementation and optimization of dynamic languages. He is one of the core developers of PyPy/RPython, and at the same time, he contributes code to languages such as Prolog, Racket, Smalltalk, PHP, and Ruby. This is his Twitter @cfbolz.

Introduction#

Object-oriented programming is a widely used programming paradigm that is supported by many modern programming languages. Although most languages provide similar object-oriented mechanisms for programmers, a closer examination reveals many differences among them. A commonality among most languages is that they have object handling and inheritance mechanisms. However, not every language perfectly supports classes. For example, languages like Self or JavaScript, which use prototype inheritance, do not have the concept of classes; their inheritance behavior occurs between objects.

Understanding the object models of different languages is a fascinating endeavor. This allows us to appreciate the similarities among different programming languages. It must be said that such experiences can help us leverage our existing knowledge when learning new languages, enabling us to grasp them more quickly.

This article will guide you in implementing a simple object model. First, we will create a simple class and its instances, allowing access to some methods through these instances. This is the object-oriented model adopted by early object-oriented languages such as Simula 67 and Smalltalk. Then we will gradually expand this model, where you will see how the next two steps will showcase different language design philosophies, and the final step will focus on optimizing the performance of our object model. Ultimately, the model we arrive at is not one adopted by any real existing language; however, if one must say, you can view our final model as a low-spec version of the Python object model.

The object model presented in this article is implemented in Python. The code runs perfectly on both Python 2.7 and Python 3.4. To help everyone better understand the design philosophy behind the model, this article also provides unit tests for the object model we designed, which can be run using py.test or nose.

To be honest, using Python as the implementation language for the object model is not a good choice. Generally speaking, the virtual machines of languages are implemented in lower-level languages like C/C++, and many details need to be carefully considered to ensure execution efficiency. However, a simple language like Python allows us to focus our main efforts on different behavioral expressions rather than getting bogged down in implementation details.

Basic Method Model#

We will start by explaining our object model with a very simple implementation from Smalltalk. Smalltalk is an object-oriented language developed in the 1970s by a team led by Alan Kay at Xerox PARC. It popularized object-oriented programming, and many features from that time can still be seen in today's programming languages. One of the core design principles of Smalltalk is: "Everything is an object." The most well-known descendant of Smalltalk is Ruby, a language that retains the Smalltalk object model while using syntax similar to C.

In this section, the object model we implement will include classes, instances, property access and modification, method invocation, and will allow for the existence of subclasses. Before we begin, it should be noted that the classes here are ordinary classes with their own properties and methods.

A very good practice is to write test code first to constrain the behavior of the specific implementation. The test code written in this article consists of two parts. The first part consists of conventional Python code, which may use classes and other advanced features in Python. The second part will use our self-defined object model to replace Python's classes.

When writing test code, we need to manually maintain the mapping between conventional Python classes and our self-defined classes. For example, in our custom class, we will use obj.read_attr("attribute") as a substitute for obj.attribute in Python. In real life, such mapping relationships would be implemented by the language's compiler/interpreter.

In this article, we further simplify the model so that the code for implementing the object model looks similar to the code for writing methods in objects. In reality, this is generally impossible, as these two are usually implemented by different languages.

First, let's write a piece of code to test reading and modifying object fields:


def test_read_write_field():
    # Python code
    class A(object):
            pass
    obj = A()
    obj.a = 1
    assert obj.a == 1
    obj.b = 5
    assert obj.a == 1
    assert obj.b == 5
    obj.a = 2
    assert obj.a == 2
    assert obj.b == 5

    # Object model code
    A = Class(name="A", base_class=OBJECT, fields={}, metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("a", 1)
    assert obj.read_attr("a") == 1
    obj.write_attr("b", 5)
    assert obj.read_attr("a") == 1
    assert obj.read_attr("b") == 5
    obj.write_attr("a", 2)
    assert obj.read_attr("a") == 2
    assert obj.read_attr("b") == 5

The test code above includes three things we must implement. The Class and Instance classes represent the classes and instances in our object model, respectively. There are also two special instances of classes: OBJECT and TYPE. OBJECT corresponds to the object class, which serves as the starting point for Python's inheritance system (Note: In Python 2.x, there are actually two class systems, one referred to as new style class and the other as old style class, with object being the base class of new style class). TYPE corresponds to the type in Python's type system.

To provide general operation support for instances of the Class and Instance classes, both classes will inherit from a base class called Base, which provides a series of methods:

class Base(object):
    """ The base class that all of the object model classes inherit from. """
    def __init__(self, cls, fields):
        """ Every object has a class. """
        self.cls = cls
        self._fields = fields
    def read_attr(self, fieldname):
        """ read field 'fieldname' out of the object """
        return self._read_dict(fieldname)
    def write_attr(self, fieldname, value):
        """ write field 'fieldname' into the object """
        self._write_dict(fieldname, value)
    def isinstance(self, cls):
        """ return True if the object is an instance of class cls """
        return self.cls.issubclass(cls)
    def callmethod(self, methname, *args):
        """ call method 'methname' with arguments 'args' on object """
        meth = self.cls._read_from_class(methname)
        return meth(self, *args)
    def _read_dict(self, fieldname):
        """ read an field 'fieldname' out of the object's dict """
        return self._fields.get(fieldname, MISSING)
    def _write_dict(self, fieldname, value):
        """ write a field 'fieldname' into the object's dict """
        self._fields[fieldname] = value

MISSING = object()

Base implements the storage of object classes and uses a dictionary to store the values of object fields. Now, we need to implement the Class and Instance classes. The constructor of Instance will complete the instantiation of the class and initialize fields and dict. In other words, Instance is just a subclass of Base and does not add any extra methods.

The constructor of Class will accept the class name, base class, class dictionary, and metaclass as parameters. For classes, these variables will be passed to the constructor by the user during class initialization. The constructor will also obtain default values for variables from its base class. However, we will discuss this point in the next chapter.

class Instance(Base):
    """Instance of a user-defined class. """
    def __init__(self, cls):
        assert isinstance(cls, Class)
        Base.__init__(self, cls, {})

class Class(Base):
    """ A User-defined class. """
    def __init__(self, name, base_class, fields, metaclass):
        Base.__init__(self, metaclass, fields)
        self.name = name
        self.base_class = base_class

You may have noticed that classes are still a special type of object; they indirectly inherit from Base. Therefore, classes are also special instances of a special class, which is called a metaclass.

Now, we can successfully pass our first set of tests. However, we have not yet defined instances of Type and OBJECT. For these, we will not build according to Smalltalk's object model, as it is too complex for us. Instead, we will adopt the type system from ObjVlisp1, from which Python's type system has absorbed many elements.

In ObjVlisp's object model, OBJECT and TYPE are intertwined. OBJECT is the parent class of all classes, meaning OBJECT has no parent class. TYPE is a subclass of OBJECT. Generally, every class is an instance of TYPE. In specific cases, both TYPE and OBJECT are instances of TYPE. However, programmers can derive a class from TYPE to serve as a metaclass:

# set up the base hierarchy as in Python (the ObjVLisp model)
# the ultimate base class is OBJECT
OBJECT = Class(name="object", base_class=None, fields={}, metaclass=None)
# TYPE is a subclass of OBJECT
TYPE = Class(name="type", base_class=OBJECT, fields={}, metaclass=None)
# TYPE is an instance of itself
TYPE.cls = TYPE
# OBJECT is an instance of TYPE
OBJECT.cls = TYPE

To write a new metaclass, we need to derive from TYPE. However, in this article, we will not do this; we will only use TYPE as the metaclass for each of our classes.

Figure 14.1 - Inheritance

Now, the first set of tests has been completely passed. Let's take a look at the second set of tests, where we will test whether object property reading and writing works correctly. This code is also straightforward to write.

def test_read_write_field_class():
    # classes are objects too
    # Python code
    class A(object):
        pass
    A.a = 1
    assert A.a == 1
    A.a = 6
    assert A.a == 6

    # Object model code
    A = Class(name="A", base_class=OBJECT, fields={"a": 1}, metaclass=TYPE)
    assert A.read_attr("a") == 1
    A.write_attr("a", 5)
    assert A.read_attr("a") == 5

isinstance Check#

So far, we have not utilized the characteristic of objects having classes. The next test code will automatically implement isinstance.

def test_isinstance():
    # Python code
    class A(object):
        pass
    class B(A):
        pass
    b = B()
    assert isinstance(b, B)
    assert isinstance(b, A)
    assert isinstance(b, object)
    assert not isinstance(b, type)

    # Object model code
    A = Class(name="A", base_class=OBJECT, fields={}, metaclass=TYPE)
    B = Class(name="B", base_class=A, fields={}, metaclass=TYPE)
    b = Instance(B)
    assert b.isinstance(B)
    assert b.isinstance(A)
    assert b.isinstance(OBJECT)
    assert not b.isinstance(TYPE)

We can determine whether the obj object is an instance of certain classes cls by checking if cls is the class of obj or one of its superclasses. To check if a class is a superclass of another class, we can see if the class exists in the superclass chain. This chain, which includes both the superclass and the class itself, is called the method resolution order (MRO). It can be easily computed recursively:

 class Class(Base):
     ...

     def method_resolution_order(self):
         """ compute the method resolution order of the class """
         if self.base_class is None:
             return [self]
         else:
             return [self] + self.base_class.method_resolution_order()

     def issubclass(self, cls):
         """ is self a subclass of cls? """
         return cls in self.method_resolution_order()

After modifying the code, the tests can pass successfully.

Method Invocation#

The object model established earlier is still missing the important feature of method invocation. In this chapter, we will establish a simple inheritance model.

def test_callmethod_simple():
    # Python code
    class A(object):
        def f(self):
            return self.x + 1
    obj = A()
    obj.x = 1
    assert obj.f() == 2

    class B(A):
        pass
    obj = B()
    obj.x = 1
    assert obj.f() == 2 # works on subclass too

    # Object model code
    def f_A(self):
        return self.read_attr("x") + 1
    A = Class(name="A", base_class=OBJECT, fields={"f": f_A}, metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("x", 1)
    assert obj.callmethod("f") == 2

    B = Class(name="B", base_class=A, fields={}, metaclass=TYPE)
    obj = Instance(B)
    obj.write_attr("x", 2)
    assert obj.callmethod("f") == 3

To find the correct implementation for calling object methods, we now begin discussing the method resolution order for class objects. The first method found in the class object dictionary according to the MRO will be called:

class Class(Base):
    ...

    def _read_from_class(self, methname):
        for cls in self.method_resolution_order():
            if methname in cls._fields:
                return cls._fields[methname]
        return MISSING

After completing the implementation of callmethod in the Base class, the tests can pass.

To ensure that function parameters are passed correctly and to ensure that our previous code can achieve method overloading, we can write the following test code, which passes the tests perfectly:

def test_callmethod_subclassing_and_arguments():
    # Python code
    class A(object):
        def g(self, arg):
            return self.x + arg
    obj = A()
    obj.x = 1
    assert obj.g(4) == 5

    class B(A):
        def g(self, arg):
            return self.x + arg * 2
    obj = B()
    obj.x = 4
    assert obj.g(4) == 12

    # Object model code
    def g_A(self, arg):
        return self.read_attr("x") + arg
    A = Class(name="A", base_class=OBJECT, fields={"g": g_A}, metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("x", 1)
    assert obj.callmethod("g", 4) == 5

    def g_B(self, arg):
        return self.read_attr("x") + arg * 2
    B = Class(name="B", base_class=A, fields={"g": g_B}, metaclass=TYPE)
    obj = Instance(B)
    obj.write_attr("x", 4)
    assert obj.callmethod("g", 4) == 12

Basic Property Model#

Now that the simplest version of the object model is operational, we need to continue improving it. This section will introduce the differences between the basic method model and the basic property model. This is also the core difference among Smalltalk, Ruby, JavaScript, Python, and Lua.

The basic method model will call methods in the most primitive way:

result = obj.f(arg1, arg2)

The basic property model will divide the calling process into two steps: finding the property and returning the execution result:

    method = obj.f
    result = method(arg1, arg2)

You can experience the differences mentioned above in the following tests:

def test_bound_method():
    # Python code
    class A(object):
        def f(self, a):
            return self.x + a + 1
    obj = A()
    obj.x = 2
    m = obj.f
    assert m(4) == 7

    class B(A):
        pass
    obj = B()
    obj.x = 1
    m = obj.f
    assert m(10) == 12 # works on subclass too

    # Object model code
    def f_A(self, a):
        return self.read_attr("x") + a + 1
    A = Class(name="A", base_class=OBJECT, fields={"f": f_A}, metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("x", 2)
    m = obj.read_attr("f")
    assert m(4) == 7

    B = Class(name="B", base_class=A, fields={}, metaclass=TYPE)
    obj = Instance(B)
    obj.write_attr("x", 1)
    m = obj.read_attr("f")
    assert m(10) == 12

We can set up property calls in the same way as we did for method calls, but some changes occur here. First, we will look for the method name corresponding to the function name in the object. This lookup process results in what is called a bound method, which is a special object that binds the method to a specific object. This bound method will then be called in the subsequent operations.

To implement this operation, we need to modify the implementation of Base.read_attr. If the corresponding property is not found in the instance dictionary, we need to look it up in the class dictionary. If the property is found in the class dictionary, we will perform the method binding operation. We can easily simulate the binding method using a closure. Besides modifying the implementation of Base.read_attr, we can also modify the Base.callmethod method to ensure our code passes the tests.

class Base(object):
    ...
    def read_attr(self, fieldname):
        """ read field 'fieldname' out of the object """
        result = self._read_dict(fieldname)
        if result is not MISSING:
            return result
        result = self.cls._read_from_class(fieldname)
        if _is_bindable(result):
            return _make_boundmethod(result, self)
        if result is not MISSING:
            return result
        raise AttributeError(fieldname)

    def callmethod(self, methname, *args):
        """ call method 'methname' with arguments 'args' on object """
        meth = self.read_attr(methname)
        return meth(*args)

def _is_bindable(meth):
    return callable(meth)

def _make_boundmethod(meth, self):
    def bound(*args):
        return meth(self, *args)
    return bound

The rest of the code does not need to be modified.

Metaclass Protocol#

In addition to regular class methods, many dynamic languages also support special methods. These methods are called by the object system rather than through regular calls. In Python, you can see these methods' names start and end with two underscores, such as __init__. Special methods can be used to overload some regular operations and provide custom functionality. Therefore, their existence can inform the object model how to automatically handle different tasks. For related special methods in Python, you can refer to this documentation.

The concept of the metaclass protocol was introduced by Smalltalk and is widely used in object models of general Lisp languages like CLOS. This concept includes a collection of special methods (Note: The reference to coined3 here is unclear; please refer to the proofreader for assistance).

In this chapter, we will add three metacall operations to our object model. They will be used to provide finer control over our operations of reading and modifying objects. The first two methods we need to add are __getattr__ and __setattr__, which look quite similar to the method names of functions with the same functionality in Python.

Custom Property Read/Write Operations#

The __getattr__ method will be called when a property cannot be found through regular means, in other words, when the corresponding property cannot be found in the instance dictionary, class dictionary, superclass dictionary, etc. We will pass the name of the property being searched for as a parameter to this method. In early Smalltalk, this method was called doesNotUnderstand:.

Things may change a bit with __setattr__. First, we need to clarify that setting a property usually means we need to create it, and at this time, the __setattr__ method is typically triggered when setting a property. To ensure the existence of __setattr__, we need to implement the __setattr__ method in the OBJECT object. This basic implementation completes the operation of writing properties to the corresponding dictionary. This allows users to delegate their defined __setattr__ to the OBJECT.__setattr__ method.

Here are the test cases for these two special methods:

def test_getattr():
    # Python code
    class A(object):
        def __getattr__(self, name):
            if name == "fahrenheit":
                return self.celsius * 9. / 5. + 32
            raise AttributeError(name)

        def __setattr__(self, name, value):
            if name == "fahrenheit":
                self.celsius = (value - 32) * 5. / 9.
            else:
                # call the base implementation
                object.__setattr__(self, name, value)
    obj = A()
    obj.celsius = 30
    assert obj.fahrenheit == 86 # test __getattr__
    obj.celsius = 40
    assert obj.fahrenheit == 104

    obj.fahrenheit = 86 # test __setattr__
    assert obj.celsius == 30
    assert obj.fahrenheit == 86

    # Object model code
    def __getattr__(self, name):
        if name == "fahrenheit":
            return self.read_attr("celsius") * 9. / 5. + 32
        raise AttributeError(name)
    def __setattr__(self, name, value):
        if name == "fahrenheit":
            self.write_attr("celsius", (value - 32) * 5. / 9.)
        else:
            # call the base implementation
            OBJECT.read_attr("__setattr__")(self, name, value)

    A = Class(name="A", base_class=OBJECT,
              fields={"__getattr__": __getattr__, "__setattr__": __setattr__},
              metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("celsius", 30)
    assert obj.read_attr("fahrenheit") == 86 # test __getattr__
    obj.write_attr("celsius", 40)
    assert obj.read_attr("fahrenheit") == 104
    obj.write_attr("fahrenheit", 86) # test __setattr__
    assert obj.read_attr("celsius") == 30
    assert obj.read_attr("fahrenheit") == 86

To pass the tests, we need to modify the Base.read_attr and Base.write_attr methods:

class Base(object):
    ...

    def read_attr(self, fieldname):
        """ read field 'fieldname' out of the object """
        result = self._read_dict(fieldname)
        if result is not MISSING:
            return result
        result = self.cls._read_from_class(fieldname)
        if _is_bindable(result):
            return _make_boundmethod(result, self)
        if result is not MISSING:
            return result
        meth = self.cls._read_from_class("__getattr__")
        if meth is not MISSING:
            return meth(self, fieldname)
        raise AttributeError(fieldname)

    def write_attr(self, fieldname, value):
        """ write field 'fieldname' into the object """
        meth = self.cls._read_from_class("__setattr__")
        return meth(self, fieldname, value)

The process of obtaining properties becomes a call to the __getattr__ method, passing the field name as a parameter. If the field does not exist, an exception will be raised. Note that __getattr__ can only be called in the class (the same applies to special methods in Python), and care must be taken to avoid recursive calls like self.read_attr("__getattr__"), as this would cause infinite recursion if the __getattr__ method is not defined.

Modification operations for properties will also be executed by the __setattr__ method, just like reading. To ensure that this method can execute normally, OBJECT needs to implement the default behavior of __setattr__, for example:

def OBJECT__setattr__(self, fieldname, value):
    self._write_dict(fieldname, value)
OBJECT = Class("object", None, {"__setattr__": OBJECT__setattr__}, None)

The specific implementation of OBJECT.__setattr__ is similar to the previous implementation of the write_attr method. After completing these modifications, we can successfully pass our tests.

Descriptor Protocol#

In the tests above, we frequently switched between different temperature scales. It must be said that it can be quite tedious to check the property names used during modification operations in __getattr__ and __setattr__. To solve this problem, Python introduced the concept of the descriptor protocol.

We will obtain specific properties from the __getattr__ and __setattr__ methods, and the descriptor protocol will trigger a special method when the property call process ends and returns a result. The descriptor protocol can be seen as a special means of binding classes and methods, and we can use the descriptor protocol to complete the binding of methods to objects. In addition to binding methods, one of the most important use cases for descriptors in Python is staticmethod, classmethod, and property.

In the following text, we will introduce how to use descriptors for object binding. We can achieve this by using the __get__ method, as shown in the test code below:

def test_get():
    # Python code
    class FahrenheitGetter(object):
        def __get__(self, inst, cls):
            return inst.celsius * 9. / 5. + 32

    class A(object):
        fahrenheit = FahrenheitGetter()
    obj = A()
    obj.celsius = 30
    assert obj.fahrenheit == 86

    # Object model code
    class FahrenheitGetter(object):
        def __get__(self, inst, cls):
            return inst.read_attr("celsius") * 9. / 5. + 32

    A = Class(name="A", base_class=OBJECT,
              fields={"fahrenheit": FahrenheitGetter()},
              metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("celsius", 30)
    assert obj.read_attr("fahrenheit") == 86

The __get__ method will be called by the FahrenheitGetter instance after the property lookup is complete. The parameters passed to __get__ are the instance at the end of the lookup process.

Implementing this functionality is quite simple; we can easily modify the _is_bindable and _make_boundmethod methods:

def _is_bindable(meth):
    return hasattr(meth, "__get__")

def _make_boundmethod(meth, self):
    return meth.__get__(self, None)

Now, this simple modification ensures that we pass the tests. The previous tests regarding method binding can also pass. In Python, after the execution of the __get__ method, a bound method object will be returned.

In practice, the descriptor protocol does appear to be quite complex. It also includes a __set__ method for setting properties. Additionally, the version you see implemented here is somewhat simplified. Note that the previous _make_boundmethod method calls __get__ as an implementation-level operation, rather than using meth.read_attr('__get__'). This is necessary because our object model borrows functions and methods from Python rather than showcasing Python's object model. Further refining the model could effectively address this issue.

Instance Optimization#

The establishment of the object model in the first three parts involved many behavioral changes, while the final part of optimization will not involve behavioral changes. This optimization method is called map, which is widely used in self-booting language virtual machines. It is one of the most important optimization techniques for object models, applied in PyPy and modern JavaScript virtual machines like V8 (in V8, this method is referred to as hidden classes).

This optimization technique is based on the observation that in the object model we have implemented so far, all instances use a complete dictionary to store their properties. Dictionaries are implemented based on hash tables, which can consume a lot of memory. Often, instances of the same class will have the same properties. For example, a class Point may have instances that all contain the same properties x and y.

The Map optimization takes advantage of this fact. It will split each instance's dictionary into two parts. One part stores property names that can be shared among all instances. The other part only stores a reference to the Map generated from the first part and the actual values. The map that stores property names will serve as an index for the values.

We will write some test cases for the above requirements, as follows:

def test_maps():
    # white box test inspecting the implementation
    Point = Class(name="Point", base_class=OBJECT, fields={}, metaclass=TYPE)
    p1 = Instance(Point)
    p1.write_attr("x", 1)
    p1.write_attr("y", 2)
    assert p1.storage == [1, 2]
    assert p1.map.attrs == {"x": 0, "y": 1}

    p2 = Instance(Point)
    p2.write_attr("x", 5)
    p2.write_attr("y", 6)
    assert p1.map is p2.map
    assert p2.storage == [5, 6]

    p1.write_attr("x", -1)
    p1.write_attr("y", -2)
    assert p1.map is p2.map
    assert p1.storage == [-1, -2]

    p3 = Instance(Point)
    p3.write_attr("x", 100)
    p3.write_attr("z", -343)
    assert p3.map is not p1.map
    assert p3.map.attrs == {"x": 0, "z": 1}

Note that the style of the test code here looks somewhat different from our previous test code. Previously, all tests were conducted through the interfaces that had been implemented to test the functionality of the classes. Here, the tests inspect the internal properties of the classes and compare them with preset values. This testing method is known as white-box testing.

The map containing attrs in p1 stores the properties x and y, with values 0 and 1 stored in p1, respectively. Then, when creating the second instance p2, the same properties are added to the same map in the same way. In other words, if different properties are added, the corresponding map will not be shared.

The Map class looks like this:

class Map(object):
    def __init__(self, attrs):
        self.attrs = attrs
        self.next_maps = {}

    def get_index(self, fieldname):
        return self.attrs.get(fieldname, -1)

    def next_map(self, fieldname):
        assert fieldname not in self.attrs
        if fieldname in self.next_maps:
            return self.next_maps[fieldname]
        attrs = self.attrs.copy()
        attrs[fieldname] = len(attrs)
        result = self.next_maps[fieldname] = Map(attrs)
        return result

EMPTY_MAP = Map({})

The Map class has two methods: get_index and next_map. The former is used to look up the index of the corresponding property name in the object's storage space. The latter should be used when new properties are added to the object. In this case, different instances will need to calculate different mappings using next_map. This method will use next_maps to check for existing mappings. Thus, similar instances will use similar Map objects.

image

Figure 14.2 - Map transitions

The implementation of Instance using map is as follows:

class Instance(Base):
    """Instance of a user-defined class. """

    def __init__(self, cls):
        assert isinstance(cls, Class)
        Base.__init__(self, cls, None)
        self.map = EMPTY_MAP
        self.storage = []   

    def _read_dict(self, fieldname):
        index = self.map.get_index(fieldname)
        if index == -1:
            return MISSING
        return self.storage[index]

    def _write_dict(self, fieldname, value):
        index = self.map.get_index(fieldname)
        if index != -1:
            self.storage[index] = value
        else:
            new_map = self.map.next_map(fieldname)
            self.storage.append(value)
            self.map = new_map

Now this class will pass None as the field dictionary to the Base class because Instance will construct the storage dictionary in a different way. Thus, it needs to override _read_dict and _write_dict. In practical operations, we will refactor the Base class so that it no longer handles the storage of field dictionaries. However, for now, passing None as a parameter is sufficient.

When a new instance is created, it uses EMPTY_MAP, which contains no objects. After implementing _read_dict, we will look up the index of the property name from the instance's map, and then map it to the corresponding storage table.

Writing data to the field dictionary is divided into two situations. The first is modifying the value of an existing property, which simply modifies the corresponding value in the mapped list. If the corresponding property does not exist, then a map transformation will occur (as shown in the diagram above), and the next_map method will be called, followed by storing the new value in the storage list.

You may wonder what this optimization method actually optimizes. Generally speaking, it can significantly optimize memory usage when there are many instances with similar structures. However, remember that this is not a universal optimization method. Sometimes, when the code is filled with instances of different structures, this method may consume even more space.

This is a common issue in dynamic language optimization. Generally, it is unlikely to find a one-size-fits-all method to optimize code for speed and space efficiency. Therefore, specific situations should be analyzed, and optimization methods should be chosen accordingly.

An interesting point about the Map optimization is that although it only consumes memory, it can also significantly improve program performance when the VM uses JIT technology. To achieve this, JIT technology uses mappings to find the offset of properties in the storage space, completely eliminating the need for dictionary lookups.

Potential Extensions#

Extending our object model and introducing design choices from different languages is quite easy. Here are some possible directions:

  • The simplest is to add more special methods, such as __init__, __getattribute__, and __set__, which are very easy and interesting to implement.

  • Extend the model to support multiple inheritance. To achieve this, each class would need a list of parent classes. Then, Class.method_resolution_order would need to be modified to support method lookups. A simple MRO calculation rule could use depth-first principles, while a more complex one could adopt the C3 algorithm, which better handles issues arising from diamond inheritance structures.

  • A more radical idea is to switch to a prototype model, which would require eliminating the distinction between classes and instances.

Conclusion#

The core of object-oriented programming language design lies in the details of its object model. Writing a simple object model is a straightforward and enjoyable task. You can use this approach to understand how existing languages work and delve into the design principles of object-oriented languages. Writing different object models to validate various design philosophies is a fantastic method. You no longer need to focus on other trivial matters, such as parsing and executing code.

The work of writing object models is also very useful in practice. Besides serving as experimental tools, they can also be used by other languages. There are many examples of this: for instance, the GObject model, written in C, is used in GLib and other Gnome projects, as well as various object models implemented in JavaScript.

References#

  1. P. Cointe, “Metaclasses are first class: The ObjVlisp Model,” SIGPLAN Not, vol. 22, no. 12, pp. 156–162, 1987.↩

  2. It seems that the attribute-based model is conceptually more complex because it needs both method lookup and call. In practice, calling something is defined by looking up and calling a special attribute __call__, so conceptual simplicity is regained. This won't be implemented in this chapter, however.↩

  3. G. Kiczales, J. des Rivieres, and D. G. Bobrow, The Art of the Metaobject Protocol. Cambridge, Mass: The MIT Press, 1991.↩

  4. A. Goldberg, Smalltalk-80: The Language and its Implementation. Addison-Wesley, 1983, page 61.↩

  5. In Python, the second argument is the class where the attribute was found, though we will ignore that here.↩

  6. C. Chambers, D. Ungar, and E. Lee, “An efficient implementation of SELF, a dynamically-typed object-oriented language based on prototypes,” in OOPSLA, 1989, vol. 24.↩

  7. How that works is beyond the scope of this chapter. I tried to give a reasonably readable account of it in a paper I wrote a few years ago. It uses an object model that is basically a variant of the one in this chapter: C. F. Bolz, A. Cuni, M. Fijałkowski, M. Leuschel, S. Pedroni, and A. Rigo, “Runtime feedback in a meta-tracing JIT for efficient dynamic languages,” in Proceedings of the 6th Workshop on Implementation, Compilation, Optimization of Object-Oriented Languages, Programs and Systems, New York, NY, USA, 2011, pp. 9:1–9:8.↩

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.