I heard you know Python?

Preface#

Recently, I felt that Python was too "simple," so I boldly stated in front of Master Chuan: "I think Python is the simplest language in the world!" At that moment, a hint of disdain flashed across Chuan's lips (inner OS: Naive! As a Python developer, I must give you some life experience, or you won't know the heights of heaven and depths of the earth!). So Chuan gave me a perfect score of 100 questions, and this article records the pitfalls I encountered while solving this set of questions.

1. List Comprehensions#

Description#

The following code will raise an error; why?

class A(object):
    x = 1
    gen = (x for _ in xrange(10))  # gen=(x for _ in range(10))


if __name__ == "__main__":
    print(list(A.gen))

Answer#

The issue is related to variable scope. In gen=(x for _ in xrange(10)), gen is a generator, and within the generator, variables have their own scope, isolated from other scopes. Therefore, you will encounter a NameError: name 'x' is not defined. So what is the solution? The answer is: use lambda.

class A(object):
    x = 1
    gen = (lambda x: (x for _ in xrange(10)))(x)  # gen=(x for _ in range(10))


if __name__ == "__main__":
    print(list(A.gen))

Or like this

class A(object):
    x = 1
    gen = (A.x for _ in xrange(10))  # gen=(x for _ in range(10))


if __name__ == "__main__":
    print(list(A.gen))

Supplement#

Thanks to the comments from several users, here is an explanation from the official documentation:
The scope of names defined in a class block is limited to the class block; it does not extend to the code blocks of methods – this includes comprehensions and generator expressions since they are implemented using a function scope. This means that the following will fail:

class A:
    a = 42
    b = list(a + i for i in range(10))

Reference links Python2 Execution-Model, Python3 Execution-Model. It is said that this was a new proposal in PEP 227, and I will further investigate it. Thanks again to commenters @没头脑很着急 @涂伟忠 @Cholerae for their corrections.

2. Decorators#

Description#

I want to write a class decorator to measure the running time of functions/methods.

import time

class Timeit(object):
    def __init__(self, func):
        self._wrapped = func

    def __call__(self, *args, **kws):
        start_time = time.time()
        result = self._wrapped(*args, **kws)
        print("elapsed time is %s " % (time.time() - start_time))
        return result

This decorator can run on regular functions:

@Timeit
def func():
    time.sleep(1)
    return "invoking function func"


if __name__ == '__main__':
    func()  # output: elapsed time is 1.00044410133

But running it on a method will raise an error; why?

class A(object):
    @Timeit
    def func(self):
        time.sleep(1)
        return 'invoking method func'


if __name__ == '__main__':
    a = A()
    a.func()  # Boom!

If I insist on using a class decorator, how should I modify it?

Answer#

When using a class decorator, the corresponding instance is not passed to the __call__ method during the call to func, resulting in an unbound method. So what is the solution? Descriptor to the rescue.

class Timeit(object):
    def __init__(self, func):
        self.func = func

    def __call__(self, *args, **kwargs):
        print('invoking Timer')

    def __get__(self, instance, owner):
        return lambda *args, **kwargs: self.func(instance, *args, **kwargs)

3. Python Calling Mechanism#

Description#

We know that the __call__ method can be used to overload parentheses calls. Good, is the problem that simple? Naive!

class A(object):
    def __call__(self):
        print("invoking __call__ from A!")


if __name__ == "__main__":
    a = A()
    a()  # output: invoking __call__ from A

Now we can see that a() seems equivalent to a.__call__(), looks easy, right? Good, I want to push my luck and write the following code,

a.__call__ = lambda: "invoking __call__ from lambda"
a.__call__()
# output:invoking __call__ from lambda
a()


# output:invoking __call__ from A!

Can the experts explain why a() did not call a.__call__() (This question was raised by USTC senior Wang Zibo)

Answer#

The reason is that in Python, the built-in special methods of new-style classes are isolated from the instance's attribute dictionary. Specifically, you can refer to the official documentation for this situation:

For new-style classes, implicit invocations of special methods are only guaranteed to work correctly if defined on an object’s type, not in the object’s instance dictionary. That behavior is the reason why the following code raises an exception (unlike the equivalent example with old-style classes):

The official documentation also provides an example:

class C(object):
    pass


c = C()
c.__len__ = lambda: 5
len(c)


# Traceback (most recent call last):
#  File "<stdin>", line 1, in <module>
# TypeError: object of type 'C' has no len()

Returning to our example, when we execute a.__call__=lambda:"invoking __call__ from lambda, we indeed added a new item with the key __call__ to a.__dict__, but when we execute a(), because it involves calling a special method, our calling process will not look for attributes in a.__dict__, but will look for attributes in type(a).__dict__. Therefore, the situation described above occurs.

4. Descriptors#

Description#

I want to write an Exam class where the attribute math is an integer in the range [0,100]. If the value assigned is outside this range, an exception should be raised. I decided to use a descriptor to implement this requirement.

class Grade(object):
    def __init__(self):
        self._score = 0

    def __get__(self, instance, owner):
        return self._score

    def __set__(self, instance, value):
        if 0 <= value <= 100:
            self._score = value
        else:
            raise ValueError('grade must be between 0 and 100')


class Exam(object):
    math = Grade()

    def __init__(self, math):
        self.math = math


if __name__ == '__main__':
    niche = Exam(math=90)
    print(niche.math)
    # output : 90
    snake = Exam(math=75)
    print(snake.math)
    # output : 75
    snake.math = 120
    # output: ValueError:grade must be between 0 and 100!

Everything seems normal. However, there is a huge problem; can you try to explain what the problem is?
To solve this problem, I rewrote the Grade descriptor as follows:

class Grad(object):
    def __init__(self):
        self._grade_pool = {}

    def __get__(self, instance, owner):
        return self._grade_pool.get(instance, None)

    def __set__(self, instance, value):
        if 0 <= value <= 100:
            _grade_pool = self.__dict__.setdefault('_grade_pool', {})
            _grade_pool[instance] = value
        else:
            raise ValueError("fuck")

However, this leads to a bigger problem. How can I solve this issue?

Answer#

The first problem is actually quite simple. If you run print(niche.math) again, you will find that the output value is 75. Why is this? This has to do with Python's calling mechanism. When we call an attribute, the order of lookup is to first check the instance's __dict__, and if not found, then check the class dictionary, parent class dictionary, until it is completely not found. Now, back to our problem, we find that in our class Exam, the call process for self.math first looks in the instance's __dict__ after instantiation and does not find it, then checks the class Exam, and finds it there, returning it. This means that all operations on self.math are operations on the class variable math, leading to variable pollution. So how should we solve this? Many might say, well, just set the value in the instance dictionary in the __set__ function.
Is that possible? The answer is clearly no, and the reason involves the mechanism of Python descriptors. Descriptors are special classes that implement the descriptor protocol, which includes three descriptor protocols: __get__, __set__, __delete__, and the __set_name__ method added in Python 3.6. Among them, those that implement __get__ and __set__ / __delete__ / __set_name__ are Data descriptors, while those that only implement __get__ are Non-Data descriptors. So what is the difference? As mentioned earlier, when we call an attribute, the order of lookup is to first check the instance's __dict__, and if not found, then check the class dictionary, parent class dictionary, until it is completely not found. However, this does not consider the descriptor factor. If we take the descriptor factor into account, the correct statement should be **when we call an attribute, the order of lookup is to first check the instance's __dict__, and if not found, then check the class dictionary, parent class dictionary, until it is completely not found. If the attribute in the class instance dictionary is a Data descriptor, then regardless of whether the attribute exists in the instance dictionary, the descriptor protocol will be called unconditionally. If the attribute in the class instance dictionary is a Non-Data descriptor, then the attribute value in the instance dictionary will be called first without triggering the descriptor protocol. If the attribute value does not exist in the instance dictionary, then the Non-Data descriptor protocol will be triggered. Returning to the previous problem, even if we write the specific attribute into the instance dictionary in __set__, since there is a Data descriptor in the class dictionary, calling the math attribute will still trigger the descriptor protocol.
The improved approach uses the uniqueness of dict keys to bind specific values to instances, but this also brings memory leak issues. So why does this cause memory leaks? First, let's review the characteristics of dict. The most important feature of dict is that any hashable object can be a key. dict ensures the uniqueness of keys by utilizing the uniqueness of hash values (strictly speaking, they are not unique, but the probability of hash collisions is extremely low, so it is approximately considered unique). At the same time (important point), the key references in dict are strong reference types, which can increase the reference count of the corresponding object, potentially causing the object to not be garbage collected, leading to memory leaks. So how can we solve this? There are two methods:
The first:

class Grad(object):
    def __init__(self):
        import weakref
        self._grade_pool = weakref.WeakKeyDictionary()

    def __get__(self, instance, owner):
        return self._grade_pool.get(instance, None)

    def __set__(self, instance, value):
        if 0 <= value <= 100:
            _grade_pool = self.__dict__.setdefault('_grade_pool', {})
            _grade_pool[instance] = value
        else:
            raise ValueError("fuck")

The WeakKeyDictionary from the weakref library uses weak references for the keys of the dictionary, which will not increase the reference count of the objects, thus preventing memory leaks. Similarly, if we want to avoid strong references for the values, we can use WeakValueDictionary.
The second: In Python 3.6, a new protocol was added for descriptors through PEP 487, which we can use to bind to the corresponding object:

class Grad(object):
    def __get__(self, instance, owner):
        return instance.__dict__[self.key]

    def __set__(self, instance, value):
        if 0 <= value <= 100:
            instance.__dict__[self.key] = value
        else:
            raise ValueError("fuck")

    def __set_name__(self, owner, name):
        self.key = name

This question involves a lot of content. Here are some reference links: invoking-descriptors, Descriptor HowTo Guide, PEP 487, what's new in Python 3.6.

5. Python Inheritance Mechanism#

Description#

What is the output of the following code?

class Init(object):
    def __init__(self, value):
        self.val = value


class Add2(Init):
    def __init__(self, val):
        super(Add2, self).__init__(val)
        self.val += 2


class Mul5(Init):
    def __init__(self, val):
        super(Mul5, self).__init__(val)
        self.val *= 5


class Pro(Mul5, Add2):
    pass


class Incr(Pro):
    csup = super(Pro)

    def __init__(self, val):
        self.csup.__init__(val)
        self.val += 1


p = Incr(5)
print(p.val)

Answer#

The output is 36. For more details, refer to New-style Classes, multiple-inheritance.

6. Python Special Methods#

Description#

I wrote a class that implements the singleton pattern by overloading the __new__ method.

class Singleton(object):
    _instance = None

    def __new__(cls, *args, **kwargs):
        if cls._instance:
            return cls._instance
        cls._instance = cv = object.__new__(cls, *args, **kwargs)
        return cv


sin1 = Singleton()
sin2 = Singleton()
print(sin1 is sin2)
# output: True

Now I have a bunch of classes that need to be implemented as singletons, so I plan to write a metaclass to reuse the code:

class SingleMeta(type):
    def __init__(cls, name, bases, dict):
        cls._instance = None
        __new__o = cls.__new__

        def __new__(cls, *args, **kwargs):
            if cls._instance:
                return cls._instance
            cls._instance = cv = __new__o(cls, *args, **kwargs)
            return cv

        cls.__new__ = __new__


class A(object):
    __metaclass__ = SingleMeta


a1 = A()  # what’s the fuck

Oh no, why does this throw an error? I previously used this method to patch __getattribute__, and the following code can capture all attribute calls and print parameters.

class TraceAttribute(type):
    def __init__(cls, name, bases, dict):
        __getattribute__o = cls.__getattribute__

        def __getattribute__(self, *args, **kwargs):
            print('__getattribute__:', args, kwargs)
            return __getattribute__o(self, *args, **kwargs)

        cls.__getattribute__ = __getattribute__


class A(object):  # In Python 3, it is class A(object, metaclass=TraceAttribute):
    __metaclass__ = TraceAttribute
    a = 1
    b = 2


a = A()
a.a
# output: __getattribute__:('a',){}
a.b

Please explain why patching __getattribute__ works while patching __new__ fails. If I insist on using a metaclass to patch __new__ to implement the singleton pattern, how should I modify it?

Answer#

This is actually the most frustrating point. The __new__ method in the class is a staticmethod, so when replacing it, it must be replaced as a staticmethod. The answer is as follows:

class SingleMeta(type):
    def __init__(cls, name, bases, dict):
        cls._instance = None
        __new__o = cls.__new__

        @staticmethod
        def __new__(cls, *args, **kwargs):
            if cls._instance:
                return cls._instance
            cls._instance = cv = __new__o(cls, *args, **kwargs)
            return cv

        cls.__new__ = __new__


class A(object):
    __metaclass__ = SingleMeta


print(A() is A())  # output: True

Conclusion#

Thanks to Master for a set of questions that opened the door to a new world for me. Well, I can't tag anyone on the blog, so I can only convey my feelings. To be honest, Python's dynamic features allow it to implement many "black magic" functionalities comfortably, but this also makes our mastery of language features and pitfalls more rigorous. I hope all Pythoners read the official documentation in their spare time and reach the realm of being as impressive as the wind, always accompanying me.