Preface#
Recently, I felt that Python was too "simple," so I boldly stated in front of Master Chuan: "I think Python is the simplest language in the world!" At that moment, a hint of disdain flashed across Chuan's lips (inner OS: Naive! As a Python developer, I must give you some life experience, or you won't know the heights of heaven and depths of the earth!). So Chuan gave me a perfect score of 100 questions, and this article records the pitfalls I encountered while solving this set of questions.
1. List Comprehensions#
Description#
The following code will raise an error; why?
class A(object):
x = 1
gen = (x for _ in xrange(10)) # gen=(x for _ in range(10))
if __name__ == "__main__":
print(list(A.gen))
Answer#
The issue is related to variable scope. In gen=(x for _ in xrange(10))
, gen
is a generator
, and within the generator
, variables have their own scope, isolated from other scopes. Therefore, you will encounter a NameError: name 'x' is not defined
. So what is the solution? The answer is: use lambda.
class A(object):
x = 1
gen = (lambda x: (x for _ in xrange(10)))(x) # gen=(x for _ in range(10))
if __name__ == "__main__":
print(list(A.gen))
Or like this
class A(object):
x = 1
gen = (A.x for _ in xrange(10)) # gen=(x for _ in range(10))
if __name__ == "__main__":
print(list(A.gen))
Supplement#
Thanks to the comments from several users, here is an explanation from the official documentation:
The scope of names defined in a class block is limited to the class block; it does not extend to the code blocks of methods – this includes comprehensions and generator expressions since they are implemented using a function scope. This means that the following will fail:
class A:
a = 42
b = list(a + i for i in range(10))
Reference links Python2 Execution-Model, Python3 Execution-Model. It is said that this was a new proposal in PEP 227, and I will further investigate it. Thanks again to commenters @没头脑很着急 @涂伟忠 @Cholerae for their corrections.
2. Decorators#
Description#
I want to write a class decorator to measure the running time of functions/methods.
import time
class Timeit(object):
def __init__(self, func):
self._wrapped = func
def __call__(self, *args, **kws):
start_time = time.time()
result = self._wrapped(*args, **kws)
print("elapsed time is %s " % (time.time() - start_time))
return result
This decorator can run on regular functions:
@Timeit
def func():
time.sleep(1)
return "invoking function func"
if __name__ == '__main__':
func() # output: elapsed time is 1.00044410133
But running it on a method will raise an error; why?
class A(object):
@Timeit
def func(self):
time.sleep(1)
return 'invoking method func'
if __name__ == '__main__':
a = A()
a.func() # Boom!
If I insist on using a class decorator, how should I modify it?
Answer#
When using a class decorator, the corresponding instance is not passed to the __call__
method during the call to func
, resulting in an unbound method
. So what is the solution? Descriptor to the rescue.
class Timeit(object):
def __init__(self, func):
self.func = func
def __call__(self, *args, **kwargs):
print('invoking Timer')
def __get__(self, instance, owner):
return lambda *args, **kwargs: self.func(instance, *args, **kwargs)
3. Python Calling Mechanism#
Description#
We know that the __call__
method can be used to overload parentheses calls. Good, is the problem that simple? Naive!
class A(object):
def __call__(self):
print("invoking __call__ from A!")
if __name__ == "__main__":
a = A()
a() # output: invoking __call__ from A
Now we can see that a()
seems equivalent to a.__call__()
, looks easy, right? Good, I want to push my luck and write the following code,
a.__call__ = lambda: "invoking __call__ from lambda"
a.__call__()
# output:invoking __call__ from lambda
a()
# output:invoking __call__ from A!
Can the experts explain why a()
did not call a.__call__()
(This question was raised by USTC senior Wang Zibo)
Answer#
The reason is that in Python, the built-in special methods of new-style classes are isolated from the instance's attribute dictionary. Specifically, you can refer to the official documentation for this situation:
For new-style classes, implicit invocations of special methods are only guaranteed to work correctly if defined on an object’s type, not in the object’s instance dictionary. That behavior is the reason why the following code raises an exception (unlike the equivalent example with old-style classes):
The official documentation also provides an example:
class C(object):
pass
c = C()
c.__len__ = lambda: 5
len(c)
# Traceback (most recent call last):
# File "<stdin>", line 1, in <module>
# TypeError: object of type 'C' has no len()
Returning to our example, when we execute a.__call__=lambda:"invoking __call__ from lambda
, we indeed added a new item with the key __call__
to a.__dict__
, but when we execute a()
, because it involves calling a special method, our calling process will not look for attributes in a.__dict__
, but will look for attributes in type(a).__dict__
. Therefore, the situation described above occurs.
4. Descriptors#
Description#
I want to write an Exam class where the attribute math is an integer in the range [0,100]. If the value assigned is outside this range, an exception should be raised. I decided to use a descriptor to implement this requirement.
class Grade(object):
def __init__(self):
self._score = 0
def __get__(self, instance, owner):
return self._score
def __set__(self, instance, value):
if 0 <= value <= 100:
self._score = value
else:
raise ValueError('grade must be between 0 and 100')
class Exam(object):
math = Grade()
def __init__(self, math):
self.math = math
if __name__ == '__main__':
niche = Exam(math=90)
print(niche.math)
# output : 90
snake = Exam(math=75)
print(snake.math)
# output : 75
snake.math = 120
# output: ValueError:grade must be between 0 and 100!
Everything seems normal. However, there is a huge problem; can you try to explain what the problem is?
To solve this problem, I rewrote the Grade descriptor as follows:
class Grad(object):
def __init__(self):
self._grade_pool = {}
def __get__(self, instance, owner):
return self._grade_pool.get(instance, None)
def __set__(self, instance, value):
if 0 <= value <= 100:
_grade_pool = self.__dict__.setdefault('_grade_pool', {})
_grade_pool[instance] = value
else:
raise ValueError("fuck")
However, this leads to a bigger problem. How can I solve this issue?
Answer#
-
The first problem is actually quite simple. If you run
print(niche.math)
again, you will find that the output value is75
. Why is this? This has to do with Python's calling mechanism. When we call an attribute, the order of lookup is to first check the instance's__dict__
, and if not found, then check the class dictionary, parent class dictionary, until it is completely not found. Now, back to our problem, we find that in our classExam
, the call process forself.math
first looks in the instance's__dict__
after instantiation and does not find it, then checks the classExam
, and finds it there, returning it. This means that all operations onself.math
are operations on the class variablemath
, leading to variable pollution. So how should we solve this? Many might say, well, just set the value in the instance dictionary in the__set__
function.
Is that possible? The answer is clearly no, and the reason involves the mechanism of Python descriptors. Descriptors are special classes that implement the descriptor protocol, which includes three descriptor protocols:__get__
,__set__
,__delete__
, and the__set_name__
method added in Python 3.6. Among them, those that implement__get__
and__set__
/__delete__
/__set_name__
are Data descriptors, while those that only implement__get__
areNon-Data descriptors
. So what is the difference? As mentioned earlier, when we call an attribute, the order of lookup is to first check the instance's__dict__
, and if not found, then check the class dictionary, parent class dictionary, until it is completely not found. However, this does not consider the descriptor factor. If we take the descriptor factor into account, the correct statement should be **when we call an attribute, the order of lookup is to first check the instance's__dict__
, and if not found, then check the class dictionary, parent class dictionary, until it is completely not found. If the attribute in the class instance dictionary is aData descriptor
, then regardless of whether the attribute exists in the instance dictionary, the descriptor protocol will be called unconditionally. If the attribute in the class instance dictionary is aNon-Data descriptor
, then the attribute value in the instance dictionary will be called first without triggering the descriptor protocol. If the attribute value does not exist in the instance dictionary, then theNon-Data descriptor
protocol will be triggered. Returning to the previous problem, even if we write the specific attribute into the instance dictionary in__set__
, since there is aData descriptor
in the class dictionary, calling themath
attribute will still trigger the descriptor protocol. -
The improved approach uses the uniqueness of
dict
keys to bind specific values to instances, but this also brings memory leak issues. So why does this cause memory leaks? First, let's review the characteristics ofdict
. The most important feature ofdict
is that any hashable object can be a key.dict
ensures the uniqueness of keys by utilizing the uniqueness of hash values (strictly speaking, they are not unique, but the probability of hash collisions is extremely low, so it is approximately considered unique). At the same time (important point), the key references indict
are strong reference types, which can increase the reference count of the corresponding object, potentially causing the object to not be garbage collected, leading to memory leaks. So how can we solve this? There are two methods:
The first:
class Grad(object):
def __init__(self):
import weakref
self._grade_pool = weakref.WeakKeyDictionary()
def __get__(self, instance, owner):
return self._grade_pool.get(instance, None)
def __set__(self, instance, value):
if 0 <= value <= 100:
_grade_pool = self.__dict__.setdefault('_grade_pool', {})
_grade_pool[instance] = value
else:
raise ValueError("fuck")
The WeakKeyDictionary
from the weakref library uses weak references for the keys of the dictionary, which will not increase the reference count of the objects, thus preventing memory leaks. Similarly, if we want to avoid strong references for the values, we can use WeakValueDictionary
.
The second: In Python 3.6, a new protocol was added for descriptors through PEP 487, which we can use to bind to the corresponding object:
class Grad(object):
def __get__(self, instance, owner):
return instance.__dict__[self.key]
def __set__(self, instance, value):
if 0 <= value <= 100:
instance.__dict__[self.key] = value
else:
raise ValueError("fuck")
def __set_name__(self, owner, name):
self.key = name
This question involves a lot of content. Here are some reference links: invoking-descriptors, Descriptor HowTo Guide, PEP 487, what's new in Python 3.6.
5. Python Inheritance Mechanism#
Description#
What is the output of the following code?
class Init(object):
def __init__(self, value):
self.val = value
class Add2(Init):
def __init__(self, val):
super(Add2, self).__init__(val)
self.val += 2
class Mul5(Init):
def __init__(self, val):
super(Mul5, self).__init__(val)
self.val *= 5
class Pro(Mul5, Add2):
pass
class Incr(Pro):
csup = super(Pro)
def __init__(self, val):
self.csup.__init__(val)
self.val += 1
p = Incr(5)
print(p.val)
Answer#
The output is 36. For more details, refer to New-style Classes, multiple-inheritance.
6. Python Special Methods#
Description#
I wrote a class that implements the singleton pattern by overloading the __new__
method.
class Singleton(object):
_instance = None
def __new__(cls, *args, **kwargs):
if cls._instance:
return cls._instance
cls._instance = cv = object.__new__(cls, *args, **kwargs)
return cv
sin1 = Singleton()
sin2 = Singleton()
print(sin1 is sin2)
# output: True
Now I have a bunch of classes that need to be implemented as singletons, so I plan to write a metaclass to reuse the code:
class SingleMeta(type):
def __init__(cls, name, bases, dict):
cls._instance = None
__new__o = cls.__new__
def __new__(cls, *args, **kwargs):
if cls._instance:
return cls._instance
cls._instance = cv = __new__o(cls, *args, **kwargs)
return cv
cls.__new__ = __new__
class A(object):
__metaclass__ = SingleMeta
a1 = A() # what’s the fuck
Oh no, why does this throw an error? I previously used this method to patch __getattribute__
, and the following code can capture all attribute calls and print parameters.
class TraceAttribute(type):
def __init__(cls, name, bases, dict):
__getattribute__o = cls.__getattribute__
def __getattribute__(self, *args, **kwargs):
print('__getattribute__:', args, kwargs)
return __getattribute__o(self, *args, **kwargs)
cls.__getattribute__ = __getattribute__
class A(object): # In Python 3, it is class A(object, metaclass=TraceAttribute):
__metaclass__ = TraceAttribute
a = 1
b = 2
a = A()
a.a
# output: __getattribute__:('a',){}
a.b
Please explain why patching __getattribute__
works while patching __new__
fails. If I insist on using a metaclass to patch __new__
to implement the singleton pattern, how should I modify it?
Answer#
This is actually the most frustrating point. The __new__
method in the class is a staticmethod
, so when replacing it, it must be replaced as a staticmethod
. The answer is as follows:
class SingleMeta(type):
def __init__(cls, name, bases, dict):
cls._instance = None
__new__o = cls.__new__
@staticmethod
def __new__(cls, *args, **kwargs):
if cls._instance:
return cls._instance
cls._instance = cv = __new__o(cls, *args, **kwargs)
return cv
cls.__new__ = __new__
class A(object):
__metaclass__ = SingleMeta
print(A() is A()) # output: True
Conclusion#
Thanks to Master for a set of questions that opened the door to a new world for me. Well, I can't tag anyone on the blog, so I can only convey my feelings. To be honest, Python's dynamic features allow it to implement many "black magic" functionalities comfortably, but this also makes our mastery of language features and pitfalls more rigorous. I hope all Pythoners read the official documentation in their spare time and reach the realm of being as impressive as the wind, always accompanying me.