Understanding how the Python garbage collector works
As most of you probably know, Python is a dynamic programming language with different implementations. The CPython implementation manages the memory using an implementation of Reference Counting and Generational Garbage Collector. It is important to mention that other implementations of Python like PyPy, IronPython,… can use different strategies.
Did you know those strategies were used for memory allocation? Do you know whether it’s possible to disable the garbage collector or not?
From version 2, Python started to use two different strategies for memory allocation reference counting and generation garbage collection. Prior to that, the only strategy used was reference counting.
Reference Counting
In this technique, it keeps the counting of the references to an object. When a new reference is created, the counter gets incremented by one, when we remove a reference the counter decrements by one.
Of course, every object created in Python needs to keep the counter updated continuously. In case the reference counter is 0, the object is eligible for being garbage collected.
Let’s create three references to the object “my object” and check the reference count of the object.
>>> import sys >>> a = "my object" >>> b = a >>> c = a >>> id(a) 4377801904 >>> id(b) 4377801904 >>> id(c) 4377801904 >>> sys.getrefcount(a) 4
id(…) shows up the unique integer that represents the object of the reference. And sys.getrefcount(a) returns the reference count of the object (“my object”). The count returned is generally one higher than we expected, because it includes the temporary reference as an argument to getrefcount().
Below there is a representation of the above code.
If we remove one reference, the counter gets decremented by one.
>>> del(c) >>> sys.getrefcount(a) 3
If we remove all the references, the counter will be 0 and the object will be eligible for the garbage collector to erase it.
Something curious is that the common values had more references count than I expected. This is because others reference them at the start-up of the interpreter. For example, I created a reference to 1 and found out a few hundred references count to the object. My recommendation is to create a special number or string, that will help you to understand the use of the getrefcount(…) method.
>>> h = 1 >>> sys.getrefcount(h) 601 >>> h = 3.14151692 >>> sys.getrefcount(h) 2
Also, if you create two objects with the same value, they don’t get the same ID, because they are not the same object. You can check their unique IDs and the reference count of the objects.
>>> a = 1234 >>> b = 1234 >>> id(a) 4484904240 >>> id(b) 4484904080 >>> sys.getrefcount(b) 2 >>> sys.getrefcount(a) 2
A benefit of using reference count is the eligibility to erase an object from memory as soon as it has no references.
It also has some drawbacks. It can be really inefficient, particularly in a naive multi-threaded implementation. And it is not able to handle objects with circular references. For those cases, Python applies a second algorithm called generational garbage collection.
Generational Garbage Collection
This algorithm divides the objects into different generations based on time allocation. And it can apply different policies to each generation.
Python creates three generations at the start-up of the application. New objects go to the first generation, if they survive the recollection, the algorithm moves them to the second generation. The same will happen in this generation, the objects are collected or moved to the third generation. In that generation, the objects will stay until the program ends.
Each generation has a threshold, when the list of objects exceeds the threshold, Python runs the garbage collection process.
One of the drawbacks of this technique is that usually fails to remove long-living garbage, although they do a good job with the newest objects.
Is it possible to disable the garbage collector in Python?
It is possible to disable the second algorithm, the generational garbage collector, but it is not possible to disable the reference count algorithm.
Below there are a few methods from the gc module that can help you.
>>> import gc >>> gc.isenabled() True >>> gc.disable() >>> gc.isenabled() False
Disabling Python generational garbage collector will not show you less memory use in your application, because Python generally doesn’t release memory back to the underlying operating system.
In case you want to dive deep into disabling the garbage collector, I recommend you to take a look at the post of the Instagram Engineering team, they were doing some experiments with the garbage collector and discovered some side effects with the disable() method for some third-party libraries.
Conclusion
Python uses two strategies for memory management, reference counting and generational garbage collector for cyclical dependencies. The second one is an optional garbage collector that is possible to disable. It is possible to take a look at the reference count of the objects, change the thresholds of the generations, and a few things more. I recommend you to take a look at the gc module, the sys module, or the garbage collector design documentation.