Internals of python Generator

Suman Shil
3 min readOct 28, 2021

I have recently started working in python after coding in java for more than a decade. I have come across generator which I found interesting as this is a unique feature which other languages like Golang, Java, Rust is missing. Generator is a useful mechanism when iterating over a large collection which we don’t want to load in memory. Unlike normal function, generators can pause and resume execution. Generator only loads the element that needs to be returned in memory.

How is a generator defined

When a function uses “yield” statement, it is considered as a generator. Consider the following example.

How is a generator different from a function

Earlier we mentioned that generators can pause and resume execution which is not possible in a function. When python compiles a generator function “gen()” to bytecode and it sees the “yield” statement, it knows that it is a generator function and not a regular one. It sets a flag to remember this fact

Here is the properties of a generator object

>>> dir(g)

[‘__class__’, ‘__del__’, ‘__delattr__’, ‘__dir__’, ‘__doc__’, ‘__eq__’, ‘__format__’, ‘__ge__’, ‘__getattribute__’, ‘__gt__’, ‘__hash__’, ‘__init__’, ‘__init_subclass__’, ‘__iter__’, ‘__le__’, ‘__lt__’, ‘__name__’, ‘__ne__’, ‘__new__’, ‘__next__’, ‘__qualname__’, ‘__reduce__’, ‘__reduce_ex__’, ‘__repr__’, ‘__setattr__’, ‘__sizeof__’, ‘__str__’, ‘__subclasshook__’, ‘close’, ‘gi_code’, ‘gi_frame’, ‘gi_running’, ‘gi_yieldfrom’, ‘send’, ‘throw’]

When we call a generator function, python creates a generator object when it notices that the flag is set.

>>> type(g)

<class ‘generator’>

A Python generator encapsulates a stack frame plus a reference to some code, the body of gen:

All generators from calls to gen point to this same code. But each has its own stack frame. This stack frame is not on any actual stack, it sits in heap memory waiting to be used.

The frame has a “last instruction” pointer, the instruction it executed most recently. In the beginning, the last instruction pointer is -1, meaning the generator has not begun:

>>> g.gi_frame.f_lasti

-1

When we call send, the generator reaches its first yield, and pauses. The return value of send is 1, since that is what g passes to the yield expression:

>>> g.send(None)

1

The generator’s instruction pointer is now 3 bytecodes from the start, part way through the 22 bytes of compiled Python:

>>> g.gi_frame.f_lasti

2

Other generators created from gen will have their own stack frames and local variables.

When we call send again, the generator continues from its second yield, and when it completes all the “yield” statements, it finishes by raising the special StopIteration exception:

>>> g.send(“stop”)

Traceback (most recent call last):

File “<stdin>”, line 1, in <module>

StopIteration

If the generator returns any value, it will be part of “StopIteration” stack trace.

Conclusion

Generator is a powerful feature in python that enables efficient iteration of large collection. This is possible as generator object is stored in heap and in stack so that it can maintain state. Generator has been used in other powerful features like coroutine. In future articles we will explore how generator has been used in coroutine and async processing.

--

--

Suman Shil

Software developer, Father, Optimistic, Eternal learner