Monday, January 22, 2018

Understanding the self variable in Python classes

When defining a class in Python you will probably have encountered the self variable. Here is a simple example
class Cat(object):
    def __init__(self, hungry=False):
        self.hungry = hungry

    def should_we_feed_the_cat(self):
        if self.hungry:
            print("Yes")
        else:
           print("No")
This is a simple class of a cat, which has an attribute hungry. The __init__() method is automatically called whenever a new class object is created, while the should_we_feed_the_cat() method is a property of the object and can be called anytime. This method tests the hungry attribute and prints out "Yes" or "No".

Let's initialize a cat
>>> felix = Cat(True)
>>> felix.should_we_feed_the_cat()
Yes
Here we initialized the object with 1 argument, even though the __init__() method is defined with two arguments. We also test whether the cat is hungry using the should_we_feed_the_cat() function without any argument, even though it is defined with one. So why is Python not complaining about this?

First, we should clarify the difference between a function and a method. A method is a function which is associated with an object. So all the functions we defined within our class above are associated with an object of that class and hence they are methods. However, we can still call methods as function
>>> type(Cat.should_we_feed_the_cat)
<class 'function'>
>>> type(felix.should_we_feed_the_cat)
<class 'method'>
The first call is a function call while the second is a method call. Class methods in Python automatically put the object itself as the first argument. So
Cat.should_we_feed_the_cat(felix)
is the same as
felix.should_we_feed_the_cat()
In the first case we call a function and need to pass in the object manually, while in the second case we call the method and the object is passed automatically.

If we get this wrong and for example call the method by passing in the object, we will get an error message which you probably have seen before (I certainly have)
>>> felix.should_we_feed_the_cat(felix)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: should_we_feed_the_cat() takes 1 positional argument but 2 
were given

If you really want to avoid the self variable you can do that by using the @staticmethod decorator
class Cat(object):
    def __init__(self, hungry=False):
        self.hungry = hungry

    @staticmethod
    def other_function():
        print("This function has no self variable")
Note that the self variable is not a reserved keyword. You could easily write the class above naming the first variable in the should_we_feed_the_cat() method anything you want and it would still work
class Cat(object):
    def __init__(whatever, hungry=False):
        whatever.hungry = hungry

def should_we_feed_the_cat(whatever):
        if whatever.hungry:
            print("Yes")
        else:
           print("No")
However, self is the accepted convention and you should stick to it.

Given the very simple concept captured by the self variable, you might already think about ways to get rid of it and use other ways to access the class attribute. If you have experience with other programming languages like Java, you might prefer to just have a pre-defined keyword. However, that does not seem to be an option in Python. If you are interested, here is a blog post by Guido Van Rossum arguing for keeping the explicit self variable.

I hope that was useful and let me know if you have questions/comments in the comments section below.
cheers
Florian

1 comment: