My statistics blog: Flask

Showing posts with label Flask. Show all posts

Monday, March 26, 2018

Context managers in Python

A context manager in Python is a way to manage resources. Typical uses of context managers include locking and unlocking resources like database connections and closing, opening files.

Context managers are used together with the with statement. You probably encountered the with statement at some point when coding in Python, the most often used case is file handling

with open('data.dat', 'r') as file: 
   contents = file.read()

However, files do not have to be opened like this, instead, you could also use

file = open('data.dat', 'r')
contents = file.read()
file.close()

The problem with this code is that the file.close() method might never be called if an exception happened after the opening of the file. If the file is kept open, you have a resource leak, and the system may slow down or crash since the number of available file handlers is finite.

A more robust version of dealing with a file could use the try... finally statement like this

file = open('data.dat', 'r')
try:
   contents = file.read()
finally:
   file.close()

So even if there is an exception while reading the file, the finally clause always ensures that the file is closed again. This is basically how the with statement is implemented.

The with statement has two minor advantages over the try... finally statement above: (1) it is a little bit shorter and (2) it does not require calling the close() method explicitly, since the with statement calls it automatically, whenever the program leaves the with statement (indented block below the with statement).

So the main purpose of the with statement is to guarantee the execution of startup and cleanup actions around a block of code, with the main application being resource management.

So how would we write a context manager? We can generate this functionality using a class

class MyOpen(object):
   def __init__(self, filename):
      self.filename = filename

   def __enter__(self):
      self.file = open(self.filename)
      return self.file

   def __exit__(self, ctx_type, ctx_value, ctx_traceback):
      self.file.close()
      return True

This class has two methods which are specifically used by the with statement, the __enter__() and __exit__() methods. The __enter__() method is called at the beginning of the with statement. The __exit__() method is called when leaving the with statement and could handle exceptions, even though this is ignored in the above example.

You might wonder what is the difference between __init__() and __enter__(). The __init__() method is called when the class object is created, which does not need to be the beginning of the with statement. This difference allows us to produce reusable context managers like this

file_handler = MyOpen('data.dat')
with file_handler as file:
    contents = file.read()

In this case, the __init__() method is called in line one, when the class object is created, the __enter__() is called in line two when the with statement begins and the __exit__() method is called in line three when the with statement ends.

So to produce a context manager you just have to create a class with two special methods: __enter__() and __exit__(). The __enter__() method returns the resource to be managed, like a file object in the case of open(). The __exit__() method does any cleanup work.

Hope that was useful and please let me know if you have any comments or questions in the comment section below.

cheers
Florian

Tuesday, February 20, 2018

Error handling within a flask application

Any website should somehow be able to deal with unexpected errors. In this blog post, I will describe how I handle errors within my Flask application, which follows this great tutorial.

In Flask, it is quite easy to register error handlers which re-direct to a custom error page like this

from flask import render_template

@app.errorhandler(404)
def page_not_found(e):
    return render_template('404.html'), 404

However, rather than going through the list of possible errors and creating a route and page for each, one can also create a general error handler which handles all errors. To do that you have to register a general error handler in your app like this

from werkzeug.exceptions import default_exceptions

def init_app(app):
    ''' Function to register error_handler in app '''
    for exception in default_exceptions:
        app.register_error_handler(exception, error_handler)

    app.register_error_handler(Exception, error_handler)

Calling the init_app(app) function will register the error_handler() function for all exceptions in the default_exceptions list of the werkzeug package, which should cover all errors except for some exotic ones. Now we just have to write the error_handler() function.

def error_handler(error):
    ''' Catches all errors in the default_exceptions list '''
    msg = "Request resulted in {}".format(error)
    if current_user.is_authenticated:
        current_app.logger.error(msg, exc_info=error)
    else:
        current_app.logger.warning(msg, exc_info=error)

    if isinstance(error, HTTPException):
        description = error.get_description(request.environ)
        code = error.code
        name = error.name
    else:
        description = ("We encountered an error "
                       "while trying to fulfill your request")
        code = 500
        name = 'Internal Server Error'

    templates_to_try = ['errors/error{}.html'.format(code), 'errors/generic_error.html']
    return render_template(templates_to_try,
                           code=code,
                           name=Markup(name),
                           description=Markup(description),
                           error=error), code

This function first checks whether the error has been caused by an authenticated user, and if so it writes an error message to the logfile. If the user was not authenticated, it only writes a warning message. The reason for this distinction is that robots cause all sorts of 404 errors in my app, which I don't care much about, but if a user causes an exception, I definitely want to know about it.

In the next step, the handler extracts all information it can get from the error message and then renders a template. The render_template function will go through the templates_to_try list until it finds an existing template, so this way you can register custom pages for certain errors but if a custom error does not exist, it will just render the general error page, which in my case looks like this

{% extends "master.html" %}

{% block title %}Error{% endblock %}

{% block description %}
<meta name="description" content="Error message.">
{% endblock %}

{% block body %}

    <div id="navbar_wrapper">
        <div id="site_content">
            <div class="container">
                <div class="col-xs-12 col-sm-9 col-md-10 col-lg-10">
                    <br>
                    <h1>{{ code }}:{{ name }}</h1>
                    <p>{{description}}</p>
                    <p>The administrator has been notified. Sorry for the inconvenience!</p>

                    <button class="btn btn-primary" onclick="history.back(-1)">Go Back</button>

                </div>
            </div>
        </div>
    </div>

{% endblock %}

The advantage of such a custom error page is that you can add a back button, to get the users back on your page if you haven't alienated them yet with the error.

I put the entire example on Github. Let me know if you have any comments or questions.
cheers
Florian

Wednesday, January 31, 2018

The N+1 problem in relational databases and eager loading options in SQLalchemy

In the first part of this post, I will explain the N+1 problem in relational databases. The second part will show how one can deal with this problem in Flask-SQLalchemy.

Assume you have a user table and each user has bought a number of products, listed in a sales table. Let's assume you want a list of all products bought by users in the last 2 days. You can do that by querying the user table to get all users

SELECT * FROM User;

and then for each user, you query the number of products they have bought

SELECT * FROM Sales WHERE user_id = ? AND date > two_days_ago

In other words, you have one Select statement for the user table followed by N additional Select statements to get the associated products, where N is the total number of users.

Each access to the database has a certain overhead which means that the procedure above will scale quite badly with the number of users.

Give that, for each user we intend to perform the same query, there should be a way to do this more efficiently rather than using N+1 calls to the database. Most object-relational mappers (ORMs) such as SQLalchemy, give you several tools to deal with this problem. The SQLalchemy manual discusses this issue here.

However, I certainly noticed that ORMs hide the N+1 problem. If you would write pure SQL, you would directly see the number of Select statements you submit, while in SQLalchemy, it is not obvious when the data is loaded and how many times the database is accessed. So it is quite important to understand the loading options in SQLalchemy and use them appropriately.

Let's build the example discussed above using Flask-SQLalchemy

class User(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    ...

    products = db.relationship('product',
                               secondary=sales)

class Product(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    date = db.Column(db.DateTime)
    ...

sales = db.Table(
    'sales',
    db.Column('product_id', db.Integer, db.ForeignKey('product.id')),
    db.Column('user_id', db.Integer, db.ForeignKey('user.id'))
)

Here we have a User and a Product table and the helper table 'sales' establishes a many-to-many relationship between the two tables. Using the helper table I added a products attribute to each user object, which allows us to access all products this user bought.

We now have two main choices for how we can implement the products relationship (db.relationship above). The default setting of SQLalchemy is lazy loading, which means that the related rows in the sales table are not loaded together with the user object. This is a reasonable choice for a default setting, but not what we want here. Lazy loading means we will run into the N+1 problem since whenever we want to have the products for each user we have to run another N database calls.

To solve this problem we have to make use of the lazy keyword, which by default is set to 'select' like this

products = db.relationship('product',
                           secondary=sales,
                           lazy='select')

So let's go through the options we have for this keyword. One very useful option is lazy='dynamic' like this

products = db.relationship('product',
                           secondary=sales,
                           lazy='dynamic')

which means that user.products actually returns a query object rather than table rows. This gives you a lot of flexibility how the second database access might look. For example, you could add additional filters like this

user.products.filter_by(Products.date > two_days_ago).all()

This can be very useful e.g. if there are many rows in the products table, the additional filter might make the second database access much quicker. But this is of course not solving our problem since it definitely needs a second database access. To avoid the N+1 problem we need to load the rows in the sales table together with the users.

To do this we have to make use of eager loading and SQLalchemy provides three choices for that: joined loading (lazy='joined'), subquery loading (lazy='subquery') and select IN loading (lazy='selectin').

Let's start with the subquery loading since that one is often a good choice. Here two Select statements are run, one to retrieve all users and one to retrieve all related rows in the sales table. So rather N+1 we have 2 Select statements.

Alternatively, joined loading squeezes everything into one Select statement. So we save another Select statement compared to subquery loading, but the downside is that if the products table is large, this can become very slow since it makes use of an out left join.

Finally, the selectin loading is the newest option in SQLalchemy and the recommended one according to the docs. It only queries 500 rows per Select statement, so if you need to retrieve many products, you might end up with several Select statements, but in my experience, this option works very well.

Note that most of the time you do not want to add this loading option to the mapping, but instead set the loading style at runtime. You can do that with joinedload(), subqueryload(), selectinload() and lazyload(), which can be imported as

from sqlalchemy.orm import subqueryload, joinedload, selectinload

and used like

users = User.query.options(selectinload(User.products)).all()

I hope that was useful. Let me know if you have any question/comments below.
best
Florian