Monday, June 26, 2017

Building a 'search as you type' engine for a Python Flask application using Javascript

In this blog post, I will describe how to build a 'search as you type' feature for a Flask application together with a Javascript frontend. 'Search as you type' means that the search results are displayed while the user is typing into the search field. Google has now included this feature in their search engine.

In my case, this search feature is used to search job advertisements for jobs in academic research and you can see it here in action.

We start with the HTML markup. All we are going to do here is to define a search field, where the user can type something, and a target field, where we are going to show the result:
<fieldset>
     <div class="form-group">
         <div class="col-xs-10 col-sm-8 col-md-6 col-lg-6">
             <input type="text" class="form-control" id="job_query" 
             placeholder="search term" value="" oninput="update_job_table()">
        </div>
    </div>
</fieldset>
<input id="timestamp" value="0" hidden>

<table class="table table-striped">
    <thead>
        <!-- whatever -->
    </thead>
    <tbody id="job_market_table">
    </tbody>
</table>
You can see that if the user types something in the search field the javascript function update_job_table() is called. In this function, we now have to take the current search string, send it to our Flask application, perform a quick search in the database and return the results for this search term.

To send the current search string to Flask we use Javascript. We are going to show a spinner symbol while the user is waiting to get the results back, so I include the spinner.js package. This package has only about 4kb when minimized, so does not add much overhead.

You can set up the spinner by defining the following options:
var opts = {
    lines: 9,    // The number of lines in the spinner
    length: 10,  // The length of each line
    width: 3,    // The line thickness
    radius: 6,   // The radius of the inner circle
    corners: 1,  // Corner roundness (0..1)
    rotate: 58,  // The rotation offset
    direction: 1, // 1: clockwise, -1: counterclockwise
    color: '#000000 ', // #rgb or #rrggbb or array of colors
    speed: 0.9,  // Rounds per second
    trail: 100,  // Afterglow percentage
    shadow: false, // Whether to render a shadow
    hwaccel: false, // Whether to use hardware acceleration
    className: 'spinner', // The CSS class to assign to the spinner
    zIndex: 99,  // the modals have zIndex 100
    top: 'auto', // Top position relative to parent
    left: '50%'  // Left position relative to parent
};
Now we can write the Javascript which sends the search term to the Flask app using Ajax:
function update_job_table() {
    var job_table = document.getElementById('job_market_table');
    job_table.innerHTML = "<td colspan='5'>Updating table</td>";
    var spinner = new Spinner(opts).spin(job_table);

    var search_term = document.getElementById('job_query').value;
    doc = { 'search_term': search_term };
    var timestamp = +new Date();
    doc['timestamp'] = timestamp;
    $.ajax({
        url: $SCRIPT_ROOT + '/search_job_market',
        type: "POST",   
        async: true,
        cache: false,
        dataType: 'json',
        contentType: "application/json; charset=utf-8",
        data: JSON.stringify(doc, null, '\t'),
        success: function(data) {
            if(data.error){
                console.log("Error = ", data.error);
            }
            else{
                var results_timestamp = document.getElementById('timestamp');
                if(data.timestamp > parseInt(results_timestamp.value)){
                    spinner.stop();
                    results_timestamp.value = data.timestamp;
                    job_table.innerHTML = data.html;
                }
            }
        }
    });
}
So we just access the search term through the id and send it with a post request to the Flask /search_job_market view as a JSON object. The $SCRIPT_ROOT variable is set to the home path of the application.

Now we have to handle this request within the /search_job_market'  view.
@app.route('/search_job_market', methods=['POST'])
def search_job_market():
    ''' 
    We get here if the user types into the job market search field
    '''
    if request.method == 'POST': 
        html = get_job_market_table_html(request.json)
        return json.dumps({'success': 'search successful', 'html': html, 
               'timestamp': request.json['timestamp']}),\
               200, {'ContentType':'application/json'}
The function get_job_market_table_html() retrieves the necessary information from the database and renders the HTML. My setup uses an Elasticsearch database, but of course, any database system can be plugged in here. You should probably make sure that your setup is able to return some meaningful result to the user in a reasonable time.

You might have noticed that I pass a timestamp through to all requests. The reason for that is to ensure that we only update the user view with new results. Imagine that the user types 'science' in the search field. This will trigger searches for 's', 'sc', 'sci', 'scie', 'scien', 'scienc' and 'science'. It is possible that the result for 's' takes longer and therefore gets sent back to the page after 'sc' has been displayed. Using the timestamp, our setup just ignores the older result.

I posted the entire code on GitHub. Feel free to leave comments or questions below.
cheers
Florian

Wednesday, June 21, 2017

Lazy load images with Javascript depending on the user view window

In this blog post, I will describe how I added lazy load of images to my website. If you have a page with many images, it is very useful to not load all images at page load but to load the images on demand. This is especially useful for people with bad internet connections or people who visit your page on mobile devices.

The first step is to load a spinner symbol instead of the actual image, but to write the image in the data-src property so that the HTML looks something like this:
<img class="lazy_load_image" src="/static/spinner.gif" 
data-src="{{ path_to_image }}" alt="{{ image_name }}" 
style="width:300px;hight:400px;">
With this markup, only the spinner image will be loaded on page load and you should, of course, make sure that this is a small image, but there are many examples you can find online.

Now after page load, we want to load the images and we do this with Javascript by adding an event listener
// This code lazy loads the images in the viewport after the page load
window.addEventListener('load', function(){
    load_lazy();
}, false)
The load_lazy() function just writes the data-src property of the HTM tag above into the src property for all images which are in the viewport:
// This function loads all images in the viewport
function load_lazy(){
    // we expand the viewport by 200px
    var images = $('.lazy_load_image').withinviewport({top: -200, bottom: -200});
    for(var i = 0; i < images.length; i++){
        var image_src = images[i].getAttribute('data-src');
        if(image_src){
            images[i].setAttribute('src', image_src);
            images[i].className -= 'lazy_load_image';
        }
    }
}
For this to work you have to download and include withinviewport.js (bower install within-viewport).
Here I extended the viewport by 200px, just to make sure that the user does not see a spinner forever if less than 50% of the image is in the viewport.

The last step is to make sure that images get loaded if the user scrolls down, which we can do by adding
// This code lazy loads the images in the viewport after scrolling
$(window).scroll(function(){
    load_lazy();
})
This function gets called whenever the user scrolls, and if a new image has entered the user view, it will be loaded by the load_lazy() function.

Hope that was useful. I posted the entire code on GitHub.
cheers
Florian

Friday, June 16, 2017

Setting up the Printful API communication with Python

I recently set up a little shop on my website (www.benty-fields.com/benty-shop) using Printful.com. Here I will explain how to do this using Python.

If you want to run an online business using printing services like Printful, by far the easiest approach is to use Shopify or any other online business solution. However, I just wanted to attach a small shop to my own website. For such a case Printful provides an API, which however requires a little bit of work.

I am running my app using Flask in combination with a Postgres database behind a SQL alchemy ORM. First I selected the products I wanted to sell and store the products in my database. I will not discuss the details of my database setup here, but the easiest way for you would be to copy the Printful product database layout.

I then built an HTML/Javascript frontend for my store, which you are welcome to visit at www.benty-fields.com/benty-shop. Whenever a customer selects a product, I create an order object in my database, which however requires the variant_id from the Printful API. A simple solution would be to just download the entire product and variant list from Printful. This has the advantage that you would not need to make an API call to get the variant_id and hence can provide quicker response to any customer action. I, however, opted to write a small function, which queries the Printful API and returns the variant id. This way I am more flexible against any changes in the Printful variant_id list.

Here is the function I use:
try:
    response = requests.get(url)
except requests.exceptions.RequestException as e:
    print("ERROR: %s" % str(e))
    return False
if response.status_code == 200:
    data = json.loads(response.text)
    # We count the number of positive variants we find,  
    # just to be sure that our variant selection is unique
    count = 0
    # Find the variant specified by the color and size of the order
    for variant in data['result']['variants']:
        # First we select the size (if size is not null). 
        # Size is null if the product comes in only one size
        if variant['size'] == 'null':
            # There is no case I know of where size and color are null
            if variant['color'] == order.color:
                variant_id = variant['id']
                count += 1
        elif variant['size'] == order.size:
            # Second we select the color (if color is not null).  
            # Color is null if the product comes in only one color
            if variant['color'] == 'null':
                variant_id = variant['id']
                count += 1
            elif variant['color'] == order.color:
                variant_id = variant['id']
                count += 1
    # to make absolutely sure that color and size define a variant for this product
    if count == 0:
        print("ERROR: No variant found with this size %s and color %s "
              for product id %d" % (order.size, order.color, 
                                    order.product.printful_product_id)
        return False
    elif count > 1:
        print("ERROR: Size and color are not enough to define the variant "
              for product id %d" % order.product.printful_product_id
        return False
    else:
        return variant_id
else:
    print("ERROR: Printful api call unsuccessful, code = %d,"
          "message = %s" % (response.status_code, response.text)
    return False
This code uses the size and color to find the variant id. This assumes that a set of size and color is enough to determine the variant_id, and I have not come across any Printful product where there is a third dimension. The code, however, makes sure that only one variant_id satisfies the specification otherwise it will return False.

Having set the variant id it is quite easy to submit a Printful order after the customer provided the payment details. Here is my function for that:

def create_printful_order(order):
    ''' This function submits a printful order '''
    order_json = {
        "recipient": {
            "name": order.shipping_name,
            "address1": order.shipping_address,
            "city": order.shipping_city,
            "state_code": order.shipping_state_code,
            "country_code": order.shipping_country_code,
            "zip": order.shipping_zip_code
        },
        "items": [{}]
    }
    order_json['retail_costs'] = { "shipping": order.shipping_cost }
    items = []
    # Process each item in the order and attach them to the json object
    for order_item in order.items:
        item = {
            "variant_id": order_item.variant_id,
            "quantity": order_item.quantity,
            "name": order_item.product.title,
            "retail_price": order_item.product.price,
            "files": [{
                "id": order_item.print_file_id
            }]
        }
        items.append(item)
    order_json['items'] = items
    url = app.config['PRINTFUL_API_BASE'] + 'orders'
    b64Val = base64.b64encode(app.config['PRINTFUL_API_KEY'])
    headers = {'content-type': 'application/json',
               'Authorization': "Basic %s" % b64Val}
    try:
        response = requests.post(url, data=json.dumps(order_json),
                                 headers=headers)
        print("response = ", response.status_code, response.text)
        return True, response
    except requests.exceptions.RequestException as e:  
        print("ERROR: When submitting order with requests, "
              "error message: %s" % str(e))
        return False, e
Note that you have to 64-bit encode your API key, otherwise the Printful API will not understand your request. It should be quite easy to replace my order_item object with whatever database setup you are using.

I posted the code on GitHub. Let me know if you have any comments.
cheers
Florian

Sunday, June 11, 2017

Combine and minify CSS/JS files with the minify API to minimize page loading time

Page loading time is a major factor in customer satisfaction with a website as many studies have shown, e.g. read this blog post. One step towards fast loading time is to minimize the data which need to be transferred before the page can be shown to the user.

In this blog post, I will talk about my approach to combine and minimize all CSS and javascript files. Combining the files reduces the number of requests and compressing reduces the number of bytes required to send to the customer. There are build tools like  GRUNT and Gulp, which allow you to do this in a more structured way, but here I will just discuss my own quick fix.

I wrote a small python script for this task which makes use of the minify API to process the CSS and javascript files. This is how it looks:
def compressor(api_url, output_name, filenames):
    '''
    Here we use the minify API 
    js files need to be sent to api_url = https://javascript-minifier.com/raw
    css files need to be sent to api_url = https://cssminifier.com/raw
    '''
    code = []
    total_cost = 0
    for fn in filenames:
        if fn.startswith('http://') or fn.startswith('https://'):
            response = requests.get(fn)
            if response.status_code == 200:
                code.append( response.text )
            else:
                print('ERROR: "%s" is not a valid url! Exit with "
                      status code %d' % (fn, response.status_code))
                return False
        else:
            if not os.path.isfile(fn):
                print('ERROR: "%s" is not a valid file!' % fn)
                return False
            code.append( open(fn).read().decode('utf-8') )
        cost = len(code[-1]) / 1024.0
        total_cost += cost
        print("added %s with (%.2fK)" % (fn, cost))
    payload = {'input': u' '.join(code)}
    response = requests.post(api_url, payload)
    if response.status_code == 200:
        outfile = open(output_name, 'w')
        outfile.write(response.text.encode('utf-8'))
        outfile.close()

        print('-' * 50)
        print('>> output: %s (%.2fK) from (%.2fK)' %
              (output_name, len(response.text)/1024.0, total_cost))
        return 
    else:
        print('ERROR: "%s" is not a valid url! Exit with "
              status code %d' % (fn, response.status_code))
        return False
This function gets the API URL, the output file name and a list of input files. It will concatenate all input files and send the code to the minify API, where it is processed and sent back. If you provide a URL instead of a file, it will download the data for you so you can build your js file entirely with CDN links.

So what does the minify API do? Let's take an example. I have a function in my javascript which changes the style display status of an HTML element like this:
// Every question mark on the site is handled through this function
function toggle_help(target_id) {
    var target_object = document.getElementById(target_id)
    if( target_object.style.display == 'none' ){
        target_object.style.display = 'block';
    }
    else{
        target_object.style.display = 'none';
    }
} 
After feeding it into the minify API it looks like this:
function toggle_help(e){
    var t = document.getElementById(e);
    "none"==t.style.display?t.style.display="block":t.style.display="none"
}

So minify removed all comments and unnecessary whitespace. It also renames the variables to one letter names. The purpose of this is of course just to minimize the number of bytes.

It is certainly not a good idea to work with the minified code. It's an unreadable mess (in the example above I introduced some whitespaces and indentation, to make it more readable). So my approach with the function above is to write my code nicely separated in many files logically separated by functionality and only before deployment I put them all together.

To process js files you have to call the function with
api_url = https://javascript-minifier.com/raw
while for javascript files, you have to call the function with
api_url = https://cssminifier.com/raw
I put this code on GitHub. Let me know if you have any questions/comments below.
cheers
Florian

Tuesday, June 6, 2017

Building a recommendation engine for arXiv papers using Python, sklearn and NLP

In this blog post, I will describe how I build a recommendation engine for arXiv papers for the web application www.benty-fields.com.

benty-fields is used by researchers in physics, mathematics and computer science, to follow the latest publications and organize journal clubs. The website collects data on the publication's users like, for example, if they vote for a paper or put it in their personal library. Here I will use those data to build a recommendation engine. 

There are two different use cases for this recommendation engine:

(1) When a user visits the website to read the latest publications, I want that list of publications ordered, so that the most interesting publications for this user are at the top.
(2) I want to present a list of publications to the user with the title 'papers you might have missed'. This list should include papers from the last 3 months, which the user has not shown any interest in yet (has not voted for or put it in her/his library), but which might be of interest.

We will use exactly the same model for both cases. So let's get started.

First, we have to think about what features we want to use to train a model. The data we have available are the papers the user liked in the past. These papers have abstracts, titles, authors, publication dates and a lot more information we could use. Here we will restrict the analysis to the abstract, title and author list. I tested other features but the most useful feature seems to be the abstract, with the title and author list adding marginal information.

First we turn the list of abstracts, titles and authors into a feature vector:
# We calculate the tf-idf for each feature
vectorizer = TfidfVectorizer(analyzer="word", tokenizer=None,
                             stop_words="english", max_features=5000) 
# fit_transform() serves two functions: First, it fits the model
# and learns the vocabulary; second, it transforms our training data
# into feature vectors. The input to fit_transform should be a list 
# of strings.
train_data_features1 = vectorizer.fit_transform(list_of_abstracts)
train_data_features2 = vectorizer.fit_transform(list_of_titles)
train_data_features3 = vectorizer.fit_transform(list_of_authors)
I use the TfidfVectorizer() from the scikit-learn package to turn the text into a matrix of papers vs. term frequency-inverse document frequency (tf-idf) scores. tf-idf is looking at the frequency of terms normalized to how often the term appears in other documents. So if a term appears in every document, the tf-idf score is very small, since the term does not seem to have much distinctive power, while if the term only appears in a few documents, this term is very useful to separate papers into categories and therefore gets a large tf-idf score.

Before I feed the abstract, title and author list into the vectorizer I pre-process the data:
import nltk
from nltk.stem.porter import PorterStemmer

stemmer = PorterStemmer()

def stem_tokens(tokens, stemmer):
    stemmed = []
    for item in tokens:
        stemmed.append(stemmer.stem(item))
    return stemmed

def tokenize(text):
    # remove numbers
    text_wo_numbers = re.sub(r'\d+', '', text)
    # remove single letters
    text_wo_single_letters = re.sub(r'\b\w\b', ' ', text_wo_numbers)
    # remove punctuation 
    text_wo_punctuation = ' '.join(re.findall('\w+',
                                              text_wo_single_letters))
    tokens = nltk.word_tokenize(text_wo_punctuation)
    stems = stem_tokens(tokens, stemmer)
    return stems

def pre_process_text(list_of_texts):
    output_list_of_texts = []
    for text in list_of_texts:
        output_list_of_texts.append(' '.join( tokenize(text) ) )
    return output_list_of_texts
This code uses the nltk library to remove numbers from the abstract and title, as well as punctuation and one letter words. It also transforms the words into word stems (e.g. equations -> equat), which ensures that e.g. equation and equations maps to the same word stem. I leave the removal of stop words to TfidfVectorizer(). The author list is not processed in this way since it does not make much sense to turn names into stems.

Next, we can use these features to train a model. I am using a Random Forest model which I train for each user with the available papers for that user, as well as an equal size negative paper set (papers the user does not seem to be interested in). The negative paper set is picked from the same time period as the positive papers, but I made sure that none of those papers is present in the positive paper set.
# Now we get the negative paper examples
negative_papers = get_negative_arxiv_papers(user, len(input_papers))

paper_sample = input_papers + negative_papers
y = [1]*len(input_papers) + [0]*len(negative_papers)
y = np.array(y)
        
X1, X2, X3 = prepare_data(paper_sample, user, write=True)
X_train = np.c_[X1, X2, X3]

forest = RandomForestClassifier(n_estimators=100)
# forest now stores the fit information used later for the prediction
forest.fit(X_train, y)
The functions get_negative_arxiv_papers() is very specific to how I store the data and I will not discuss the details of these functions here. If you are interested in the details I posted the complete code on GitHub. The function prepare_data() contains the data cleaning and vectorization steps discussed above.

However, it is important to pick the correct set of negative papers. In my case I could, for example, pick just any paper in the arXiv library. The problem is that users follow only certain arXiv categories, so the fact that a user who follows the category 'Statistics' has never shown any interest in papers from the category 'Optics' is not very surprising since the user has probably never seen these papers.

The model developed here is supposed to order the papers in the category the user is looking at, so it has to distinguish between papers in the user specific category. Therefore I pick the negative papers also from this category. It might be interesting to train a more general model and suggest papers outside the user's specified field of interest, but this is not the purpose of this model. Note that these choices impact the performance of the model significantly, so I did a lot of tests to make sure to pick a representative negative dataset. It is important to pick the training data according to the questions you are trying to answer.

Ok, now that we have a model, we can test the model. One question we need to answer is when do we want to use the model. For many users, the available data to train the model is very small, just because many users are not very active. If the model is trained with small datasets, it might be better to put the model aside and instead recommend the most popular papers to the users.

So when should we use the model to derive recommendations and when is it better to instead use the most popular papers? Here is a plot which shows the precision of the random forest model as a function of the size of the training dataset.

Figure 1: Precision reached for the Random Forest model as a function of the size of the
training dataset for each user. The precision is calculated as the average of 10 cross-validation step.
Figure 1 clearly shows that the precision has larger and larger error bars with smaller datasets, and as soon as the error bar includes a precision of 0.5, the model basically behaves like a random selection. From this plot, I determined that having less than 50 papers to train the model does not result in a predictive model. I used that to distinguish active users, for whom I use the model to get paper recommendations, and less active users for whom I use the most popular papers as recommendations.

So here is the function which uses the random forest model to find the most popular papers
def get_tf_idf_ranking(papers, user):
    ''' This function is used to rank papers by user interest '''
    X1, X2, X3 = prepare_data(papers, user, read=True)
    X_pred = np.c_[X1, X2, X3]

    model_file = ("%d/recommendation_model_%d.pickle" % 
                  (app.config['ML_FOLDER'], user.id))
    forest = pickle.load( open( model_file, "rb" ) )

    # Use the model for prediction
    y_proba = forest.predict_proba(X_pred)

    return y_proba[:,1]
I just load the random forest model and use its predict_proba() function. The preparation of the data in prepare_data() is very similar to what I did above, but now I read in the feature list instead of creating a new one.

My code to test the recommendation model is as follows:
def get_auc_through_cross_validation(user):
    ''' 
    This function calculates the area under the curve (AUC)
    for the paper prediction model
    '''
    input_papers = paper_click_data(user)
    negative_papers = get_negative_arxiv_papers(user, len(input_papers))

    paper_sample = input_papers + negative_papers
    y = [1]*len(input_papers) + [0]*len(negative_papers)
    y = np.array(y)
    
    X1, X2, X3 = prepare_data(paper_sample, user, write=False)
    X = np.c_[X1, X2, X3]
    
    metrics = { 
        'num_cases': len(X),   
        'curves': [],
        'aucs': [],
        'pr': []
    }
    cross_validation_steps = 10
    kf = KFold(n_splits=cross_validation_steps, shuffle=True)
    forest = RandomForestClassifier(n_estimators=100)

    for train, test in kf.split(X):
        X_train, X_test = X[train], X[test]
        y_train, y_test = y[train], y[test]

        forest.fit(X_train, y_train)
        probabilities = forest.predict_proba(X_test)[:,1]
        prec, rec, thresholds = precision_recall_curve(y_test,
                                                       probabilities, 
                                                       pos_label=1)
        thresholds = np.append(thresholds, np.array([1]))

        fp_rate, tp_rate, thresholds2 = roc_curve(y_test, 
                                                  probabilities,
                                                  pos_label=1)
        roc_auc = auc(false_positive_rate, true_positive_rate)
        print roc_auc
        if not math.isnan(roc_auc):
            av_pr = average_precision_score(y_test, probabilities)

            case_rate = []
            for threshold in thresholds:
                case_rate.append(np.mean(probabilities >= threshold))

            curves = {
                'thresholds': thresholds,
                'precision': precision,
                'recall': recall,
                'case_rate': case_rate,
                'fpr': false_positive_rate,
                'tpr': true_positive_rate
            }
            metrics['curves'].append(curves)
            metrics['aucs'].append(roc_auc)
            metrics['pr'].append(av_pr)

    plot_cross_validation_result(user, metrics)
    return metrics
This code calculates a large set of evaluation metrics, including a Receiver Operating Characteristic (ROC) and Precision-Recall plots for each model













Figure 2: Precision-Recall curve and ROC curve for the Random Forest model of 
the most active user on benty-fields.com.

The different lines correspond to the 10 cross-validation steps. The area under the curve (AUC) tells us the predictive power of the model, with 1 being perfect, and 0.5 representing a random selection. Our AUC of about 0.8 is pretty good and given that we have the cross-validation results we can also calculate an error for this number.

The code to produce these plots is
def plot_cross_validation_result(user, results):
    # ROC Curve plot
    # average values for the legend
    auc_av = "{0:.2f}".format(np.mean(results['aucs']))
    auc_sd = "{0:.2f}".format(np.std(results['aucs']))

    plt.clf()
    plt.figure(2)
    # plot each individual ROC curve
    for chart in results['curves']:
        plt.plot(chart['fpr'], chart['tpr'], color='b', alpha=0.5)
    plt.plot([0,1],[0,1],'b--')
    plt.xlabel('False positive rate')
    plt.ylabel('True positive rate')
    plt.title('Cross-validated ROC for user %d (sample size %d)' % 
              (user.id, len(user.paper_click_objects)))
    plt.text(0.6, 0.1,r'AUC = {av} \pm {sd}'.format(av=auc_av,sd=auc_sd))
    plt.savefig("%s/user_like_roc_curve2_%d.png" % 
                (app.config['STATISTICS_FOLDER'], user.id))
    # Precision-recall plot
    pr_av = "{0:.2f}".format(np.mean(results['pr']))
    pr_sd = "{0:.2f}".format(np.std(results['pr']))

    plt.clf()
    plt.figure(4)
    for chart in results['curves']:
        plt.plot(chart['recall'], chart['precision'], 
                 color='b', alpha=0.5)
    plt.xlabel('Recall')
    plt.ylabel('Precision')
    plt.title('Cross-validated prec/rec for user %d (sample size %d)' %
              (user.id, len(user.paper_click_objects)))
    plt.text(0.6, 0.9,r'AUC = {av} \pm {sd}'.format(av=pr_av,sd=pr_sd))
    plt.savefig("%s/user_like_precision_recall_%d.png" % 
                (app.config['STATISTICS_FOLDER'], user.id))
    return 
Ok, that's it. A lot of code in this post but let me know if you have any comments. I also posted the entire code including some functions I skipped in this post on GitHub.
cheers
Florian