Better Logging in Python

February 28, 2019

KURT WHEELER

or as I like to call it: Kurt Loblog's Log Blog

‍

TL;DR

There are countless log blog posts out there about the benefits of good logging, how to log well, and how much to log. Going through them all can be a real log blog slog. Wouldn't it be cool if you could log like this:

logger.info("Something happened!", job=job.id, user=user.id)

And get an easily searchable log line like:

What if I told you: you can using Python!

Why This is Important

Good application logs can make a world of difference when tracking down bugs. This is especially true when you run into a bug that happens rarely enough that it's very difficult to reproduce. Almost every mature programming language has a robust logging system. Unfortunately these logging systems can often be fairly complex to configure. Some log blog posts suggest logging in JSON. This can be very powerful if you are using a log management system. It also makes your logs more voluminous. If you aren't using a log management system, it can make your logs look very ugly.

Two commandments from The 10 Commandments of Logging stand out to me as particularly important for making logs easy to search through:

6. Thou shalt log with context

7. Thou shalt log in machine parsable format

If you log with context, then you have something to search for so you can limit your search to lines that are relevant to the job, event, or user you're tracking down. If you log in machine parsable format, then you can use tools to do the searching for you. These two also stood out to me as things that could be made easier with the right tools. This blog post describes the techniques and tools that we use to make logging painless while keeping the resulting logs searchable.

Here’s How We Make Good Logging Easier

For the CCDL's refine.bio project, I wanted to make sure that all the logs that we generated across our distributed system were consistent and searchable. However we don't use a log management system for all of our logs, so I didn't want to log in JSON. I also wanted to be able to easily add the identifier of whatever entity I was tracking down into the context for each log message. After searching for python libraries that would let me do this I found daiquiri. Daiquiri's extras feature was nearly the perfect straw with which to stir my logging drink:

import logging

import daiquiri
import daiquiri.formatter

daiquiri.setup(level=logging.INFO, outputs=(
daiquiri.output.Stream(formatter=daiquiri.formatter.ColorFormatter(
fmt=(daiquiri.formatter.DEFAULT_FORMAT +
" [%(subsystem)s is %(mood)s]"))),
))

logger = daiquiri.getLogger(name, subsystem="example")
logger.info("It works and log to stderr by default with color!",
mood="happy")

Ouputs:

On every call to the method you can pass additional context with a keyword parameter. Pretty cool! Unfortunately, as you can see it requires you to specify what keyword parameters you want to accept when you configure your logger. In fact you have to pass all the parameters it is expecting or else you'll end up with an exception, and an exception is the last thing you want when you're logging about an exception!

A logging exception logging an exception logging an exception logging an exception…
Source: https://media.giphy.com/media/xlTwaFb20TVjW/giphy.gif

However with a little bit of work I was able to extend this behavior by adding the ExtrasFormatter and ColorExtrasFormatter:

import logging

import daiquiri
import daiquiri.formatter

daiquiri.setup(level=logging.INFO, outputs=[
daiquiri.output.Stream(formatter=daiquiri.formatter.ColorExtrasFormatter(
fmt=(daiquiri.formatter.DEFAULT_FORMAT +
" [%(subsystem)s is %(mood)s]" +
"%(extras)s"),
keywords=['mood', 'subsystem'],
))])

logger = daiquiri.getLogger(name, subsystem="example")
logger.info("It works and log to stderr by default with color!",
mood="happy",
arbitrary_context="included"
)

This Also Supports JSON Logging

It's also possible to tweak some parameters to log your context as JSON*, which was an idea mentioned in the aforementioned 10 Commandments log blog post:

import logging

import daiquiri
import daiquiri.formatter

daiquiri.setup(level=logging.INFO, outputs=[
daiquiri.output.Stream(formatter=daiquiri.formatter.ColorExtrasFormatter(
fmt=(daiquiri.formatter.DEFAULT_FORMAT +
"%(extras)s"),
keywords=['mood', 'subsystem'],
extras_prefix=' {',
extras_template='"{0}": "{1}"',
extras_separator=', ',
extras_suffix='}'
))])

logger = daiquiri.getLogger(name)
logger.info("It works and log to stderr by default with color!",
arbitrary_context="included",
using="json"
)

outputs:

* While it will be valid JSON, numbers will still be represented as strings.

Using the Standard Python Library

Note that if you don't want the overhead of daiquiri, you can get close with the standard python logging library:

import json
import logging

class StructuredMessage(object):
def init(self, message, **kwargs):
self.message = message
self.kwargs = kwargs

def str(self):
return '%s >>> %s' % (self.message, json.dumps(self.kwargs))

_ = StructuredMessage # optional, to improve readability

logging.basicConfig(level=logging.INFO, format='%(message)s')
logging.info(_('message 1', foo='bar', bar='baz', num=123, fnum=123.456))

As you can see, there's still some configuration required. You will have to construct a message object in every logging call, and you don't get the other benefits of using daiquiri (such as color support).

Other Log Blog Nuggets

If you're curious how we log in refine.bio, every log message uses the logger configured in our logging module. If you take a look you'll see we do a few other things in there too:

We have a moderately complex log format that we stick some always-present context into.
We unconfigure the root logger.
We read the minimum log level that should be emitted from an environment variable.
We send our logs to sentry.io (which actually automatically parses out our context even though it's not JSON. Pretty cool.)

Conclusion

In short: write good logs! They can really make a difference, especially if you make them searchable by adding as much context as possible in a machine parsable format!

Kurt Loblog’s Log Blog Job Flog

By the way…. we’re hiring!!!! Take a look at our careers page for a list of openings and how to apply for them.

‍

or as I like to call it: Kurt Loblog's Log Blog

‍

TL;DR

logger.info("Something happened!", job=job.id, user=user.id)

And get an easily searchable log line like:

What if I told you: you can using Python!

Why This is Important

Two commandments from The 10 Commandments of Logging stand out to me as particularly important for making logs easy to search through:

6. Thou shalt log with context

7. Thou shalt log in machine parsable format

Here’s How We Make Good Logging Easier

import logging

import daiquiri
import daiquiri.formatter

daiquiri.setup(level=logging.INFO, outputs=(
daiquiri.output.Stream(formatter=daiquiri.formatter.ColorFormatter(
fmt=(daiquiri.formatter.DEFAULT_FORMAT +
" [%(subsystem)s is %(mood)s]"))),
))

logger = daiquiri.getLogger(name, subsystem="example")
logger.info("It works and log to stderr by default with color!",
mood="happy")

Ouputs:

However with a little bit of work I was able to extend this behavior by adding the ExtrasFormatter and ColorExtrasFormatter:

import logging

import daiquiri
import daiquiri.formatter

daiquiri.setup(level=logging.INFO, outputs=[
daiquiri.output.Stream(formatter=daiquiri.formatter.ColorExtrasFormatter(
fmt=(daiquiri.formatter.DEFAULT_FORMAT +
" [%(subsystem)s is %(mood)s]" +
"%(extras)s"),
keywords=['mood', 'subsystem'],
))])

logger = daiquiri.getLogger(name, subsystem="example")
logger.info("It works and log to stderr by default with color!",
mood="happy",
arbitrary_context="included"
)

This Also Supports JSON Logging

It's also possible to tweak some parameters to log your context as JSON*, which was an idea mentioned in the aforementioned 10 Commandments log blog post:

import logging

import daiquiri
import daiquiri.formatter

daiquiri.setup(level=logging.INFO, outputs=[
daiquiri.output.Stream(formatter=daiquiri.formatter.ColorExtrasFormatter(
fmt=(daiquiri.formatter.DEFAULT_FORMAT +
"%(extras)s"),
keywords=['mood', 'subsystem'],
extras_prefix=' {',
extras_template='"{0}": "{1}"',
extras_separator=', ',
extras_suffix='}'
))])

logger = daiquiri.getLogger(name)
logger.info("It works and log to stderr by default with color!",
arbitrary_context="included",
using="json"
)

outputs:

* While it will be valid JSON, numbers will still be represented as strings.

Using the Standard Python Library

Note that if you don't want the overhead of daiquiri, you can get close with the standard python logging library:

import json
import logging

class StructuredMessage(object):
def init(self, message, **kwargs):
self.message = message
self.kwargs = kwargs

def str(self):
return '%s >>> %s' % (self.message, json.dumps(self.kwargs))

_ = StructuredMessage # optional, to improve readability

logging.basicConfig(level=logging.INFO, format='%(message)s')
logging.info(_('message 1', foo='bar', bar='baz', num=123, fnum=123.456))

Other Log Blog Nuggets

If you're curious how we log in refine.bio, every log message uses the logger configured in our logging module. If you take a look you'll see we do a few other things in there too:

We have a moderately complex log format that we stick some always-present context into.
We unconfigure the root logger.
We read the minimum log level that should be emitted from an environment variable.
We send our logs to sentry.io (which actually automatically parses out our context even though it's not JSON. Pretty cool.)

Conclusion

In short: write good logs! They can really make a difference, especially if you make them searchable by adding as much context as possible in a machine parsable format!

Kurt Loblog’s Log Blog Job Flog

By the way…. we’re hiring!!!! Take a look at our careers page for a list of openings and how to apply for them.

‍

Back To Blog

Training

September 13, 2024

Building Reproducible Research Skills: A Training Workshop with the Treehouse Childhood Cancer Initiative

The Data Lab recently traveled to California to lead a hands-on workshop for nine researchers from the UC Santa Cruz Treehouse Childhood Cancer Initiative. The participants, all from a range of backgrounds and experience levels, came together to learn common practices for reproducible computational research. Our relationship with Treehouse spans years, grounded in a shared commitment to open science and reproducibility. This workshop was a chance to strengthen that partnership and an opportunity to put shared values into practice!

Tools

May 4, 2022

Use cases as a brainstorming tool

‍Use cases define how users interact with a product or system, including actions users can take and how the system responds. It also identifies user goals and paths for the system to handle errors.

Projects

March 2, 2022

The OpenScPCA Project: What We've Built Together in Year One

The Open Single-cell Pediatric Cancer Atlas (OpenScPCA) project is one year old, and there is much to celebrate! For the past year, we’ve worked closely with pediatric cancer experts to analyze data from the ScPCA Portal, improving its utility for researchers everywhere. Our focus has been on adding reliable cell type annotations across samples on the Portal, but the journey has been much more than that.

Better Logging in Python

or as I like to call it: Kurt Loblog's Log Blog

TL;DR

logger.info("Something happened!", job=job.id, user=user.id)

Why This is Important

Here’s How We Make Good Logging Easier

This Also Supports JSON Logging

Using the Standard Python Library

Other Log Blog Nuggets

Conclusion

Kurt Loblog’s Log Blog Job Flog

or as I like to call it: Kurt Loblog's Log Blog

TL;DR

logger.info("Something happened!", job=job.id, user=user.id)

Why This is Important

Here’s How We Make Good Logging Easier

This Also Supports JSON Logging

Using the Standard Python Library

Other Log Blog Nuggets

Conclusion

Kurt Loblog’s Log Blog Job Flog

Related Post