Week 3: Stale Copy: Object Identity in Related Models and Reloading Out-of-Sync Instances

Kevin Turner

January 11, 2014

The Task model in our application has a tree structure where tasks may have sub-tasks as children. We do this with a recursive relationship like this:

class Task(Model):
    parent = ForeignKey('self', related_name='children', null=True)
    finished = BooleanField(default=False)

I was testing a function that modifies a task and all its descendants, so my test looked a little like this:

def test_finish_all():
    task_1 = Task.objects.create()
    task_2 = Task.objects.create(parent=task_1)

    finish_all(task_1)

    assert task_1.finished
    assert task_2.finished

The first assert passed and the second didn't. I did a little exploring in the shell:

>>> task_1 = Task.objects.create()
>>> task_2 = Task.objects.create(parent=task_1)

>>> task_1
<Task#1 finished:False>
>>> task_2
<Task#2 finished:False>

>>> finish_all(task_1)

>>> task_1
<Task#1 finished:True>
>>> task_2
<Task#2 finished:False>

>>> task_1.children.first()
<Task#2 finished:True>

>>> task_2.parent is task_1
True
>>> task_1.children.first() is task_2
False

Conclusions we can draw from this:

When asking a model for a related object, you may get back an existing object (as we do with task_2.parent), but this is often not the case.
The second task is marked finished, but the reference we hold to it in task_2 is stale.

Okay, so if we suspect task_2 is stale, is there a way to have the ORM refresh it with current values?

It turns out that's a question with some history. In late 2005, Ticket #901 proposed a reload() method to reload model attributes from the database. The ticket was marked wontfix and closed three months later, but it refuses to stay down; it's been a perennial topic of discussion ever since. It was reopened last May and there's a suggested implementation …

So in other words, no. At least not in any version of Django we'll be using before 2015. I've taken to doing

task_2 = Task.objects.get(id=task_2.id)

which gives you a new python object representing that task. So it's current, but anything holding a reference to the old object isn't updated and doesn't know it's stale.

The other thing to notice in all this is that even if we just wanted to do the finish_all operation and inspect the results in the local process, we had to go through the database to find out what the results were. This may be a contributing factor in our application's tendency to do far more database queries than is sensible for a single page load — but that's a topic for another time.