Week 1: South Migrations with Multiple Apps

Our Django application uses South to manage database migrations for model changes. South seems to be universally recommended and generally quite capable at its job. But when I went to do a fresh install of our project, ./manage.py migrate failed with messages about missing tables or no-such-relation. Where did things go wrong?

The answer, it turns out, is in how South handles multiple apps. The migrate command will, by default, walk through all INSTALLED_APPS1 with models and run migrations for each app in turn.

If the developer next to you has been working on the same installation of this project for the past year, that is almost certainly not the order their migrations ran in. Maybe Accounts got a migration in January, Inventory and Distributors got migrations in March, and Accounts got another migration in June. If that last Accounts migration depended on a field added in March's update to the Distributors table, it will not work when we try to run all the Accounts migrations before any of those for the Distributors.

That second Accounts migration can be made to work; South does have the ability to declare the dependencies of a migration. But those dependencies aren't added automatically by --schemamigration auto, so your historical migrations probably don't have them. (After all, they worked for the migration author at the time!)


Time out for a second here. If these models are all inter-related, why are they in different apps? I know what is an app? and how many apps should I have in my project? were certainly questions I had when I came to Django, as they have been for many others.

If we turn to Two Scoops of Django for guidance here, they say

Each app should be tightly focused on its task. If an app can’t be explained in a single sentence of moderate length, or you need to say and more than once, it probably means the app is too big and should be broken up.

A little later, in the chapter on model design, there's a bold section heading that reads Break Up Apps With Too Many Models. So it's easy to see why project authors might trend towards making a lot of apps.

We flagged down a Django veteran from the neighboring office and asked them for their take on the subject. They said that the conclusion they've been coming to is that it can be better to have one central app where you put all the models that may be shared by various parts of your project. You can then make more apps, if you find that organization helpful, but have them use the models from the central app. That way there is only one set of migrations for South to deal with, and you don't have to worry about explicitly adding migration dependencies.

The advice in Two Scoops quotes James Bennett, Django core developer and author of presentations like Reusable Apps. But if you aren't focusing on reusable apps — if your apps all live together in a single project, and you don't intend to use them separately in your other projects or redistribute them for third-party use — then it's quite likely they will grow interdependent as they evolve alongside one another. At that point you may be doing yourself a disservice if you're operating as if your apps are independent (as tools like South will assume) when they're really not.

That doesn't mean the advice break up apps with too many models doesn't still apply; any unit of organization will get harder to manage if it gets too big. Managing that is a central responsibility of a software developer. But perhaps there are other ways of doing that, such as using multiple modules or sub-packages within a single app. Don't feel like you have to start your project with half a dozen new apps just because you can.


Anyway, back to the task at hand. We already have a project with models in a lot of apps. And we already have a lot of migrations. Migrations that aren't currently functional. What next?

We could, through some combination of trial-and-error and detective work, by looking at the individual migrations and the commit history, try to go back and put dependencies on all the old migrations that need them.

Or, if we don't actually have a set of data from the Beginning of Time that needs to be brought up through all these migrations, we could throw them away and start with fresh definitions.

Since our data was already in an up-do-date schema, that second option sounded a lot more attractive. So we set about doing that. It goes something like this:

  1. Throw away all your migrations/ directories.
  2. Remove all migrations for your apps from the migration history,
    • taking care not to remove migrations for any third-party apps you may have in INSTALLED_APPS.
    • Or, if you do zealously wipe the entire migration history, bringing those back by running migrate --fake on them.
  3. Create new initial migrations describing the current state of each of your apps with schemamigration --initial
    • If some of your apps depend on others, be sure to add depends_on attributes to those initial migrations.
  4. Run migrate --fake on each of those to get your local migration history back in sync.

Looking around again now that I know what's going on, I find Ben Roberts on Stack Overflow figured out a good way to reset your migrations that looks rather less error-prone than what we did; using --delete-ghost-migrations is a better idea than other ways of revising your migration history. Do it that way.

Except we still had a problem.

We had circular dependencies between our apps — e.g. some model in Accounts referred to some model in Distributors and some model in Distributors referred to some model in Accounts — so schemamigration --initial couldn't create either app in a single step.

We ended up commenting out the model fields which created the circular dependencies, made the --initial schemas, un-commented those fields, ran a second set of schemamigration --auto on those apps, added depends_on attributes to that second set of migrations, and then we were done.

I did a diff of pg_dump --schema on the existing database and the one produced by our new set of migrations, and it checked out. (Although that diff was pretty tedious to read, with occasionally reordered fields and different names for indices and constraints. Is there a better way to compare the schema of two databases?)

Was that less work than figuring out the dependencies for the entire history? Probably. Still too much work? It felt like it. But I have working migrations now.

What could have saved us some trouble this time?

Well, when South creates a migration, it knows which models are the targets of new references. Determining if those models are not in the current app should not be hard. It should, at the very least, throw up a flag at that point and say Hey, you should add some dependencies on this migration before you commit it!

Ideally, it'd look at the migration history for the other app and add the dependency for you, depending on the most recently applied migration for the other app.

Not surprisingly, I am not the first person to think of this: #509: add dependencies automatically.
Also relevant: #829: add how to reset migrations to documentation.

I guess that means if you see me at a python sprint or project hack night, we have something to work on!

Footnotes:

all INSTALLED_APPS
The migrate command calls all_migrations, which gets all apps with models from the django AppCache.
Two Scoops of Django: Best Practices for Django 1.5
This book by Daniel pydanny Greenfield and Audrey Roy has generally been helpful for me, as someone who already knows a bit about python and web servers and databases but isn't quite sure of the best way to go about things in Django. Even if I have been questioning some of the content in this article today, that's been outweighed by the many times I said Yes! I was wondering how to do that! when reading the book.

When I asked myself if I should write this post today, I thought Surely it won't take as long as the last one!

The last one was a little over 400 words. This one is over 1300.

Please don't expect me to keep this pace. But then, I'm sure many of you will appreciate it if my posts don't get to tl;dr length. I do get verbose at times. But I do have more topics queued up!