Weeks 5 – 250: Notes from Five Years of Python Web-Dev

Kevin Turner

March 8, 2019

Dear blog,

I had that job writing Django for a while, and then went right from that one into a job at a spin-off company. Different product, different industry, still working on Django code from the same predecessors, but with more coworkers.

I worked on a bunch of things! Here's a list of some of them. If any of them sound interesting, let me know and maybe I can write about it in more depth or you can invite me to give a presentation on it.

API and Django Things

We didn't use Django templating very heavily; the product was composed of a few single-page apps, so most of the UI building was done browser-side. We did make a couple template filters to help transform our data so it was available to JavaScript on the initial page request.
Most of our views were written with Django Request Framework. It served us well.
- It's modelled very closely on Django's Class-Based Views, which is nice for design consistency, and not always nice if you disagree with aspects of the CBV design.
- ...and the thing I did to get the output of a DRF view embedded into the JavaScript data of another view was super kludgy.
We had so many Celery tasks. Things were bad at first, where workers would just stop processing jobs and nobody knew why, but they did get better.
- Somewhere along the way there was a good chunk of custom code to parse and collate and cross-reference logs to figure out where the trouble spots were. I plotted some of it out with Bokeh. Data visualization is fun and satisfying, I'd like to do more of it.
Django has this File abstraction that tries to make working with files the same whether they're on a local filesystem or on, say, S3, the Amazon-hosted key-value store with an HTTP API. But that abstraction can be leaky. Double-check your assumptions to make sure that
- you don't accidentally have the whole file in memory when you didn't mean to,
- you're not repeatedly PUT-ing the entire file while your CSV export is streaming twenty thousand records out one at a time,
- or that you're not running some code that assumes checking attributes like .size is cheap and fast when that really invokes a remote API request every time.
I switched an API to use keyset-based pagination instead of offset pagination. It turned out to be a lot more contentious than I expected!
- and there's the thing with the floating point numbers where, since the database driver writes them out in decimal form when it sends them over in SQL, and somewhere you lose a bit of precision and you end up with a timestamp that doesn't match itself...
Inspired by our team book club reading of Practical Object Oriented Design, I decided to design a Python interface that had a stronger separation between what the class is responsible for and how to interact with it and how its data is stored than was standard for our Model classes.

I think I do want to expand on that in a more detailed post, but the biggest 💡 is that tab-completion beats everything.

Database Things

Parallelism is still hard. We probably didn't make enough use of database transactions. But also, unnecessarily locking transactions increases your odds of deadlocking all those parallel celery workers, so don't use too many either.
Occasionally used ElasticSearch. It's one of those things where everyone expects you to have search, it barely even merits a check on your product's feature list, but it's an entire domain in itself.
Reporting reporting reporting reporting. columnar database. star schemas. data warehouse pipelines. I was only peripherally involved with those projects, so I can't speak to the ins and outs of it. I can only testify that it was a big concern and ate up a lot of attention.
Found the limits on Django's ORM.
- The example of this I thought I remembered is you can't join on multiple fields, but when looking for a reference for that now, I find that maybe you can join on multiple fields with this undocumented feature. Also, that's not one of the tickets I was following when I was researching this at the time. Such are the hazards of waiting years between blog posts.
Found the limits of SQL and moved some queries to Neo4j. The data was very tree-like in structure and our queries had a lot of logic about the attributes of the relationships. This one was especially interesting for me as a developer, and it ended up paying off well for the project.
- We were successful at querying this data with code that was much more readable than our SQL attempts and orders of magnitude more performant.
- Getting data in to Neo4j is straightforward, but keeping cross-database integrity is not.
- Integrating with django test runner: successful, but some messy work. (Neo, you should probably hire me to write a reusable implementation of that.)
- Our different queries operating on related concepts had a bunch of repeated code. I look forward to finding out if newer Neo4j versions' support for user defined functions and procedures helped alleviate that, but I left the project before we got that far.

sometimes JavaScript

Sometimes they let me tinker on the JavaScript side of the project too.

There was that one time we started making a mobile app with Cordova? But we scrapped that one and had someone else make a fresh start after React Native came along.
There was the infamous star-rating widget, which we probably collectively put a hundred of hours of design and engineering in to. If only we had Unicode 11 with its ⯫ half-stars ⯪ built-in.
I wrote at least one of our gulpfiles, with accompanying Browserify + Babel configurations.
We wrote a Google Chrome extension, and targeting a single browser is way more fun because you hardly need a compilation step at all.

…but since JavaScript iterates at a rate of seven bajillionity per year, that's probably all ancient history by now.

Yet Other Things

Our development machines were Macs, but production servers are Linux. At some point we switched our development runtime from native OS X programs to using Docker for Mac.
- It significantly streamlined our new-developer-workstation set-up time, and removed some differences between development and production (like the way sort results came back in certain locales because something something darwin libc locales something?).
- It made other aspects of development performance and debugging much worse.
PyCon came to our city (twice!) and I got to see some of my favorite Python people, and work with some new people during the sprints. Miss you, Python people! :snake-with-heart-eyes-emoji: