Project Anvil is the name I gave to the biggest and most significant project refactor I’ve ever undertaken. I don’t want to call it a rewrite - although it really feels like one - because the actual functionality and language isn’t changing; mostly I’m just moving bits of code around and changing how the disparate parts connect and are deployed. Also a rewrite sounds scary, and a refactor sounds grown-up.
Podiant is the project in question, and it can be considered a monolith, as it’s a single large codebase run on a single framework. The databases are cloud-hosted and there are some microservices for things like download tracking and the beginnings of an Alexa API, but the marketing site and the dashboard are thoroughly traditional Django apps with URL routes, views and templates rendered on the server.
Prior to this, the biggest refactor I did was to convert the project to Python 3, and change the way the app was hosted and deployed, moving from pets to cattle (a paradigm shift that means you spin up new servers to replace old ones, instead of deploying new code and configuration to existing serves which you manually provision and maintain).
But the future is containerisation via Docker, which adds a layer of abstraction to your setup, meaning you can have multiple machines each running multiple instances of your codebase, and have intelligent load-balancing between servers, so you can scale up and down (in theory) without actually having to provision new servers. (That’s not strictly a Docker thing but a subsequent add-on.)
Doing it wrong
The brains at Ember.js - the framework I eventually settled on - have spent years figuring out the best way to handle data from external sources and cache it locally, support different browsers, manage the DOM and events so the browser doesn’t run out of memory… I’m just far better off standing on their shoulders.
Doing it right
A few days ago I decided to go HAM, and completely blitz everything in my codebase and start again. I’m still ironing out the details, but the current plan is to go with a Docker app for the REST API, which will be the heart of the codebase, an Ember app for the marketing site, another for the dashboard and something yet-to-be-determined for the podcast pages (I’ll explain why that’s complicated in a bit).
The final piece concerns background tasks, and it’s something that only occurred to me last night. Having two images based off the same codebase feels like bad practice, but there are lots of important jobs that are done in the background and must report their progress and update the database with newly-discovered info. Here’s a use-case: When a user uploads a piece of audio, Podiant runs a number of jobs as part of what I call a workflow. For example:
convert to 96kbps and remove all metadatajob always needs to be run
- if the podcast has no artwork, the
add artwork to MP3 filejob doesn’t need to be run
add chapters to MP3job needs to be run if the user wants to add chapters
create waveform imagejob always needs to be done.
When each job in a workflow is finished, it reports its status in realtime (via Pusher ) and when all the jobs are completed, the database is updated with the episode MP3 URL, the duration, the filesize and a graphical representation of the waveform. If a workflow like this were to run in isolation in a distributed system, it would need Django and a copy of the episode and podcast models, so it can make those database changes. But that’s icky, so I’m looking to make workflows real-world models (instead of temporary objects stored in a cache) and allow them to store arbitrary data in JSON format, which would represent the updates that need to be made to the database.
When the workflow is completed, it POSTs to the workflow’s API endpoint (something like
https://api.podiant.co/workflows/123 ), with the JSON data representing the changes that have been applied. The API then naively applies those changes to the database, and can handle the realtime reporting.
This way, the workflows can be separate processes run in isolation, that the Django API doesn’t technically need to be aware of (it just needs to know the fully-qualified name of the workflow, or even a URL to the workflow’s endpoint, AWS Lambda style). The workflows can be lighter as they just need to be Python scripts that take in an input and return a serialisable object.
There are two problems on which I’m not yet sold: one is how to manage scheduled tasks (cron jobs), and the other is the podcast sites themselves. Cron jobs will probably end up being easier, but in an ideal world I’d love to be able to spin up a container that runs the job and then is destroyed, similar to how Heroku handles scheduled tasks. The harder problem is the frontend site for podcasts.
Ember and on-the-fly templates
The game plan
So, after a day of running Ember.js’s development server in Docker, I have a piece of advice for you:
Don’t run Ember.js’s development server in Docker.
— Mark Steadman (@iamsteadman) March 6, 2018
I wanted to start with something fairly inconsequential so I could tease out some of the harder problems before diving deeper in, so as of today I have an Ember app running the marketing pages (what I call the brochure site).and a Dockerised API server running the Django REST Framework, which was absurdly easy to get going thanks to a cookiecutter template.
The Ember app talks to the API to get posts from the Podiant blog, and a list of podcasts for the directory pages. The API is based off the original codebase, but with everything except what is needed to run the API stripped out, and no database modifications (that way I’ll be able to build the dashboard and beta test it with users on the production database).
The next thing
There’s a bit of tidying to do with the directory, but the next big thing will be user management (signup, login and logout). I’d like to refactor the signup process so that users can sign up and create a podcast in what feels like one step, I expect I’ll be using Djoser for this, as it provides a RESTful backbone for authentication and signup.
The challenge will probably be securing the API such that it can run via the website but can’t be accessed directly. I have a very long road ahead of me, but I hope that by tackling some of the easier problems first - and trying to do every step right - I’ll be better set to tackle the harder problems in a few weeks. It’s a long road; I just really, really hope it’s worth it, because while it’s fun to play with new toys and development patterns, it’s got to be for something, and I’m banking on it meaning that Podiant can go further and do more amazing things in the future, with a robust, stable but flexible infrastructure. Wish me luck, and if you have any thoughts or you think there’s a better way to skin any of the above cats (ew), let me know.