So I listened to the first episode of Chris Morrell‘s podcast Over Engineered. It was on data changes in Laravel.
I had some thoughts. Read on if you’d like to. 😊
Don’t Use Migrations!
Before I listened to the first episode of Over Engineered, I was leaning towards “just use migrations”. Now that I’ve listened to the whole thing all the way through once, and most bits of it two or three times, I now think … yeah, just use migrations. 😉
As I was listening, different people seemed to say that it felt ‘wrong’ or ‘dirty’ to use migrations because ‘that’s not what they were intended for’, and instead we should use … something else. Something that’s exactly like migrations in pretty much every possible way, but … for some reason, not actually migrations.
My grug brain, he no understand. 😁
I mean, yes, migrations are traditionally used solely for adding new tables. Well – adding new tables, and adding columns to existing tables. Oh, and removing or changing columns. Or removing tables. Or … well, all kinds of database changes, really.
So … maybe it’s not such a stretch to also do some core ‘system’ data changes there too?
Which brings me to two main points.
The Perils of Time
The first one is this. If you’re just thinking about getting a change in to Production, one time, straight after you build it when you still remember how it works … then yeah. You can do it pretty much any way you want. Just run a bunch of commands live in Prod if you feel like it. Whatevs. Doesn’t matter.
But if you’re thinking about the big picture – if you’re thinking long term – then you’re going to need to get new devs up and running with new environments from time to time. You might even need to spin up a new instance for Sales to do a demo to a prospective customer. And if it’s a long-term project that’s been running for years, then you just might have done something like:
- Add a column
- A few months later, add some data to that column
- Next year, maybe move the data to a whole other table
- A little while after that, you might delete the original column from the first table.
So now your new dev can’t run the data changes at the end, after all the column changes. That column isn’t there any more. They will need to add the data in step 2 in the right sequence, after step 1 but before steps 3 & 4.
So now you’re setting up a new … something … maybe something like the Dragon package … but it needs to interleave its changes in between the migrations, in the right order. First 1 then 2 then 3 then 4. So … how do you do that?
I’ll tell you how. You add the data changes to the migrations.
But it’s Not Clean!
You don’t think it’s ‘clean’ to have data inserts in the migrations? Fine. Put them somewhere else. A separate artisan command, or a seeder, or whatevs. But then – run that command or seeder from within a migration. That way, you get everything happening in the right order.
More importantly, the new dev who started six months after you left – you know, that new guy, who never even knew that the data was ever in a different table – that new guy will also get the data changes in the right order. Without having to bug some old-timer, who might not remember how to do it, or where to find the blog post that explains it all.
(Also, make sure you put the command / seeder in a clearly marked subdirectory, so you know it’s not a ‘regular’ command / seeder. But that’s just gravy.)
It Takes All Sorts
The second thing I wanted to say is this. There are different sorts of data changes.
- Core data that the system NEEDS in order to be able to run
- Sample data, for testing / demo purposes only, not for Prod
- Default data that new users get for free, but they can change or delete if they feel like it
- Huge changes to tens of millions of rows, that take hours to complete
- Other sorts of changes that I’m not thinking of just now
And yes, these different sorts of changes might need to be handled differently. You might want to kick off a queue job to do the tens-of-millions-of-rows job in the background. You might want to force it to happen in real-time during the migration, even if it takes seven hours to run, if it’s a change that the system MUST have before any users log in.
But if it needs to be in Production, then it can (and should) happen in a migration. Or at least be triggered during a migration, if you’ve put it in a command / seeder / queue job. Because that way you know that it will happen, even if it’s a year after you wrote the thing. Even if it’s six months after you left to go to another project and the other dev quit and now only Bill the tester remembers how it worked but Bill is in another team for the next two months and anyway he’s off sick today.
It won’t matter.
It won’t matter who remembers how to do it, or who remembers where the Google doc is (or was it an Office 365 Word doc? Or a Confluence wiki page? Which one were we using back then?). Heck, it won’t matter if nobody even remembers that it needs to be done.
Because the repo knows. The repo will never forget. The repo will make sure it happens, and happens right, and happens in the right order.
The repo is your friend. It is your guardian. Your hero.
Of course, if a change does NOT need to be in production – sample data for demos, etc – then it should NOT be in a migration. It should be in … I dunno, a seeder, or something. Probably a seeder.
But that’s a whole other episode. 😉
THE END