Managing ZIP-based file formats in git

A super-brilliant person named tante wrote a post on comparing zip files in git. Go read it.

If it’s no longer on the web, then the basic idea is this. In ~/.gitconfig add this:

[diff "zip"]
textconv = unzip -c -a

Then in REPOSITORY/.gitattributes add this:

*.pptx diff=zip

I’m guessing (hoping?) you can add multiple of these lines, like this:

*.pptx diff=zip
*.xlsx diff=zip

*.docx diff=zip

et cetera.

Of course, this just unzips the files into a temporary location for the comparison. It doesn’t change the repo itself – you’re still committing zip files (or .docx etc) into the repo.

If you want to automagically unzip before committing and re-zip on checkout, so you’re actually committing the contents of the zip file and not the zip file itself, maybe try Zippey (cloned here).

Managing ZIP-based file formats in git

Ideas for iOS / Mac apps

Something in the Mac menu bar that pings me when my watch is fully charged, so I put it back on my wrist and don’t leave it on the charger for hours and hours and miss a bunch of stand goals. (Like Juice Watch by John Ganotis, but for Mac.) (Maybe a Shortcut would do it.) (Or check out Coconut Battery or Batteries for Mac or Battery Health or Battery Widget from this post.)

My iPhone Geohashing app (in progress – see oHash)

A split-flap screen app for Apple TV (called flippity.app)

An iPhone version of Cards Against Humanity (using their cards via their Creative Commons license). Maybe call it Completely Unaffiliated App.

Feel free to add suggestions in the comments below. I’m always keen to hear ideas and if it piques my interest there’s a chance I might make it for you (but no guarantees).

Ideas for iOS / Mac apps

Concurrency in Java

I don’t do a lot in Java, but there’s one project where it seems to me to be the best way to go. One thing I need to do is update a progress bar during a long processing job, which means I run in to the problem described in this CodeRanch post.

For ages (like, many many years) I assumed the java.lang.Thread idea from that CodeRanch post was the only way to go. (Perhaps when I first found that page it was.) But I could never really wrap my head around exactly how to use the lock object, particularly if the worker thread was in a whole other class in a whole other .java file.

But recently I found SwingWorker, thanks to this post and this example. This looks MUCH simpler, because all the lock object logic is just handled for you.

Concurrency in Java

Data Changes in Laravel

So I listened to the first episode of Chris Morrell‘s podcast Over Engineered. It was on data changes in Laravel.

I had some thoughts. Read on if you’d like to. 😊

Don’t Use Migrations!

Before I listened to the first episode of Over Engineered, I was leaning towards “just use migrations”. Now that I’ve listened to the whole thing all the way through once, and most bits of it two or three times, I now think … yeah, just use migrations.  😉

As I was listening, different people seemed to say that it felt ‘wrong’ or ‘dirty’ to use migrations because ‘that’s not what they were intended for’, and instead we should use … something else. Something that’s exactly like migrations in pretty much every possible way, but … for some reason, not actually migrations.

My grug brain, he no understand. 😁

I mean, yes, migrations are traditionally used solely for adding new tables. Well – adding new tables, and adding columns to existing tables. Oh, and removing or changing columns. Or removing tables. Or … well, all kinds of database changes, really.

So … maybe it’s not such a stretch to also do some core ‘system’ data changes there too?

Which brings me to two main points.

The Perils of Time

The first one is this. If you’re just thinking about getting a change in to Production, one time, straight after you build it when you still remember how it works …  then yeah. You can do it pretty much any way you want. Just run a bunch of commands live in Prod if you feel like it. Whatevs. Doesn’t matter.

But if you’re thinking about the big picture – if you’re thinking long term – then you’re going to need to get new devs up and running with new environments from time to time. You might even need to spin up a new instance for Sales to do a demo to a prospective customer. And if it’s a long-term project that’s been running for years, then you just might have done something like:

  1. Add a column 
  2. A few months later, add some data to that column 
  3. Next year, maybe move the data to a whole other table
  4. A little while after that, you might delete the original column  from the first table. 

So now your new dev can’t run the data changes at the end, after all the column changes. That column isn’t there any more. They will need to add the data in step 2 in the right sequence, after step 1 but before steps 3 & 4.

So now you’re setting up a new  … something … maybe something like the Dragon package … but it needs to interleave its changes in between the migrations, in the right order. First 1 then 2 then 3 then 4. So … how do you do that?

I’ll tell you how. You add the data changes to the migrations.

But it’s Not Clean!

You don’t think it’s ‘clean’ to have data inserts in the migrations? Fine. Put them somewhere else. A separate artisan command, or a seeder, or whatevs. But then – run that command or seeder from within a migration. That way, you get everything happening in the right order.

More importantly, the new dev who started six months after you left – you know, that new guy, who never even knew that the data was ever in a different table – that new guy will also get the data changes in the right order. Without having to bug some old-timer, who might not remember how to do it, or where to find the blog post that explains it all.

(Also, make sure you put the command / seeder in a clearly marked subdirectory, so you know it’s not a ‘regular’ command / seeder. But that’s just gravy.)

It Takes All Sorts

The second thing I wanted to say is this. There are different sorts of data changes. 

  • Core data that the system NEEDS in order to be able to run
  • Sample data, for testing / demo purposes only, not for Prod
  • Default data that new users get for free, but they can change or delete if they feel like it
  • Huge changes to tens of millions of rows, that take hours to complete
  • Other sorts of changes that I’m not thinking of just now

And yes, these different sorts of changes might need to be handled differently. You might want to kick off a queue job to do the tens-of-millions-of-rows job in the background. You might want to force it to happen in real-time during the migration, even if it takes seven hours to run, if it’s a change that the system MUST have before any users log in. 

But if it needs to be in Production, then it can (and should) happen in a migration. Or at least be triggered during a migration, if you’ve put it in a command / seeder / queue job. Because that way you know that it will happen, even if it’s a year after you wrote the thing. Even if it’s six months after you left to go to another project and the other dev quit and now only Bill the tester remembers how it worked but Bill is in another team for the next two months and anyway he’s off sick today.

It won’t matter.

It won’t matter who remembers how to do it, or who remembers where the Google doc is (or was it an Office 365 Word doc? Or a Confluence wiki page? Which one were we using back then?). Heck, it won’t matter if nobody even remembers that it needs to be done.

Because the repo knows. The repo will never forget. The repo will make sure it happens, and happens right, and happens in the right order.

The repo is your friend. It is your guardian. Your hero.

Of course, if a change does NOT need to be in production – sample data for demos, etc – then it should NOT be in a migration. It should be in … I dunno, a seeder, or something. Probably a seeder.

But that’s a whole other episode. 😉

THE END

Data Changes in Laravel