Using AI to Improve Bad Business Writing

If you're a developer needing to write a business document, Jon Udell explores the benefits (and quirks) of LLM-assisted copy editing.

Mar 26th, 2024 5:00am by Jon Udell

Featued image for: Using AI to Improve Bad Business Writing

Photo by Sticker Mule on Unsplash.

I once taught expository writing and have long felt that software tools ought to be able to help people learn how to write more clearly and effectively. In a previous post I applied core principles — omit needless words, use active voice, avoid jargon and cliche, cite specific examples — to a typical badly-written press release. That post narrates a step-by-step transformation which, in a companion post, I converted to a GitHub pull request that enables a learner to step through the changes and review each as a color-coded diff with comments explaining the rationale for each change.

I’d love to be able to empower teachers to create that kind of guided experience but, despite the post’s hopeful title — GitHub for English teachers — the reality is that GitHub isn’t for English teachers, and I don’t think the current generation of LLMs can bridge that gap. I do think they might help us attack the scourge of bad business writing, though, so to test that hypothesis I tried prompting ChatGPT, Claude, and Gemini with my step-by-step example and asking them to apply the principles it illustrates.

For an initial trial, I presented the three LLMs with my original example press release. Even though they “knew” the answer, because the prompt included both the original and rewritten versions, and specified the latter as the goal, all the proposed rewrites were quite different from my rewrite. The LLMs improved the original version, but not by much. On reflection that makes sense — their training sets must include many more bad examples than good ones. If you’re coming from a software background it’s disconcerting to elicit such varying outputs from the same input, but here we are. There are powerful benefits to be gained, but if you require deterministic behavior this isn’t the technology for you.

I thought I’d made my peace with LLM non-determinism but it was still a bit surprising to see how much the results differed — for the same LLM — from one chat session to another. Rule 4 from Best Practices for Working with Large Language Models (Ask for choral explanations) applies within as well as across LLMs. These things contain multitudes; it’s worth trying the same thing several times, cherrypicking the results you like best, and combining elements from several results.

In the past, I’ve felt that Claude was a better writer and editor than ChatGPT, but the models keep changing and this time around neither of them emerged as a clear winner. Nor did Gemini. It seemed that the best improvements, and worst failures, could occur anywhere and that there were no obvious differences for intra-LLM versus cross-LLM trials. The same held true when I repeated the trial using another sample press release for which I hadn’t done an expert rewrite.

Implicit vs. Explicit Flow of Control

The writing assistant I envision is a conversational partner that helps a writer work through a rewrite in a step-by-step manner. So the prompt instructs the LLMs to proceed one step at a time, show the before and after versions of an altered paragraph, cite the rationale for the change, and ask the writer whether to continue to the next chunk or linger to discuss alternate phrasing. All three LLMs did that pretty well, which is impressive if you’ve tried coding such interactions explicitly. That’s doable using their APIs (you can build programmatic loops with exit conditions), but it’s exponentially more work to achieve an effect that the LLMs can create naturally.

Of course, the conversational flow in this experiment was very basic. The LLMs “assumed” that a step-by-step rewrite meant revising the headline, and then revising each paragraph in turn. That isn’t how my worked example proceeds. It omits the first paragraph entirely (because it only repeats the headline), and remixes other elements to produce a final version that differs from the original both structurally and on a line-by-line basis. A local line edit can provoke global restructuring, and vice versa. I don’t know if it will be possible to weave LLM support into guided experiences that capture such subtleties of real-world editing, but if so, I suspect it will require the exponentially harder API-based approach. Or a comprehensive set of annotated before-and-after examples. Or both!

Regardless of the degree of control you impose on the turn-by-turn conversation, the ability to engage in open dialogue at any point is fundamental to interaction with LLMs. Khan Academy’s Khanmigo demonstrates that beautifully. If you haven’t tried it, I highly recommend signing up for a course and giving it a whirl. Pick something you haven’t studied in a long time, in my case it was AP Biology, work through some of the lectures and readings, take a few quizzes, and then activate Khanmigo. The guardrails are impressive: it knows your course context and will invite you to take some of the same quizzes, which it conducts via conversation versus multiple choice. If you ask a random question (“How about those RedSox?”) it tells you to stick to biology. But you can always pause to ask questions to deepen your understanding. At one point, in a section on water as a universal solvent, the LLM mentioned that water dissolves non-polar substances like oil and fat. Because my high-school biology is very rusty, I asked: “Give me five examples of polar and non-polar substances.” It did that and gave more examples when I asked. A bit later I asked for and received examples of dehydration synthesis and hydrolysis. I wish I could travel back in time and redo high school and college with that kind of patient and helpful assistance.

It’s true that I’m a highly self-motivated learner, someone Audrey Watters would describe as a roaming autodidact. That style of learning hasn’t been the norm. Many learners need — and most teachers aim to provide — a more scaffolded experience. But Khan Academy strikes a nice balance. The scaffolding has always been there: syllabi that align with standard frameworks, repeatable assessments that enable students to achieve mastery at their own pace, and careful progress monitoring. At any point, though, learners can ask questions in order to reframe the material in ways that make the most sense to them. The instructor never grows impatient, and the rest of the class is never inconvenienced, it’s wonderful.

If you never bother to ask those questions, though, you’ll never have that kind of experience. As I worked through the rewrite exercise with my team of assistants, I prodded them to honor the principles that they were trying, and often failing, to demonstrate. That made the outcomes far better than they otherwise would have been. And it was engaging to challenge the assistants to do a better job of simplifying, clarifying, and activating the prose. Learners who wouldn’t do that in a classroom setting, in front of their teacher and their peers, will be much more likely to have the experience in private conversation with a patient and helpful learning partner. It will always be desirable to provide fully guided experiences, if possible. But as learners figure out why and how to challenge and interrogate LLMs, they’ll gain more control over how they learn. Providers of learning experiences may, in turn, be able to cover more ground with less of the kinds of scaffolding that will be expensive to create and maintain.

Showing Differences

If you start here and use the Next button to step through the changes, you’ll see the powerful effect that’s created by diffing the before and after states of each granular revision. The writing assistant I envision would be able to do that. Although I’ve found LLMs to be very good at generating and transforming HTML, none got anywhere near that goal.

This is, admittedly, a daunting challenge. Again I suspect it’s something best suited to a hybrid API-based approach that calls LLM APIs to produce before-and-after pairs, then conventional APIs to compare them and emit color-coded diffs.

Channeling Strunk, White, and Orwell

Although I made a GPT that incorporates the prompt I used in this exercise, I wouldn’t recommend it as readily as I would the Flowpipe test writer, which operates in a much narrower domain and obeys Rule 2 (Never trust, always verify). It’s quick and easy to verify that a test of a workflow pipeline does or doesn’t yield the expected results. It’s much harder to verify that a rewrite of a piece of business communication hits the mark. If that’s even possible, I think it will require a large corpus of before-and-after examples, in addition to the kinds of instructions I wrote in my prompt. It’s worth a try, though! Badly written business communication taxes everyone, and systematic improvement — even if only marginal — would be a major boon.

Meanwhile, I’ll reiterate a couple of prompts that have proven consistently useful. I now think of them as the Strunk and White transform and the George Orwell transform.

Strunk and White:
I’ll show you a marketing document. Please revise it according to Strunk and White: Omit needless words, prefer short Anglo-Saxon words to long latinate words, use active voice.

George Orwell:
Please evaluate this marketing document according to George Orwell’s writing guidelines as laid out in his essay Politics and the English Language. Which parts would make him cringe?

Given these prompts, my LLM assistants were able to cut through the muddled jargon and sloppy phrasing of badly-written press releases just about as well as they did when given the more elaborate prompts I used in this exercise. I recommend them to writers of press releases, or of any other marketing copy, as a first step towards clearer and more effective business communication.

Jon Udell is an author and software developer who explores software tools and technologies and explains them in writing, audio, and video. He is the author of the cult classic Practical Internet Groupware. Past gigs include Lotus, BYTE magazine, Safari...