AI and the Future of Formal Testing

AI and the Future of Formal Testing

Learning reshapes the way that we process our world and evolve as humans, and yet, the way that we learn hasn’t evolved much over time. Sure, the internet delivered a major shakeup in how we can access and share knowledge, and the pandemic made the concept of virtual learning more widely accepted. But fundamentally, the way that your kids learn probably looks a lot like how you – and even your parents – learned. But that is about to change.

My work in both AI and education means I frequently have the privilege of holding very interesting discussions with lots of people with varying perspectives about both subjects. This includes teachers, who typically have too much on their plate to focus too hard on AI or who think AI is just the next fad that will come and go the same way as Gangnam Style and Pokémon Go. It also includes education leaders such as Sir Anthony Seldon whom I had the pleasure of interviewing for the AI and You podcast.

Sir Anthony is the director of Wellington College Education but also the founder of the UK’s AI in Education initiative, which made him a perfect authority to consult. I asked him how he might gauge the size of AI’s impact on education on a scale ranging from the invention of fire to the longevity of the Liz Truss administration. That latter analogy wasn’t entirely random: Sir Anthony is also a renowned biographer of prime ministers and had just published one about the risibly brief tenure of Ms. Truss who was outlasted by a 60p head of lettuce from Tesco. That book (Truss At 10: How Not To Be Prime Minister) was serialized in The Times days before our interview, generating immense discussion and making the interview even more of a coup by creating a flood of requests from reporters for his time.

His response was to compare AI to the invention of the printing press. That establishes a level of impact that’s not a once-in-a-generation event, not once-in-a-lifetime, but once-in-a-millennium. Far outside the experience of anyone alive or anyone that anyone alive could have talked with. And it’s useful to know that. After the interview, I reflected on this answer more and realized that it has another lesson for teachers. Because one can imagine, when Gutenberg came out with his invention, or when the Chinese invented movable type, that teachers said, “Well, that puts us out of work. Our job has always been to pass along oral histories, and now students can just read the book. What’s the point of having a teacher if you can just give someone a book? We’re done for.”

Clearly it turned out that the teaching profession had more life in it, and that teachers could use books in the pursuit of education, by, for instance, throwing them at pupils who had fallen asleep at the back of the room. But now they’re asking the same question about AI: Is it going to put them out of a job? People everywhere have that fear, even in jobs that are quite safe for the foreseeable future. It doesn’t help that the media has peppered us with apocalyptic visions of robots taking our jobs. That may be a step up from their usual Terminator-fueled scenarios of robots laying waste to the human race, but not by much.

One reason that AI has landed in education most acutely is a quirk of how large language models are designed, which makes them insanely good at one thing in particular: passing tests. We can see this every time a new model comes out and someone makes it sit an examination and reports that, for instance, OpenAI’s o1 now qualifies for Mensa membership. Or is answering questions at a PhD level. If you’re a teacher who took that at face value and compared it to human beings who score similarly – and why wouldn’t you when the media doesn’t help with making any different distinction – then you’d likely conclude that your days at the blackboard were numbered.

But the quirk means that LLMs are far better at passing formal tests than anything else. This simply emerges from the way they are trained. Examinations are designed first and foremost to be easy to grade. How much they measure a valuable skill is the second design criterion. This hasn’t mattered until now, because any human who can pass those tests does have the skill they’re testing for.

But consider as an example the game of chess. If I introduced you to someone and said they were a chess grandmaster, you would assume with justification that they were good at strategic planning, thinking under pressure, and seeing the big picture: qualities that you look for when hiring a CEO, for instance. Now, how do we know someone is a chess grandmaster? By their Elo rating, a number which is derived from how many players they have beaten. But the highest Elo ratings of all belong to… AIs. That human grandmaster has no chance against them. Yet those AIs cannot do the strategic planning, etc. All they can do is play chess. They couldn’t possibly run a company.

That’s the principle that’s at work with LLMs passing exams. Now, that does make them fit for many purposes. But not all the ones that human teachers fulfill. It’s just very difficult to describe the distinction. And of course, as soon as someone devises a formal test for making that distinction, well then, AI will pass that as well. Because they’re insanely good at that. They seem to have deftly found an Achilles’ Heel of academic studies, which depend on constructing formal tests for gathering useful data. Hopefully we’ll come up with more subtle ways of testing that still scratch academics’ itch but don’t land in the sweet spot for AIs. What do you think?

To view or add a comment, sign in

More articles by Peter Scott

  • I have a confession to make

    “I have a confession to make.” This is how I start the final week in some of my courses on AI’s impact on us.

  • Don't Let an AI Downturn Deter You

    In the waning months of the year 2000, a rumor began to circulate about a machine of unknown but epic power. Coming at…

    2 Comments
  • Trusting Intuition

    The New York Times has a wonderful daily online game called Pips, which involves placing dominoes on a grid to meet…

  • Critical Mass for an Inverted Hierarchy

    Because it matters (for reasons I’ll get into another time), I need to say that this was all written by me. AI helped…

    4 Comments
  • The Future of Education in an AI World

    One thing we know about the future of education is that it won’t—or shouldn’t—look like rows of students sitting at…

  • Five Years or a Century

    How much has happened in just 5 years! On February 10, 2020, I gave evidence before the All-Party Parliamentary Group…

Others also viewed

Explore content categories