Sunday, 26 January 2025

humanity’s last exam (12. 186)

Like the Voight-Kampff test, this standardised benchmark, created by some of the most astute philosophers is an escalating fight to stay ahead of AI to afford proctors a purchase on some sort of vanishing Turing test—especially against a backdrop of artificial intelligence making advances on graduate-level, multidisciplinary questions, raising the prospect that the machines are quickly approaching the limits of humanity’s ability to gauge and compare its progress and ability. The resulting quiz, with samples that not only imply to a degree teaching to the test and priming answers that people want to hear, has some three thousand questions, vetted and juried by academic panels, and whilst not timed, is completed in seconds by the world’s most powerful models. The battery of questions have correct answers—and perhaps it might be more interesting to pose the unknown or drill into what they get wrong in over-confident albeit novel ways, mindful of the risk of our own gullibility and misdirection which is certainly baked into solutions—and underscores the problem of jaggedness, inconsistency in AI’s abilities to tackle basic questions and the flowchart of prompts for better or worse outcomes and the difference between acing an exam and being a practising professional doing maths, physics, medicine or governance.