Wednesday 24 April 2024

a frontier research problem (11. 511)

Trained on “publicly-available” text scrapped with or without consent from billions of human authored, English language websites in the hopes of informing accurate or at least confident language models, the rather nascent AI boom might be facing a bust as it is running out of data to mine. Previously we’ve looked at the phenomena of recursive AI as generated content begins to saturate the internet, but conversely as vast as the web seems industry experts estimate that AI—to presumably get better at delivering right and desired responses with minimal intervention by exposure to countless right answers and only learning through brute iteration—needs far more information than has been thus far produced in order to advance. Exuberance, nonetheless, is undeterred and growing, notwithstanding immense energy demands, threats to labour and intellectual property even given a spotty record of actual adoption and the dangers of citing less than authoritative sources—the original sin of artificial intelligence, exhausting the sum of human knowledge, only really came to light not by complaints of plagiarism but rather from competitors trying to shield warehoused content from the clearing house and our actions may be propping up something adversarial and degenerative. More from Ed Zitron at the link up top.