GIGOGBI

I’ve been thinking a little about what I actually want to be putting out on this channel. The Work Week posts just aren’t particularly satisfying; they crowd out other things I might be writing, and they’re not that helpful at keeping me accountable for work output. Maybe it’s better to write less regularly, less frequently, and more interestingly.

Anyway, Cory Doctorow has a good post on a phenomenon he calls “GIGOGBI” — garbage in, garbage out, garbage back in. Basically it’s when your algorithm is corrupted by including algorithmically-generated predictions in the training set. In the LLM domain this has been dubbed called “model collapse,” and AI haters await it like the Rapture; Doctorow links a good article by Kristian Lum and William Isaac showing the same phenomenon in predictive policing, which happens because your crime-prediction algorithm ends up getting trained on data produced by police following its recommendations. (The post is three weeks old because I’m behind on my reading.)

Of course, what’s a little funny about this is that before predictive policing there were “hot spots,” which were the same idea; you put more police where there’s more crime… and having more police in that spot generates more arrests, which makes it seem like there’s more crime, so you send more police, &c. You don’t need AI to make this mistake, is what I mean. It’s just Goodhart’s Law: “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.” Which itself is just a special case of the drunk looking for his keys under the streetlight, I suppose.


Currently listening: THIS THING BETWEEN US, by Gus Moreno, read by Robb Moreira.


If you’re enjoying my writing, you can get some of my short fiction on your e-reader for the low, low cost of $0. Remembered Air is a collection of six poems and short stories not available anywhere else. Download it here.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.