Automatic cleanup

Over the years, bloggers on Language Log and elsewhere have catalogued ways of avoiding taboo and other offensive vocabulary in print. These range from handcrafted strategies, like circumlocution and euphemism, through a variety of substitution techniques, to partially automated avoidance schemes (straightforward blocking of postings and messages containing the offending items, several types of asterisking schemes, and the like).

Here’s an automated substitution scheme reported by Martin R in a comment on my “bad bingo words” posting:

My son used to hang out in a chatroom where bad language was modified automatically. “Fuck” became “hug”, “fucking” became “hugging”.

To which PaddyK replied:

I like the “hug” filter concept! “If you don’t get your hugging donkey over here right now I’ll hugging kiss you!”

Aside from how silly-sounding the hug substitutes are, and the very real possibility that such substitution could simply invest hug with an obscene aura it didn’t have before, this simple example illustrates some of the (well-known) potential complexities in automated filtering (for some related complexities, see the Language Log postings on automated asterisking in iTunes — for instance, this one).

Here’s the problem: if the filtering routine just does substring replacement, then for fucked and fucking you’ll get huged and huging instead of hugged and hugging. So either the routine has to incorporate some spelling conventions of English, or the dictionary for replacement has to have separate entries for all the forms — a solution that’s probably necessary in any case, to avoid absurdities like replacing the turd of Saturday with something else (or using four asterisks, or blocking the message entirely).

3 Responses to “Automatic cleanup”

  1. Stan Says:

    This reminds me of “The Clbuttic Mistake”, of which the most infamous example might be a story about the sprinter Tyson Gay, published on the OneNewsNow website under the headline: “Homosexual eases into 100 final at Olympic trials”.

    On a related note, TV re-dubs of problematic language in films can be similarly daft but more creative, such as The Big Lebowski‘s “This is what happens when you find a stranger in the Alps.” Harry Enfield and Paul Whitehouse parodied the phenomenon.

  2. ian Says:

    I remember seeing that forum where swearing was commonplace established a similar filtering scheme, only this one was intended to discourage excessive politeness. Many of the forum-goers were addressing each other as “sir”, as in “thank you, kind sir”, so the filter replaced “sir” with “fag”.

  3. Claire Says:

    the Something Awful fora have a systematic search/replace that takes into account different spellings and conjugations. “fuck” becomes “gently caress” whereas “fucking” becomes “loving”. “Shit” becomes “poo poo”.

    It’s actually a pretty great system, from what I’ve seen. My only issue with it is that if it doesn’t replace capitalized FUCK with capitalized GENTLY CARESS, so when you’re reading someone’s caps-locked rant, the sudden lowercase sort of breaks the ranty flow.

Leave a Reply

%d bloggers like this: