October 8, 2018

Swearing is highly creative on the web and why it is so difficult to keep up

Language is not static but changes continuously. New words and meanings for old words or phrases spread fast. Web communities, Twitter and social media all shape the language we use. Swearing, cursing and cussing evolves at least the same pace as language in general. Maybe because four-letter-words are not allowed in normal, moderated web communities, maybe because clever looking euphemism makes the writer look cool.

Garbling words and finding new meanings

Garbling or hiding the spelling is probably the most common method. Hiding the spelling like in “cr*p” or “b*tt” saves reader’s eyes seeing the word, although brains read the word without trouble. Word-based filters find that type of spelling easy. If the motivation is to hide from spam filter, then the garbling needs to be harder for machine to find. One popular method is to add space between letters, like “b a d w o r d”. When the spelling is added to the filter’s word list, users start using other words, or they garble the spelling a little bit different. Human brain is truly amazing how it can see the word through all the camouflage.

If the motivation is to hide from spam filter, then the garbling needs to be harder for machine to find.

Slang and new meaning of old, innocent words may totally change the meaning of a sentence, quite often to obscene direction: “I saw a bunch of beavers at the gym this morning” is not really referring to flock of flat-tailed dam builders. Politics has given us a lot of new meanings for old words, like in the phrase: “Birthers were teasing snowflakes in the meeting”. Depending on the context snowflake is used for political bashing or description of the weather.

Context is king

Swearing is often context sensitive. Think of the difference between “This campaign is the sh*t” (positive), “That is really sh*t” (negative) or “Get your sh*t together” (neutral). Also intended audience may have an impact on the moderation policy. If the community is used by children, the policy is certainly different and much stricter compared to an adult-only group.

Word or rule based spam filters or bots have trouble understanding context, they are easily duped by intentional misspelling or colloquial language, and teaching new words or rules requires manual work, otherwise the accuracy will degrade over time. Whereas modern AI moderator utilizing machine learning and text analytics will learn automatically on the fly new meanings, colloquial language and misspelled words. It also understands context better, and adapts to the moderation policy being used in particular web service.

Kari Kemppi
Project Director

Further reading:
Phys.org: New research reveals religious profanity and homophobic terminology among the most common swearwords
Quartz: How brand-new words are spreading across America
Washington Post: 24 words that mean totally different things now than they did pre-Internet

Article Categories


Read Next: Latest Articles