Monday, December 15, 2008

Human Text Compression?

An article in The Globe and Mail's technology section describes a study done by a Tasmanian psychology prof regarding the speed at which people read "textese", e.g. "ur cool" "good 4 u" "i 8 my lunch". I couldn't find the paper in any online journal, so I couldn't get a hold of the data, but the results are summarized in the G&M article as such:

While students were significantly faster using textese, it took almost half the number of students twice as long to read these messages aloud than messages written in proper English.

I would be interested to see if there's a model to be made (or a model to fit to) these data. The model I had in mind has to do with data compression, human cognition, and error rates.

Basically, testese is a form of data compression. It takes fewer characters on average to send words to a recipient. The theory would be that the degree to which textese is used would be related to the thought required to parse it by the recipient (represented in the study by how long it takes to read). Sort of like how the DivX codec takes more processing power than MPEG2 to decode the same video, but ends up being a smaller file to send/store.

This theory could most likely be tied into more established theories on language and information. Ever since cryptography was introduced, information theory has endeavoured to create a model describing what I've mentioned and more.

For anyone who actually reads the article, reads this post, and realizes there's also mention of participants of the study who experienced no slow-down at all... I figure that they've simply evolved some 'hardware decoding' in their brain for reading textese, much like someone who can write shorthand has 'hardware encoding' capabilities. The poor chumps who still use 'software' experience the slowdown in human text decompression.

No comments: