Here’s a bleary-eyed half-considered idea that I will never follow up.
Take a web-site with a lot of comments from a regular community – forget OddThinking, it isn’t nearly big enough; find a blog in the top 100, or StackOverflow or the like.
Grab all the comments, and group them by author. Discard any author with less than a threshold number of words in their personal corpus.
[Magic happens here] Use your computational linguistic skills to evaluate some metrics about the voice of each commenter – the mood, the vocabulary, their sentence length and complexity, etc.
Now, rank each article by the average deviation of each comment from the typical comments by that author, according to your magical linguistic measures.
The result is a list of articles that are most likely to provoke the readers out of their own personal ruts.
My hypothesis is that these will be particularly interesting articles.
Well, either that, or they will be articles saying things like “Happy New Year”, and getting a whole lot of “Happy New Year!” responses…
That’s as far as I got. I think I am going back to bed, now.
Comment by configurator on January 18, 2011
Happy new year!
Comment by Julian on January 18, 2011
Nice one, configurator. This post is sure to be flagged as interesting now!