If you want to know about how often a word…
... occurs in our language, you need to find some data–usually the bigger, the better.
This week’s Challenge is all about comparing how often people write about certain ideas. How often do they mention LA vs. NYC vs. London? How often do they write about different beverages? And so on.
As I mentioned, once you have a great data set you can process it to find answers to questions like “is fly used more often as a verb, or a noun?”
1. When people write about world cities, which do they write about most often? Los Angeles, London, Berlin, or Beijing?
(I’m willing to bet that most of those references are to New York as a city, but I don’t know how to automate it. If I really wanted to know, I’d just take a sample set and then break up the task of reading the text snippets and classifying them as “city reference” or “state reference” with Mechanical Turk. But that’s a different blogpost.)
2. If you look at what people write about what they drink (as a beverage), what do they write about? (Water? Wine? Beer? Coffee? Root beer?) Which is the most commonly written-about beverage?
3. Is the word “fly” used more often as a noun, or as a verb?
4. Speaking of polysemous words (words with more than one meaning), can you find any words that USED to be used more frequently as nouns, that are now usually used as verbs? (Or vice-versa? Words that were once verbs, but are now thought of as nouns?)
Unfortunately, there’s no way (that I know of) to find this out without either writing a program to test LOTS of verb/noun pairs. Here’s one I found with a fascinating cross-over.