“THE” – is the most used word in the English language
About 6 percent of everything you say and read and write is THE . About one out of every 16 words we encounter on a daily basis is “the.” The top 20 most common English words in order are “the,” “of,” “and,” “to,” “a,” “in,” “is,” “I,” “that,” “it,” “for,” “you,””was,” “with,” “on,” “as,” “have,” “but,” “be,” “they.”
You see, whether the most commonly used words are ranked across an entire language, or in just one book or article, almost every time a bizarre pattern emerges. The second most used word will appear about half as often as the most used. The third one third as often. The fourth one fourth as often. The fifth one fifth as often. The sixth one sixth as often, and so on all the way down.Seriously. For some reason, the amount of times a word is used is just proportional to one over its rank. Word frequency and ranking on a log log graph follow a nice straight line. A power-law.This phenomenon is called Zipf’s Law and it doesn’t only apply to English. It also applies to other languages, like, well,all of them.Even ancient languages we haven’t been able to translate yet.And here’s the thing. We have no idea why. It’s surprising that something as complex as reality should be conveyed by something as creative as language in such a predictable way. How predictable?
Well, watch this. According to Wordcount · Tracking the Way We Use Language,which ranks words as found in the British National Corpus, “sauce” is the 5,555th most common English word.
Now, here is a list of how many times every word on Wikipedia and in the entire Gutenberg Corpus of tens of thousands of public domain books shows up. The most used word, ‘the,’ shows up about 181 million times. Knowing these two things, we can estimate that the word “sauce” should appear about thirty thousand times on Wikipedia and Gutenberg combined.And it pretty much does.
What gives? The world is chaotic. Things are distributed in myriad of ways, not just power laws. And language is personal,intentional, idiosyncratic. What about the world and ourselves could cause such complex activities and behaviors to follow such a basic rule? We literally don’t know. More than a century of research has yet to close the case.Moreover, Zipf’s law doesn’t just mysteriously describe word use. It’s also found in city populations, solar flare intensities, protein sequences and immune receptors, the amount of traffic websites get, earthquake magnitudes, the number of times academic papers are cited, last names, the firing patterns of neural networks, ingredients used in cookbooks, the number of phone calls people received, the diameter of Moon craters, the number of people that die in wars, the popularity of opening chess moves, even the rate at which we forget.