Lexical Diversity Index Calculator
Measures the richness of vocabulary in a text by calculating the Type-Token Ratio (TTR). Use it when analyzing essays, transcripts, or language samples for linguistic research or readability assessment.
About this calculator
Lexical diversity quantifies how varied the vocabulary in a text is. The core metric is the Type-Token Ratio (TTR), calculated as: TTR = (uniqueWords / totalWords) × 100. Here, 'types' are distinct word forms and 'tokens' are every word occurrence including repetitions. A TTR of 100% means every word is unique — practically seen only in very short texts. As text length grows, TTR typically falls because common words repeat. A higher TTR generally signals a richer, more varied vocabulary, which is useful for evaluating language learners, comparing authors' writing styles, or assessing the complexity of a speech sample.
How to use
Suppose you analyze a 200-word paragraph and find 95 unique words. Enter 95 as Unique Words (Types) and 200 as Total Words (Tokens). The calculator computes: TTR = (95 / 200) × 100 = 47.5%. This means 47.5% of the words in your text are distinct. A value above 40% for a passage of this length suggests reasonably varied vocabulary. You can compare this score across multiple texts to rank their lexical richness.
Frequently asked questions
What is a good lexical diversity score for written text?
There is no universal benchmark because TTR is sensitive to text length — shorter texts almost always score higher. For texts of similar length, a TTR above 70% is considered high diversity, while scores below 30% suggest heavy repetition. Researchers often use corrected measures like MATTR or MTLD for longer texts to control for length effects. For practical comparison, use TTR only when comparing texts of roughly equal word count.
How does lexical diversity differ from readability?
Lexical diversity measures vocabulary variety, while readability measures how easy a text is to understand. A highly diverse text (high TTR) can actually be harder to read because it uses many rare or unfamiliar words. Readability formulas like Flesch-Kincaid focus on sentence length and syllable count, not word uniqueness. Both metrics together give a fuller picture of text quality and complexity.
Why does type-token ratio decrease as text gets longer?
As a text grows, function words like 'the', 'a', and 'is' appear repeatedly, pulling the ratio of unique words to total words downward. Even content words start repeating as a topic is discussed in depth. This mathematical artifact means TTR scores are not directly comparable across texts of different lengths. Linguists compensate by using standardized windows of fixed word counts or alternative diversity indices.