Writing an Algorithm to Calculate Article Read Length

Daniel Pericich
6 min readMar 18, 2023

--

Photo by Christin Hume on Unsplash

You have probably noticed a read-time number underneath article titles while scrolling through your favorite news source. It is an important metric for readers as it allows them to determine how much time they are devoting to an article, and marks content in a way that allows for filtering and sorting. There is no arguing that this metric is useful, but how do websites generate a number to predict how long text will take to read?

Do websites simply count words and spit out a guess? Is the length of each of your page visits being tracked and used to calculate how long it takes you to read each article? Or is your webcam secretly on while you’re reading, tracking your eye movements to produce an exact measurement of how fast you read? Who knows, maybe? (Of course, there is no such agency that would do that.)

I’m not a proponent of hacking hardware, but am interested in how article read time is generated so I thought it would be fun to develop an algorithm. Let’s walk through some different approaches to calculating read length. Then we will write an algorithm for blogs and newspapers to improve the reader experience.

More Than Simple Counting

“Read Time” is a metric based on a single unit, time. How many minutes will it take me to read this piece of content? For such a simple output, shouldn’t our algorithm be just as simple? It could be, but there are a lot of variable inputs that could sink the accuracy of a simple algorithm.

Figure 1. Example of article title with estimated read length.

An article consists of a set number of characters. These characters are a mix of letters, numbers, and symbols. Their total count has to mean something. No, not really as this type of inspection by an atomic unit is too granular to be useful. We need to think more about macro units to get an accurate read time.

Words seem like a more reasonable unit of measure, but something seems off with that too. Whether typing or reading, words per minute (WPM) has become a standard way to predict a task’s duration. If we calculate our read time off of words per minute and word count we may oversimplify our algorithm to weigh uneven words evenly. To correctly calculate how long it will take to read something, we need to inspect words, considering each word’s composition and the text’s composition.

Easy Reading with the Flesch Reading Score

If you are familiar with SEO (search engine optimization) you have probably heard of the Flesch Reading Score. This score is assigned to text to determine how easy it is to read. This score is based on a 0–100 scale where 0 is a difficult read, best approached by graduate students. A score of 100 is an easy read and could be understood by 11-year-olds. Depending on your audience, you can target different score ranges, but a higher score is better for general accessibility. The generally acceptable range starts at least between 60–70.

Figure 2. Flesch Reading Score Card (Source of this image is https://managementblog-emily.blogspot.com/)

The Flesch Reading Score consists of 2 criteria. The first is the length of the sentences in the text. Longer sentences take more focus for the reader and are thus harder to follow. Short sentences break up topics into manageable pieces and are easy for readers to understand.

Figure 3. Flesch Reading Score Graph (Source: https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests#/media/File:Flesch_Kincaid_readability_tests.svg)

The second criterion for scoring is the average number of syllables per word. Short words are generally more well-known. They are also quicker to read and understand in the context of their sentence.

We can calculate our text’s Flesch Reading Score using the following equation:

Figure 4. Flesch Reading Ease equation (Source: readable.com)

This equation is straightforward and helpful for determining whether our text is readable. Though it tells us how readable our text is, it does not answer how long the text’s reading time is. It does give us some insight into how word count, syllable count, and sentence length factor into the reading experience.

Algorithm for Read Lengths

We have talked about what affects readability. Now, we need to determine our reading time. From the Flesch Reading Score, we know that reading ability is affected by sentences’ word count and words’ syllable count. With this, we have three variables that we should be concerned with when it comes to reading time: total words, average sentence length, and average word length.

Before we continue with our algorithm, I will point out a simplifying assumption we will make about our text input. Our text (the article/blog post’s body) will be a string. This is not accurate for what actual frameworks use for storing formatted text. If we were using Jekyll we would get a Markdown file. With NextJS we would get an array of JSON objects with keys describing the text’s styling with the text itself.

Our whole text will be a single string without carriage returns. This is a naive input, but we are not concerned with text transformations. Our algorithm consists of a main method and multiple helper methods. First, we will set some constants for assumptions of sentence length and syllable count for an 11th-grade reading audience:

Figure 5. Constants for assumptions in our read length algorithm

Next, we have three helper methods that determine a reading difficulty multiplier. We have set constants for expected sentence length and syllable count and will now compare these to our actual sentence lengths and syllable count to be able to adjust our words per minute rate:

Figure 6. Helper methods for determining reading difficulty multiplier

Now we have our helper methods to calculate the multiplier. We are finally ready to build our main method to take in our text and output the expected read length:

Figure 7. Main method for determining the reading_speed of a given text.

Some things to observe on this are that we are modifying the text to get the total word count, average sentence length, and average syllable length. Then we use our helper methods to determine a reading difficulty multiplier. Finally, we can use this with the word count and expected WPM to return an estimated reading length. Neat!

Conclusion

People read at all different speeds, and even individuals’ reading speeds change between texts. Maybe they are distracted while reading something, or they are just not interested in the text. These algorithm types are always approximations, but as software engineers building on human behaviors, approximations are the best we can do. I hope this has given you a better understanding of how we read. Drop a comment below if you have a clever way to optimize this algorithm.

Notes

https://yoast.com/flesch-reading-ease-score/

https://scholarwithin.com/average-reading-speed

--

--

Daniel Pericich
Daniel Pericich

Written by Daniel Pericich

Former Big Beer Engineer turned Full Stack Software Engineer

No responses yet