How to Create Article Excerpts from Markdown using TypeScript

Daniel Pericich
8 min readDec 28, 2022

--

Photo by Anete Lūsiņa on Unsplash

Great websites and platforms always have some sort of hook to draw users in. Our world is oversaturated with information, which means you can’t rely on sharing a simple hyperlink to get eyes on your project. To gain attention you’ll need a descriptive excerpt coupled with a catchy title to extend the reach of the content you create.

From a data perspective this combination is not as straightforward as it would seem. While the title is something that will not change, an excerpt might be drastically different for different mediums. The character count allowed for your Google search result may display less content than the character count allowed for your website’s content preview cards.

An excerpt is usually the first X characters from your article’s body, so having to pull and store different lengths of this in your front matter for different meta data categories can be time consuming in setting up and maintaining. Luckily, there are good ways to avoid storing or maintaining the same body / excerpt text in multiple places. In this article we will talk about how to use markdown front matter and body content to create easily accessible and maintainable content excerpts.

What do I mean by Excerpts?

Excerpts are preview sections of text that allow users to get a better understanding of content without interacting with the full content item. The two types of excerpts I would like to discuss are blog article excerpts and SEO search result engine excerpts.

Blog article excerpts usually take the form of article cards. These excerpts are usually part of an article’s body text that is meant to draw in readers or shoppers by answering questions about the content, or stoking further interest:

Figure 1. An article card with excerpt, title and other meta data.

Our excerpts don’t need to be special, and often are just sections of text drawn from the full content item. Because these items are built by reusing content, it is important to reference the original content directly and control how much of the content we are retrieving.

The other excerpt type I would like to discuss is the search engine result description:

Figure 2. A SEO engine result card with excerpt and title.

This excerpt is incredibly important for your content for a few reasons. The first is that a good description will draw in new users. The difference between drawing users to a piece of content on your website and drawing users to a piece of content from a search engine is that in the first case, the user has already decided to spend time with your brand or website. It is a lot harder, and more important, to be on your game when trying to get inbound traffic from search engine users to your site.

By getting users to click through your link, you will also boost your SEO ranking for the searched term which will have a network effect as more users are organically sent to your site. A good SEO excerpt will form a reinforcing positive pressure on your page rankings and site visitors. This will cause higher rankings, more click throughs and potentially further reach in adjacent search terms.

What are the Parts of the Markdown Files?

Before we get too deep into excerpts, I would like to do a quick refresher on markdown. If you are reading this article then you probably create your content by writing markdown files. Markdown is great as it makes writing HTML quick and efficient, especially for non technical users. For each file there are two parts: front matter and content:

Figure 3. Example of markdown file’s sections: front matter and content.

Front matter is the section of markdown files that contains metadata including titles, article cover images, authors, dates, etc. I won’t go too deep into this section, but if you’d like to learn more about front matter and how to interact with it then check out this article.

The second section of a markdown file is the content section. This is where your actual headers, text, figures and images all reside. It’s also where the information for our excerpts lives. This section will be the source of truth for the text we use to populate our excerpts for both internal article cards and external SEO result descriptions.

Creating our Excerpts

Now that we’ve talked about what excerpts are and why they’re important as well as the parts of a markdown file, let’s discuss building our excerpts. Our solution will be a basic two step approach. First we will create fields in our front matter and assign them appropriate character counts. Then we will reference these fields’ values in order to slice the correct number of characters of our content to return a smaller string. This collection of characters will be what we display to our users.

There are some issues that we will need to address as we build this. First, character counts as numbers make sense to developers, but may not make as much sense to a non-technical content creator. We need to have a way to allow excerpt size selections in a more accessible way.

Our next issue is with the characters involved in the markdown content. Remember, markdown works because of its specific syntax. In order to present clean, readable excerpt text, we will need to remove certain elements and tags from our content strings.

Let’s address these issues and then write our excerpt builder method.

Making Excerpt Sizes More Accessible

Markdown’s beauty comes in its ease of use. If you want headers, start your line with “#.” If you want a header with less importance, add a few more “#.” In order to create an ordered list, start each list point with “#.” If you want to write some body text then just start typing.

Markdown is extremely accessible so we should make any extensions of it accessible as well. To do this we will allow the users to specify the size of their excerpts with English words. In the front matter they can type ‘excerpt_size: “medium”’ which will reference a constant set in our excerpt formatter file to make the excerpt size a certain length.

While these lengths aren’t written in stone, note that Google’s search engine description max length runs about 150 characters:

Figure 4. A table showing the accessible language excerpt size designator and the associated character count value.

The Markdown Syntax Issues

We’ve talked about making except length settings accessible for writers, but how can we ensure the excerpt content is accessible to readers? The content from our markdown file is raw markdown and therefore has all of the funky syntax that makes it run. We need to get rid of this.

The first markdown issue we need to solve is having the title text included in the content. Having this in our excerpt will not help at all for getting user’s attention, so we would like to add a way to remove this. These titles are usually found with <h1> elements so we will want to remove those and everything between the opening and closing tags.

Along with removing those tags, we will want to remove any secondary headings <h2> through <h6> as well as any images [alt text](link to the image asset).

To accomplish these goals we will be leveraging a number of regex patterns that I will detail more in the next section.

Building a Except Generator Function

We have talked about excerpts, markdown and potential pitfalls to dealing with markdown for content and front matter. It’s now time to actually build this method. Here is the code that we would use for creating excerpts:

Figure 5. formExcerpt method for creating excerpts from content and front matter ‘excerpt_size’ and ‘seo_description_size’ fields.

There’s a lot going on here so let’s walk through how we are accomplishing our goals mentioned above. The first thing you will see is our constant declaration section. At the top of the file we declare constants to set our small, medium and large character counts. These are our hardcoded values that allow us to have accessible attribute setting in our front matter.

After our constants, we have our function head. Our function, formExcerpt, will accept two parameters. The first is a string called content which will be our content from the markdown file. The second parameter is a string, excerptSize. This has a default value of small, but should be passed in wherever it is called in our site. The return value of our function is also a string, our formatted excerpt text.

Within the function body we do three main things. The first thing we do is use a switch statement to turn our accessible English excerptSize into a computer friendly characterCount value. We will hold onto this for later.

The next section focuses on cleaning up the string and getting the correct number of characters. As discussed before we would like to remove certain tags as well as the content between certain tags. One pattern I found especially useful when approaching this problem was “/<X>.*</X>/“ which accesses everything between two tags, including the tags, for selection. With this selection I can use a simple “replace” method to remove the substring from our string.

After the string is cleaned up, we grab the requested number of characters then split the string by spaces into an array of strings. We then remove the last item. Why do remove the last item if we X number of characters? Sometimes when we use slice on a collection of words we get partial words. It is better to drop off a partial word than make the excerpt harder for a user to read.

Finally, we have our string. To embrace the excerpt theme we will append an ellipse “…” to this string to tell the reader that this excerpt is just the beginning of the content. Hopefully this excerpt has them hooked and clicking through to our content and more!

Conclusion

Giving more context, and a true preview of a post is great for user experience. There’s no worse feeling for users than clicking on an article based on a click bait header to find completely different content. I hope that this article helped you better understand how to approach excerpts from markdown files and how to this functionality to your project.

Notes

https://github.com/jonschlinkert/gray-matter/blob/master/examples/excerpt.js

--

--

Daniel Pericich
Daniel Pericich

Written by Daniel Pericich

Former Big Beer Engineer turned Full Stack Software Engineer

Responses (2)