🎤 WTSFest Berlin up next on June 13th
🎟️ Tickets on sale for Philadelphia, Melbourne, London, & Portland

Back to Knowledge Hub

RegEx for SEOs: ready-to-implement use cases

Author: Laura Vaduva

Last updated: 10/06/2025

As an SEO Specialist, one of my biggest struggles is analysing large sets of data to identify top performing pages or keywords while being on the lookout for growth opportunities.

Excel is (and always has been) an SEO’s best friend, but when it comes to identifying patterns the tasks can become overwhelming and time consuming.

In this case, RegEx formulas are the best alternative to consider – and the fastest! Let’s see what RegEx is and how I leverage it to make my life as an SEO much easier.

What is RegEx?

Barry Schwartz defines RegEx (or regular expressions) as “a sequence of characters that define a search pattern. Usually, such patterns are used by string searching algorithms for “find” or “find and replace” operations on strings, or for input validation.” In this Search Engine Land article about RegEx, Barry also states that “RegEx can be super powerful and fast in filtering data and even for replacing data, but it can also be tricky to get right”.

Dan Taylor explained in a RegEx for SEO SEJ article that “Regular expressions, or ‘RegEx’, are like an in-line programming language for text searches that allow you to include complex search strings, partial matches and wildcards, case-insensitive searches, and other advanced instructions.”

Why is RegEx useful for SEO?

RegEx can be used to identify any series of characters e.g.: a phone number, a search query, an URL or even a product reference number. As an SEO, I find RegEx to be one of the most useful tools I can use to match, manage, or filter data.

For example, this RegEx string (what|where|when|how|who) matches any one of the words "what", "where", "when", "how", or "who".

A List of Basic RegEx Operators:

  • . Represents any character
  • .* Represents 0 or more characters
  • . + Represents 1 or more characters
  • ? Represents optional character
  • ^ Represents beginning of a line
  • $ Represents end of a line
  • \ Represents escape, a special character

A list of the basic RegEx Operators.

There are numerous comprehensive RegEx guides available, such as: Regular Expression Language - Quick Reference by Microsoft, Regular expressions by Mozilla, Using Regular Expressions by Oracle.

However, to be completely honest with you, there are only a handful of formulas I use in my day-to-day SEO role. I’ve included them in the RegEx Cheat Sheet below, and I hope it’ll make your life just a bit easier 😀

RegEx Cheat sheet:

RegEx formulas for Search Queries:

Formula for Branded Terms: .*domain name.*domain.*name.*dm.*

  • What it does: Matches queries that contain your brand name or other variations of your brand name such as typos or abbreviations
  • When to use it: When you want to look at branded terms only (matches RegEx) or non-branded terms only (doesn’t match RegEx)

Formula for Informational Terms: who|what|when|why|how|can|tips|guide|instructions|list|explained|for beginners|meaning|definition|types|uses|best|steps|tutorial|example|benefits

  • What it does: Matches any of the words most commonly used with an informational intent
  • When to use it: When you want to extract only the informational terms, meaning keywords that contain any of the words included in the RegEx formula

Formula for Questions: what|where|when|how|who:

  • What it does: Matches any one of the words "what", "where", "when", "how", or "who"
  • When to use it: When you want to extract only the queries that are questions, meaning keywords that contain any of the words "what", "where", "when", "how", or "who"

Formula for Location Specific Terms: \b(near\s+me|in\s+madrid|nearby|in\s+salamanca)\b

  • What it does: Matches any search containing the words "Near me", "in Madrid", "nearby" or "in Salamanca"
  • When to use it: When you want to extract only the queries specific to a location. For example: if you are located in the Salamanca neighbourhood of Madrid and you are searching for something close to you, then you are likely to use some of these words: "Near me", "in Madrid", "nearby" or "in Salamanca"

Formula for LSI (Latent Semantic Indexing) keywords: \b(Apple|iOS|iPhone|MacBook|AirPods|iPad)\b

  • What it does: Matches keywords that are semantically related to the main keyword you are analysing
  • When to use it: When you want to extract or filter by keywords that fall into the same semantic cluster. For example: if I am searching for Apple related keywords, I might also include terms like “iOS”, “iPhone”, “MacBook”, “AirPods”, “iPad”

RegEx formulas for URLs:

Formula for Category Pages: (https://companyname.com/.*/)

  • What it does: Matches all URLs that have the Category URL structure, meaning only 1 term after “/”
  • When to use it: When you want to extract or filter by the Category page URLs, assuming the Category URL structure is https://companyname.com/category. For example: if you run a sports website with multiple categories for each sport you cover, this formula will include a URL such as https://sportswebsite.com/football

Formula for URLs containing one specific word: \/word\b

  • What it does: Matches all URLs that contain a specific word
  • When to use it: When you want to extract all URLs that contain a specific word For example: \/car\b – This will match any URLs where “car” appears after the slash / (for example; companyname.com/car, companyname.com/something/car)

Formula for URLs that include this or that: (keyword1|keyword2)

  • What it does: Matches all URLs that contain either of the specific terms mentioned in the RegEx formula
  • When to use it: When you want to extract all URLs that contain one of the specific words or patterns you have included in the RegEx formula. For example: (ankle-boots|loafers) – This pattern will match any URL that includes either /ankle-boots or /loafers

Formula for URLs that exclude this or that: (?!.*\/(keyword1|keyword2))

  • What it does: Matches all URLs that do not contain either of the specific terms mentioned in the RegEx formula
  • When to use it: When you want to extract all URLs that do not contain one of the specific words or patterns you have included in the RegEx formula. For example: (?!.*\/(ankle-boots|loafers)) – This pattern will match any URL that does not include /ankle-boots or /loafers

Testing your RegEx Formulas

The risk of error can be high with RegEx formulas. So, before using any formula, I always test it first. Here is a list of free RegEx tools you can use to test your formulas:

I use Tracking Plan’s RegEx tester as I find it quicker and more intuitive than some of the other tools out there.

Here is how it works:

Two validation examples using Tracking Plan’s Online ReGex tester. Green indicates a valid pattern, red indicates an invalid pattern.

Source: Online Regex Tester: Validate and Test Your Regular Expressions Easily

RegEx Tips

Using RegEx can be tricky, especially when you’re just starting out. If you’re using RegEx in GA4, here are some tips from Google which I’ve found useful:

1. Keep your RegEx simple

This makes it easier for me (and anyone I collaborate with) to interpret, modify and repurpose.

2. Use the backslash (\) to escape RegEx metacharacters

I find this an extremely useful practice when I need those characters to be interpreted literally.

3. Use metacharacters to limit the match

The default RegEx behaviour Google uses is called “partial match”. This means that your RegEx pattern can be found anywhere within the data.

“Specific match” is different from “partial match”. For “specific match”, construct the RegEx accordingly: For example, if I only want to identify instances of “where”, the RegEx I use is ^where”$. This way, my RegEx won’t include other words that contain “where”, such as “anywhere” or “wherever”.

Find more tips and suggestions in Google’s GA4 documentation: [GA4] About Regular Expressions (RegEx)

RegEx Use Cases in SEO

Now let’s have a look at how to use RegEx formulas in our day-to-day work.

For this, I’ve included some use cases for creating segments in GA4, for data filtering in Google Search Console, and report building in Looker Studio. Let’s go!

Creating Segments in Google Analytics 4 (GA4)

I find RegEx useful in GA4, especially for creating segments & audiences.

At times, I prefer to have an overview of all organic traffic, regardless of the search engine the visitor used. For this reason, I create a user segment that includes both Google & Bing.

Here is how I do it:

Step 1: Go to GA4 – Explore – Blank

A screenshot from GA4. In the Explore tab, “Blank - Create a new exploration” is selected.

Step 2: Segments – Create a new Segment

A screenshot from GA4. On a new ”Untitled exploration”, the variable “Segments” is selected.

A screenshot from GA4. On a new ”Untitled exploration”, under variable “Segments”, “Create a new segment” is selected.

Step 3: Build new segment – Session segment

A screenshot from GA4. On a new ”Untitled exploration”, a new session segment is created. 

There are three different types of segments in the GA4 Segment Builder: User segment, Session segment and Event segment.

For this example, I’ll be focusing on the Session segment, which captures all the sessions that originated from organic search.

Step 4: Build new segment – Session segment

A screenshot from GA4. A new segment for organic traffic in Google and Bing is created using RegEx

Here, the first thing I do is name the Untitled Segment. Here, the name I’ve chosen is Google and Bing Organic Search.

Then, I update each field according to my needs. Here, I want to include sessions when the Source/Medium matches my condition. My condition is matches regex, and the RegEx formula is google / organic|bing / organic.

As a result, I will see all traffic (sessions) coming from either Google or Bing.

Filtering Data in Google Search Console

I find RegEx extremely useful when I need to filter keywords in Google Search Console. A common use case for me is filtering branded & non-branded queries.

With RegEx, I include multiple variations of branded search terms e.g.: brand name, partial brand name, abbreviations, and typos.

For example:

  • Brand is Company Name
  • RegEx Branded Terms formula is: .*company name.*company.*name.*cn.*compan name.*

Here is what I do to filter branded terms in Google Search Console:

Step 1: Go to Google Search Console – Performance – Add Filter – Query

A screenshot from Google Search Console. In the Performance tab, under “Add Filter”, “Query” is selected.

Step 2: Click on Query – Custom (regex)

A screenshot from Google Search Console. In the Performance tab, under the Query Filter, “Custom (regex)” is selected.

Step 3: I use Matches regex for branded terms) & Doesn’t match regex (for non-branded terms)

A screenshot from Google Search Console. In the Performance tab, under the Custom (regex) Query Filter, “Matches regex” is selected.

You can also apply the same logic when filtering URL patterns in Google Search Console. If you’d like to do this, choose Page:

Then, click on Custom (regex) and select:

  • Matches regex – to include only URLs that match the pattern
  • Doesn’t match regex – to exclude URLs that match the pattern

A screenshot from Google Search Console. In the Performance tab, under the Custom (regex) Page Filter, “Matches regex” is selected.

Don’t forget to test your formula first!

Building Custom Reports in Looker Studio

Looker Studio (formerly known as Google Data Studio) is a tool which you can use to convert data from various sources (such as Google Data Studio or Google Analytics) into interactive reports and dashboards. For me, it is especially useful for data visualisation, trends over time, period-versus-period comparisons and scorecards for my main target KPIs.

Regular expressions in Looker Studio are commonly used for creating custom filters and reports. There are 4 main formulas available: REGEXP_CONTAINS, REGEXP_EXTRACT, REGEXP_MATCH, REGEXP_REPLACE.

The main regular expressions formulas available in Looker Studio. A screenshot from the Google Cloud Documentation.

Source: Regular expressions in Looker Studio

In practice, the formulas I use most frequently are REGEXP_MATCH and REGEXP_CONTAINS.

Creating custom Looker Studio Reports with Google Search Console Data

I like to create an overview of SEO performance split out by the following page levels: ‘Category’, ‘Subcategory’, and ‘Article’.

Let’s say our website follows this URL Structure: https://companyname.com/category/subcategory/article

Here is the Looker Studio Custom Filter Formula using RegEx:

CASE

WHEN REGEXP_MATCH(Landing Page, 'https://companyname.com/.*/.*/.*/.*') THEN 'Article'

WHEN REGEXP_MATCH(Landing Page, 'https://companyname.com/.*/') THEN 'Category'

WHEN REGEXP_MATCH(Landing Page, 'https://companyname.com/.*/.*/') THEN 'Subcategory'

WHEN REGEXP_MATCH(Landing Page, 'https://companyname.comLanding Page/$') THEN 'Home Page'

ELSE 'Other'

END*

To create a Custom Filter in Looker Studio:

Step 1: Add Data Source

In this case, I add Google Search Console

A screenshot from Looker Studio. In the Add Data Source section, “Google Search Console” is selected.

A screenshot from Looker Studio. In the Add Data Source section, Google Search Console, the source domain, Tables (URL Impressions) and Search type (web) are selected.

Step 2: Add Field

A screenshot from Looker Studio. In the Data section for Google Search Console as a source, “Add a field” is selected.

Step 3: Add calculated field (this is the custom filter)

A screenshot from Looker Studio. In the Data section (source: Google Search Console), under “Add a field”, “Add calculated field” is selected.

Step 4: Name the custom field

Here I add the RegEx Formula previously created:

CASE

WHEN REGEXP_MATCH(Landing Page, 'https://companyname.com/.*/.*/.*/.*') THEN 'Article'

WHEN REGEXP_MATCH(Landing Page, 'https://companyname.com/.*/') THEN 'Category'

WHEN REGEXP_MATCH(Landing Page, 'https://companyname.com/.*/.*/') THEN 'Subcategory'

WHEN REGEXP_MATCH(Landing Page, 'https://companyname.comLanding Page/$') THEN 'Home Page'

ELSE 'Other'

END

A screenshot from Looker Studio. In the Data section (source: Google Search Console) under “Add calculated field”, a new “Category” field is created.

Step 5: Save & finish!

Creating Custom Looker Studio Reports with Google Analytics 4 Data

In GA4, it is possible to identify page titles that contain a specific keyword (string pattern) using RegEx. I find this useful to categorize pages based on the keyword(s) used in the page title. Let’s say I want to identify all page titles that start with the string “SEO”. Here is the Looker Studio Formula I’d use:

REGEXP_CONTAINS(page title , "^seo|^seo.*")

To create a Custom Filter in Looker Studio for GA4 Data follow the steps below:

Step 1: Add Data Source

In this case, I add Google Analytics 4

A screenshot from Looker Studio. In the Add Data Source section, “Google Analytics” is selected.

Step 2: Add Field

A screenshot from Looker Studio. In the Data section for Google Analytics as a source, “Add a field” is selected.

Step 3: Add calculated field (this is the custom filter)

Step 4: Name the custom field. For example (SEO Title)

A screenshot from Looker Studio. In the Data section (source: Google Analytics) under “Add calculated field”, a new “Page Title” field is created.

Step 5: Save & finish

Creating RegEx Formulas via ChatGPT Prompts:

Not sure how to create a formula? Don’t worry! ChatCPT can help. As before, don’t forget to test your formula first!

Here are some examples of ChatGPT prompts you can use:

Google Search Console query filtering formula:

Prompt: Create a RegEx formula for Google Search Console that can find all search queries which include the words: download, subscribe, enroll, buy, order

Result Formula: \b(download|subscribe|enroll|buy|order)\b

A screenshot from ChatGPT. Example of a prompt and results for a request to generate a RegEx formula for specific queries.

Google Search Console query length formula:

Prompt: Create a RegEx for Google Search Console to find all search queries between 1 and 4 words.

Result: ^(\w+(\s+\w+){0,3})$

A screenshot from ChatGPT. Example of a prompt and results for a request to generate a RegEx formula for a specific query length.

Google Search Console URL Formula:

Prompt: Create a regex formula for Google Search Console to find all Subcategory URLs. Here is the URL structure of my domain: https://companyname.com/category/subcategory/article

Result Formula: ^https:\/\/companyname\.com\/[^\/]+\/[^\/]+\/[^\/]+$

A screenshot from ChatGPT. Example of a prompt and results for a request to generate a RegEx formula for a specific page type (URL format).

Final Thoughts

From analysing large data sets and extracting patterns, to creating custom reports, RegEx can be crucial for us SEOs. Whether we’re filtering keywords in Google Search Console, creating new segments in GA4, or building cleaner dashboards in Looker Studio, the use of regular expressions can provide great insights.

However, it can still be quite overwhelming at first. But with patience, curiosity, and a bit of help from GEN AI tools, you’ll master it in no time! 😊


Laura Vaduva - Senior SEO Executive

Laura is an in-house SEO Specialist and part-time freelancer. With +7 years of experience in Digital Marketing, Laura is most passionate about Organic Search Growth, with expertise in both Search Engine Optimization (SEO) & App Store Optimization (ASO).

WTSKnowledge Sponsor

February to June 2025 Cohort

ZipSprout is a matchmaking service between brands who need local SEO and marketing and nonprofits with sponsorship opportunities.

Since 2016, they've distributed over $7,000,000 to local communities across the US.