-
[0-9]
-
[a-z]
-
[A-Z]
-
[a-zA-Z]
-
[aeiou]
-
[^aeiou]
Regular Expressions
Regular expressions, often abbreviated as regex, are a powerful tool used to search and manipulate text (string) data. They provide a concise way to define patterns within strings, allowing for efficient matching and extraction of specific information. While the underlying syntax might appear complex, understanding the core concepts empowers even non-technical users to leverage this valuable technique.
Regex basics
Due to the complexity and variability of regular expressions, it is not possible to cover every single nuance within our documentation. For this reason, this documentation will focus on the most fundamental regex tokens and their core functionalities. This ensures a solid foundation for understanding basic pattern matching and provides a springboard for further exploration of regex’s capabilities.
If you encounter specific use cases beyond the covered tokens, we recommend referring to advanced online regex resources, e.g. https://regex101.com/. Since our regex implementation is Java-based, make sure to use the Java-based regex flavor to ensure desired results.
Regex searches are case-sensitive by default. Also, your search might match parts of other words. Bear this in mind and write your search exactly how you want it to match, covering all the needed scenarios. If you don’t care about uppercase or lowercase, you can use the (?i) flag to make the search case insensitive. |
Regex token | Purpose | Description | Example | Explanation |
---|---|---|---|---|
Find exact match |
Matches the exact form of the word. |
|
It will match |
|
|
0 or 1 |
A question mark ( |
|
It will match both |
|
0 or more |
An asterisk ( |
|
It will match |
|
1 or more |
A plus sign ( |
|
It will match |
|
Wildcard dot |
A dot ( |
|
It will match |
|
Alternative |
The pipe ( |
|
It will match |
|
Grouping |
Parentheses |
|
Without parentheses, |
|
Character classes |
Square brackets Can also be used to exclude a group of characters when preceded by a caret |
|
|
|
Repeating characters |
Curly braces
|
|
|
|
Whitespace-related tokens |
|
|
|
|
Digit-related tokens |
|
|
|
|
Word-related tokens |
|
|
|
|
Start and end of strings |
Caret ( |
|
|
Regular expressions use examples
Create a basic formula that would find dates in the format DD/MM/YYYY
(format matching 2 digits/2 digits/4 digits
).
Click to reveal the formula and explanation
The regex formula would be: \d{2}/\d{2}/\d{4}
.
Explanation:
-
\d{2}
: This matches exactly two digits (\d
represents any single digit, and{2}
specifies it must occur twice consecutively). -
/
: This matches a literal forward slash character ("/"). -
\d{2}
: Similar to the first part, this matches exactly two digits again. -
/
: Another literal forward slash match. -
\d{4}
: This matches exactly four digits. This captures the year (e.g.,2024
).
Postal codes need to be in the format
#####
(basic 5-digit zip code) or #####-####
(ZIP+4 code).
Click to reveal the formula and explanation
The regex formula would be: ^\d{5}(-\d{4})?$
.
Explanation:
-
^
: Matches the beginning of the string. -
\d{5}
: Matches exactly five digits. -
(-
: Matches a literal hyphen character (-) but only if it appears after the first five digits. -
\d{4})?
: Matches an optional group containing four digits (0-9). The question mark?
makes the entire group optional, allowing the hyphen and four digits to be absent. -
$
: Matches the end of the string.
Create a basic validation that would match the following format: 123 Main Street, Anytown, USA 12345
, i.e., it:
-
Starts with a house number (one or more digits).
-
Includes a street name containing letters and spaces.
-
Has a city name containing letters (and optionally also spaces).
-
Specifies the country as "USA" (case-sensitive).
-
Ends with a five-digit zip code.
Click to reveal the formula and explanation
The regex formula would be:
^(\d+)\s+([A-Za-z\s]+),\s+([A-Za-z\s]+),\s+USA\s+(\d{5})$
Explanation:
-
^
: Matches the beginning of the string. -
(\d+)
: Captures one or more digits for the house number. -
\s+
: Matches one or more whitespace characters (space, tab, etc.). -
([A-Za-z\s]+)
: Captures the street name, allowing for one or more words separated by spaces. -
,
: Matches a comma. -
\s+
: Matches one or more whitespace characters again. -
([A-Za-z\s]+)
: Captures the city name, allowing for one or more words separated by spaces. -
,
: Matches another comma. -
\s+
: Matches one or more whitespace characters again. -
USA
: Matches the literal string "USA" (case-sensitive). -
\s+
: Matches one or more whitespace characters again. -
(\d{5})
: Captures exactly five digits for the ZIP code. -
$
: Matches the end of the string.
Identify email addresses that adhere to the following rules:
-
The local part can contain letters, numbers, underscores, hyphens, and dots.
-
The domain name can have one or more subdomains.
-
Subdomains can contain letters, numbers, and hyphens.
-
The Top-level domain (TLD) must be 2 to 4 characters long and can contain letters only.
Click to reveal the formula and explanation
The regex formula would be:
^[\w-\.]+@([a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*)\.[a-zA-Z]{2,4}$
Explanation:
-
^
: Matches the beginning of the string (entire email address). -
[\w-\.]+
: Matches one or more occurrences of the following characters in the local part:-
\w
: Word characters (letters, numbers, and underscores). -
-
: Hyphen. -
\.
: Dot (period).
-
-
@
: Matches the "@" symbol, separating the local part from the domain name. -
([a-zA-Z0-9\-]+(?:\.[a-zA-Z0-9\-]+)*)
: Matches one or more repetitions of the subdomain pattern:-
[a-zA-Z0-9\-]+
: Matches one or more letters (a-z, A-Z), numbers (0-9), and hyphens (-) for a subdomain (excluding underscores). -
(?:\.[a-zA-Z0-9\-]+)*
: Matches zero or more repetitions of a literal dot (.) followed by another subdomain following the same pattern.
-
-
\.
: Matches a literal dot (.) separating the subdomains from the TLD. -
[a-zA-Z]{2,4}
: Matches the Top-Level Domain (TLD) containing:-
[a-zA-Z]
: Letters (a-z, A-Z). -
{2,4}
: Quantifier specifying a length of 2 to 4 characters.
-
-
$
: Matches the end of the string (entire email address).
Regex for advanced users
Since the implementation of regular expressions comes from the Java standard library, the syntax of expressions is the same as in Java: see http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html.
For a more detailed explanation of how to use regular expressions, see the Java documentation for java.util.regex.Pattern
.
The meaning of regular expressions can be modified using embedded flag expressions. The expressions include the following:
(?i)
–Pattern.CASE_INSENSITIVE
-
Enables case-insensitive matching.
(?s)
–Pattern.DOTALL
-
In dotall mode, the dot
.
matches any character, including line terminators. (?m)
–Pattern.MULTILINE
-
In multiline mode, you can use
^
and$
to express the beginning and end of the line, respectively (this includes at the beginning and end of the entire expression).
Further reading and description of other flags can be found at http://docs.oracle.com/javase/tutorial/essential/regex/pattern.html.