What is a Regular Expression?
A regular expression (regex) is a sequence of characters that defines a search pattern. Think of it as a powerful "find" tool — but instead of searching for a specific word, you can search for a pattern of characters.
For example, instead of searching for exactly "hello123", you can write a regex that matches any combination of letters followed by any combination of digits. This makes regex incredibly useful for validating input, extracting data, and transforming text.
Regex is supported in virtually every programming language: JavaScript, Python, Java, Go, Ruby, PHP, and more. Once you learn regex, it works everywhere.
Your First Regex Pattern
The simplest regex is a literal string. The pattern hello matches the exact text "hello" anywhere in a string. But the real power comes from special characters called metacharacters.
Here are the most important metacharacters to start with:
.— Matches any single character (except newline)*— Matches the previous character zero or more times+— Matches the previous character one or more times?— Matches the previous character zero or one time (optional)^— Matches the start of a string$— Matches the end of a string\— Escapes a metacharacter to treat it literally
Character Classes
Character classes let you match one character from a set of possibilities using square brackets [].
[aeiou]— Matches any vowel[a-z]— Matches any lowercase letter[A-Z0-9]— Matches any uppercase letter or digit[^abc]— Matches any character except a, b, or c
There are also built-in shorthand character classes:
\d— Any digit (0-9)\w— Any word character (letters, digits, underscore)\s— Any whitespace (space, tab, newline)\D,\W,\S— The opposite of the above
Quantifiers
Quantifiers control how many times a pattern element must match:
a{3}— Matches exactly 3 "a"sa{2,4}— Matches 2 to 4 "a"sa{2,}— Matches 2 or more "a"s
Your First Real-World Regex
Let's write a regex to validate a simple email address format: \w+@\w+\.\w+
\w+— One or more word characters (the username)@— Literal @ symbol\w+— One or more word characters (the domain)\.— Literal dot (escaped because . alone means "any character")\w+— One or more word characters (the TLD like "com")
This matches "user@example.com" but it's not perfect — email validation is a complex topic. Try it in the Regex Tester and experiment!
Practice: Common Beginner Patterns
The best way to learn regex is to practice. Try these patterns in the Regex Tester:
\d{4}-\d{2}-\d{2}— Matches dates like "2024-01-15"https?://\S+— Matches HTTP and HTTPS URLs^[A-Z]— Matches strings that start with a capital letter\b\w{5}\b— Matches exactly 5-letter words