Metacharacters Overview
In regex, certain characters have special meanings. These are called metacharacters. To match them literally, you must escape them with a backslash: \. matches a literal period.
The metacharacters are: . ^ $ * + ? { } [ ] \ | ( )
The Dot: Match Any Character
The dot . matches any single character except a newline. The pattern c.t matches "cat", "cot", "cut", "c!t", but not "ct" (nothing between c and t) or "coat" (two characters).
To match an actual period, escape it: \.
Anchors: ^ and $
^ matches the start of a string. ^Hello matches "Hello World" but not "Say Hello".
$ matches the end of a string. end$ matches "the end" but not "ending".
Combined: ^\d+$ matches strings that are entirely digits — useful for validating numeric input.
Word Boundaries: \b
\b matches the position between a word character and a non-word character. The pattern \bcat\b matches "cat" and "the cat sat" but NOT "catch" or "concatenate".
This is very useful when you want to match whole words only.
Alternation: |
The pipe character | means "or". The pattern cat|dog matches either "cat" or "dog". When combined with groups: (cat|dog)s matches "cats" or "dogs".
Escape Sequences
Special escape sequences represent non-printable or commonly used character groups:
\n— Newline\t— Tab\r— Carriage return\d— Digit [0-9]\D— Non-digit [^0-9]\w— Word character [a-zA-Z0-9_]\W— Non-word character\s— Whitespace [ \t\n\r\f\v]\S— Non-whitespace
Greedy vs. Lazy Matching
By default, quantifiers are greedy — they match as much as possible. The pattern <.+> applied to <b>text</b> matches the entire string <b>text</b>, not just <b>.
Add ? after a quantifier to make it lazy (match as little as possible): <.+?> matches <b> then </b> separately.