Regex for programmers, comprehensive guide
As a developer, I constantly switch between different search tools - VIM for editing, ripgrep for code search, fd for file finding, and various grep variants for text processing. Each tool has its own regex syntax and search patterns, which can be confusing when jumping between contexts.
This comprehensive guide explores the landscape of search tools and their regex implementations, with detailed breakdowns of flags, special characters, and escaping rules for each tool.
Tool Selection Decision MatrixPermalink
By Primary Use CasePermalink
Use Case | Best Choice | Alternative | Legacy/Fallback |
---|---|---|---|
File Finding | fd | find | find |
Code Search | ripgrep | git grep | grep -r |
Git Repos | git grep | ripgrep | grep -r |
Text Editing | VIM \v | VIM magic | sed |
Scripting | ripgrep | grep -E | grep |
Performance | ripgrep, fd | git grep | grep, find |
Portability | grep, find | git grep | POSIX tools |
Tool Recommendation: For modern development, the ripgrep
+ fd
combination covers 90% of search needs with excellent performance and intuitive syntax.
Tool-by-Tool ReferencePermalink
File Finding ToolsPermalink
find (Traditional Unix)Permalink
Regex Engine: ERE (Extended Regular Expressions) with -regex
flag
Default Mode: Glob patterns with -name
Performance Warning: find
can be slow on large directories. Consider using fd
for better performance in modern development environments.
fd (Modern Alternative)Permalink
Pattern Engine: Glob patterns (default) / Regex with --regex
flag
Default Mode: Glob patterns
Key Flags:
-e, --extension EXT # Filter by extension
-t, --type TYPE # f(ile), d(irectory)
-H, --hidden # Include hidden files
-I, --no-ignore # Don't respect .gitignore
--regex PATTERN # Use regex instead of glob
-x, --exec CMD # Execute command on matches
Character Matching (Glob mode - default):
- Digits:
[0-9]
in character classes - Letters:
[a-zA-Z]
in character classes - Word chars:
[a-zA-Z0-9_]
in character classes - Any chars:
*
(wildcard) - Single char:
?
(wildcard) - Char classes:
[abc]
,[a-z]
- Brace expansion:
{js,ts,jsx}
- Recursive:
**
(recursive directory matching)
Character Matching (Regex mode with --regex
):
- Digits:
\d
or[0-9]
- Letters:
[a-zA-Z]
or[[:alpha:]]
- Word chars:
\w
or[a-zA-Z0-9_]
- Whitespace:
\s
or[ \t\n\r\f\v]
- Space only: ` ` (literal space)
- Tab:
\t
or literal tab - Newline:
\n
or literal newline - Word boundary:
\b
Metacharacter Escaping:
# Glob mode (default):
fd "*.test.*" # * is glob wildcard
fd "test\?" # Escape ? for literal question mark
fd "file\[1\]" # Escape [] for literal brackets
fd "prefix\*suffix" # Escape * for literal asterisk
# Regex mode:
fd --regex ".*\.js$" # . needs escape, $ works
fd --regex "test\+" # + works for one-or-more
fd --regex "literal\(" # Escape ( for literal parenthesis
fd --regex "\d{3}" # \d works, {} work for repetition
Examples:
# Extensions (multiple approaches)
fd -e js -e ts # Extension filter
fd "*.js" # Glob pattern
fd --regex "\.(js|ts)$" # Regex pattern
# Complex patterns
fd --regex "component.*\.spec\."
fd -t f --regex "test.*\.(js|ts)$"
fd --glob "**/*test*"
fd -p --regex "src/.*component" # Search in full path
# Combining flags
fd -e js -H -I "test" # Include hidden, ignore .gitignore
fd -t f --regex "\d+.*\.log$" --exec cat {}
Modern Choice: fd
is git-aware by default and respects .gitignore
, making it ideal for code projects. It’s also significantly faster than traditional find
.
Extension Tip: Use -e
flag for simple extension filtering - it’s more readable than regex patterns for common cases.
Content Search Tools (grep family)Permalink
grep (Traditional)Permalink
Regex Engine: BRE (default) / ERE (with -E
) / PCRE (with -P
on some systems)
Default Mode: BRE (Basic Regular Expressions)
Legacy Limitation: Traditional grep
doesn’t support modern shorthand character classes like \d
, \w
, \s
. Use POSIX character classes or explicit ranges instead.
Mode Recommendation: Use grep -E
(ERE mode) by default for less escaping confusion. The syntax is closer to modern regex engines.
git grepPermalink
Regex Engine: BRE (default) / ERE (with -E
) / PCRE (with --perl-regexp
)
Default Mode: BRE
Git Integration: git grep
is perfect for repository searches as it automatically respects .gitignore
and only searches tracked files by default.
PCRE Power: Use git grep -P
when you need modern regex features like lookahead/lookbehind that aren’t available in ERE mode.
ripgrep (rg)Permalink
Regex Engine: Rust regex (PCRE-like, with some extensions)
Default Mode: Rust regex (modern, PCRE-compatible)
Key Flags:
-i, --ignore-case # Case insensitive
-w, --word-regexp # Match whole words
-n, --line-number # Show line numbers
-l, --files-with-matches # Show only filenames
-A/-B/-C NUM # Show context lines
-t, --type TYPE # Filter by file type (js, py, etc.)
-g, --glob PATTERN # Include/exclude by glob
--hidden # Search hidden files
-U, --multiline # Enable multiline matching
Character Matching (Rust regex - default):
- Digits:
\d
or[0-9]
- Letters:
[a-zA-Z]
or[[:alpha:]]
- Word chars:
\w
or[a-zA-Z0-9_]
- Whitespace:
\s
or[ \t\n\r\f\v]
- Space only: ` ` (literal space)
- Tab:
\t
or literal tab - Newline:
\n
or literal newline - Carriage return:
\r
- Word boundary:
\b
- Unicode categories:
\p{L}
(letters),\p{N}
(numbers)
Metacharacter Escaping:
# Modern syntax works out of the box:
rg "function\s+\w+" # \s and \w work
rg "\d{2,4}" # \d and {} work
rg "(function|const)" # () and | work
# Literal matching:
rg "literal\(" # Escape ( for literal
rg "literal\$" # Escape $ for literal
rg "literal\." # Escape . for literal
rg --fixed-strings "literal(" # Or use fixed strings mode
Examples:
# Type filtering
rg --type js "function"
rg -t py -t js "class"
rg -T log "error" # Exclude log files
# Advanced patterns
rg "\w+@\w+\.\w+" -t md # Email in markdown files
rg "(?i)todo|fixme" --glob "*.js" # Case-insensitive TODO/FIXME
rg -U "class.*\{[\s\S]*?constructor" # Multiline class with constructor
# Replacement
rg "console\.log" -r "logger.info" --type js
rg "\btodo\b" -r "TODO" -i # Case-insensitive replace
# Context and formatting
rg "error" -A 3 -B 3 # Show context
rg "function" -o # Only show matching parts
rg "import.*from" --json # JSON output
Performance Champion: ripgrep
is often the fastest search tool available, with excellent defaults and modern regex support out of the box.
Smart Defaults: ripgrep
automatically detects file types, respects .gitignore
, and excludes binary files - minimal configuration needed for great results.
Editor Search ToolsPermalink
VIM/Neovim SearchPermalink
Regex Engine: VIM’s Magic Mode system
Default Mode: Magic mode
Magic Modes:
\v " Very magic - PCRE-like syntax
\m " Magic (default) - traditional VIM
\M " No magic - most chars are literal
\V " Very no magic - almost all chars literal
Common Search Commands:
/pattern " Forward search
?pattern " Backward search
n " Next match
N " Previous match
* " Search word under cursor (forward)
# " Search word under cursor (backward)
gn " Select next match
gN " Select previous match
:noh " Clear highlighting
:%s/old/new/g " Substitute (replace)
:%s/old/new/gc " Substitute with confirmation
:g/pattern/ " Global command (show lines matching)
:v/pattern/ " Inverse global (show non-matching)
Character Matching (Magic mode - default):
- Digits:
\d
or[0-9]
or[[:digit:]]
- Letters:
[a-zA-Z]
or[[:alpha:]]
- Word chars:
\w
or[a-zA-Z0-9_]
or[[:alnum:]_]
- Whitespace:
\s
or[ \t\n\r]
or[[:space:]]
- Space only: ` ` (literal space)
- Tab:
\t
or literal tab - Newline:
\n
or literal newline - Word boundary:
\<
(start) and\>
(end)
Magic Mode Tip: VIM’s default magic mode can be confusing. Use /\v
(very magic) to get JavaScript-like regex behavior with less escaping.
Character Matching (Very Magic mode with \v
):
- Digits:
\d
or[0-9]
- Letters:
[a-zA-Z]
or[[:alpha:]]
- Word chars:
\w
or[a-zA-Z0-9_]
- Whitespace:
\s
or[ \t\n\r]
- Space only: ` ` (literal space)
- Tab:
\t
or literal tab - Newline:
\n
or literal newline - Word boundary:
\<
and\>
(not\b
)
Metacharacter Escaping:
" Magic mode (default):
/function\+ " Need escape for +
/\(function\) " Need escape for ()
/test\|spec " Need escape for |
/file\{2,3\} " Need escape for {}
" Very magic mode (\v):
/\vfunction+ " + works without escape
/\v(function|const) " () and | work
/\vfile{2,3} " {} work
" Very no magic mode (\V):
/\Vliteral+ " + is literal
/\Vliteral( " ( is literal
" Literal matching in magic mode:
/literal\$ " Escape $ for literal
/literal\. " Escape . for literal
/literal\* " Escape * for literal
Examples:
" Basic searches
/function " Find 'function'
/\vfunction\s+\w+ " Very magic: function followed by word
/\(function\|const\) " Magic: function or const
" Word boundaries
/\<word\> " Exact word match
/\vword> " Very magic word boundary
" Substitution
:%s/\v(function)\s+(\w+)/const \2 = () =>/g " Convert functions
:%s/console\.log/logger.info/g " Replace console.log
" Case sensitivity
/\cpattern " Case insensitive search
/\Cpattern " Case sensitive search
:set ignorecase " Default case insensitive
:set smartcase " Smart case sensitivity
VIM Complexity: VIM’s regex system is unique among editors. When in doubt, use /\v
for very magic mode to reduce escaping confusion.
Word Boundaries: VIM uses \<
and \>
for word boundaries instead of \b
. This works across all magic modes.
Comprehensive Comparison TablesPermalink
Character Matching ReferencePermalink
Pattern Type | find (ERE) | fd (regex) | grep (ERE) | git grep (PCRE) | ripgrep | VIM (magic) | VIM (\v) |
---|---|---|---|---|---|---|---|
Digits | [0-9] |
\d or [0-9] |
[0-9] |
\d or [0-9] |
\d or [0-9] |
\d or [0-9] |
\d or [0-9] |
Letters | [a-zA-Z] |
[a-zA-Z] |
[a-zA-Z] |
[a-zA-Z] |
[a-zA-Z] |
[a-zA-Z] |
[a-zA-Z] |
Word chars | [a-zA-Z0-9_] |
\w |
[a-zA-Z0-9_] |
\w |
\w |
\w |
\w |
Whitespace | [[:space:]] |
\s |
[[:space:]] |
\s |
\s |
\s |
\s |
Space only | ` ` | ` ` | ` ` | ` ` | ` ` | ` ` | ` ` |
Tab | ` ` | \t |
` ` | \t |
\t |
\t |
\t |
Newline | literal | \n |
literal | \n |
\n |
\n |
\n |
Word boundary | \< \> |
\b |
\< \> |
\b |
\b |
\< \> |
\< \> |
Modern vs Legacy: Modern tools support shorthand classes (\d
, \w
, \s
). Legacy tools require explicit character classes or POSIX classes like [[:digit:]]
, [[:alpha:]]
, [[:space:]]
.
Metacharacter Escaping RulesPermalink
Pattern | find (ERE) | fd (regex) | grep (BRE) | grep (ERE) | git grep (PCRE) | ripgrep | VIM (magic) | VIM (\v) |
---|---|---|---|---|---|---|---|---|
Groups | (...) |
(...) |
\(...\) |
(...) |
(...) |
(...) |
\(...\) |
(...) |
Literal ( | \( |
\( |
( |
\( |
\( |
\( |
( |
\( |
One or more | + |
+ |
\+ |
+ |
+ |
+ |
\+ |
+ |
Literal + | \+ |
\+ |
+ |
\+ |
\+ |
\+ |
+ |
\+ |
Zero or one | ? |
? |
\? |
? |
? |
? |
\? |
? |
Literal ? | \? |
\? |
? |
\? |
\? |
\? |
? |
\? |
Alternation | | |
| |
\| |
| |
| |
| |
\| |
| |
Literal | | \| |
\| |
| |
\| |
\| |
\| |
| |
\| |
Repetition | {n,m} |
{n,m} |
\{n,m\} |
{n,m} |
{n,m} |
{n,m} |
\{n,m\} |
{n,m} |
Escaping Trap: BRE mode (basic grep
, git grep
default) requires escaping +
, ?
, {}
, ()
, |
to use them as metacharacters. ERE mode and modern tools work the opposite way.
Practical Workflow ExamplesPermalink
Modern Development SetupPermalink
# Install modern tools
brew install ripgrep fd
# Shell aliases for consistency
alias search='rg'
alias find-files='fd'
alias git-search='git grep -P'
# Common patterns as functions
function find-js() { fd --regex "\.(js|ts|jsx|tsx)$" "$@"; }
function search-todos() { rg "(?i)(todo|fixme|hack)" "$@"; }
function search-functions() { rg "function\s+\w+" --type js "$@"; }
Setup Tip: These aliases create a consistent interface across different search tools, reducing the cognitive load of remembering different command syntaxes.
VIM IntegrationPermalink
" Use ripgrep for :grep
if executable('rg')
set grepprg=rg\ --vimgrep\ --smart-case
set grepformat=%f:%l:%c:%m
endif
" Use fd for file finding
if executable('fd')
let $FZF_DEFAULT_COMMAND = 'fd --type f'
endif
" Better search highlighting
set hlsearch incsearch
nnoremap <leader>/ /\v " Use very magic by default
Cross-Tool Pattern ExamplesPermalink
# Find email addresses across different tools
fd --regex "\w+@\w+\.\w+" # Find files with email in name
rg "\w+@\w+\.\w+" # Find email in file content
git grep -P "\w+@\w+\.\w+" # Git-aware email search
# VIM: /\v\w+@\w+\.\w+
# Find function definitions
fd --regex "function.*\.js$" # Files with "function" in name
rg "function\s+\w+" --type js # Function definitions in JS
git grep -E "function\s+\w+" -- "*.js" # Git-aware function search
# VIM: /\vfunction\s+\w+
# Find test files
fd --glob "*test*" # Glob pattern for test files
fd --regex ".*test.*\.(js|ts)$" # Regex for JS/TS test files
rg --files-with-matches "describe\(" --type js # Files with tests
# VIM: /\v.*test.*\.(js|ts)$
Cross-Tool Consistency: Notice how the same logical pattern requires different syntax across tools. Building a mental map of these differences is key to productivity.
Summary and Best PracticesPermalink
Key RecommendationsPermalink
- Learn Modern Tools: Master
ripgrep
+fd
for daily use - Understand Escaping: Know when to escape metacharacters in each tool
- Use Very Magic in VIM:
/\v
makes VIM patterns JavaScript-like - Leverage Tool Strengths: Use each tool for its optimal use case
- Build Muscle Memory: Create consistent aliases and shortcuts
Common Pitfalls to AvoidPermalink
- Mixing Regex Flavors: Don’t assume
\d
works everywhere - Wrong Tool Choice: Don’t use
find
for content search - Escaping Confusion: Test patterns incrementally
- Ignoring Performance: Use
fd
instead offind
when possible - Not Using Git Integration: Leverage
git grep
in repositories
Final Advice: Start with ripgrep
and fd
for modern development. Learn VIM’s /\v
mode for editing. Keep this guide handy for regex syntax reference when switching between tools.
This comprehensive guide should serve as a reference for navigating the complex landscape of search tools and their regex implementations. The key is understanding each tool’s strengths and using them appropriately in your development workflow.
Comments