Regular expressions

Regular expressions or the shortcut RegExs are very powerful for searching and matching text patterns. Because the syntax and therefor the resulting expressions are not easy readable and therefor also not easy to understand, Regular Expressions are loved and hated. Knowledge of Regular Expressions and good tooling can make the results better and more understandable.

This page tries to accomplish that by giving examples, explanation, references, tooling and websites. Not all implementations of Regular Expressions have the same functionality. Always verify that the given examples are supported by your RegEx implementation.

But please keep in mind that regular expression are not the solution to everything. If you try to do too much with just one regular expression you might fall in the pitfall:

 Some people when confronted with a problem, think "I know, I'll use regular expressions."
 Now they have two problems.

Flavors

All modern regular expression flavors can trace their history back to the Perl programming language (Perl-style regular expressions).

Perl
Perl Compatible Regular Expressions (PCRE ) is a C library developed by Philip Hazel ^[1].
.NET
Java. In Java 4 the first release of Regex.
JavaScript
Python
Ruby

General

To start with RegEx the user needs to know what RegEx are. The following websites and references contain this information. Regular-Expressions.info ^[2] gives a very good explanation on Regular Expressions. The site has also a download of the very useful program RegexBuddy ^[3]. There is also a free tool written in JavaScript by Steven Levithan ^[4].

Another good starting point for Regular expressions is the description of the javascript implementation on W3schools.com ^[5].

Looking for a regular expressions but can not find one? The library on RegExLib.com ^[6] offers a wide range of examples.

Wanna know how to use Regex in a web environment. WebReference.com ^[7] offers an example of using Regex with Javascript.

The Regular Expressions Cookbook ^[8] is written by the same author as the RegexBuddy gives even more explanation on this subject.

Elements

Regex are based on building elements. See the ones below.

Anchors

Syntax	Example	Description
^(caret)	^. matches a and d in abc\ndef.	Start of line. Matches at the start of the string the regex pattern is applied to. Matches a position rather than a character.
\A	\A. matches a in abc\ndef.	Same as the caret but never matches after line breaks.
$(dollar)	.$ matches c and f in abc\ndef.	Matches at the end of the string the regex pattern is applied to. Matches a position rather than a character.
\Z	.\Z matches f in abc\ndef	Same as the dollar only
\b	.\b matches c in abc.	Word boundary matches the position between a word character (anything matched by \w) and a non-word character.
\B	.\B matches b in abc.	Not word boundary matches the position between two word characters.
\m	\m. matches a in abctest .\m matches space in 'test for'	Start of word.
\M	\M. matches space and dot in 'test for.'	End of word.

Assertions

Syntax	Example	Description
(?=regex)	t(?=s) matches the 2^nd t in 'streets'. streets	Zero-width positive lookahead matches at a position where the pattern inside the lookahead can be matched. Matches only the position. It does not consume any characters or expand the match. In a pattern like one(?=two)three, both two and three have to match at the position where the match of one ends. Looks for a character succeeded by the lookahead character.
(?!regex)	t(?!s) matches the 1^st t in 'streets'. streets	Zero-width negative lookahead is identical to positive lookahead, except that the overall match will only succeed if the regex inside the lookahead fails to match. Looks for a character not succeeded by the lookahead character.
(?<=text)	(?<=s)t matches the 1^st t in 'streets'. streets	Zero-width positive look-behind matches at a position to the left of which text appears. Since regular expressions cannot be applied backwards, the test inside the look-behind can only be plain text. Some regex flavors allow alternation of plain text options in the look-behind.
(?<!text)	(?<!s)t matches the 2^nd t in 'streets'. streets	Zero-width negative look-behind matches at a position if the text does not appear to the left of that position.
(?>text)	(?>\d+) matches 5 and 00 in '$ 5.00'.	Once-only Subexpression, Also known as possessive quantifier.
?()		Condition [if then]
?()\|		Condition [if then else]

Characters

Syntax	Example	Description
\c	\ce matches te in testing	Matches all characters (XPATH)
\s	\sf matches "space"f in 'test for'	White space
\S	\St matches the 2^nd occurence of t in testing.	Non white space.
\d	\d matches all 9s in test99ing.	Digit
\D	\D matches test and ing in test99ing.	Not digit
\w	\w matches test and for in 'test for.'	Word
\W	\W matches "space" and "dot" in 'test for.'	Not word
\xhh	\x20 matches "space" in 'test for.'	Hexadecimal character hh
\xxxx	\0O40 matches "space" in 'test for.'	Octal character xxxx

Iterators

Iteration qualifiers are metacharacters that are not regular expressions by themselves. Instead, they state how many iterations of the preceding expression there must be or can be, in order to match. These metacharacters are: *, + and ?.

Syntax	Example	Description
*	*test(\d)ing** matches testing, test9ing, test99ing in 'testing, test9ing, test99ing'.	Any number of occurrences
+	*test(\d)ing** matches test9ing, test99ing in 'testing, test9ing, test99ing'.	One or more
?	*test(\d)ing** matches testing, test9ing in 'testing, test9ing, test99ing'.	Zero or one
{n}	test(\d){1}ing matches test9ing in 'testing, test9ing, test99ing'.	n times exact
{n,m}	test(\d){2,5}ing matches test99ing in 'testing, test9ing, test99ing'.	n, n+1, ..., m times.

Without these qualifiers, a regular expression will match exactly one occurrence in the text.

Groups

Every time you create a group by (), you can re-use the found information in the replacement. See the table below for examples.

Syntax	Example	Description
(regex)	(abc){3} matches abcabcabc. The 1^st group matches abc.	Round brackets group the regex between them. They capture the text matched by the regex inside them that can be reused in a backreference, and they allow you to apply regex operators to the entire grouped regex.
(?:regex)	(?:abc){3} matches abc.	Non-capturing parentheses group the regex so you can apply regex operators. But do not capture anything and do not create backreferences.
\1 to \9	(abc\|def)=\1 matches abc=abc or def=def. But not abc=def.	Substituted with the text matched between the 1^st through 9,sup>th pair of capturing parentheses. Some regex flavors allow more than 9 backreferences.

Modifiers

Syntax	Example	Description
(?i)	te(?i)st matches teST but not TEST.	Turn on case insensitivity for the remainder of the regular expression. (?-i) Turn off case insensitivity.
(?s)		Turn on "dot matches newline" for the remainder of the regular expression. The s stands for 'single line' mode.
(?m)	(?m)te st matches test	Caret and dollar match after and before newlines for the remainder of the regular expression.test is possible.
(?x)	(?x)te st matches test.	Turn on free-spacing mode to ignore whitespace between regex tokens, and allow # comments. So also test(?m)# This will also match
(?i-sm:regex)	Combine options	Matches the regex inside the span with the options "i" and "m" turned on, and "s" turned off.

Quotation

Syntax	Example	Description
\	\- means literal the '-' character.	Nothing, but quotes the following character
\Q	\Q....\E means literal '....' characters. Same as \.\.\.\.	Nothing, but quotes all characters until \E
\E		Nothing, but ends quoting started by \Q

Ranges

Syntax	Example	Description
.	. matches abc in 'abc'	Any character (the used character is a dot) except new line (\n).
(a\|b)	(a\|b) matches ab in 'abc'.	a or b
(...)	(ab) matches ab in 'abc'.	Group. The character has to be in the same sequence.
(?:...)	(?:ab) matches ab in 'abc'.	Passive group does not create groups for back references.
[abc]	[abc] matches abc in 'Duplicate test'.	Range a, b or c.
[^tes]	[^ste] matches r in 'streets'	Not s, t or e.
[a-q]	[a-q] matches 'e' in 'streets'.	Letters between a and q.

CodeWright

The implementation of regular expressions in applications and computer language is not equal and therefor can be very different to use. Below a few examples using CodeWright Search & Replace options.

Find

Regex	Meaning	Matches
([ ]+[0-9]+)	1 or more Spaces, 1 or more digits	[ 123456 ]

Replace

Find Regex	Meaning	Replace Regex	Meaning
( )([0-9][0-9][0-9])( )	Space 3 digits Space ..XXXX 123 YYYY...	\10\2\3	Replace the 1^st group, insert zero (0), 2^nd and 3^rd ...XXXX 0123 YYYY... Please remark in CW the better way ( )([\d]{3})( ) does not work, because CW does only have the iteration qualifiers *, + and ?.

Examples

XML

The following example is from the Regular Expression Cookbook ^[8]. Remove all XML Style Tags except <em> and <strong> from an XML or html page.

(?xm)                 # Permits comments and multiple lines
< /?                  # Permit closing tag
(?!                   # Negative lookahead
   (?: em | strong)   #   List of tags to avoid match
   \b                 #   Word boundary avoids partial word matches
)
[a-z]                 # Tag name initial character must be a-z
(?: [^>"']            #   Any char except >, " or '
  | "[^"]*"           #   Double quoted attribute values
  | '[^']*'           #   Single quoted attribute values
)*
>

Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby

Reference

top

↑ PCRE, Philip Hazel. The PCRE library is free, even for building commercial software.
↑ Regular-Expressions.info, the premier website about Regular Expressions, Tutorials, Language examples, books, and references made by Jan Goyvaerts
↑ ^3.0 ^3.1 RegexBuddy is really your perfect (software) companion for working with regular expressions. See the live demos. Shows also the capability of the different Regex implementations (Java, Javascripts, Perl and more).
↑ ^4.0 ^4.1 RegexPal written by Steven Levithan in JavaScript. The only thing you need is a webbrowser.
↑ W3Schools, Full Web Building Tutorials, Free webtutorials
↑ RegExLib.com, Regular Expressions Library with description of the used expressions. Also an RegEx tester.
↑ Webreference.com, One of the oldest (created in 1995) and most respected Web development sites, WebReference.com is all about the Web and Webmastery. From browsing to authoring, HTML to advanced site design, we'll keep you informed.
↑ ^8.0 ^8.1 Regular Expressions Cookbook, Jan Goyvaerts and Steven Levithan, 510 pages, O'Reilly Media, ISBN-10 0596520689, Also available for the Kindle

[1] PCRE, Philip Hazel. The PCRE library is free, even for building commercial software.

[2] Regular-Expressions.info, the premier website about Regular Expressions, Tutorials, Language examples, books, and references made by Jan Goyvaerts

[RegexBuddy-3] 3.0 ^3.1 RegexBuddy is really your perfect (software) companion for working with regular expressions. See the live demos. Shows also the capability of the different Regex implementations (Java, Javascripts, Perl and more).

[RegexPal-4] 4.0 ^4.1 RegexPal written by Steven Levithan in JavaScript. The only thing you need is a webbrowser.

[5] W3Schools, Full Web Building Tutorials, Free webtutorials

[6] RegExLib.com, Regular Expressions Library with description of the used expressions. Also an RegEx tester.

[7] Webreference.com, One of the oldest (created in 1995) and most respected Web development sites, WebReference.com is all about the Web and Webmastery. From browsing to authoring, HTML to advanced site design, we'll keep you informed.

[RECookbook-8] 8.0 ^8.1 Regular Expressions Cookbook, Jan Goyvaerts and Steven Levithan, 510 pages, O'Reilly Media, ISBN-10 0596520689, Also available for the Kindle

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

Regular expressions

Contents

Flavors

General

Elements

Anchors

Assertions

Characters

Iterators

Groups

Modifiers

Quotation

Ranges

CodeWright

Find

Replace

Examples

XML

See also

Regex Desktop Testers

Regex Online Test

Grep

Tools

Tutorial

Reference

Navigation menu