In free-spacing mode, whitespace between regular expression tokens is ignored. Whitespace includes spaces, tabs and line breaks. Note that only whitespace between tokens is ignored. a b c is the same as abc in free-spacing mode. But \ d and \d are not the same. The former matches d, while the latter matches a digit. \d is a single regex token composed of a backslash and a "d". Breaking up the token with a space gives you an escaped space (which matches a space), and a literal "d".
Likewise, grouping modifiers cannot be broken up. (?>atomic) is the same as (?> ato mic ) and as ( ?>ato mic). They all match the same atomic group. They're not the same as (? >atomic). In fact, the latter will cause a syntax error. The ?> grouping modifier is a single element in the regex syntax, and must stay together. This is true for all such constructs, including lookaround, named groups, etc.
A character class is also treated as a single token. [abc] is not the same as [ a b c ]. The former matches one of three letters, while the latter matches those three letters or a space. In other words: free-spacing mode has no effect inside character classes. Spaces and line breaks inside character classes will be included in the character class. This means that in free-spacing mode, you can use \ or [ ] to match a single space. Use whichever you find more readable. The hexadecimal escape \x20 also works, of course.
Java, however, does not treat a character class as a single token in free-spacing mode. Java does ignore whitespace and comments inside character classes. So in Java's free-spacing mode, [abc] is identical to [ a b c ]. To add a space to a character class, you'll have to escape it with a backslash. But even in free-spacing mode, the negating caret must appear immediately after the opening bracket. [ ^ a b c ] matches any of the four characters ^, a, b or c just like [abc^] would. With the negating caret in the proper place, [^ a b c ] matches any character that is not a, b or c.
Another feature of free-spacing mode is that the # character starts a comment. The comment runs until the end of the line. Everything from the # until the next line break character is ignored.
XPath and Oracle do not support comments within the regular expression, even though they have a free-spacing mode. They always treat # as a literal character.
Java is the only flavor that treats # as the start of a comment inside character classes in free-spacing mode. The comment runs until the end of the line, so you can use a ] to close a comment. All other flavors treat # as a literal inside character classes.
Putting it all together, I could clarify the regex to match a valid date by writing it across multiple lines as:
# Match a 20th or 21st century date in yyyy-mm-dd format
(19|20)\d\d # year (group 1)
[- /.] # separator
(0[1-9]|1) # month (group 2)
[- /.] # separator
(0[1-9]|[0-9]|3) # day (group 3)
Many flavors also allow you to add comments to your regex without using free-spacing mode. The syntax is (?#comment) where "comment" can be whatever you want, as long as it does not contain a closing parenthesis. The regex engine ignores everything after the (?# until the first closing parenthesis.
Of the flavors discussed in this tutorial, all flavors that support comment in free-spacing mode, except Java and Tcl, also support (?#comment). The flavors that don't support comments in free-spacing mode or don't support free-spacing mode at all also don't support (?#comment).
Did this website just save you a trip to the bookstore? Please make a donation to support this site, and you'll get a lifetime of advertisement-free access to this site!
| Quick Start | Tutorial | Tools & Languages | Examples | Reference | Book Reviews |
| Introduction | Table of Contents | Special Characters | Non-Printable Characters | Regex Engine Internals | Character Classes | Character Class Subtraction | Character Class Intersection | Shorthand Character Classes | Dot | Anchors | Word Boundaries | Alternation | Optional Items | Repetition | Grouping & Capturing | Backreferences | Backreferences, part 2 | Named Groups | Relative Backreferences | Branch Reset Groups | Free-Spacing & Comments | Unicode | Mode Modifiers | Atomic Grouping | Possessive Quantifiers | Lookahead & Lookbehind | Lookaround, part 2 | Keep Text out of The Match | Conditionals | Balancing Groups | Recursion | Subroutines | Recursion & Capturing | Recursion & Backreferences | Recursion & Backtracking | POSIX Bracket Expressions | Zero-Length Matches | Continuing Matches |
Page URL: http://regular-expressions.mobi/freespacing.html
Page last updated: 03 June 2016
Site last updated: 06 March 2017
Copyright © 2003-2017 Jan Goyvaerts. All rights reserved.