Quick Start | Tutorial | Tools & Languages | Examples | Reference | Book Reviews |

RegexBuddy—The best regex editor and tester for Boost developers!

C++ Regular Expressions with Boost

Boost is a free source code library for C++. After downloading and unzipping, you need to run the bootstrap batch file or script and then run b2 --with-regex to compile Boost's regex library. Then add the folder into which you unzipped Boost to the include path of your C++ compiler. Add the stage\lib subfolder of that folder to your linker's library path. Then you can add #include <boost/regex.hpp> to your C++ code to make use of Boost regular expressions.

If you use C++Builder, you should download the Boost libraries for your specific version of C++Builder from Embarcadero. The version of Boost you get depends on your version of C++Builder and whether you're targeting Win32 or Win64. The Win32 compiler in XE3 through XE8, and the classic Win32 compiler in C++Builder 10 Seattle through 10.1 Berlin are all stuck on Boost 1.39. The Win64 compiler in XE3 through XE6 uses Boost 1.50. The Win64 compiler in XE7 through 10.1 Berlin uses Boost 1.55. The new C++11 Win32 compiler in C++Builder 10 and later uses the same version of boost as the Win64 compiler.

This website covers Boost 1.38, 1.39, and 1.42 through the latest 1.62. Boost 1.40 introduced many new regex features borrowed from Perl 5.10. But it also introduced some serious bugs that weren't fixed until Boost 1.42. So we completely ignore Boost 1.40 and 1.41. We still cover Boost 1.38 and 1.39 (which have identical regex features) because the classic Win32 C++Builder compiler is stuck on this version. If you're using another compiler, you should definitely use Boost 1.42 or later to avoid what are now old bugs. You should preferably use Boost 1.47 or later as this version changes certain behaviors involving backreferences that may change how some of your regexes behave if you later upgrade from pre-1.47 to post-1.47.

In practice, you'll mostly use the Boost's ECMAScript grammar. It's the default grammar and offers far more features that the other grammars. Whenever the tutorial on this website mentions Boost without mentioning any grammars then what is written applies to the ECMAScript grammar and may or may not apply to any of the other grammars. You'll really only use the other grammars if you want to reuse existing regular expressions from old POSIX code or UNIX scripts.

Boost And Regex Standards

The Boost documentation likes to talk about being compatible with Perl and JavaScript and how boost::regex was standardized as std::regex in C++11. When we compare the Dinkumware implementation of std::regex (included with Visual Studio and C++Builder) with boost::regex, we find that the class and function templates are almost the same. Your C++ compiler will just as happily compile code using boost::regex as it does compiling the same code using std::regex. So all the code examples given in the std::regex topic on this website work just fine with Boost if you replace std with boost.

But when you run your C++ application then it can make a big difference whether it is Dinkumware or Boost that is interpreting your regular expressions. Though both offer the same six grammars, their syntax and behavior are not the same between the two libraries. Boost defines regex_constants::perl which is not part of the C++11 standard. This is not actually an additional grammar but simply a synonym to ECMAScript and JavaScript. There are major differences in the regex flavors used by actual JavaScript and actual Perl. So it's obvious that a library treating these as one flavor or grammar can't be compatible with either. Boost's ECMAScript grammar is a cross between the actual JavaScript and Perl flavors, with a bunch of Boost-specific features and peculiarities thrown in. Dinkumware's ECMAScript grammar is closer to actual JavaScript, but still has significant behavioral differences. Dinkumware didn't borrow any features from Perl that JavaScript doesn't have.

The table below highlights the most important differences between the ECMAScript grammars in std::regex and Boost and actual JavaScript and Perl. Some are obvious differences in feature sets. But others are subtle differences in behavior that may bite you unexpectedly.

Feature std::regex Boost JavaScript Perl
Dot matches line breaks never default never option
Anchors match at line breaks always default option option
Line break characters CR, LF CR, LF, FF, NEL, LS, PSCR, LF, LS, PS LF
Backreferences to non-participating groupsMatch empty stringfail since 1.47Match empty stringfail
Empty character class Fails to match Not possibleFails to match Not possible
Free-spacing mode no YES no YES
Mode modifiers no YES no YES
Possessive quantifiers no YES no YES
Named capture no .NET syntax no .NET & Python syntax
Recursion no atomic no backtracking
Subroutines no backtracking no backtracking
Conditionals no YES no YES
Atomic groups no YES no YES
Atomic groups backtrack capturing groupsn/a no n/a YES
Start and end of word boundaries no YES no no
Standard POSIX classes YES YES no YES
Single letter POSIX classes no YES no no

Make a Donation

Did this website just save you a trip to the bookstore? Please make a donation to support this site, and you'll get a lifetime of advertisement-free access to this site!

Quick Start | Tutorial | Tools & Languages | Examples | Reference | Book Reviews |

grep | PowerGREP | RegexBuddy | RegexMagic |

EditPad Lite | EditPad Pro |

Boost | Delphi | GNU (Linux) | Groovy | Java | JavaScript | .NET | PCRE (C/C++) | PCRE2 (C/C++) | Perl | PHP | POSIX | PowerShell | Python | R | Ruby | std::regex | Tcl | VBScript | Visual Basic 6 | wxWidgets | XML Schema | Xojo | XQuery & XPath | XRegExp |

MySQL | Oracle | PostgreSQL |