The Just Great Software Regular Expression Engine

The Just Great Software (JGsoft) regular expression engine is designed and developed by Jan Goyvaerts, who is also the author of this regular expressions tutorial. This engine is the core of the Just Great Software products PowerGREP (version 3 and later), RegexBuddy (version 2 and later), and RegexMagic (all versions). It also powers the regular expression search functions in EditPad Pro (version 6 and later), EditPad Lite (version 7 and later), and AceText (version 2 and later).

This engine and its regular expression flavor were specifically developed for PowerGREP, RegexBuddy, and RegexMagic. The JGsoft flavor offers a blend of all the features found in the most popular regular expression flavors. It fully supports Unicode and a wide range of legacy code pages. The engine in PowerGREP (version 3 and later) and EditPad Pro (version 7 and later) can process files larger than 4GB without splitting the files in arbitrary chunks.

RegexBuddy and RegexMagic have a special version of the Just Great Software engine that can simulate the limitations and flavor-specific features of many other regular expression flavors. When you select a regex flavor in RegexBuddy or RegexMagic, all of their features use that flavor. Even the built-in grep.

The Just Great Software engine is not available as a component that can be embedded into other applications. It was specifically designed to work within Just Great Software products, rather than as a general-purpose engine. Unless you're developing a tool like PowerGREP or RegexBuddy in which regular expressions are the core functionality, the regex flavor already provided by your development environment will be more than sufficient. Just do your users a favor and state very clearly which regex flavor your application uses. Then they can easily look up the specifics on this site.

JGsoft V2

PowerGREP 5 introduces a major new version of the JGsoft regex flavor, which we'll call JGsoft V2. This flavor is also available in RegexBuddy 4 and RegexMagic 2 when you select PowerGREP 5 as your application. JGsoft V2 brings some significant new features. It also brings a few breaking changes.

JGsoft V2 now supports balancing groups like the .NET regex flavor and branch reset groups like Perl and PCRE. Also new is character class intersection using the [class&&[intersect]] syntax like Java and Ruby. The nested pair of square brackets is required. JGsoft V2 does not support the [class&&intersect] syntax as this could lead people to write [class&&intersect&&again] which behaves unpredictably in Java and Ruby.

In Perl and PCRE you can use \K to keep text out of the match to work around their restrictions on lookbehind. While \K is not really needed in JGsoft V2 with its unrestricted lookbehind, you can now use \K in JGsoft V2 like you would in Perl or PCRE if you are used to writing your regexes that way.

Perl, PCRE, and Ruby all support regular expression recursion and subroutines. These three have largely copied each others syntax, resulting in multiple ways to write recursion and subroutines. But these three have not copied each others matching behavior, resulting in clear behavioral differences despite the similar syntax. JGsoft V2 provides three sets of syntax for recursion and subroutine calls. Each set of syntax follows the matching behavior of one of these three flavors. Like in PCRE, (?P>name) does not capture, reverts capturing groups, and is atomic. You can remember this syntax by its similarity to that of atomic groups. Unlike PCRE, JGsoft V2 also supports (?P>1) and (?P>0) so you can specify this behavior for a numbered call and for recursion. Like in Perl, (?R), (?1), and (?&name) do not capture, revert capturing groups, and allow backtracking. You can remember this syntax by the ampersand that is used in &subroutine(); calls in Perl code. Finally, like in Ruby, \g<0>, \g<1>, and \g<name> capture the match of the subroutine call, do not revert capturing groups, and allow backtracking. You can remember this syntax by the fact that Ruby's regex flavor does not support any other syntax for recursion and subroutine calls.

\h is a new shorthand character class for horizontal whitespace. It includes spaces, tabs, and all Unicode whitespace except line and paragraph breaks. \v used to be an escape that matches the vertical tab. Now \v is a shorthand for vertical whitespace. This includes the vertical tab, line breaks, page breaks, and paragraph breaks. \v matches CR and LF separately. \H and \V are the negated versions of these two new shorthands.

\R is a new special escape that matches any line break, including Unicode line breaks. What makes it special is that it treats CRLF pairs as indivisible. It matches CR and LF on their own when they occur in the subject string on their own. But when the subject string contains CRLF as a sequence, \R matches the entire CRLF pair.

\l and \u are now shorthands for \p{Ll} and \p{Lu}. These match any Unicode lowercase or uppercase character. These tokens are always case sensitive.

POSIX classes using the notation [[:alpha:]] now match only ASCII characters. The \p{Alpha} notation still matches Unicode characters. [[:d:]], [[:s:]], [[:w:]], [[:l:]], and [[:u:]] are now shorthands for [[:digit:]], [[:space:]], [[:word:]], [[:lower:]], and [[:upper:]]. You can treat them as ASCII-only versions of \d, \s, \w, \l, and \u.

\i and \c are now XML shorthand character classes. \cA through \cZ are no longer supported as control character escapes.

Octal escapes must now be written as \o{377}. The octal number can range from \o{0} through \o{177777}. The old \0377 syntax is now an error. The JGsoft flavor has never supported \377 as that is too confusing with the syntax for backreferences. \0 too is now an error, instead of matching a literal zero. Use \x00 to match NULL bytes.

The replacement string syntax has been extended with replacement string conditionals to make this possible. (?1matched:unmatched) inserts matched if the first capturing group participated in the match or unmatched if it did not. Just like conditionals in the regular expression, a capturing group that finds a zero-length match is considered to have participated. When using named capturing groups, you can use (?{name}matched:unmatched) to reference them in replacement string conditionals. You can use the full replacement string syntax inside a conditional, including nested conditionals. You can omit the parentheses if the conditional is at the end of the replacement string. The syntax for replacement string conditionals is the same as that in the C++ Boost library. The only differences are that Boost uses parentheses for grouping anywhere in the replacement string, while JGsoft V2 only recognizes them around conditionals. This way you don't need to escape literal parentheses. Conditionals that reference non-existing groups are an error in JGsoft V2. In Boost they insert the "unmatched" text.

As a consequence of adding this syntax, JGsoft V2 treats \?, \:, \(, and \) as escaped characters that insert one of these four punctuation characters literally. The original JGsoft flavor treated these as literal backslashes, inserting both the backslash and the following punctuation character literally into the replacement.

Make a Donation

Did this website just save you a trip to the bookstore? Please make a donation to support this site, and you'll get a lifetime of advertisement-free access to this site!

Quick Start | Tutorial | Tools & Languages | Examples | Reference | Book Reviews |

grep | PowerGREP | RegexBuddy | RegexMagic |

EditPad Lite | EditPad Pro |

Boost | Delphi | GNU (Linux) | Groovy | Java | JavaScript | .NET | PCRE (C/C++) | PCRE2 (C/C++) | Perl | PHP | POSIX | PowerShell | Python | R | Ruby | std::regex | Tcl | VBScript | Visual Basic 6 | wxWidgets | XML Schema | Xojo | XQuery & XPath | XRegExp |

MySQL | Oracle | PostgreSQL |