Quick Start | Tutorial | Tools & Languages | Examples | Reference | Book Reviews |

Replacement Text Case Conversion

Some applications can insert the text matched by the regex or by capturing groups converted to uppercase or lowercase. The Just Great Software applications allow you to prefix the matched text token \0 and the backreferences \1 through \99 with a letter that changes the case of the inserted text. U is for uppercase, L for lowercase, I for initial capitals (first letter of each word is uppercase, rest is lowercase), and F for first capital (first letter in the inserted text is uppercase, rest is lowercase). The letter only affects the case of the backreference that it is part of.

When the regex (?i)(Hello) (World) matches HeLlO WoRlD the replacement text \U1 \L2 \I0 \F0 becomes HELLO world Hello World Hello world.

Perl String Features in Regular Expressions and Replacement Texts

The double-slashed and triple-slashed notations for regular expressions and replacement texts in Perl support all the features of double-quoted strings. Most obvious is variable interpolation. You can insert the text matched by the regex or capturing groups simply by using the regex-related variables in your replacement text.

Perl's case conversion escapes also work in replacement texts. The most common use is to change the case of an interpolated variable. \U converts everything up to the next \L or \E to uppercase. \L converts everything up to the next \U or \E to lowercase. \u converts the next character to uppercase. \l converts the next character to lowercase. You can combine these into \l\U to make the first character lowercase and the remainder uppercase, or \u\L to make the first character uppercase and the remainder lowercase. \E turns off case conversion. You cannot use \u or \l after \U or \L unless you first stop the sequence with \E.

When the regex (?i)(hello) (world) matches HeLlO WoRlD the replacement text \U\l$1\E \L\u$2 becomes hELLO World. Literal text is also affected. \U$1 Dear $2 becomes HELLO DEAR WORLD.

Perl's case conversion works in regular expressions too. But it doesn't work the way you might expect. Perl applies case conversion when it parses a string in your script and interpolates variables. That works great with backreferences in replacement texts, because those are really interpolated variables in Perl. But backreferences in the regular expression are regular expression tokens rather than variables. (?-i)(a)\U\1 matches aa but not aA. \1 is converted to uppercase while the regex is parsed, not during the matching process. Since \1 does not include any letters, this has no effect. In the regex \U\w, \w is converted to uppercase while the regex is parsed. This means that \U\w is the same as \W, which matches any character that is not a word character.

Boost's Replacement String Case Conversion

Boost supports case conversion in replacement strings when using the default replacement format or the "all" replacement format. \U converts everything up to the next \L or \E to uppercase. \L converts everything up to the next \U or \E to lowercase. \u converts the next character to uppercase. \l converts the next character to lowercase. \E turns off case conversion. As in Perl, the case conversion affects both literal text in your replacement string and the text inserted by backreferences.

where Boost differs from Perl is that combining these needs to be done the other way around. \U\l makes the first character lowercase and the remainder uppercase. \L\u makes the first character uppercase and the remainder lowercase. Boost also allows \l inside a \U sequence and a \u inside a \L sequence. So when (?i)(hello) (world) matches HeLlO WoRlD you can use \L\u\1 \u\2 to replace the match with Hello World.

PowerGREP—The world’s most powerful tool to flex your regex muscles!

PCRE2's Replacement String Case Conversion

PCRE2 supports case conversion in replacement strings when using PCRE2_SUBSTITUTE_EXTENDED. \U converts everything that follows to uppercase. \L converts everything that follows to lowercase. \u converts the next character to uppercase. \l converts the next character to lowercase. \E turns off case conversion. As in Perl, the case conversion affects both literal text in your replacement string and the text inserted by backreferences.

Unlike in Perl, in PCRE2 \U, \L, \u, and \l all stop any preceding case conversion. So you cannot combine \L and \u, for example, to make the first character uppercase and the remainder lowercase. \L\u makes the first character uppercase and leaves the rest unchanged, just like \u. \u\L makes all characters lowercase, just like \L.

In PCRE2, case conversion runs through conditionals. Any case conversion in effect before the conditional also applies to the conditional. If the conditional contains its own case conversion escapes in the part of the conditional that is actually used, then those remain in effect after the conditional. So you could use ${1:+\U:\L}${2} to insert the text matched by the second capturing group in uppercase if the first group participated, and in lowercase if it didn't.

R's Backreference Case Conversion

The sub() and gsub() functions in R support case conversion escapes that are inspired by Perl strings. \U converts all backreferences up to the next \L or \E to uppercase. \L converts all backreferences up to the next \U or \E to lowercase. \E turns off case conversion.

When the regex (?i)(Hello) (World) matches HeLlO WoRlD the replacement string \U$1 \L$2 becomes HELLO world. Literal text is not affected. \U$1 Dear $2 becomes HELLO Dear WORLD.

Make a Donation

Did this website just save you a trip to the bookstore? Please make a donation to support this site, and you'll get a lifetime of advertisement-free access to this site!

Quick Start | Tutorial | Tools & Languages | Examples | Reference | Book Reviews |

Introduction | Characters | Non-Printable Characters | Matched Text | Backreferences | Match Context | Case Conversion | Conditionals |