Quick Start | Tutorial | Tools & Languages | Examples | Reference | Book Reviews |

Non-Printable Characters

Most applications and programming languages do not support any special syntax in the replacement text to make it easier to enter non-printable characters. If you are the end user of an application, that means you'll have to use an application such as the Windows Character Map to help you enter characters that you cannot type on your keyboard. If you are programming, you can specify the replacement text as a string constant in your source code. Then you can use the syntax for string constants in your programming language to specify non-printable characters.

The Just Great Software applications are an exception. They allow you to use special escape sequences to enter a few common control characters. Use \t to replace with a tab character (ASCII 0x09), \r for carriage return (0x0D), and \n for line feed (0x0A). Remember that Windows text files use \r\n to terminate lines, while UNIX text files use \n.

Python also supports the above escape sequences in replacement text, in addition to supporting them in string constants. Python and Boost also support these more exotic non-printables: \a (bell, 0x07), \f (form feed, 0x0C) and \v (vertical tab, 0x0B).

The Just Great Software applications and Boost also support hexadecimal escapes. You can use \uFFFF or \x{FFFF} to insert a Unicode character. The euro currency sign occupies Unicode code point U+20AC. If you cannot type it on your keyboard, you can insert it into the replacement text with \x{20AC}. The JGsoft apps also support \u20AC. For the 127 ASCII characters, you can use \x00 through \x7F. If you are working with files using 8-bit code pages in EditPad or PowerGREP, or are using Boost with 8-bit character strings, you can also use \x80 through \xFF to insert characters from those 8-bit code pages.

Python does not support hexadecimal escapes in the replacement text syntax, even though it supports \xFF and \uFFFF in string constants.

Regex Syntax versus String Syntax

Many programming languages support escapes for non-printable characters in their syntax for literal strings in source code. Then such escapes are translated by the compiler into their actual characters before the string is passed to the search-and-replace function. If the search-and-replace function does not support the same escapes, this can cause an apparent difference in behavior when a regex is specified as a literal string in source code compared with a regex that is read from a file or received from user input. For example, JavaScript's string.replace() function does not support any of these escapes. But the JavaScript language does support escapes like \n, \x0A, and \u000A in string literals. So when developing an application in JavaScript, \n is only interpreted as a newline when you add the replacement text as a string literal to your source code. Then the JavaScript interpreter then translates \n and the string.replace() function sees an actual newline character. If your code reads the same replacement text from a file, then string.replace() function sees \n, which it treats as a literal backslash and a literal n.

Make a Donation

Did this website just save you a trip to the bookstore? Please make a donation to support this site, and you'll get a lifetime of advertisement-free access to this site!

Quick Start | Tutorial | Tools & Languages | Examples | Reference | Book Reviews |

Introduction | Characters | Non-Printable Characters | Matched Text | Backreferences | Match Context | Case Conversion | Conditionals |