| Tutorial | Tools & Languages | Examples | Books & Reference |
| Regular Expr. Cookbook | Teach Yourself Reg. Expr. | Mastering Regular Expr. | Java Regular Expressions | Oracle Regular Expr. | Regular Expr. Pocket Ref. | Regular Expr. Recipes | Regex Recipes for Windows |
| Basic Regex Syntax | Advanced Regex Syntax | Unicode-Specific Syntax | Flavor-Specific Syntax | Flavor Comparison | Replacement Syntax |
| Unicode Characters | ||
|---|---|---|
| Character | Description | Example |
| \X | Matches a single Unicode grapheme, whether encoded as a single code point or multiple code points using combining marks. A grapheme most closely resembles the everyday concept of a "character". | \X matches à encoded as U+0061 U+0300, à encoded as U+00E0, ©, etc. |
| \uFFFF where FFFF are 4 hexadecimal digits | Matches a specific Unicode code point. Can be used inside character classes. | \u00E0 matches à encoded as U+00E0 only. \u00A9 matches © |
| \x{FFFF} where FFFF are 1 to 4 hexadecimal digits | Perl syntax to match a specific Unicode code point. Can be used inside character classes. | \x{E0} matches à encoded as U+00E0 only. \x{A9} matches © |
| Unicode Properties, Scripts and Blocks | ||
| Character | Description | Example |
| \p{L} or \p{Letter} | Matches a single Unicode code point that has the property "letter". See Unicode Character Properties in the tutorial for a complete list of properties. Each Unicode code point has exactly one property. Can be used inside character classes. | \p{L} matches à encoded as U+00E0; \p{S} matches © |
| \p{Arabic} | Matches a single Unicode code point that is part of the Unicode script "Arabic". See Unicode Scripts in the tutorial for a complete list of scripts. Each Unicode code point is part of exactly one script. Can be used inside character classes. | \p{Thai} matches one of 83 code points in Thai script, from ก until ๙ |
| \p{InBasicLatin} | Matches a single Unicode code point that is part of the Unicode block "BasicLatin". See Unicode Blocks in the tutorial for a complete list of blocks. Each Unicode code point is part of exactly one block. Blocks may contain unassigned code points. Can be used inside character classes. | \p{InLatinExtended-A} any of the code points in the block U+100 until U+17F (Ā until ſ) |
| \P{L} or \P{Letter} | Matches a single Unicode code point that does not have the property "letter". You can also use \P to match a code point that is not part of a particular Unicode block or script. Can be used inside character classes. | \P{L} matches © |
| Tutorial | Tools & Languages | Examples | Books & Reference |
| Regular Expr. Cookbook | Teach Yourself Reg. Expr. | Mastering Regular Expr. | Java Regular Expressions | Oracle Regular Expr. | Regular Expr. Pocket Ref. | Regular Expr. Recipes | Regex Recipes for Windows |
| Basic Regex Syntax | Advanced Regex Syntax | Unicode-Specific Syntax | Flavor-Specific Syntax | Flavor Comparison | Replacement Syntax |
Page URL: http://regular-expressions.mobi/refunicode.html
Page last updated: 17 June 2009
Site last updated: 02 December 2010
Copyright © 2003-2012 Jan Goyvaerts. All rights reserved.