Regular Expression Unicode Syntax Reference

Tutorial | Tools & Languages | Examples | Books & Reference |

Regular Expr. Cookbook | Teach Yourself Reg. Expr. | Mastering Regular Expr. | Java Regular Expressions | Oracle Regular Expr. | Regular Expr. Pocket Ref. | Regular Expr. Recipes | Regex Recipes for Windows |

Basic Regex Syntax | Advanced Regex Syntax | Unicode-Specific Syntax | Flavor-Specific Syntax | Flavor Comparison | Replacement Syntax |

RegexBuddy Get RegexBuddy to create and edit regex patterns with RegexBuddy's easy-to-grasp regex blocks and intuitive regex tree, instead of or in combination with the traditional regex syntax.
Unicode Characters
CharacterDescriptionExample
\X Matches a single Unicode grapheme, whether encoded as a single code point or multiple code points using combining marks. A grapheme most closely resembles the everyday concept of a "character". \X matches à encoded as U+0061 U+0300, à encoded as U+00E0, ©, etc.
\uFFFF where FFFF are 4 hexadecimal digits Matches a specific Unicode code point. Can be used inside character classes. \u00E0 matches à encoded as U+00E0 only. \u00A9 matches ©
\x{FFFF} where FFFF are 1 to 4 hexadecimal digits Perl syntax to match a specific Unicode code point. Can be used inside character classes. \x{E0} matches à encoded as U+00E0 only. \x{A9} matches ©
Unicode Properties, Scripts and Blocks
CharacterDescriptionExample
\p{L} or \p{Letter} Matches a single Unicode code point that has the property "letter". See Unicode Character Properties in the tutorial for a complete list of properties. Each Unicode code point has exactly one property. Can be used inside character classes. \p{L} matches à encoded as U+00E0; \p{S} matches ©
\p{Arabic} Matches a single Unicode code point that is part of the Unicode script "Arabic". See Unicode Scripts in the tutorial for a complete list of scripts. Each Unicode code point is part of exactly one script. Can be used inside character classes. \p{Thai} matches one of 83 code points in Thai script, from until
\p{InBasicLatin} Matches a single Unicode code point that is part of the Unicode block "BasicLatin". See Unicode Blocks in the tutorial for a complete list of blocks. Each Unicode code point is part of exactly one block. Blocks may contain unassigned code points. Can be used inside character classes. \p{InLatinExtended-A} any of the code points in the block U+100 until U+17F (Ā until ſ)
\P{L} or \P{Letter} Matches a single Unicode code point that does not have the property "letter". You can also use \P to match a code point that is not part of a particular Unicode block or script. Can be used inside character classes. \P{L} matches ©
RegexBuddy Get RegexBuddy to create and edit regex patterns with RegexBuddy's easy-to-grasp regex blocks and intuitive regex tree, instead of or in combination with the traditional regex syntax.

Tutorial | Tools & Languages | Examples | Books & Reference |

Regular Expr. Cookbook | Teach Yourself Reg. Expr. | Mastering Regular Expr. | Java Regular Expressions | Oracle Regular Expr. | Regular Expr. Pocket Ref. | Regular Expr. Recipes | Regex Recipes for Windows |

Basic Regex Syntax | Advanced Regex Syntax | Unicode-Specific Syntax | Flavor-Specific Syntax | Flavor Comparison | Replacement Syntax |