mirror of
https://github.com/alrayyes/wiki.git
synced 2024-11-22 11:36:23 +00:00
85 lines
1.9 KiB
Markdown
85 lines
1.9 KiB
Markdown
---
|
||
date: 2020-11-06
|
||
id: 5441479c-12e5-48c1-ab17-46a9c3df8bc3
|
||
title: JavaScript RegExp Unicode Property Escapes
|
||
---
|
||
|
||
# Description
|
||
|
||
Characters can be matched by mentioning their Unicode character
|
||
properties inside of `\p{}`:
|
||
|
||
# Proposal
|
||
|
||
RegExp Unicode Property Escapes[^1]
|
||
|
||
# Syntax
|
||
|
||
``` javascript
|
||
console.log(/^\p{White_Space}+$/u.test("\t \n\r")); // true
|
||
console.log(/^\p{Script=Greek}+$/u.test("μετά")); // true
|
||
```
|
||
|
||
# Properties
|
||
|
||
## Examples
|
||
|
||
### Name
|
||
|
||
A unique name, composed of uppercase letters, digits, hyphens and
|
||
spaces.
|
||
|
||
- A: Name = LATIN CAPITAL LETTER A
|
||
- 🙂: Name = SLIGHTLY SMILING FACE
|
||
|
||
### General~Category~
|
||
|
||
categorizes characters
|
||
|
||
- x: General~Category~ = Lowercase~Letter~
|
||
- \$: General~Category~ = Currency~Symbol~
|
||
|
||
### White~Scpace~
|
||
|
||
Used for marking invisible spacing characters, such as spaces, tabs and
|
||
newlines.
|
||
|
||
- :͡ White~Space~ = True
|
||
- π: White~Space~ = False
|
||
|
||
### Age
|
||
|
||
Version of the Unicode Standard in which a character was introduced. For
|
||
example: The Euro sign € was added in version 2.1 of the Unicode
|
||
standard.
|
||
|
||
- €: Age = 2.1
|
||
|
||
### Block
|
||
|
||
A contiguous range of code points. Blocks don’t overlap and their names
|
||
are unique.
|
||
|
||
- S: Block = Basic~Latin~ (range U+0000..U+007F)
|
||
- 🙂: Block = Emoticons (range U+1F600..U+1F64F)
|
||
|
||
### Script
|
||
|
||
A collection of characters used by one or more writing systems.
|
||
|
||
- Some scripts support several writing systems. For example, the Latin
|
||
script supports the writing systems English, French, German, Latin,
|
||
etc.
|
||
- Some languages can be written in multiple alternate writing systems
|
||
that are supported by multiple scripts. For example, Turkish used
|
||
the Arabic script before it transitioned to the Latin script in the
|
||
early 20th century.
|
||
|
||
1. Examples
|
||
|
||
- α: Script = Greek
|
||
- Д: Script = Cyrillic
|
||
|
||
# Footnotes
|
||
|
||
[^1]: <https://github.com/tc39/proposal-regexp-unicode-property-escapes>
|