From e3cda45e7f6453436ec1761206026f6d545ba866 Mon Sep 17 00:00:00 2001 From: Eemeli Aro Date: Wed, 5 Feb 2025 20:00:59 +0200 Subject: [PATCH 1/4] Select null locale (zxx) as proposed solution --- README.md | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index 67f458d..cb4916e 100644 --- a/README.md +++ b/README.md @@ -29,15 +29,7 @@ The `Intl` formatters do not currently support this well. For example, the top StackOverflow suggestion for how to format a date using ISO-8601 formatting is to [use Swedish as the locale](https://stackoverflow.com/a/58633686). -## Possible Solutions - -It's entirely possible for a solution to this to be found in ECMA-262 outside `Intl`. -Two possible approaches are presented: -one that extends the `Intl` formatters -to support non-internationalization usage for the desired formatting, -and another that's a purely ECMA-262 solution. - -### Add a new "null" locale +## Proposed Solution Define in ECMA-402 the behaviour of each of the formatters for the `zxx` null locale. This locale identifier (which stands for for "no linguistic content; not applicable") @@ -61,6 +53,8 @@ new Intl.DateTimeFormat('zxx').format(new Date()) === '2023-09-01' (12345.67).toLocaleString(null) === '12345.67' ``` +## Alternatives + ### Add options to ECMA-262 formatters Change @@ -100,8 +94,6 @@ to replace the `radix` argument, determining behaviour based on that argument's This approach would not include any equivalent of the `Intl` formatters' `formatToParts` methods. -## Alternatives - ### Add an "undetermined" locale A prior version of this proposal used the "undetermined" `und` locale instead of `zxx`. From b14023cbd95f57129718aa213861cb5944dfc60e Mon Sep 17 00:00:00 2001 From: Eemeli Aro Date: Wed, 5 Feb 2025 19:59:45 +0200 Subject: [PATCH 2/4] Add Intl API specifics --- README.md | 162 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 162 insertions(+) diff --git a/README.md b/README.md index cb4916e..fab6e32 100644 --- a/README.md +++ b/README.md @@ -53,6 +53,168 @@ new Intl.DateTimeFormat('zxx').format(new Date()) === '2023-09-01' (12345.67).toLocaleString(null) === '12345.67' ``` +### Intl.Collator + +When the `zxx` locale is used, [CLDR root collation](https://www.unicode.org/reports/tr35/tr35-collation.html#Root_Collation) +is used, with unified ideographs ordered either by block and then by code point, or by radical-stroke +(See [issue #13](https://github.com/tc39/proposal-stable-formatting/issues/13)). + +### Intl.DateTimeFormat + +When the `zxx` locale is used, the formatted output matches that used by Temporal. +To achieve that, the following default option values are applied: + +```js +{ + calendar: 'gregory', + numberingSystem: 'latn', + hour12: false, + hourCycle: 'h23' +} +``` + +In the formatted output, an RFC 9557 serialization of the input value is used when appropriate, +e.g. `2006-01-02`, `15:04:05`, `2006-01-02T15:04:05.999+01:00[Europe/Paris]`. +Only numerical representations of time and date values are used, as in: + +```js +const dtf = new Intl.DateTimeFormat(null, { month: "long" }); +dtf.format(new Date("2006-01-02")) === "1"; +``` + +The `hour12` and `hourCycle` options are validated but ignored, +and formatting is always done as if `hour12: false` was set. + +The `dayPeriod`, `weekday`, `era` options are validated but ignored. + +The `month` option values `'long'` and `'short'` are considered equivalent to its `'2-digit'` value, +and `'narrow'` is considered equivalent to its `'numeric'` value. + +If the `timeZoneName` option has a valid value, +the canonical IANA time zone identifier is used for the timezone, irrespective of the option value. + +If the `dateStyle` option has a valid value, +the formatted output always starts with a date formatted like `2006-01-02`. + +If the `timeStyle` option has a valid value, the formatted output always ends with a time formatted as follows: + +- `'full'` or `'long'`: `15:04:05+01:00[Europe/Paris]` +- `'medium'`: `15:04:05` +- `'short'`: `15:04` + +If both `dateStyle` and `timeStyle` options are set, the formatted output consists of +the formatted date, followed by `T` (U+0054), followed by the formatted time. + +### Intl.DisplayNames + +This API should not support the `zxx` locale. + +### Intl.DurationFormat + +When the `zxx` locale is used, the formatted duration should follow SI units as closely as possible. +In the locale options, `'latn'` is used as default value for the `numberingSystem` option. + +For all the options that support it, the value `'long'` results in the same formatting as when using the value `'short'`. + +The output for parts formatted as `'short'` consist of +the numerical value, followed by a space (U+0020), followed by the duration identifier. + +The output for parts formatted as `'narrow'` consist of +the numerical value, followed by the duration identifier. + +The formatted parts are concatenated together with a separator determined by the `style` option: + +- `'long'` or `'short'`: a comma followed by a space `, ` (U+002C U+0020) +- `'narrow'`: a space (U+0020) +- `'digital'`: a colon `:` (U+003A) + +The duration identifiers are as follows: + +- years: TBD +- months: TBD +- weeks: TBD +- days: `d` (U+0064) +- hours: `h` (U+0068) +- minutes: `min` (U+006D U+0069 U+006E) +- seconds: `s` (U+0073) +- milliseconds: `ms` (U+006D U+0073) +- microseconds: `μs` (U+03BC U+0073) +- nanoseconds: `ns` (U+006E U+0073) + +### Intl.ListFormat + +When the `zxx` locale is used, the `type` option value is validated but ignored, +and the output is determined by the `style` option: + +- `'long'` or `'short'`: list items are separated by a comma followed by a space `, ` (U+002C U+0020) +- `'narrow'`: list items are separated by a space (U+0020) + +### Intl.Locale + +TBD + +### Intl.NumberFormat + +When the `zxx` locale is used, the numerical part of the formatted output +alwas satisfies the [_StrNumericLiteral_](https://tc39.es/ecma262/#prod-StrNumericLiteral) grammar symbol. +In the locale options, `'latn'` is used as default value for the `numberingSystem` option. + +When used together with the `style: 'currency'` option, +the output includes the numerical value, followed by a space (U+0020), followed by the ISO currency code. +The `currencyDisplay` option value is validated but ignored. +If the [intl-currency-display-choices](https://github.com/tc39/proposal-intl-currency-display-choices) proposal +is accepted, using the `currencyDisplay: 'never'` option leaves out all but the numerical value from the output. + +When used together with the `style: 'percent'` option, +the output includes the numerical value followed by the U+0025 Percent Sign character. + +When used together with the `style: 'unit'` option, +the output is determined by the `unitDisplay` option: + +- `'short'`: numerical value, followed by a space (U+0020), followed by the short unit identifier +- `'narrow'`: numerical value, followed by the short unit identifier +- `'long'`: numerical value, followed by a space (U+0020), followed by the long unit identifier + +The "long unit identifier" is the `unit` option value. +The "short unit identifier" is a locale-independent string derived from the `unit` option value +which will need to be explicitly defined from SI units and otherwise, +with e.g. `l` for `litre` and `TB` for `terabyte`. +Compound units are formed by replacing the `-per-` with a solidus `/` (U+002F) +and by mapping the unit parts separately to their short unit identifiers. + +When used together with the `notation: 'compact'` option, +the output includes the numerical value followed by an appropriate SI prefix: + +- 1012: `T` (U+0054) +- 109: `G` (U+0047) +- 106: `M` (U+004D) +- 103: `k` (U+006B) +- 10-3: `m` (U+006D) +- 10-6: `μ` (U+03BC) +- 10-9: `n` (U+006E) +- 10-12: `p` (U+0070) + +The `compactDisplay` option is validated but ignored. + +The `useGrouping` option is validated but ignored, +and grouping separators are never included in the output. + +### Intl.PluralRules + +When the `zxx` locale is used with otherwise valid options, +all calls to `select(number)` and `selectRange(startRange, endRange)` +will return `'other'`. + +### Intl.RelativeTimeFormat + +This API should not support the `zxx` locale. + +### Intl.Segmenter + +When the `zxx` locale is used, [UAX #29](https://unicode.org/reports/tr29/) segmentation +with extended grapheme clusters is used, without tailorings +(See [issue #13](https://github.com/tc39/proposal-stable-formatting/issues/13)). + ## Alternatives ### Add options to ECMA-262 formatters From 7e05797b63bd749389719812e1bd9eb9de9948c0 Mon Sep 17 00:00:00 2001 From: Eemeli Aro Date: Fri, 7 Feb 2025 11:46:12 +0200 Subject: [PATCH 3/4] Apply updates recommended by TG2 --- README.md | 63 +++++++++++++++++++++++++------------------------------ 1 file changed, 28 insertions(+), 35 deletions(-) diff --git a/README.md b/README.md index fab6e32..9fa50df 100644 --- a/README.md +++ b/README.md @@ -107,39 +107,17 @@ the formatted date, followed by `T` (U+0054), followed by the formatted time. ### Intl.DisplayNames -This API should not support the `zxx` locale. +When the `zxx` locale is used with valid formatting options, +calling the `of(code)` method with structurally valid input +will behave as if no matching display name is available, +and return either the requested code or `undefined`, +depending on the `fallback` option. ### Intl.DurationFormat -When the `zxx` locale is used, the formatted duration should follow SI units as closely as possible. -In the locale options, `'latn'` is used as default value for the `numberingSystem` option. - -For all the options that support it, the value `'long'` results in the same formatting as when using the value `'short'`. - -The output for parts formatted as `'short'` consist of -the numerical value, followed by a space (U+0020), followed by the duration identifier. - -The output for parts formatted as `'narrow'` consist of -the numerical value, followed by the duration identifier. - -The formatted parts are concatenated together with a separator determined by the `style` option: - -- `'long'` or `'short'`: a comma followed by a space `, ` (U+002C U+0020) -- `'narrow'`: a space (U+0020) -- `'digital'`: a colon `:` (U+003A) - -The duration identifiers are as follows: - -- years: TBD -- months: TBD -- weeks: TBD -- days: `d` (U+0064) -- hours: `h` (U+0068) -- minutes: `min` (U+006D U+0069 U+006E) -- seconds: `s` (U+0073) -- milliseconds: `ms` (U+006D U+0073) -- microseconds: `μs` (U+03BC U+0073) -- nanoseconds: `ns` (U+006E U+0073) +When the `zxx` locale is used with valid formatting options, +the formatted duration is an ISO 8601-2 duration, +such as `P2Y` (2 years), `PT2H30M` (2 hours and 30 minutes), or `P5DT0.001S` (5 days and 1 millisecond). ### Intl.ListFormat @@ -156,7 +134,7 @@ TBD ### Intl.NumberFormat When the `zxx` locale is used, the numerical part of the formatted output -alwas satisfies the [_StrNumericLiteral_](https://tc39.es/ecma262/#prod-StrNumericLiteral) grammar symbol. +always satisfies the [_StrNumericLiteral_](https://tc39.es/ecma262/#prod-StrNumericLiteral) grammar symbol. In the locale options, `'latn'` is used as default value for the `numberingSystem` option. When used together with the `style: 'currency'` option, @@ -201,13 +179,18 @@ and grouping separators are never included in the output. ### Intl.PluralRules -When the `zxx` locale is used with otherwise valid options, -all calls to `select(number)` and `selectRange(startRange, endRange)` -will return `'other'`. +When the `zxx` locale is used with valid formatting options, +calling the `select(number)` and `selectRange(startRange, endRange)` +methods with structurally valid inputs will always return `'other'`. ### Intl.RelativeTimeFormat -This API should not support the `zxx` locale. +When the `zxx` locale is used with valid formatting options, +the formatted relative time is an ISO 8601-2 duration +with either a Plus Sign `+` (U+002B) or a Hyphen-Minus `-` (U+002D) as its first character, +such as `+P2Y` (in 2 years), `-P1D` (yesterday), or `+PT10S` (in 10 seconds). + +Quarters are expressed in months. ### Intl.Segmenter @@ -215,6 +198,16 @@ When the `zxx` locale is used, [UAX #29](https://unicode.org/reports/tr29/) segm with extended grapheme clusters is used, without tailorings (See [issue #13](https://github.com/tc39/proposal-stable-formatting/issues/13)). +### Array.prototype.toLocaleString + +When the `zxx` locale is used, array items are concatenated with a comma `,` (U+002C) +as a separator. + +### String.prototype.toLocaleLowerCase & String.prototype.toLocaleUpperCase + +When the `zxx` locale is used, the string is converted to the appropriate case +according to the CLDR root locale case mappings. + ## Alternatives ### Add options to ECMA-262 formatters From 43d77fa53a0cf85363be80ac556a30d35e2f4bca Mon Sep 17 00:00:00 2001 From: Eemeli Aro Date: Fri, 7 Feb 2025 12:32:52 +0200 Subject: [PATCH 4/4] For case tranforms, use the Unicode Default Case Conversion algorithm --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 9fa50df..0373d68 100644 --- a/README.md +++ b/README.md @@ -206,7 +206,7 @@ as a separator. ### String.prototype.toLocaleLowerCase & String.prototype.toLocaleUpperCase When the `zxx` locale is used, the string is converted to the appropriate case -according to the CLDR root locale case mappings. +using the Unicode Default Case Conversion algorithm. ## Alternatives