Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Select null locale as proposed solution & add Intl API specifics #18

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
169 changes: 158 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,15 +29,7 @@ The `Intl` formatters do not currently support this well.
For example, the top StackOverflow suggestion for how to format a date using ISO-8601 formatting
is to [use Swedish as the locale](https://stackoverflow.com/a/58633686).

## Possible Solutions

It's entirely possible for a solution to this to be found in ECMA-262 outside `Intl`.
Two possible approaches are presented:
one that extends the `Intl` formatters
to support non-internationalization usage for the desired formatting,
and another that's a purely ECMA-262 solution.

### Add a new "null" locale
## Proposed Solution

Define in ECMA-402 the behaviour of each of the formatters for the `zxx` null locale.
This locale identifier (which stands for for "no linguistic content; not applicable")
Expand All @@ -61,6 +53,163 @@ new Intl.DateTimeFormat('zxx').format(new Date()) === '2023-09-01'
(12345.67).toLocaleString(null) === '12345.67'
```

### Intl.Collator

When the `zxx` locale is used, [CLDR root collation](https://www.unicode.org/reports/tr35/tr35-collation.html#Root_Collation)
is used, with unified ideographs ordered either by block and then by code point, or by radical-stroke
(See [issue #13](https://github.com/tc39/proposal-stable-formatting/issues/13)).

### Intl.DateTimeFormat

When the `zxx` locale is used, the formatted output matches that used by Temporal.
To achieve that, the following default option values are applied:

```js
{
calendar: 'gregory',
numberingSystem: 'latn',
hour12: false,
hourCycle: 'h23'
}
```

In the formatted output, an RFC 9557 serialization of the input value is used when appropriate,
e.g. `2006-01-02`, `15:04:05`, `2006-01-02T15:04:05.999+01:00[Europe/Paris]`.
Only numerical representations of time and date values are used, as in:

```js
const dtf = new Intl.DateTimeFormat(null, { month: "long" });
dtf.format(new Date("2006-01-02")) === "1";
```

The `hour12` and `hourCycle` options are validated but ignored,
and formatting is always done as if `hour12: false` was set.

The `dayPeriod`, `weekday`, `era` options are validated but ignored.

The `month` option values `'long'` and `'short'` are considered equivalent to its `'2-digit'` value,
and `'narrow'` is considered equivalent to its `'numeric'` value.

If the `timeZoneName` option has a valid value,
the canonical IANA time zone identifier is used for the timezone, irrespective of the option value.

If the `dateStyle` option has a valid value,
the formatted output always starts with a date formatted like `2006-01-02`.

If the `timeStyle` option has a valid value, the formatted output always ends with a time formatted as follows:

- `'full'` or `'long'`: `15:04:05+01:00[Europe/Paris]`
- `'medium'`: `15:04:05`
- `'short'`: `15:04`

If both `dateStyle` and `timeStyle` options are set, the formatted output consists of
the formatted date, followed by `T` (U+0054), followed by the formatted time.

### Intl.DisplayNames

When the `zxx` locale is used with valid formatting options,
calling the `of(code)` method with structurally valid input
will behave as if no matching display name is available,
and return either the requested code or `undefined`,
depending on the `fallback` option.

### Intl.DurationFormat

When the `zxx` locale is used with valid formatting options,
the formatted duration is an ISO 8601-2 duration,
such as `P2Y` (2 years), `PT2H30M` (2 hours and 30 minutes), or `P5DT0.001S` (5 days and 1 millisecond).

### Intl.ListFormat

When the `zxx` locale is used, the `type` option value is validated but ignored,
and the output is determined by the `style` option:

- `'long'` or `'short'`: list items are separated by a comma followed by a space `, ` (U+002C U+0020)
- `'narrow'`: list items are separated by a space (U+0020)

### Intl.Locale

TBD

### Intl.NumberFormat

When the `zxx` locale is used, the numerical part of the formatted output
always satisfies the [_StrNumericLiteral_](https://tc39.es/ecma262/#prod-StrNumericLiteral) grammar symbol.
In the locale options, `'latn'` is used as default value for the `numberingSystem` option.

When used together with the `style: 'currency'` option,
the output includes the numerical value, followed by a space (U+0020), followed by the ISO currency code.
The `currencyDisplay` option value is validated but ignored.
If the [intl-currency-display-choices](https://github.com/tc39/proposal-intl-currency-display-choices) proposal
is accepted, using the `currencyDisplay: 'never'` option leaves out all but the numerical value from the output.

When used together with the `style: 'percent'` option,
the output includes the numerical value followed by the U+0025 Percent Sign character.

When used together with the `style: 'unit'` option,
the output is determined by the `unitDisplay` option:

- `'short'`: numerical value, followed by a space (U+0020), followed by the short unit identifier
- `'narrow'`: numerical value, followed by the short unit identifier
- `'long'`: numerical value, followed by a space (U+0020), followed by the long unit identifier

The "long unit identifier" is the `unit` option value.
The "short unit identifier" is a locale-independent string derived from the `unit` option value
which will need to be explicitly defined from SI units and otherwise,
with e.g. `l` for `litre` and `TB` for `terabyte`.
Compound units are formed by replacing the `-per-` with a solidus `/` (U+002F)
and by mapping the unit parts separately to their short unit identifiers.

When used together with the `notation: 'compact'` option,
the output includes the numerical value followed by an appropriate SI prefix:

- 10<sup>12</sup>: `T` (U+0054)
- 10<sup>9</sup>: `G` (U+0047)
- 10<sup>6</sup>: `M` (U+004D)
- 10<sup>3</sup>: `k` (U+006B)
- 10<sup>-3</sup>: `m` (U+006D)
- 10<sup>-6</sup>: `μ` (U+03BC)
- 10<sup>-9</sup>: `n` (U+006E)
- 10<sup>-12</sup>: `p` (U+0070)

The `compactDisplay` option is validated but ignored.

The `useGrouping` option is validated but ignored,
and grouping separators are never included in the output.

### Intl.PluralRules

When the `zxx` locale is used with valid formatting options,
calling the `select(number)` and `selectRange(startRange, endRange)`
methods with structurally valid inputs will always return `'other'`.

### Intl.RelativeTimeFormat

When the `zxx` locale is used with valid formatting options,
the formatted relative time is an ISO 8601-2 duration
with either a Plus Sign `+` (U+002B) or a Hyphen-Minus `-` (U+002D) as its first character,
such as `+P2Y` (in 2 years), `-P1D` (yesterday), or `+PT10S` (in 10 seconds).

Quarters are expressed in months.

### Intl.Segmenter

When the `zxx` locale is used, [UAX #29](https://unicode.org/reports/tr29/) segmentation
with extended grapheme clusters is used, without tailorings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably say not to have tailorings for 'grapheme' and 'sentence', but for 'word' saying that would turn off behaviors that are de facto on by default in a cartch-all case.

How about:

The 'grapheme' mode shall use untailored UAX 29 extended grapheme cluster rules.

The 'sentence' mode shall use untailored UAX 29 default sentence boundary rules.

The 'word' mode shall use UAX 29 default word boundary rules with the tailorings that the implementation supports for scripts that do not use spaces between words. (Note: This is intended to enable word segmentation for e.g. Han, Thai, Lao, and Khmer scripts.) If the implementation supports more than one tailoring for a script that does not use spaces, the most broadly applicable one of the alternatives for a given script shall be used. (Note: The currently-known or expected implementations do not currently have multiple mutually-exclusive tailorings for scripts that don't use spaces.)

CC @makotokato @aethanyc

(See [issue #13](https://github.com/tc39/proposal-stable-formatting/issues/13)).

### Array.prototype.toLocaleString

When the `zxx` locale is used, array items are concatenated with a comma `,` (U+002C)
as a separator.

### String.prototype.toLocaleLowerCase & String.prototype.toLocaleUpperCase

When the `zxx` locale is used, the string is converted to the appropriate case
using the Unicode Default Case Conversion algorithm.

## Alternatives

### Add options to ECMA-262 formatters

Change
Expand Down Expand Up @@ -100,8 +249,6 @@ to replace the `radix` argument, determining behaviour based on that argument's
This approach would not include any equivalent of the `Intl` formatters'
`formatToParts` methods.

## Alternatives

### Add an "undetermined" locale

A prior version of this proposal used the "undetermined" `und` locale instead of `zxx`.
Expand Down