Select null locale as proposed solution & add Intl API specifics #18

eemeli · 2025-02-05T18:35:14Z

Closes #2
Closes #7
Closes #8
Closes #17
Closes #19

I propose that we advance this proposal by choosing to define the behaviour of the 'zxx' or null locale, with support in most Intl APIs as defined here.

The solution of adding corresponding ECMA-262 (rather than ECMA-402) functionality is dismissed as infeasible, as that would require introducing wholly new functions for:

duration serialization
list serialization
collation
segmentation

Collation and segmentation have a significant data dependency that's already internalized in 402, but not in 262.

The behaviour for null locale in Intl.Collator and Intl.Segmenter are as proposed by @hsivonen in #13.

At least the following are left to be filled out in later PRs, but there's indubitably more:

A complete set of supported Intl.DateTimeFormat component options, and their detailed formatted output
Intl.Locale behaviour
Complete mapping of short unit identifers for Intl.NumberFormat

Edit: I've prepared a presentation for the changes proposed here.
Edit 2: Updated following changes proposed by TG2

hsivonen · 2025-02-06T14:31:11Z

README.md

+### Intl.Segmenter
+
+When the `zxx` locale is used, [UAX #29](https://unicode.org/reports/tr29/) segmentation
+with extended grapheme clusters is used, without tailorings


This should probably say not to have tailorings for 'grapheme' and 'sentence', but for 'word' saying that would turn off behaviors that are de facto on by default in a cartch-all case.

How about:

The 'grapheme' mode shall use untailored UAX 29 extended grapheme cluster rules.

The 'sentence' mode shall use untailored UAX 29 default sentence boundary rules.

The 'word' mode shall use UAX 29 default word boundary rules with the tailorings that the implementation supports for scripts that do not use spaces between words. (Note: This is intended to enable word segmentation for e.g. Han, Thai, Lao, and Khmer scripts.) If the implementation supports more than one tailoring for a script that does not use spaces, the most broadly applicable one of the alternatives for a given script shall be used. (Note: The currently-known or expected implementations do not currently have multiple mutually-exclusive tailorings for scripts that don't use spaces.)

CC @makotokato @aethanyc

eemeli · 2025-02-07T10:02:15Z

I've updated the PR following yesterday's TG2 discussion:

Behaviour for Intl.DisplayNames is defined, always experiencing fallback
Intl.DurationFormat and Intl.RelativeTimeFormat both return an ISO 8601-2 duration string, the latter with a + or - prefix.
Array.p.toLocaleString uses a comma , as separator
String.p.toLocale{Lower,Upper}Case use the Unicode Default Case Conversion algorithm

eemeli added 2 commits February 5, 2025 20:00

Select null locale (zxx) as proposed solution

e3cda45

Add Intl API specifics

b14023c

hsivonen reviewed Feb 6, 2025

View reviewed changes

Apply updates recommended by TG2

7e05797

For case tranforms, use the Unicode Default Case Conversion algorithm

43d77fa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Select null locale as proposed solution & add Intl API specifics #18

Select null locale as proposed solution & add Intl API specifics #18

eemeli commented Feb 5, 2025 •

edited

Loading

hsivonen Feb 6, 2025

eemeli commented Feb 7, 2025 •

edited

Loading

Select null locale as proposed solution & add Intl API specifics #18

Are you sure you want to change the base?

Select null locale as proposed solution & add Intl API specifics #18

Conversation

eemeli commented Feb 5, 2025 • edited Loading

hsivonen Feb 6, 2025

Choose a reason for hiding this comment

eemeli commented Feb 7, 2025 • edited Loading

eemeli commented Feb 5, 2025 •

edited

Loading

eemeli commented Feb 7, 2025 •

edited

Loading