@@ -36,10 +36,15 @@ should be easy to parse into data structures in a wide variety of languages.
36
36
37
37
## Spec
38
38
39
+ A TOML file must be a valid UTF-8 encoded Unicode document. Specifically this
40
+ means that, should a file as a whole not form a
41
+ [ well-formed code-unit sequence] ( https://unicode.org/glossary/#well_formed_code_unit_sequence ) ,
42
+ the file must be rejected (preferably) or ill-formed byte sequences must be
43
+ replaced with U+FFFD as per the Unicode spec.
44
+
39
45
- TOML is case-sensitive.
40
- - A TOML file must be a valid UTF-8 encoded Unicode document.
41
- - Whitespace means tab (0x09) or space (0x20).
42
- - Newline means LF (0x0A) or CRLF (0x0D 0x0A).
46
+ - Whitespace means tab (U+0009) or space (U+0020).
47
+ - Newline means LF (U+000A) or CRLF (U+000D U+000A).
43
48
44
49
## Comment
45
50
@@ -265,7 +270,7 @@ The above TOML maps to the following JSON.
265
270
## String
266
271
267
272
There are four ways to express strings: basic, multi-line basic, literal, and
268
- multi-line literal. All strings must contain only valid UTF-8 characters.
273
+ multi-line literal. All strings must contain only Unicode characters.
269
274
270
275
** Basic strings** are surrounded by quotation marks (` " ` ). Any Unicode character
271
276
may be used except those that must be escaped: quotation mark, backslash, and
@@ -293,7 +298,7 @@ For convenience, some popular characters have a compact escape sequence.
293
298
```
294
299
295
300
Any Unicode character may be escaped with the ` \xHH ` , ` \uHHHH ` , or ` \UHHHHHHHH `
296
- forms. The escape codes must be valid Unicode
301
+ forms. The escape codes must be Unicode
297
302
[ scalar values] ( https://unicode.org/glossary/#unicode_scalar_value ) .
298
303
299
304
All other escape sequences not listed above are reserved; if they are used, TOML
@@ -417,8 +422,9 @@ str = ''''That,' she said, 'is still pointless.''''
417
422
```
418
423
419
424
Control characters other than tab are not permitted in a literal string. Thus,
420
- for binary data, it is recommended that you use Base64 or another suitable ASCII
421
- or UTF-8 encoding. The handling of that encoding will be application-specific.
425
+ for binary data, it is recommended that you use Base64 or another suitable
426
+ binary-to-text encoding. The handling of that encoding will be
427
+ application-specific.
422
428
423
429
## Integer
424
430
0 commit comments