Skip to content

Use of IRIs

Kai Eckert edited this page Sep 30, 2020 · 1 revision

IRIs are basically URIs that allow unicode characters in the path and parameters and international domains (which is not interesting for us). https://tools.ietf.org/html/rfc3987

For HTTP, IRIs are URL encoded (percent encoding). Even if it should not strictly be necessary, as far as I see, common Web browsers will URL encode an IRI in the address field. Encoding and decoding is not trivial, see section 3 in the RFC.

quote, unquote in Python does not the right thing:

In [4]: one = quote('Dōsei')

In [5]: one
Out[5]: 'D%C5%8Dsei'

In [6]: two = quote(one)

In [7]: two
Out[7]: 'D%25C5%258Dsei'

In [8]: 

If you copy paste an IRI to the browser, it will therefore URL encode it. It therefore makes sense to decode it back to get the original IRI on the server. Therefore the original data must not be URL encoded.

If you use the URL encoded version on the Fuseki sparql interface, it is not decoded by Fuseki. This means, you at least might run into issues if you copy paste IRIs from browser to a SPARQL form (quite common I think).

For Hebrew, it gets even more funny because of right to left writing, refer to BIDI IRIs in the RFC (Section 4).

Conclusion: Do not use IRIs?

Clone this wiki locally