-
Notifications
You must be signed in to change notification settings - Fork 5
Use of IRIs
IRIs are basically URIs that allow unicode characters in the path and parameters and international domains (which is not interesting for us). https://tools.ietf.org/html/rfc3987
For HTTP, IRIs are URL encoded (percent encoding). Even if it should not strictly be necessary, as far as I see, common Web browsers will URL encode an IRI in the address field. Encoding and decoding is not trivial, see section 3 in the RFC.
quote, unquote in Python does not the right thing:
In [4]: one = quote('Dōsei')
In [5]: one
Out[5]: 'D%C5%8Dsei'
In [6]: two = quote(one)
In [7]: two
Out[7]: 'D%25C5%258Dsei'
In [8]:
If you copy paste an IRI to the browser, it will therefore URL encode it. It therefore makes sense to decode it back to get the original IRI on the server. Therefore the original data must not be URL encoded.
If you use the URL encoded version on the Fuseki sparql interface, it is not decoded by Fuseki. This means, you at least might run into issues if you copy paste IRIs from browser to a SPARQL form (quite common I think).
For Hebrew, it gets even more funny because of right to left writing, refer to BIDI IRIs in the RFC (Section 4).
Conclusion: Do not use IRIs?
Judaicalink - https://www.judaicalink.org
© JudaicaLink: Hochschule Mannheim, FID Judaica: UB Johann Christian Senckenberg, Frankfurt am Main.