-
Notifications
You must be signed in to change notification settings - Fork 2
Description
STRLEN
is currently documented:
The
strlen
function corresponds to the XPath fn:string-length function and returns anxsd:integer
equal to the length in characters of the lexical form of the literal.
fn:string-length
is defined as "[returning] the number of ·characters· in a string," with character
being "an instance of the Char production" from XML, which is defined as:
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
Since RDF strings are defined as sequences of Unicode scalar values, I think there's a gap here regarding the code points below U+0020 (modulo the 3 explicitly enumerated in the production). This gap should be observable in considering this query:
SELECT (STRLEN("\u0001\u0002") AS ?l) {}
I think a plain reading of F&O suggests that this would return zero, while the intuitive (IMO) answer is two.
Perhaps we need to update the definition of STRLEN
to use language that does not bottom-out in XML characters, but instead talks directly about Unicode code points and/or scalar values.