Skip to content

STRLEN definition based on F&O seems incomplete #271

@kasei

Description

@kasei

STRLEN is currently documented:

The strlen function corresponds to the XPath fn:string-length function and returns an xsd:integer equal to the length in characters of the lexical form of the literal.

fn:string-length is defined as "[returning] the number of ·characters· in a string," with character being "an instance of the Char production" from XML, which is defined as:

[2]   	Char	   ::=   	#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

Since RDF strings are defined as sequences of Unicode scalar values, I think there's a gap here regarding the code points below U+0020 (modulo the 3 explicitly enumerated in the production). This gap should be observable in considering this query:

SELECT (STRLEN("\u0001\u0002") AS ?l) {}

I think a plain reading of F&O suggests that this would return zero, while the intuitive (IMO) answer is two.

Perhaps we need to update the definition of STRLEN to use language that does not bottom-out in XML characters, but instead talks directly about Unicode code points and/or scalar values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions