Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node Expressions #222

Open
afs opened this issue Jan 29, 2025 · 14 comments
Open

Node Expressions #222

afs opened this issue Jan 29, 2025 · 14 comments
Assignees
Labels
Core For SHACL 1.2 Core spec Inferencing For SHACL 1.2 Inferencing spec.

Comments

@afs
Copy link
Contributor

afs commented Jan 29, 2025

This issue is for general discussion of Node Expressions.

Node Expressions in SHACL-AF (May 2017).

SHACL CG draft with a longer list of possible node expressions

Wiki doc: Proposal on Node Expressions for SHACL Core 1.2

Use cases:

Please add use cases, small or large, as issues and label them "Inferencing" and also "UCR". We can do further organisation later if the volume is high enough.

The full work on this area is phase2.

However, some of the ideas also apply to to SHACL core in phase1. We need to know what is the vision of "node expressions" so that phase1 documents are aligned.

@tpluscode
Copy link

I have been using Node Expressions extensively for generating SPARQL queries from shapes: https://shape-to-query.hypermedia.app/docs/

@afs
Copy link
Contributor Author

afs commented Feb 3, 2025

@HolgerKnublauch

On the description page, I don't understand the second edg:Database-tableCount example. It uses sh:path on the property shape but it does not evaluate to a path. What would the new SHACL-core definition of sh:path look like?

edg:Database-tableCount
    a sh:PropertyShape ;
    sh:inferredProperty edg:tableCount ;
    sh:path [                       ## ????
        sh:count [
            sh:path [
                sh:inversePath edg:tableOf ;
            ] ;
        ] ;
    ] .
    sh:datatype xsd:integer ;
    sh:description "The number of tables in this database, automatically computed." ;

From the description of it, this seems to be a prototype/template for a first example edg:Database-tableCount - a "property shape" but it does not have a sh:path (= reachability for value nodes), it has sh:inferredProperty instead and values of the object. sh:inferredProperty.

edg:Database-tableCount
    a sh:DerivedPropertyShape ;
    sh:inferredProperty edg:tableCount ;
    sh:values [
        sh:count [
            sh:path [
                sh:inversePath edg:tableOf ;
            ] ;
        ] ;
    ] .
    sh:datatype xsd:integer ;
    sh:description "The number of tables in this database, automatically computed." ;

related to:

	sh:rule [
		a sh:TripleRule ;
		sh:subject sh:this ;
		sh:predicate edg:tableCount ;
		sh:object sh:values [
                    sh:count [
                        sh:path [
                            sh:inversePath edg:tableOf ;
               ] ;
           ] ;

@HolgerKnublauch
Copy link
Contributor

HolgerKnublauch commented Feb 3, 2025

The first definition of tableCount used a sh:values rule, with 1.1 syntax and the sh:path of that was the (potential) inferred property. The new design is more flexible and allows the use of node expressions directly at sh:path, with the inferred property optional in case anyone wants to reference them from other node expressions.

The values of sh:path at PropertyShapes in 1.2 would be more general than previously. While current SHACL only allows property path expressions, 1.2 would allow any node expression in the sh:path position. This should not be confused with the use of sh:path in the inner block (sh:inversePath), which is a node expression that must point to an "old" property path. It may be best to use a fresh property instead for this use case, instead of sh:path. Also, since property expressions such as sh:inversePath will also be node expressions, it could just be dropped in your example:

           sh:rule [
		a sh:TripleRule ;
		sh:subject sh:this ;
		sh:predicate edg:tableCount ;
		sh:object [
                    sh:count [
                        sh:inversePath edg:tableOf ;
                    ] ;
               ] ;
           ] ;

Not sure if this is any clearer, please ask again if it isn't.

@afs
Copy link
Contributor Author

afs commented Feb 4, 2025

Also, since property expressions such as sh:inversePath will also be node expressions, it could just be dropped in your example:

It was copied from your example :-)

This should not be confused with the use of sh:path in the inner block (sh:inversePath), which is a node expression that must point to an "old" property path. It may be best to use a fresh property instead for this use case, instead of sh:path

It's early days, but, yes, I think a new name would be better. sh:path is central to SHACL and defines sh:PropertyShape.

Maybe extend the definition of property shape to be "must have exactly one of sh:path or one sh:inferredProperty".

@philharveyonline
Copy link

The general direction proposed in Proposal on Node Expressions for SHACL Core 1.2 looks really useful for our requirements.

I recorded a related use case that we've got in #227 .

@afs
Copy link
Contributor Author

afs commented Feb 7, 2025

Picking up from #234 (comment)

  • The use of functions as node expressions would become a node expression in SHACL-SPARQL if we elect to limit them to SPARQL functions, otherwise no clue. For example, we also support JS-based custom functions.

Being able to give a name to a node expression so it can be used in several places seems a likely need. That doesn't need the full function definition (it does not need to declare parameters, or return type necessarily).

my:nodeExpr_numTables
    rdf:type sh:nodeExpression ;
    sh:datatype xsd:integer ;     ## Optional
    sh:description "Calculate the number of tables in this database" ;
    sh:values [
        sh:count [
            sh:path [
                sh:inversePath edg:tableOf ;
            ] ;
        ] ;
    .

It will need a call syntax so that the node expression call site is a blank node (otherwise it is a constant term expression)

    [ sh:call my:nodeExpr_numTables ]

@HolgerKnublauch HolgerKnublauch added Core For SHACL 1.2 Core spec Inferencing For SHACL 1.2 Inferencing spec. labels Feb 8, 2025
@hmottestad
Copy link

hmottestad commented Feb 8, 2025

I also got confused with the use of expressions within sh:path. My first thought was also that the expression within a path was a way to dynamically calculate the path.

ex:person a foaf:Person;
   ex:agePath ex:useThisAsThePredicateToRetrieveTheAge;
   ex:useThisAsThePredicateToRetrieveTheAge 74.

So we could have a shape that checks that the age of a person is within a valid range, without knowing the specific predicate that the subject uses to define the age.

@afs
Copy link
Contributor Author

afs commented Feb 11, 2025

Here is some exploration of node expressions.

Examples:

  1. Calculate and test a substring
  2. Is the focus node, as a date, within 30 days of today? (e.g. shop returns policy)
  3. A rules related case: generating new IRIs - like having an identifier ":employeeId 123" and making IRI http://.../employee/123

"F&O" is "XPath and XQuery Functions and Operators"
Many SPARQL functions are defined using this large collection of functions.

PREFIX fn:      <http://www.w3.org/2005/xpath-functions#>
PREFIX op:      

op: is the operator prefix for F&O. It does not have URI but it can be the same as that for fn:.

Example 1

## Calculate and test a substring 
[ sh:equals ( [ fn:substr (sh:this 5 4) ] "ABCD" ) ]

Example 2

## Date within 30 days of today
## SPARQL with support for XSD Duration
##    ( now() - $this < "P30D"^^xsd:dayTimeDuration )

[ op:dayTimeDuration-less-than (
    [ op:subtract-dateTimes (
        [ sparql:now () ]
        [ fn:year-from-date sh:this ]
        )]
    "P30D"^^xsd:dayTimeDuration
  )]

Example 3

## Generate employee IRI

[ sparql:IRI
   [ fn:concat (
      "http://example/employee/"
      [ fn:replace ( sh:this "^[^/]*" "" ) ]   ## The last component of the IRI
   )]
]

Evaluation and function calling

This is my understanding of the node expressions proposal.

IRIs and literals are constants (sh:this is special)

Anything [ ] is a function call (sh:if is special)

[ ns:name () ]          ## No argument - must be RDF property-object
[ ns:name oneArg ]      ## Shorthand
[ ns:name ( args ) ]    ## General

sh:path accesses the graph starting from the focus node.

The SHACL-AF CG extended list of node expressions has set-based functions and aggregators. They take list arguments; the examples about take single values for each argument.

@simonstey
Copy link
Contributor

This is my understanding of the node expressions proposal.

IRIs and literals are constants (sh:this is special)

Anything [ ] is a function call (sh:if is special)

[ ns:name () ]          ## No argument - must be RDF property-object
[ ns:name oneArg ]      ## Shorthand
[ ns:name ( args ) ]    ## General

sh:path accesses the graph starting from the focus node.

The SHACL-AF CG extended list of node expressions has set-based functions and aggregators. They take list arguments; the examples about take single values for each argument.

those NExp would still need to be used wherever we would allow NExp to be used right (i.e. within Shapes)? they can't just exist on their "own" (otherwise, what would sh:this point to?)

@simonstey
Copy link
Contributor

Picking up from #234 (comment)

  • The use of functions as node expressions would become a node expression in SHACL-SPARQL if we elect to limit them to SPARQL functions, otherwise no clue. For example, we also support JS-based custom functions.

Being able to give a name to a node expression so it can be used in several places seems a likely need. That doesn't need the full function definition (it does not need to declare parameters, or return type necessarily).

my:nodeExpr_numTables
    rdf:type sh:nodeExpression ;
    sh:datatype xsd:integer ;     ## Optional
    sh:description "Calculate the number of tables in this database" ;
    sh:values [
        sh:count [
            sh:path [
                sh:inversePath edg:tableOf ;
            ] ;
        ] ;
    .

It will need a call syntax so that the node expression call site is a blank node (otherwise it is a constant term expression)

    [ sh:call my:nodeExpr_numTables ]

why not adopt a similar approach like the already existing one for defining SPARQL-based Constraint components?:

sh:PatternConstraintComponent
	a sh:ConstraintComponent ;
	sh:parameter [
		sh:path sh:pattern ;
	] ;
	sh:parameter [
		sh:path sh:flags ;
		sh:optional true ;
	] ;
	sh:validator shimpl:hasPattern .

shimpl:hasPattern
	a sh:SPARQLAskValidator ;
	sh:message "Value does not match pattern {$pattern}" ;
	sh:ask """
		ASK { 
			FILTER (!isBlank($value) && 
				IF(bound($flags), regex(str($value), $pattern, $flags), regex(str($value), $pattern)))
		}""" .

@tpluscode
Copy link

Evaluation and function calling

This is my understanding of the node expressions proposal.

IRIs and literals are constants (sh:this is special)

Anything [ ] is a function call (sh:if is special)

I think it's worth referencing the latest draft of SHACL-AF: https://w3c.github.io/shacl/shacl-af/

There, every expression is a blank node and its properties determine the functionality. The spec defines a set of expressions which have special meaning (sh:orderBy, sh:distinct`, etc).

Everything else is treated as a function call (Function Expression).

Notably, a vocabulary exists under https://datashapes.org/sparql namespace which maps SPARQL functions. dash-sparql:concat, dash-sparql:langMatches and so on. These descriptions also include the parameters, which makes some validation possible.

Do we also want to adopt that in some fashion? @HolgerKnublauch

[ ns:name () ]          ## No argument - must be RDF property-object
[ ns:name oneArg ]      ## Shorthand
[ ns:name ( args ) ]    ## General

sh:path accesses the graph starting from the focus node.

The SHACL-AF CG extended list of node expressions has set-based functions and aggregators. They take list arguments; the examples about take single values for each argument.

Other than the shorthand, the above matches the current usage of Function Expressions 👍

On that note, complex expressions quickly result in deeply nested structures which are not exactly pretty. For example, I used something like this to generate SPARQL which creates an URI from resource's kebab-cased schema:name and some random chars at the end.

prefix sh: <http://www.w3.org/ns/shacl#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix sparql: <http://datashapes.org/sparql#>
PREFIX schema: <http://schema.org/>

[
  a sh:NodeShape ;
  sh:property [
    sh:path schema:value ;
    sh:values [
      sparql:iri (
      [
        sparql:concat
          (
            "/prefix/"
            [
              sparql:encode_for_uri
                (
                  [
                    sparql:replace
                      (
                        [
                          sparql:lcase
                            (
                              [
                                sh:path schema:name
                              ]
                            )
                        ] " " "-" "g"
                      )
                  ]
                )
            ]
            "-"
            [
              sparql:substr ( [ sparql:struuid () ] 1 8 )
            ]
          )
      ]
    ) 
      ]
  ] ;
] .

The SPARQL:

PREFIX schema: <http://schema.org/>
CONSTRUCT { ?resource1 schema:value ?resource2. }
WHERE {
  ?resource1 schema:name ?resource3.
  BIND(IRI(CONCAT("/prefix/", ENCODE_FOR_URI(REPLACE(LCASE(?resource3), " ", "-", "g")), "-", SUBSTR(STRUUID(), 1 , 8 ))) AS ?resource2)
}

The generated patterns are arguably terser and easier to read which made me wonder every time if a different SHACL representation could be possible.

@afs
Copy link
Contributor Author

afs commented Feb 11, 2025

I think it's worth referencing the latest draft of SHACL-AF: https://w3c.github.io/shacl/shacl-af/
(reference added)

Notably, a vocabulary exists under https://datashapes.org/sparql namespace which maps SPARQL functions.

The thing about SPARQL functions is the dispatch. <, (sparql:lt) goes to a number or string or datetime compare for example; the compare is right F&O function. F&O is essential for the details of each function.

The generated patterns are arguably terser and easier to read which made me wonder every time if a different SHACL representation could be possible.

Something to share with SHACL-CS?

It would be possible to reuse the expression syntax of SPARQL and translate it to node expressions in RDF form. Then the expressions aren't opaque to introspection, which has been a concern mentioned, while being easier to write and maintain.

@HolgerKnublauch
Copy link
Contributor

@tpluscode SPARQL strings could be a Compact Syntax for NEs. That's what we did in SPIN/TopBraid's user interface, allowing users to enter the SPARQL string while storing the triples. But yeah, nested bnode trees have their downsides.

And yes, something like the dash-sparql library could just become another spec or part of the main NE document, depending on time. As @simonstey indicated, potentially the SHACL-SPARQL spec could be extended to allow self-descriptive purely declarative node expression types that are backed by SELECT queries that are repeatedly executed over all input bindings. The same mechanism already exists for user-defined constraint components and functions. And the built-in SPARQL functions could then be lifted into SHACL simply by making them sh:SPARQLFunctions with the actual built-in functions like CONCAT in their sh:select. As a result, anyone can plug in such libraries, producing "living" standards like an open source project.

@afs
Copy link
Contributor Author

afs commented Feb 11, 2025

why not adopt a similar approach like the already existing one for defining SPARQL-based Constraint components?:

yes - and declarations with details can be used for checking calls.

Named parameters everywhere looks to be are clunky when not used via SPARQL.

fn and sparql: functions are list-of-arguments and reusing those makes the NE spec smaller (!)

e.g.
[ fn:regex [ sh:flags "i"] ; [ sh:pattern "^AB(C|D)$"] ]
[ fn:regex [ sh:pattern "^AB(C|D)$"] ; [ sh:flags "i"] ]
[ fn:regex ( "^AB(C|D)$" "i" ) ]

[] a sh:NEComponent
    # List or named arguments call forms.
    sh:parameters (
      	sh:parameter [
      		sh:path sh:pattern ;
      	] ;
      	sh:parameter [
      		sh:path sh:flags ;
      		sh:optional true ;
      	] ;
     )
     sh:handler fn:regex ;
     ... ;
    .

Before we get to deep, some use cases from across the WG would be good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core For SHACL 1.2 Core spec Inferencing For SHACL 1.2 Inferencing spec.
Projects
None yet
Development

No branches or pull requests

6 participants