Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

by default do not owl:import in the Data Graph #15

Open
VladimirAlexiev opened this issue Dec 10, 2024 · 8 comments
Open

by default do not owl:import in the Data Graph #15

VladimirAlexiev opened this issue Dec 10, 2024 · 8 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@VladimirAlexiev
Copy link

VladimirAlexiev commented Dec 10, 2024

(Was provide useful sh:focusNode in ValidationResults)

In the case described in #14, the sh:focusNode is blank:

  sh:result    [ rdf:type                      sh:ValidationResult;
                 sh:focusNode                  [] ;
                 sh:resultMessage              "Using the values of sh:prefixes as defined by 5.2.1 Prefix Declarations for SPARQL Queries, the values of sh:select must be valid SPARQL 1.1 SELECT queries with a single result variable this."@en;
                 sh:resultPath                 sh:select;
                 sh:resultSeverity             sh:Violation;
                 sh:sourceConstraintComponent  sh:PatternConstraintComponent;
                 sh:sourceShape                [] ;
                 sh:value                      "\n                SELECT DISTINCT ?this\n                WHERE {\n                  ?this rdf:type cim:PowerTransformer   .\n                  FILTER NOT EXISTS {?this ^cim:PowerTransformerEnd.PowerTransformer/cim:TransformerEnd.endNumber 3}.\n                }\n                "
               ];

Many other problems reported for the same shape file also don't point to a specific shape in that file.
Eg this should blame a specific sh:PropertyShape (all such nodes are named) but it blames something in sh: ?!?

  sh:result    [ rdf:type                      sh:ValidationResult;
                 sh:focusNode                  sh:PatternConstraintComponent-flags;
                 sh:resultMessage              "Each property SHALL have a sh:name (in the context of the target where it appears) to provide human-oriented labels. This is the preferred alternative to overwriting rdfs:labels coming from external foreign vocabularies to fit the context better."@en;
                 sh:resultPath                 sh:name;
                 sh:resultSeverity             sh:Warning;
                 sh:sourceConstraintComponent  sh:MinCountConstraintComponent;
                 sh:sourceShape                bpsh:CountNameProperty
               ];

Only sh:NodeShapes are correctly reported, eg

  sh:result    [ rdf:type                      sh:ValidationResult;
                 sh:focusNode                  equ:DCGround;
                 sh:resultMessage              "Every NodeShape SHOULD contain rdfs:label and rdfs:comment, and they SHOULD only have one value per language tag."@en;
                 sh:resultSeverity             sh:Warning;
                 sh:sourceConstraintComponent  sh:AndConstraintComponent;
                 sh:sourceShape                bpsh:NodeShapeShape;
                 sh:value                      equ:DCGround
               ];
@costas80
Copy link
Contributor

Hi @VladimirAlexiev. I'll investigate why the focusNode gets reported as such in these cases and see how we can improve it.

@costas80 costas80 added the enhancement New feature or request label Dec 10, 2024
@VladimirAlexiev
Copy link
Author

If you look at https://transparency.ontotext.com/app/validations, you will see validation result counts that are very well organized:
by applicability, group, and each row is a NodeShape.
When you click on a count, you see the list of ValidationResults for that NodeShape and applicability (eg country, zone, etc).

We did face the problem that sh:sourceShape often pointed to blank nodes (the list of and/or or a blank PropertyShape). We used this query to redirect it to point to the respective sh:NodeShape (since all our shape metadata is attached there).
"appliesTo" is a custom extension to SHACL, so you should ignore it:

BASE         <https://transparency.ontotext.com/resource/>
PREFIX sh: <http://www.w3.org/ns/shacl#>
PREFIX tr: <https://transparency.ontotext.com/resource/tr/>
PREFIX rdf4j: <http://rdf4j.org/schema/rdf4j#>
DROP GRAPH <graph/validationResult> ;
INSERT {
    GRAPH <graph/validationResult> {
        ?result a sh:ValidationResult ;
                sh:sourceShape ?finalShape ;
                sh:focusNode ?focus ;
                sh:value ?value ;
                sh:resultMessage ?message ;
                tr:displayArea ?displayArea ;
                tr:countryCode ?cc ;
                tr:appliesTo ?appliesTo ;
                sh:resultSeverity ?severity .
    }
} WHERE {
    ?s a sh:ValidationResult ;
       sh:sourceShape ?shape ;
       sh:focusNode ?focus ;
       sh:resultSeverity ?severity .
    OPTIONAL {
        ?s sh:value ?value # Some things, for example, sh:minCount have no value.
    }
    OPTIONAL {
        ?shape a sh:PropertyShape .
        ?parent sh:property ?shape .
    }
    BIND(COALESCE(?parent, ?shape) as ?finalShape)
    BIND(IRI(CONCAT("validationResult/", replace(str(?finalShape),".*/",""), replace(str(?focus),"https://transparency.ontotext.com/resource/","/"))) as ?result) .
    ?finalShape tr:appliesTo ?areaProp ;
                sh:order ?ord .
    optional {
        filter (?areaProp=tr:countryCode)
        ?focus tr:countryCode ?cc
        bind(if(exists{
                    [] tr:iso2 ?cc
                },?cc,"other") as ?country)
    }
    optional {
        ?focus ?areaProp ?ar.
        ?ar tr:notation ?area
        BIND((?areaProp) as ?specificApplies)
    }
    # We sometimes have two appliesTo at the root shape. In that case, pick only the one which is actually used.
    {
        SELECT (COUNT(distinct ?subApplies) as ?cnt) ?finalShape ?ord WHERE {
            ?finalShape tr:appliesTo ?subApplies ;
                        sh:order ?ord .
            # For some reason only doing this by finalShape does not work. Ord is also consistent and unique.
        }  GROUP BY ?finalShape ?ord
    }
    # If specific applies to is not set, use areaProp.
    BIND(IF(?cnt > 1, ?specificApplies, ?areaProp) as ?appliesTo)
    {
        FILTER(BOUND(?appliesTo)) # For multivalue root ?appliesTo, this will be unset for the value which does not apply to that particular violation.
        bind(coalesce(?country,?area,"none") as ?displayArea)
    }
    OPTIONAL {
        ?s sh:resultMessage ?message
        FILTER EXISTS {
            ?finalShape (sh:message|(sh:sparql/sh:message)) []
        }
    }
}

I can't remember whether we had to recompute focusNode in some cases...

@costas80
Copy link
Contributor

Looking into this further I believe the oddly reported focus node is due to your shapes, not the validator's configuration. The problem comes from importing http://www.w3.org/ns/shacl# (i.e. owl:imports sh: ;):

...
@prefix sh:    <http://www.w3.org/ns/shacl#> .
...
cim:    
...
    owl:imports sh: ;
...

In doing so you are are pulling into the shape graph to be validated, the shapes from the SHACL namespace. The validator makes no distinction on the origin of shapes and proceeds to validate everything (producing the warnings you see for things in sh:).

If I remove the import, and tweak one of your own shapes from the sample you shared to produce the same kind of issue (I removed the name from equ:BoundaryPoint.toEndNameTso-stringLength), the focus node is reported as you would expect:

  sh:result    [ rdf:type                      sh:ValidationResult;
                 sh:focusNode                  equ:BoundaryPoint.toEndNameTso-stringLength;
                 sh:resultMessage              "Each property SHALL have a sh:name (in the context of the target where it appears) to provide human-oriented labels. This is the preferred alternative to overwriting rdfs:labels coming from external foreign vocabularies to fit the context better."@en;
                 sh:resultPath                 sh:name;
                 sh:resultSeverity             sh:Warning;
                 sh:sourceConstraintComponent  sh:MinCountConstraintComponent;
                 sh:sourceShape                bpsh:CountNameProperty
               ];

In my opinion, the only reason why you would want to do an owl:imports sh: ; is if the graphs you are planning on validating will include themselves SHACL shapes (which I doubt). By including owl:imports sh: ; the validator is correctly reporting the issues it finds for the shapes under the SHACL namespace.

From my point of view there is nothing to correct in the validator regarding this. @VladimirAlexiev, would you please consider what I described above and confirm my assessment?

@costas80 costas80 added invalid This doesn't seem right and removed enhancement New feature or request labels Jan 29, 2025
@costas80
Copy link
Contributor

Hi @VladimirAlexiev - can we close this issue (check my previous comment)?

@VladimirAlexiev
Copy link
Author

VladimirAlexiev commented Feb 13, 2025

Your diagnosis about sh:focusNode sh:PatternConstraintComponent-flags is correct.
But how about the second example, and have you checked the other problems reported for the same shape file that also don't point to a specific shape in that file?

In my opinion, the only reason why you would want to do an owl:imports sh: ..

@costas80
Copy link
Contributor

Thanks for the follow-up @VladimirAlexiev.

But how about the second example, and have you checked the other problems reported for the same shape file that also don't point to a specific shape in that file?

Checking again the example you had provided I confirm that all reported findings point to specific shapes from the input. As discussed, most of these are due to the owl:imports sh: and refer to shapes defined there, but the shapes are correctly referenced. Do you have a specific example that shows a reported issue with no reference to an existing shape?

But I also think the ITB validator should not execute owl:import in the data graph?
If someone wants some ontologies to be validated, they should pass them to the processor, not rely on import links.
Posted w3c/data-shapes#241

I'm not sure I agree with this. If you import external shapes in your own shapes than you would want to apply quality control to the full aggregated set. Consider this from the perspective of the end user of your own shapes. If some shape is mis-configured (e.g. it misses a customised message), resulting in unclear messages, the user will not (and should not) care where the relevant shape was defined.

Our RDF validator software foresees configuration properties determining whether imports are loaded (by default not), and whether the validator's user can choose to override this or not (see our configuration guide if you're interested). In the case of the SHACL shape validator we are talking about, the loading of imports has been explicitly set to true based on feedback we had from other users (matching my reasoning I shared in the previous paragraph). However, I would argue that flexibility is best, so I just updated the validator's configuration to allow to choose whether to load data imports or not (the default being yes):

...
# Load imports by default.
validator.loadImports = true
# Allow the user to choose whether to load imports from the input.
validator.input.loadImports = optional
...

If you check the validator again you will now see that you can choose whether data graph imports will be processed or not:

Image

Note that you can still load additional ontologies by providing them explicitly as additional "RDF resources":

Image

I trust that this update better suits your needs no?

@costas80 costas80 added enhancement New feature or request question Further information is requested and removed invalid This doesn't seem right labels Feb 13, 2025
@VladimirAlexiev VladimirAlexiev changed the title provide useful sh:focusNode in ValidationResults by default do not owl:import in the Data Graph Feb 16, 2025
@VladimirAlexiev
Copy link
Author

If you import external shapes in your own shapes than you would want to apply quality control to the full aggregated set.

Hi @costas80 !

Consider this from the perspective of the end user of your own shapes. If some shape is mis-configured ...

Unlike a compiler error in a programming language, a SHACL file may have lots of warnings or even errors, and still work perfectly well.
An example is the bug w3c/data-shapes#208: you report it as an error, but it doesn't cause any bad effect on validation.

Conversely, when treating SHACL as data: I want to focus on problems in my shapes, not in someone else's shapes.

Thanks for being so patient with minor nitpickings! Cheers!

@costas80
Copy link
Contributor

Hi @VladimirAlexiev.

Would you agree to uncheck it by default for the SHACL-SHACL checker?

The validator's configuration is now updated to not load imports by default. Note that besides the UI, the same applies when using the validator via REST and SOAP APIs. I trust other users of the validator will not have an issue with this change, given that the option to include the imports remains clearly available.

Thanks for being so patient with minor nitpickings!

No problem at all! It's good to have such discussions to see how to best tweak the validator so that it is as useful as possible.

I trust that with the latest configuration update we can consider this issue closed. Please have a look @VladimirAlexiev, and if you agree you can mark the issue as closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants