# Validation
A schema can be provided in two formats: ShExC, the language described in the ShEx primer (opens new window) or ShExJ, a JSON-based format for shape expressions. Both formats have a dedicated module with a decode
function to get the ShEx schema from a string in the respective language.
{:ok, schema} =
ShEx.ShExC.decode("""
PREFIX ex: <http://ex.example/#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX school: <http://school.example/#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
school:enrolleeAge xsd:integer MinInclusive 13 MaxInclusive 20
school:Enrollee {
foaf:age @school:enrolleeAge ;
ex:hasGuardian IRI {1,2}
}
""")
For both formats there's also a bang variant decode!
which returns the result directly (not in an ok tuple) and fails in error cases.
RDF data can now be validated with such a schema and a ShapeMap passed to the ShEx.validate/3
function. For the ShapeMap you can also pass any data structure from which an ShapeMap can be constructed.
result_shape_map =
ShEx.validate(
RDF.Turtle.read_string!("""
PREFIX ex: <http://ex.example/#>
PREFIX inst: <http://example.com/users/>
PREFIX school: <http://school.example/#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
inst:Alice foaf:age 13 ;
ex:hasGuardian inst:Person2, inst:Person3 .
inst:Bob foaf:age 15 ;
ex:hasGuardian inst:Person4 .
inst:Claire foaf:age 12 ;
ex:hasGuardian inst:Person5 .
inst:Don foaf:age 14 .
"""),
ShEx.ShExC.decode!("""
PREFIX ex: <http://ex.example/#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX school: <http://school.example/#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
school:enrolleeAge xsd:integer MinInclusive 13 MaxInclusive 20
school:Enrollee {
foaf:age @school:enrolleeAge ;
ex:hasGuardian IRI {1,2}
}
"""),
%{
~I<http://example.com/users/Alice> => ~I<http://school.example/#Enrollee>,
~I<http://example.com/users/Bob> => ~I<http://school.example/#Enrollee>,
~I<http://example.com/users/Claire> => ~I<http://school.example/#Enrollee>,
~I<http://example.com/users/Don> => ~I<http://school.example/#Enrollee>,
}
)
The result of the validation is a result ShapeMap for which the associations now have the value :conformant
or :nonconformant
in the status
field of the association. For example:
for association <- result_shape_map do
IO.puts("#{inspect association.node} is #{association.status}")
end
will output
~I<http://example.com/users/Alice> is conformant
~I<http://example.com/users/Bob> is conformant
~I<http://example.com/users/Claire> is nonconformant
~I<http://example.com/users/Don> is nonconformant
The reason
field of an association contains a list of ShEx.Violation
structures with details about the reason why it's nonconformant. The fields of these depend on the type of violation. You can get a string representation of any type of violation with the ShEx.Violation.message/1
function.
If you want to output the failures of the result, instead of filtering the nonconformant associations, you can also access them directly, since the associations are partitioned on a result ShapeMap into the fields conformant
and nonconformant
.
for association <- result_shape_map.nonconformant do
IO.puts """
#{inspect association.node} is not valid because: #{
association.reason
|> Enum.map(&ShEx.Violation.message/1)
|> Enum.join("\n")
}
"""
end
This will output:
~I<http://example.com/users/Claire> is not valid because:
- matched none of at least 1 ~I<http://xmlns.com/foaf/0.1/age> triples
- %RDF.Literal{value: 12, datatype: ~I<http://www.w3.org/2001/XMLSchema#integer>} is less than 13.0
~I<http://example.com/users/Don> is not valid because:
- matched none of at least 1 ~I<http://ex.example/#hasGuardian> triples
# Parallelization
The validation of all the nodes in ShapeMap can also run be run in parallel by passing the option parallel: true
.
result = ShEx.validate(data, schema, shape_map, parallel: true)
Under the hood the work of processing the nodes is distributed in batches over your CPUs with the Flow (opens new window) library. Since for small amounts of nodes to be validated the usage of Flow means a little overhead, if you don't provide the option explicitly the parallel
flag is set to true
only for query ShapeMaps (as these usually produce more ShapeMap associations) and fixed ShapeMaps with more than 10 ShapeMap associations.
You can however turn off this auto-setting of the parallel
flag with the parallel
application configuration field:
config :shex,
parallel: true
ShEx.ex automatically uses sane defaults for the Flow configuration. You can still try to tweak them for yourself. The options to ShEx.validate/4
are passed through to Flow.from_enumerable/2
(opens new window). You can also configure them globally:
config :shex,
flow_opts: [
max_demand: 20,
min_demand: 10,
stages: 8
]
These default options are used whenever parallel
is set to true
and no Flow option is provided on a ShEx.validate/4
call.
You're invited to share your experience or thoughts on the forum (opens new window), a GitHub issue or PR.
← ShapeMaps Limitations →