Minimum Viable Schemas

joeltg · October 8, 2020, 8:11pm

I think we should use TOML as a the base format, because it’s the JSON data model we all know and love (almost) but also allows comments. It’d be good to browse the homepage before reading

# I'm a schema!
# And these are comments!

# We definitely should version the schema format.
# Not sure what a good name for this is that won't get
# confused with meaning "the version of *this* schema"
formatVersion = 1

# Schemas have a top-level .import field, which is an
# array of { url, version } objects.
# In TOML, you can list them like this (note the double brackets):
[[import]]
url = "http://r1.underlay.org/schemas/baylor/snap"
version = "4.2.0"

[[import]]
url = "http://r1.underlay.org/schemas/baylor/crackle"
version = "0.3.1"

[[import]]
url = "http://r1.underlay.org/schemas/emerson/pop"
version = "45.0.0"

Versions in this TOML format are just semver strings. Every time a version of a collection gets published, its schema imports get resolved and compiled into one big flat schema in an unreadable RDF format, and that’s what gets hashed, similar to package-lock.json. This deserves its own discussion somewhere else, but the point is that the TOML format is purely human-readable and human-editable.

All that importing does is let you reference the imported types as the values of properties, which we’ll see later. You can’t “extend” types.

If a type in a schema is defined with the same label as an imported type, the imported one is just ignored. Similarly, the imports themselves overwrite each other in order if there are conflicts (it’s important that .import is an array). But since everything will be namespaced, collisions should never really happen. Speaking of which:

# Schemas also have a required top-level namespace string.
# This has to be a URI that ends in "/" or "#"
namespace = "http://foo.com/bar/"

Great. Now on to types, which live in top-level .types object:

# Types have zero or more properties.
[types.Skyscraper]
# This one has zero.

There are two kinds of properties that types can have: literal properties and reference properties (still thinking about names for these, lmk wyt).

Literals are one of string, integer, double, boolean, dateTime, and date (ie the xsd datatypes that I think are the most common).

References point to another type.

Every property has an associated cardinality, which is either required, optional, or any. any means that there can be any number of values (zero or more). Values are not ordered. required is the default cardinality if not specified.

As a special shortcut, you can define literal properties by just saying:

[types.Person]
name = "string"
age = "integer"

But you can only do this for literals, and the implied cardinality is required. In general, properties are defined like this:

[types.Person]

[types.Person.name]
type = "string"
cardinality = "any"

[types.Person.age]
type = "integer"
cardinality = "optional"

[types.Person.knows]
reference = "Person"
cardinality = "any"

# This is NOT VALID!
# properties have to either be literals or references
[types.Person.baz]
type = "integer"
reference = "Person"

If we wanted to reference an imported type, we’d have to use its full URI, like this:

[types.Person.hometown]
reference = "http://r1.underlay.org/schemas/common/City"
cardinality = "optional"

Some of these things could use better names - in particular, I don’t feel great about "reference = ", and I don’t feel great about “types.”

Also we could easily add “JSON” as a datatype, and maybe we should, but there are some potential downsides. It’d be a good escape hatch but it wouldn’t be good if people just use that for things that could be properly typed.

But wait!? What about provenance!?

Instead of trying to define two separate data- and prov-level schemas etc etc, a simpler approach would just be this: collection.toml has a .schemas array and a .provenance key. Here what I mean:

Collections specify an array of schemas (ie implicitly importing them all). They do this with the exact same {url: string; version: string}[] format.
Collections also have a top-level .provenance: string property (or some other name like .meta or .graph). The value of that property is a URI that has to be one of the labels imported in one of the schemas.
Assertions in a collection validate when:
- The contents of all the named graphs validate the imported schemas
- The named graph labels appear in the default graph as instances of the type indicated by the .provenance key. There could be other things in the default graph as necessary.

So to make all this concrete, suppose we have this schema

namespace = "http://r1.underlay.org/common/"
[types.Person]
name = "string"
[types.Person.knows]
reference = "Person"
cardinality = "any"

and somewhere else we have this schema

namespace = "http://www.w3.org/ns/prov#"

[types.Entity]
[types.Entity.name]
type = "string"
cardinality = "optional"

[types.Derivation]
[types.Derivation.subject]
reference = "Entity"
[types.Derivation.entity]
reference = "Entity"
[types.Derivation.comment]
type = "string"
cardinality = "optional"

which depicts a simple PROV model where entities, which have names, are derived (with comments) from other entities.

Then, a collection.toml would start with something like this:

[[schema]]
url = "http://r1.underlay.org/common"
version = "1.4.3"

[[schema]]
url = "http://r1.underlay.org/prov"
version = "1.0.0"

provenance = "http://www.w3.org/ns/prov#Entity"

(Note that the “import URL” doesn’t necessarily correspond at all to the URI labels defined in the schema that you end up importing. It’s just a directive to the compiler at where to look.)

Okay, so what does an assertion in this collection look like? Well it has some named graphs with data in it.

PREFIX common = "http://r1.underlay.org/common/"
_:b0 rdf:type           common:Person       _:g1 .
_:b0 common:Person/name "Joel"              _:g1 .
_:b1 rdf:type           common:Person       _:g1 .
_:b1 common:Person/name "Travis"            _:g1 .
_:b2 rdf:type           common:Person/knows _:g1 .
_:b2 ul:source          _:b0                _:g1 .
_:b2 ul:target          _:b1                _:g1 .

(This is all in the named graph _:g1. Also note the slash in common:Person/name - the “dots” in Person.name in the schema are path elements in the implied URI).

Notice that - woah! - cardinality-any properties like knows get reified with their own blank node with source and target predicates. Under the hood, cardinality-any properties are really just a shorthand way of defining another type with required source and target properties. This is a very good thing to do and sets us up well for extending the data model (e.g. with edge properties) in the future.

Anyway, what do we put in the default graph to make this a valid assertion?

… Well we have make _:g1 a valid instance of prov:Entity.

This means adding an rdf:type for it, and a value for at least all of its properties.

_:g1 rdf:type     prov:Entity .
_:g1 prov:comment _:b3 .
_:b3 ul:none      _:b4 .

… hmmm what’s going on here? Well, cardinality-optional properties “compile” down to cardinality-required properties under the hood, just like cardinality-any properties did. In this case, the value that every entity is required to have is “either a comment, or nothing” - every entity has to have one of those values! The value of an “or” type like that is a single blank node with one outgoing predicate. Which predicate it is tells you what type to expect at the other end (where the “nothing” at the other end is represented as a dangling blank node _:b4). These are known as “discriminated unions” or “tagged nulls” and all sorts of other names.

So if we actually did have a comment for this entity, we’d write something like:

_:g1 rdf:type     prov:Entity .
_:g1 prov:comment _:b3 .
_:b3 ul:some      "This is a graph that I found on the street" .

Again, this sets us up //really// well for a more expressive data model in the future. And I think and hope that very few people (basically just us, aka Underlay developers) will ever have to actually touch the RDF representation like this.

We could tell a bigger story in the default graph too, if we wanted! I was going to write out a bigger example using a Derivation from the prov schema, but I feel like I’ve gotten the point across and it may be out of scope for this post.

The gist w/r/t provenance is that we make collections declare what type their named graphs are going to be, which could be dead-simple (no properties) or complicated (entities/prov/etc). I expect that most prov won’t be that complicated, and that the 90% case will be to use a type like the example I gave that just has an optional string comment field.

Lastly, here’s a quick JSON-LD representation of the example assertion:

{
  "@context": {
    "ul": "http://underlay.org/ns/",
    "common": "http://r1.underlay.org/schemas/common/",
    "prov": "http://www.w3.org/ns/prov#"
  },
  "@type": "prov:Entity",
  "prov:comment": {
    "ul:some": "This is a graph that I found on the street"
  },
  "@graph": [
    {
      "@id": "_:joel",
      "@type": "common:Person",
      "common:Person/name": "Joel"
    },
    {
      "@id": "_:travis",
      "@type": "common:Person",
      "common:Person/name": "Travis"
    },
    {
      "@type": "common:Person/knows",
      "ul:source": { "@id": "_:joel" },
      "ul:target": { "@id": "_:travis" }
    }
  ]
}

The only predicates that we need to reserve in the ul: namespace for this preliminary data model are source, target, some, and none. The two pairs have the same number of characters, which is a sign from God that we’re on the right track.

trich · October 9, 2020, 12:08pm

This is seriously awesome, and really feels like a it gets us 85% of the way there. The only place I got a little tripped up was concerning prov and how it’s set up in the collection file and used to validate. Probably better to set some time to talk through that synchronously.

Through the rest of it, I had a couple questions and ideas:

What about something like

format: "http://underlay.org/schema/2.2"

Thinking again about whether the namespaces ought to be immutable, I still like the idea of having a computer-friendly permalink that uses a uuid or hash, and a human-friendly version that is helpful, but not canonical. For example:

[[import]]
id: "http://r1.underlay.org/schemas/a80a991c-2cc4-43d7-b5bc-273d1f20f60e"
url = "http://r1.underlay.org/schemas/baylor/snap"
version = "4.2.0"

Any tool that consumes the schema will produce a warning if the id does not correlate with the details specified by url and version - but will use id as the canonical, stable reference.

Super.

Makes sense, though (similar to my namespace question above) I think there’s a conversation here about how we deal with stability of that namespace, without forcing collection, user, and org slugs to never be changed. I think there’s lots of options around having the TOML use human-readable namespaces, but having a resolution process that grabs something less human-friendly but immutable.

optional is zero or one?

Are you thinking the collection file is a .toml now as opposed to a .json?

I’m interested in spending some time thinking through whether collections should specify and array or a string for the schema file. The implicit importing means we can’t as cleanly reference a specific schema version for a given collection version. For example, collection animals/cats@v2.3 could use schema /schemas/animals/cats/1.1. If the schema used by a particular collection version is implicitly derived from an array of imports, we don’t have a chance to label/number that specific schema as a version.

This is related to my point in the other fanfic about schemas and collections being independently versioned.

sj · October 9, 2020, 3:30pm

A nice aspect of the APG talk was its description of importing from common formats. If you really did find this graph on the street, w/ no option to get more context from its creators: what is the minimum structure needed to cleanly map a schema in other formats (a CSV header row, RDF-S/s.o, &c) to this?

Types and references: this use of the terms feels a bit confusing to me, especially if reference properties are referred to in shorthand as ‘references’.

But you can only do this for literals, and the implied cardinality is required .

How come? For simplicity + ease of reading, what about starting w/ one-line definitions for either:

[types.Person]
 name = {"string", "any"}        # full name
 age = {"integer", "optional"}   # age if known or inferred
 knows = {"http://r1.underlay.org/schemas/common/Person", "optional"}

Travis writes:

I think there’s lots of options around having the TOML use human-readable namespaces, but having a resolution process that grabs something less human-friendly but immutable.

[we want to] cleanly reference a specific schema version for a given collection version.

These both seem important.

joeltg · October 10, 2020, 12:06am

Awesome.

Well there are a few things to disambiguate here.

I think “namespaces” should refer to the URI prefix of the labels and properties in a schema, but that this doesn’t necessarily have any relationship to the URL that we store schemas at and import them from (immutable or not).

Namespaces cannot be changed. Or rather, “changing” a namespace doesn’t really mean anything, since it’s just a prefix for the labels defined in a schema. I imagine that new versions of schemas will sometimes have different labels than the old version, but I don’t think we should even have a mechanism for indicate what old labels correspond to what new ones other than release notes or whatever (similar to changing a function name in a library).

Maybe good practice would be to version namespaces like this:

http://r1.underlay.org/ns/baylor/snap/v1/

but I don’t think that this is something that we should require, since I expect that lots of schemas will use existing ontologies e.g. http://schema.org/ as a namespace. Plus, that would really be centralizing things.

Yes

Uhhh maybe, I didn’t really think about it. I don’t think I have an opinion - it depends on how collection.whatever is used / created / consumed.

I’ll work on a more thorough treatment of importing, UUIDs, updating, etc - they’re all sort of the same thing anyway.

joeltg · October 10, 2020, 12:12am

Mmmm I understand where you’re coming from but I think it would be really bad practice to mix the two. They’re entirely separate things - the literal types are URIs in the implicit namespace http://www.w3.org/2001/XMLSchema#, and the label keys are URIs in the schema-defined namespace. We’ll probably want to extend the allowed literal types in the future, or let people just set their own custom literal types, and then there be no clean way to distinguish the two. Plus they end up in entirely different places - literal types are RDF literal datatypes, like "24"^^< http://www.w3.org/2001/XMLSchema#integer>, and the label keys are used as the objects of rdf:type triples.

You can define properties on one line, but you have to use valid TOML:

[types.Person]
 name = { type = "string", cardinality = "any" }        # full name
 age = { type = "integer", cardinality = "optional" }   # age if known or inferred
 knows = { reference = "http://r1.underlay.org/schemas/common/Person", cardinality = "optional" }

joeltg · October 10, 2020, 12:13am

I also realized I forgot to add a URI type, which I think is important to have.

joeltg · October 10, 2020, 4:16pm

Okay taking a second pass at the schema syntax here, step by step, with TypeScript types and TOML side-by-side.

The top-level properties are the format, imports, namespace, and shapes. I’m gonna call the “types” from the original post shapes in an effort to not use the word “type” at all.

In TypeScript:

type Schema = {
  format: "http://r1.underlay.org/schema/v1"
  import: Import[]
  namespace: string
  shapes: { [key: string]: Shape }
}

Let’s figure out what Import is later.

JSON/TOML gives us two approaches to modelling a collection of things: arrays and objects. I think an array make sense for imports and an object makes sense for shapes, since only shapes will have short alphanumeric names, due to the namespace.

Okay, so what is a Shape? Well, it’s a set of properties. Each property has a URI key, a value type, and a cardinality.

We messed up and used the word “type” again, so let’s instead call them kinds. There are three kinds of properties. There’s literal kinds, which need to be “configured” with a string in the XSD namespace to say what datatype it is; there’s URI kinds, which don’t need to be configured with anything at all; and there’s reference kinds, which need to be configured with the label of the shape that they’re referencing. If they’re referencing a local shape, this label is alphanumeric, but if they’re referencing an imported shape, this label is a full URI.

I think it makes sense to have a shorthand syntax for the most common kind of property. This is probably “literal kinds with required cardinality”, so we can say something like this:

type Datatype =
  | "string"
  | "integer"
  | "double"
  | "boolean"
  | "dateTime"
  | "date"
type Shape = { [key: string]: Property }
type Property = Datatype | PropertyObject

… in other words, you can define required literal properties in TOML like:

[shapes.Skyscraper]
height = "number"

Okay, with the shorthand syntax out of the way, we now need a type for PropertyObject. This is a little bit tricky. Here’s what I proposed in the first draft:

type Cardinality = "required" | "optional" | "any"
type PropertyObject =
  | { reference: string; cardinality?: Cardinality }
  | { type: Datatype; cardinality?: Cardinality }

This is kind of nice but doesn’t leave us with a good way of defining URI kinds, since they aren’t “configued” with anything so there’s no object property that you can look for to tell that it’s a URI kind. And it would be strange if the empty object (or an object with just a cardinalty property) was supposed to represent a URI kind.

So we definitely have to explicitly discriminate kinds somehow. One way to do that is like this:

type Cardinality = "required" | "optional" | "any"
type PropertyObject =
  | { kind: "uri"; cardinality?: Cardinality }
  | { kind: "literal"; datatype: Datatype; cardinality?: Cardinality }
  | { kind: "reference"; label: string; cardinality?: Cardinality }

so in TOML this would look like:

[shapes.Person]

[shapes.Person.name]
kind = "literal"
datatype = "string"
cardinality = "any"

[shapes.Person.age]
kind = "literal"
datatype = "integer"
cardinality = "optional"

[shapes.Person.knows]
kind = "reference"
label = "Person"
cardinality = "any"

[shapes.Person.email]
kind = "uri"

[shapes.Person.orchidId]
kind = "uri"
cardinality = "optional"

This is really good in the sense that it’s very regular and explicit, but it’s a little disappointing in that it’s relatively verbose. Let’s call this “Format A”.

Another general way to go about discriminating things is to use a wrapper object with a discriminating property. This is a little crazy but is maybe not as bad of an idea as it first seems.

type Cardinality = "required" | "optional" | "any"
type PropertyObject =
  | { uri: { cardinality?: Cardinality } }
  | { literal: { datatype: Datatype; cardinality?: Cardinality } }
  | { reference: { label: string; cardinality?: Cardinality } }

Here the TOML is actually more concise than the JSON, since we can use implicit nested objects:

[shapes.Person]

[shapes.Person.name.literal]
datatype = "string"
cardinality = "any"

[shapes.Person.age.literal]
datatype = "integer"
cardinality = "optional"

[shapes.Person.knows.reference]
label = "Person"
cardinality = "any"

[shapes.Person.email.uri]

[shapes.Person.orchidId.uri]
cardinality = "optional"

Let’s call this Format B. It’s less verbose, but it’s confusing that there’s so much happening inside the square brackets - “[shapes.Person.knows.reference]” is defining both the property key and the property kind right next to each other.

We could write the same thing in TOML a different way:

[shapes.Person]

[shapes.Person.name]
literal = { datatype = "string", cardinality = "any" }

[shapes.Person.age]
literal = { datatype = "integer", cardinality = "optional" }

[shapes.Person.knows.reference]
reference = { label = "Person", cardinality = "any" }

[shapes.Person.email]
uri = {}

[shapes.Person.orchidId]
uri = { cardinality = "optional" }

… but this is sort of the worst of all worlds, and it’s really strange to have to say “uri = {}” for URI kinds.

So I’d be really interested in knowing which of Format A and Format B people prefer.

Separately, we should think about the glossary of terms. Here I’ve used:

Schema
Shape
Kind
- Literal
  - Datatype
- Reference
  - Label
- URI

in an effort to not use the word “type” at all for clarity.

Some different options w/r/t terms:

“IRI” is technically what we’re supposed to say for RDF named nodes now, but I’d be fine with just calling them "URI"s since most people have never seen “IRI”. But then again, I don’t know if “most people” understand URIs.
“Datatype” is the official RDF term for the URI that tells you what type a literal is. I think we should stick with it. We could potentially just call it “type” if we don’t use it elsewhere, but I think this would be confusing.
“Shape” is maybe what I’m most unhappy with, but again, I think it’s better than “type”, which is just inevitably confusing. Other options are:
- “record” (possibly good)
- “struct” (kind of obscure / technical)
- “node” (potentially misleading? idk)
- “message” (protobuf does this but we’re not using them as messages at all)
- “value” (possibly not as bad an idea as it seems lol)
- “label” (again maybe not as bad as it seems)
- [your great idea here]
Separate from what to call the top-level .shapes property, we have to pick a property that reference kinds use to point to the referenced shape. We could make this consistent (shape: string) if we want, although maybe they shouldn’t be consistent because .shapes is a object of shapes (ie full objects) while .shape is just a shape’s URI (which would be an argument for sticking with .label, which is what Format A and B use). We didn’t have to “name” the URI part of a shape since we use them as object keys, not properties, so it was left sort of implicit, but maybe we should be thinking of a shape’s URI as being its “label” or similar? Idk I’m not at all opinionated here.
Replacing “kind” with “type” is not a terrible idea (i.e.type: "uri" | "literal" | "reference"). My biggest concern here is that introducing it at all will confuse it with “datatype”, and that people will confuse it with (what I’m calling) “shape” regardless. “Kind” is sort of a strange word but there actually aren’t very many alternatives.

And one last note: the JSON-LD people decided to use the name @type for both literal datatypes and as a shorthand syntax for rdf:type triples, and the impression I’ve gathered is that most people (including the authors) think in hindsight that this was mistake. It’s a difference in the underlying data model that shouldn’t actually be hidden from the user. We shouldn’t pursue simplicity to the point that we’re misrepresenting what’s actually going on.

Thoughts? I’m partial to Format A and keeping the terms as listed

agnescameron · October 13, 2020, 9:41am

I found this second edit really clarifying in the distinction between kinds and datatypes, thanks for writing it!

I feel like Format A is much clearer than B/C: in B it feels like the kind is defined as a sub-kind of the object. E.g. the existence of [shapes.Person.name.literal] sort of makes it feel like you could also have [shapes.Person.name.reference] in the same schema, even though (I believe?) that wouldn’t be valid.

Re: other names for things (these are mostly quite instinctual and not very strongly held):

URI/IRI – personally, I think whatever we call them, this part is going to be the part of the process that will need the most clarification/explaining. Mild preference for URI as it sounds like it should look a bit like a URL (which it does), but I think you’re right in that whatever we choose will likely be a relatively new concept
I think datatypes is pretty clear (and also less software-y than type, and more in line with terms like ‘Data Format’, used in spreadsheeting software)
I also find shape confusing. I think “record” might also be a bit misleading as rows in a database are also referred to as “records”, creating potential ambiguity between data in a collection and the collection structure. “object” might be a bit techy (?and also perhaps incorrect?). I think “label” perhaps makes the most sense in the context of defining a schema… “field”?

I also think that, in as much as is possible, the UI/CLI can remove the requirement to define both a kind and a datatype, as most/all kinds will be implied by the datatype selected for a particular shape (/object/label/field), with perhaps just a dialog when that’s unclear (“do you want to include Person as a literal or a reference?”) etc…

One naming I am still unclear on:
[shapes.Person]... defines a shape, does [shapes.Person.name]... also define a shape, or is that something else?

sj · October 13, 2020, 12:01pm

Thanks for this. Calling them shapes for now for clarity seems fine, while thinking about other options. A shape’s URI could be its label.

URIs: Why make this its own kind rather than a literal datatype, like anyURI?

Schema changes:
Joel writes:

I imagine that new versions of schemas will sometimes have different labels than the old version, but I don’t think we should even have a mechanism for indicate what old labels correspond to what new ones other than release notes

Mapping old labels to new ones is a common transform, in some uses the most common; we should make it easy to create or annotate such things. No pref at the moment for how.

verbose vs compact: A feels like a fine verbose form.
I think it’s important to have a compact form as well, e.g.:

[shapes.Person]
name = {kind = "literal", datatype = "string", cardinality = "any"}
age =  "integer"
knows = {kind = "reference", label = "Person", cardinality = "any"}
email = "anyuri"
orcidId = {kind = "literal", datatype = "anyuri", cardinality = "optional"}

joeltg · October 13, 2020, 1:07pm

This was a design decision made by RDF in the very beginning - the RDF data model is “there are URIs, literals, and blank nodes”. Also xsd:anyURI includes relative URIs, which defeat the purpose of interpreting URIs in a global scope.

As I said before, you can define properties on one line, but you have to use valid TOML:

[shapes.Person]
name = { kind = "literal", datatype = "string", cardinality = "any" }
age = "integer"
knows = { kind = "reference", label = "Person", cardinality = "any" }
email = { kind = "iri" }
orcidId = { kind = "iri", cardinality = "optional" }

(specifically, objects must have keys - {"literal", "string", "any"} doesn’t parse)

There really isn’t “space” in the TOML/JSON data model for a more concise syntax than this. We have one thing that we can choose to do with string values (here we choose to have them mean required literal kinds), and after that we have to write out the full object format.

joeltg · October 13, 2020, 1:12pm

Thanks for the feedback!

[shapes.Person] defines a shape, and [shapes.Person.name] defines a property (of some kind) called “name” on that shape “Person”.

So if I want to define a shape with no properties, I just write:

[shapes.Person]

If I want to define a shape with a required literal property, I have to option of writing either

[shapes.Person]
name = "string"

or

[shapes.Person]
[shapes.Person.name]
kind = "literal"
datatype = "string"
cardinality = "required"

agnescameron · October 13, 2020, 1:25pm

aha, that makes sense – is there a reason not to call it an object?

re: compactness -> is there a reason other than aesthetic to want a compacted form? Equivalent formats (e.g. package.json files) require both keys+vals for everything

joeltg · October 13, 2020, 1:46pm

I think the same point about “record” applies to “object”, which is that it’s typically used to refer to instances, not classes. I don’t know if this is that big of a concern, but “shape” does do a good job of evoking something higher-level.

(edit: is there a reason not to call them classes?)

sj · October 13, 2020, 10:17pm

Thanks, corrected my toml. That seems compact enough.

trich · October 14, 2020, 9:43am

‘Shapes’ feels like it has the least baggage at the moment. I’d be fine with Classes (TerminusDB uses Classes and Properties for their schemas, fwiw) - but we may have to specify the difference from OOP Classes given that they would be similar in a lot of ways.

Heavy preference for Type A. It’s more verbose, but feels much more straight forward. That said, I’m still not entirely clear why something like the following is forbidden.

type PropertyObject =
  | { reference: string; cardinality?: Cardinality }
  | { kind: Datatype | "uri"; cardinality?: Cardinality }

(Feel free to swap the word ‘kind’ for whatever we land on). I think you may have addressed it in your response to SJ, but - the schema.toml isn’t an RDF file, so as long as we’re not introducing ambiguous statements, couldn’t we do some trivial processing to generate the verbose format if needed? And as long as we forbid any future custom literal type from having the name “uri”, we’d always know what the author intended.

Or, if we’re okay with the inconsistency, just forcing URI kinds to be explicit (whereas the others don’t need to, given that the label and type fields convey the needed context:

type PropertyObject =
  | { reference: string; cardinality?: Cardinality }
  | { type: Datatype; cardinality?: Cardinality }
  | { kind: "uri"; cardinality?: Cardinality }

joeltg · October 14, 2020, 8:47pm

Right, but it’s a schema for RDF data. The reason it’s forbidden is not that it makes schema.toml unambiguous - there are lots of ways of encoding the bits we need to encode - but exactly because it hides the difference between URIs and other literals, by making URIs look like “a literal with a ‘uri’ datatype”. It’s our job to explain the difference to people, not hide it.

(I’m imagining a “When to use URI properties” guide somewhere in our documentations)

Again, JSON-LD tried basically the exact same thing with @type and that ended up causing way more confusion than a more verbose alternative would have. The “simplicity” gave people a false impression and then they got burned as soon as they encountered a situation where the difference mattered.

trich · October 15, 2020, 10:11am

Gotcha. I think I missed the part about the ambiguity causing a problem for users’ mental model. I thought the json-ld folks felt it was a mistake because of syntactical challenges.

I’m fully on board with option A then.

ajp · August 2, 2021, 1:12pm

I wanted to raise the point of potential for ambiguity / squatting (security) with IRIs. e.g. you can buy various theguαrdian IRIs etc which are very hard to tell apart from theguardian or theguardiαn etc etc. Not using IRIs because of some ambiguity / security is obviously completely unfair on other alphabets and this might not matter as much for schemas as they would likely primarily exist within complex/complicated applications which require experts / the tools would provide support for highlighting potential mistakes / conflicts and deduping.

ajp · August 2, 2021, 4:22pm

Regarding name of Shape vs Class etc, the Solid Data Interoperability work have settled on calling something a Shape Tree. I am still wrapping my head around what exactly it is but might be relevant grist for the mill. See “Application Interoperability Walkthrough - Part 1 (VIDEO)” which links to this deck of presentation slides + audio. It’s introduced about 40 minutes in.

Here’s a link to their repo: Shape Trees · GitHub (linked to from that github page above).

Topic		Replies	Views
Schema imports and lifecycle Underlay	7	404	October 15, 2020
What's tasl and what's not tasl? Underlay	3	458	October 7, 2021
Schema Construction	4	379	October 23, 2020
Tasl feedback Underlay	3	561	December 16, 2020
A Better Tomorrow, Today: A 2022 Pipeline Fanfic Underlay	3	554	March 3, 2021

Minimum Viable Schemas

Related topics