Tasl feedback

joeltg · December 15, 2020, 7:23pm

Before reading this, please go through the schema documentation again from start to finish - some existing pages have been partially or totally re-written and many new ones have been added.

As the name implies, tasl is intended to be a minimal representation of algebraic data types, using RDF literals and URIs as primitives. On top of that, in places where it can be done consistently and unobtrusively, tasl has some syntactic sugar for common structural patterns.

I can expand on rationale if people are interested, but my general feeling is that I’m committed to these parts:

the overall algebraic data model
using URIs for naming classes, components, and options
using {} for products and [] for coproducts
calling coproducts “coproducts” and not “unions” or anything else. “union” in particular is misleading since they’re technically discriminated unions (which are different in very significant ways!)
requiring that every URI used in the schema is from a namespace that is declared at the top. in other words, no “raw” or inline URIs, and you have to declare prefixes for the namespaces you use.
no option for setting a “default” or empty-prefix namespace
URIs aren’t quoted; you just write e.g. ex:Person directly. this is a major, forcing design decision w/r/t other syntax elements
using <> for the URI type and <ex:jdkflsjk> for literals. this matches the informal design language i’ve been developing for graph visualizations, where URIs are diamonds and literals are corner-cut rectangles
having a concept of a “type variable” that associates types with local (non-class, non-exported) identifiers that you can re-use in later type expressions
defining global type variables for xsd datatypes like string, integer, etc. This feels like the appropriate degree of defaultishness
the optional operator ? (it comes before the type, not after)
using * for references
using # for comments
not supporting mult-line comments
the name “tasl”

here are some medium-strongly-held opinions:

not capitalizing tasl, ever
having the global variables unit and uri

and here are the things that I’m not really satisfied with or still questioning:

1. unit type syntax

Should the unit type should have its own symbol (like !) or if we should write it as the empty product object {}? Typically the unit type feels like its own, different kind of type. But mathematically speaking they’re identical. It’s weird to have two different syntactic ways of writing the same thing. It might not matter that much if people just end up using a unit global variable all the time. I’m inclined to switch to {}.

2. syntax for components and options

>- is pretty weird. I like the way it looks with the ligature font but I still haven’t gotten used to typing it.

I had originally wanted to use a reverse arrow <- for options, but that doesn’t play well with auto-matching brackets, which most IDEs do by default for unrecognized file types and which I’d like to enable anyway for the sake of URIs and literals (every time you type < for an option you’d accumulate an extra > that you’re not going to use).

We can’t use colons because the URIs will all have colons in them

type foo {
  ex:bar: integer
}

I still think using -> for both products and coproducts doesn’t do enough to visually distinguish them:

type foo {
  ex:bar -> integer;
  ex:jfklfjskl -> string;
  ex:jfklfjskl -> string;
  ex:jfklfjskl -> [
    ex:ajklsa -> integer;
    ex:ajklsa -> integer;
    ex:ajklsa -> integer;
    ex:ajklsa -> integer;
    ex:ajklsa -> integer;
  ];
  ex:jfklfjskl -> string;
  ex:jfklfjskl -> string;
}

… here the arrows attract all the attention, and the crucial info is hidden at the top and bottom. This isn’t a design problem that most languages have because most languages just have one kind of syntactic map (and they usually get to use colons for it). e.g. in JavaScript the context that “this big block is an object” is almost always implicit and/or you can tell just by noticing the colons; there’s nothing you have to distinguish it from.

no spacer?

One totally different way to go is to not have a spacing token at all:

type foo {
  ex:bar integer;
  ex:jfklfjskl string;
  ex:jfklfjskl string;
  ex:jfklfjskl [
    ex:ajklsa integer;
    ex:ajklsa integer;
    ex:ajklsa integer;
    ex:ajklsa integer;
    ex:ajklsa integer;
  ];
  ex:jfklfjskl string;
  ex:jfklfjskl string;
}

this doesn’t cause any technical syntactic problems with parsing, but it doesn’t really help either. plus, property names can vary in length a lot

type foo {
  ex:bar dateTime;
  ex:someLongerPropertyName ? boolean;
}

… here it feels like the long property name makes it harder to see what’s going on with the one above it. Go doesn’t use colons in any of its struct or type declaration syntax, but it only works well and looks nice since they have a canonical formatter for every major IDE that auto-inserts spaces:

type foo struct  {
  bar                    byte
  someLongerPropertyName bool
}

without that, I do feel like we should have some kind of spacer token, and arrows were the most natural candidate.

type foo {
  ex:bar -> dateTime;
  ex:someLongerPropertyName -> ? boolean;
}

a different delimiter?

I was reading about dhall and saw that they write coproducts like this: <foo integer | bar integer | baz string> (foo, bar, baz are the option keys). Right now we use the same delimiter ; for products and coproducts… maybe one way to go is to use a different delimiter for coproducts?

This looks really good when you write it on one line (like the dhall example), but it’s not obvious what the right thing to do for multi-line blocks is. If we use the pipe |, which is the most familiar union-ish symbol, we’d have to at least add a space:

type foo {
  ex:bar integer;
  ex:jfklfjskl string;
  ex:jfklfjskl string;
  ex:jfklfjskl [
    ex:ajklsa integer |
    ex:ajklsa integer |
    ex:ajklsa integer |
    ex:ajklsa integer |
    ex:ajklsa integer |
  ];
  ex:jfklfjskl string;
  ex:jfklfjskl string;
}

… which is weird. If we move the pipe inside…

type foo {
  ex:bar integer;
  ex:jfklfjskl string;
  ex:jfklfjskl string;
  ex:jfklfjskl [
    | ex:ajklsa integer
    | ex:ajklsa integer
    | ex:ajklsa integer
    | ex:ajklsa integer
    | ex:ajklsa integer
  ];
  ex:jfklfjskl string;
  ex:jfklfjskl string;
}

…then suddenly products and coproducts are very different! more different than we asked for!

I’m open to any and all suggestions here.

3. declaring things

It feels like tasl doesn’t really have a consistent approach to declaring things.

Right now, we have four kinds of declarations:

namespace declarations: namespace ex http://example.com/
type declarations: type foo ? { ex:bar -> integer }
class declarations: class ex:Person { ex:name -> foo }
edge declarations: edge ex:foo ==/ ex:bar /=> ex:baz

3.1. edges

The last of these - edge declarations - aren’t covered in the documentation yet. But the gist is that

edge ex:foo ==/ ex:bar /=> ex:baz

expands to

class ex:bar {
  ul:source -> * ex:foo;
  ul:target -> * ex:baz;
}

… note that the class that’s being declared is the middle URI of the edge declaration syntax. the ascii art is supposed to communicate that the middle URI is the “label” of the big arrow, which goes from source to target.

edge is a different kind of syntactic sugar than the optional operator ?. The optional operator works on types - it takes one type in a produces another type. The edge declaration needs to create a whole new class, which means it needs its own URI label and can’t be nested inside other type declarations (this is related to why there can’t be syntactic sugar for multi-valued properties).

One example of another shorthand syntax that we might want to add is list - letting people create classes for linked lists of things. This is the same “kind” as edge in the sense that it needs to be its own class and the user would have to give it its own URI name. One sketch looks like this:

list ex:IntegerList :: integer

which would expand to

class ex:IntegerList ? {
  ul:head -> integer;
  ul:tail -> * ex:IntegerList;
}

It feels like it’d be smart to anticipate these (regardless of whether list is a good idea or not) and try to preemptively unify the declaration syntax a bit. Maybe :: (or similar) could be the “this is shorthand syntax that expands to a larger class declaration” token, and we could write both edges and lists like this:

# notice how ex:bar comes first now!
edge ex:bar :: ex:foo ==> ex:baz

# and edges and lists look similar!
list ex:IntegerList :: integer

and other shorthand classes we find could all follow the label :: ...weirdstuff pattern. I’m medium-strongly in favor of making this change. I like :: but am open to other suggestions.

3.2 namespaces, types, classes

Right now we don’t use = or any kind of assignment token, we just declare things with keywords. This is the simplest thing to do, but I wonder if we’re missing an opportunity to make the distinction between namespaces and types, which are local to the schema, and classes (and edges and whatever else), which are exported to the world.

I’m particularly scared that the difference between types and classes is going to be confusing. Just declaring a type

type foo {
  ex:name -> string
}

doesn’t do anything. Only classes actually matter.

type foo {
  ex:name -> string
}

class ex:Thing foo

The thing that namespaces and types have in common is that they define local alphanumeric (ie not URI) identifiers (prefixes and type variables). I keep debating whether it’s worth it to try to visually distinguish them somehow

namespace ex = http://example.com/
type foo = { ex:name -> string }
class ex:Thing foo

… or maybe by adding another keyword, like export

namespace ex http://example.com/
type foo { ex:name -> string }

export class ex:Thing foo
export edge ex:Thing2Thing :: ex:Thing ==> ex:Thing

… where just declaring class or edge without export in front of it is invalid syntax. Generally I like the pattern of using keywords for class-level things and tokens for type-level things (that’s e.g. why I don’t want to add type keywords like product { ... }) and this fits with that ethos.

I’m also open to more radical reworkings of the declaration syntax if people have any ideas they like.

4. the name “class”

I’m not particularly attached to it but I also don’t really like the alternatives that much.

Let me know if you have thoughts on any of these! I wrote this pretty fast so some of it might not make sense

trich · December 16, 2020, 3:43pm

Huh - I didn’t notice this when reading through the docs and examples - but ya, looks like nothing is quoted in the entire schema. Is that right? If so, I really like that - makes for a much cleaner reading/writing experience if we can get away with it technically.

I think overcoming the desire from others to capitalize tasl when it’s at the start of a sentence will be an unwinnable battle. If we’re okay with people breaking that convention (and us trying to stick with it) - then I think we agree to never capitalize tasl - but if we think we’d be constantly trying to correct people to start sentences with all-lowercase tasl, I think we should drop the case and just be okay with using Tasl ourselves.

When reading through examples, I actually preferred the variable names over <> and !. It felt more consistent and readable when paired with other lines like ex:name -> string;

The one advantage I can see of ! over {} is that it is more explicit that you meant unit type, whereas {} might occur when someone intended to add component types, but forgot or skipped over it accidentally. If we encouraged people to use the unit variable as best practice though, that may remove the usefulness of !.

One suggestion I also sent in slack is to use ~> for options.

I’m also really partial to the Go/Prisma approach of just using spaces and then having auto-formatting that aligns everything. However, since a lot of this will be done in a browser with codemirror to begin, those shortcut-heavy, IDE-friendly workflows might not be well suited. Not to mention the lack of differentiation between components and options.

I preceding pipes feels too distracting and different in my opinion.

I wonder if there’s an opportunity to group components, optional components, and options in a single structure. Something like:

ex:thing -> string;
ex:thing -? string;
ex:thing -| string;

Or even without the hyphen:

ex:thing > string; (or maybe keeping ex:thing -> string;)
ex:thing ? string;
ex:thing | string;

I have a similar hesitation that the difference between classes and types will get a bit lost. I don’t think exporting is necessarily the right approach, as it’s not clear that we’d have an import word anywhere.

I like the double colon syntax, fwiw. Don’t know exactly where best to use it, but I think it’s much easier to type than ==/ ex:foo /=> (though not nearly as fun to read).

I’ve typed out a couple ideas, but none of them really feel elegant. Happy to think through this more or have a brainstorming call to play with different syntax to unify declarations.

sj · December 16, 2020, 6:05pm

I like this division of decisions/beliefs by strength of conviction + satisfaction.

unit and uri : the strings are clear. ! is confusing in the middle of a block of text, <> makes me think “address of the current doc” or “empty instance of something”…

global type vars can be redefined : feels like it could lead to confusion, and seems to conflict w/ the goal of making namespace choices more visible and explicit.

Options : these are clearest to me with a different separator, inline or multi-line. The relation b/t ex:thing and string is similar when it appears as a component and as an option, the difference is in the optionality + not that relationship.

A pipe that can be at either the end or start of the line feels fine; the latter doesn’t feel “too different” to me, and formats cleanly (you can see at a glance where options appear) in a doc.

| ex:alphalpha integer
| ex:omegdala integer
| ex:tinyblobulin base85blob

local vs global, sense of exporting to the world, use of “class” vs “type”:
This set of choices and guidelines are the least clear to me at present.

What does it mean for a class to be exported / exportable / global; what are the differential uses of
| type foo { ex:tf --> bool }
| clas ex:foo { ex:tf --> bool }
| edge ex:foo :: ex:tf ==> bool ?

agnescameron · December 16, 2020, 6:51pm

++ to {} for the unit type? i feel like in the following case:

given {} is still valid syntax that mistake would still occur, and at least if {} is documented as being the way to write unit types, it might avoid mysterious bugs? maybe I’m not reading this bit right though. I like unit a lot + definitely more than !.

there’s something nice about >-, though maybe ~ could also be a candidate?

Without looking at the documentation (e.g. just from the example given in R1 box), the distinction between types and classes is pretty hard to parse, though once you’ve read the documentation the distinction is pretty clear. It does make it a little harder to skim a schema though; not sure what I think of export but i think highlighting/bolding class declarations is helpful.

would entity work instead of class? indicating the difference btwn an abstract shape of something and an instantiation of that thing as an object that can be linked to and engaged with

Topic		Replies	Views
Tasl schema langauge updates Underlay	9	617	June 23, 2021
What's tasl and what's not tasl? Underlay	3	459	October 7, 2021
Minimum Viable Schemas Underlay	18	779	August 2, 2021
Schema Construction	4	379	October 23, 2020
A Better Tomorrow, Today: A 2022 Pipeline Fanfic Underlay	3	554	March 3, 2021