Now that we’ve had a chance to write a few of them and that I’ve had to write a codemirror lexer and a textmate grammar, I think we’re in a good position to make some (informed) breaking changes to the tasl schema language. I want to do this now because it feels like we might be near the end of the just-us-using-this window, and there have been a couple issues that came up while writing grammars that should be addressed.
double-colons for class declarations
previously:
namespace ex http://example.com/
class ex:Person { }
now:
namespace ex http://example.com/
class ex:Person :: { }
reasoning: I want there to be more visual consistency among “statements that define a class” (ie edge and class declarations) that differentiate them from statements that only do things locally (ie namespace and type variable declarations). Pretend we added a list
keyword similar to class that declares a linked list. Looking at this schema…
class ex:A <>
type Foo string
list ex:Bar dateTime
type bar {}
edge ex:Bar ==/ ex:Z /=> ex:A
… you can’t really scan this quickly and tell “what classes are declared by this schema?” But by adding a little more consistency to distinguish them…
class ex:A :: <>
type Foo string
list ex:Bar :: dateTime
type bar {}
edge ex:Z :: ex:Bar => ex:A
… now we know what to look for. Statements that declare a class will always follow the [keyword] [class uri key] :: [... special syntax]
format.
I picked the double-colon because to me it has “export this to a global scope” connotations. I’m open to other tokens.
potential objections:
- makes tasl even more colon-dense than it already was
- ???
new edge declaration syntax
the point of changing the class statement syntax is to make all the class declarations (including edge statements) more consistent, so here we are.
previously:
namespace ex http://example.com/
class ex:Person { }
edge ex:Person ==/ ex:Friendship /=> ex:Person
now:
namespace ex http://example.com/
class ex:Person :: { }
edge ex:Friendship :: ex:Person => ex:Person
Personally I think this is a huge improvement; I regret trying to put the edge label in the middle between the source and target. As always, this expands to
class ex:Friendship :: {
ul:source -> * ex:Person
ul:target -> * ex:Person
}
potential objections:
- should it be a double-length arrow
==>
?
new edge metadata syntax
previously (I don’t know if this was ever documented) you could also annotate edges with a type like this:
edge ex:Person ==/ ex:Friendship someType /=> ex:Person
edge ex:Person ==/ ex:Rivalry <> /=> ex:Person
which would add an additional ul:value
component to the class:
class ex:Friendship :: {
ul:source -> * ex:Person
ul:target -> * ex:Person
ul:value -> someType
}
class ex:Rivalry :: {
ul:source -> * ex:Person
ul:target -> * ex:Person
ul:value -> <>
}
I like this feature a lot and think we should definitely keep it. For the new syntax, that looks like this:
edge ex:Friendship :: ex:Person =/ someType /=> ex:Person
ex:Rivalry :: ex:Person =/ <> /=> ex:Person
Notice that now I’m using just =/
for the first pipe segment instead of ==/
- this is something that I’d like feedback on… ==/ someType /=>
looks a little more “balanced” because it has three characters in each segment, but it’s just more stuff to type and =/ someType /=>
gets the same fundamental “message” across. I guess this is related to whether the unannotated edge statement should use =>
or ==>
. Barring any compelling reason to use the longer ones I’m planning on just defaulting to the shorter =>
and =/
. We could also potentially do something other than /
but I’m pretty happy with it.
Just to clarify - this means there are two versions of the edge declaration syntax, one with a value annotation, and one without.
# valid; doesn't have a ul:value component
edge ex:Friendship :: ex:Person => ex:Person
# also valid; has a ul:value component
edge ex:Friendship :: ex:Person =/ someType /=> ex:Person
Mandatory newlines; no more semicolons
This is the most significant change I’m proposing. Previously we used semicolons to delimit product components and coproduct options…
type foo { ex:a -> string; ex:b -> string; ... }
type bar [ ex:i >- string; ex:j >- string; ... ]
… and all type expressions were whitespace-insensitive; ie you could always write any arbitrarily complex type on one run-on line if you wanted. This was done roughly by analogy to JSON and to the TypeScript type =
declaration syntax.
Unfortunately there are some problems with this related to parsing URIs, which I could have seen coming but didn’t notice until I was started writing a TextMate grammar for syntax highlighting. Reference expressions like * ex:Person
end in a URI, which means that often you’ll end up writing something like { ex:friend -> * ex:Person; ... }
. But the semicolon is a valid character that can appear unescaped in URI path segments or fragments (it’s in the sub-delims
group in RFC3986 - “reserved” but still valid in most places), which means ex:Person;
needs to get parsed as a single token (a URI that happens to end in a semicolon).
I don’t think saying “you can only use X lexical subset of URIs in tasl” is on the table; if we’re using URIs we have to support URIs (I’m making a point of tokenizing the octects in an IPv6 hostname to demonstrate our commitment to this). “Any absolute URI is a valid token” is a fundamental organizing rule that I don’t want to break… which also also rules out commas and just about every other reasonable delimiter. I don’t want to require whitespace before the delimiter (which is the style that most of the examples are written in right now) because it’s a really nonstandard requirement and doesn’t look very nice.
(and this isn’t just a problem with reference types; it also shows up with the compact coproduct-of-units enum syntax [ex:a; ex:b; ex:c]
)
After playing around with lots of options I came to the conclusion that the simplest thing to do is just remove the delimiter altogether and require newlines for every product component and coproduct option; this makes tasl into a more blocky language similar to the way types are declared in Go, C, Rust, etc. This is, by a huge margin, the “safest” way of handling URIs, since whitespace is the only common character class that is guaranteed to not appear in them.
previously:
type hello [ex:a; ex:b; ex:c >- ? { ex:i -> integer; ex:j -> * ex:Person }]
now:
type hello [
ex:a
ex:b
ex:c <- ? {
ex:i -> integer
ex:j -> * ex:Person
}
]
Only empty products {}
(and empty coproducts []
, which are valid but essentially useless) can appear on one line, otherwise you need a newline after the opening one and before the closing one. You can still have arbitrary non-newline whitespace (ie spaces and tabs) anywhere. The optional operator stays on the same line, as in the example above.
# these are all valid --------------------------------------
class ex:A :: { }
class ex:B :: {
}
class ex:C :: [
ex:option1
ex:option2
]
# these are all INVALID ------------------------------------
class ex:X :: { ex:foo -> string }
class ex:Y :: { ex:component1 -> string
}
class ex:Z :: {
ex:component1 -> string }
potential objections:
- enums are a little more verbose because you can’t list all the options on one line
- dramatically changes the feel of the language
- ???
use regular left arrows for coproducts
I want to reverse my previous decision about <-
vs >-
.
previously:
type hello [ ex:a >- string; ex:b >- integer ]
now:
type hello [
ex:a <- string
ex:b <- integer
]
The only strong reason for >-
over <-
is that in some situations editors will auto-match <>
brackets, and typing <-
would “accumulate” a matching >
character to the right of the cursor. This only happens by default when the language isn’t known (whenever we write a language extension for any IDE we’re able to tell it explicitly which sets of brackets to auto-close).
After thinking about it more, it doesn’t seem like this is a big enough concern to outweigh the simple fact that “<-” is just easier to remember. Products point right; coproducts point left. That’s it.
Feedback on any of this is welcome. Over the next few weeks I’ll be focusing on writing copy for tasl.io, moving the contents of the underlay/apg repo into one master underlay/tasl repo, and releasing new minor versions of all the libraries, so there’s no hard deadline but I would like to finalize these fairly soon.