Schema Construction

Modeling Best Practices

This is a living document to track schema design, examples and interactions, record questions and issues. This is not the schema specification, but a place to name issues encountered/anticipated in schema construction.

Example Schema: Recipe

Take as an example a schema for recipes, with a data model intended for use with a voice assistant. In an interaction, someone might ask for a recipe itself, then branch into metadata about ingredients in the recipe, similar recipes, potential adaptions and substitutions, or processes involved in the cooking steps.

In our model, recipe has a name and description, perhaps some other contextual metadata like linked cuisines + occasions. It also contains ingredients (unordered list) and directions (ordered list), which themselves contain metadata such as descriptions, estimated difficulty, required equipment etc etc.

A draft of this schema in .toml format (just considering recipes and ingredients) might look like:

namespace = "http://example.com/"

[shapes.Recipe]
name = "string"
[shapes.Recipe.url]
kind = "uri"
[shapes.Recipe.ingredients]
kind = "reference"
label = "Ingredient"
cardinality = "any"

[shapes.Ingredient]
[shapes.Ingredient.id]
kind = "uri"
[shapes.Ingredient.description]
kind = "literal"
datatype = "string"
cardinality = "optional"
[shapes.Ingredient.name]
kind = "literal"
datatype = "string"

Namespaces

Each shape in the schema gets defined according to a particular namespace, which gives each shape a unique URI that gets referenced when that schema gets imported elsewhere. In our example schema, everything currently sits within the namespace ‘example.com’, so the full URI for the first shape would be ‘example.com/Recipe’, and so on. We could also specify different namespaces for different parts of the schema, if we liked.

What if we wanted to use an existing namespace for our schema? Let’s look at how ‘recipes’ and ‘ingredients’ get understood as objects in 2 different ontologies:

  • Schema.org (commonly used for SEO through structured web-metadata) defines a Recipe as having multiple recipeIngredients (plus instructions, cuisine, etc). However, a recipeIngredient is just text, and has no properties in its own right.

We can actually see schema.org’s recipe schemas in action, by looking at the source of big recipe websites (like Bon Appetit). Here’s an example, taken from this recipe


      {
        "@context": "http://schema.org"
        ,"@type": "Recipe"
        ,"name": "Bucatini all\u0027Amatriciana"
        ,"image": "https://assets.bonappetit.com/photos/57afff221b33404414976058/16:9/w_1000,c_limit/bucatini-all-amatriciana.jpg"
        ,"author": {
          "@type": "Person"
          ,"name": "Sarah Tenaglia"
        }
        ,"publisher": {"@type":"Organization","name":"Bon Appétit","logo":{"@type":"ImageObject","url":"https://www.bonappetit.com/images/logo-foodculture-tablet@1x.png","width":322,"height":56}}
        ,"datePublished": "2010-04-11T20:00:00.000-04:00"
        ,"dateCreated": "2020-10-14T09:23:00.000-04:00"
        ,"description": "This classic sauce takes its spiciness from black pepper and dried chiles and its depth of flavor from guanciale, Italian salt-cured pork jowl."
        ,"aggregateRating": {"@type": "AggregateRating", "ratingValue": "4.29", "reviewCount": "171"}
        ,"recipeYield": "4  Servings"
        ,"recipeIngredient": [
          "2 Tbsp. extra-virgin olive oil","4 oz. thinly sliced guanciale, pancetta, or chopped unsmoked bacon","1/2 tsp. crushed red pepper flakes","1/2 tsp. freshly ground black pepper","3/4 cup minced onion","2 cloves garlic, minced","1 28-oz. can peeled tomatoes with juices, crushed by hand t","Kosher salt","12 oz. dried bucatini or spaghetti","1/4 cup finely grated Pecorino (about 1 oz.)"
        ]
        ,"recipeInstructions": [
          {"@type":"HowToStep","text":"Heat oil in a large heavy skillet over medium heat. Add guanciale and sauté until crisp and golden, about 4 minutes. Add pepper flakes and black pepper; stir for 10 seconds. Add onion and garlic; cook, stirring often, until soft, about 8 minutes. Add tomatoes, reduce heat to low, and cook, stirring occasionally, until sauce thickens, 15-20 minutes."},{"@type":"HowToStep","text":"Meanwhile, bring a large pot of water to a boil. Season with salt; add the pasta and cook, stirring occasionally, until 2 minutes before al dente. Drain, reserving 1 cup of pasta cooking water."},{"@type":"HowToStep","text":"Add drained pasta to sauce in skillet and toss vigorously with tongs to coat. Add 1/2 cup of the reserved pasta water and cook until sauce coats pasta and pasta is al dente, about 2 minutes. (Add a little pasta water if sauce is too dry.) Stir in cheese and transfer pasta to warmed bowls."}
        ]
        ,"nutrition": "One serving contains:   Calories (Kcal)          524.6  %Calories From Fat   25.2  Fat (G)            14.7  Saturated Fat (G)     4.0  Cholesterol (Mg)      14.7  Carbohydrates (G)   75.9  Dietary Fiber (G)       6.0  Total Sugars (G)       7.4  Net Carbs (G)            69.9  Protein (G)     19.4  Sodium (Mg) 757.8"
      } 

This schema is pretty neat if your end step in a search task is a recipe (“how do I cook Bucatini all’Amatriciana” / “what should I do with all this guanciale” / “does Bucatini all’Amatriciana include onions” (fyi – lots of angry italians in the comments say no†)), but not so much if your questioning follows on from that, e.g. (“can I use pancetta instead?”, “what is bucatini?”)

  • FoodON – the food ontology – does not define a recipe object at all, instead following a much more scientific approach to defining food, in terms of products and transformations. You can see this from the screenshots below that the concept of ‘food in popular culture’ hasn’t really made it’s way into the ontology, where the closest thing to a recipe in this sense is probably home food preparation process.

Screen Shot 2020-10-20 at 09.21.18

However, FoodON does have a really thorough catalog of ingredients (‘food products’), all of which have their own properties (e.g. descriptions), though some might be somewhat anatomical


Screen Shot 2020-10-20 at 10.04.45

Tactics

As pointed out earlier, this will be a very common problem - my shape doesn’t look like existing shapes - and we should have a well-defined solution to this that the interface can nudge people towards. I think that the way search + suggestion of different schemas will really shape how the broader structure is built when people start using the platform without our help (so, far off, but not so far that we shouldn’t think about it).

As I understand it, what we want is for as many people as possible to be using the same schemas (or at least, schemas with some understood mapping between them, such as standard components), provided those schemas are correct for their purposes.

approach 1: it’s like schema.org, but I added some bits (foodON)

One tactic might be just to take the schema.org schema (which gives us a fairly useful shape), but add in ingredients that had descriptions attached to them as well.

This is technically an incorrect use of the schema.org schema, but a correct usage could be supported by a ‘fork this schema’ feature, that shows people a few examples, but encourages people to adapt and keeps track of those adaptations. The referenced namespace would then be the forked schema, rather than schema.org, but it could reference the original in some way. (do schemas have prov? eek.)

approach 2: it’s my own schema, but I include foodON ‘food products’ as ingredients

Another tactic might be to have people structure schemas as they like, but be able to put different objects in the schema that fit the shape (the smaller the object, the more likely it might be to find something that fits). So, you might have a schema that references the foodON namespace when talking about the ingredients, and the schema.org namespace to talk about cuisines. Correspondence between different peoples’ schemas could then be understood in terms of these more standard objects.

approach 3: it’s my own schema and it references itself.

We could just get people to make schemas from scratch each time, and find another tactic to map schema namespaces onto one another. Or, perhaps, we don’t care so much about doing that? Not sure about this one; good to discuss.

Extended edition:

1: ordered lists

What if we wanted to add directions to the recipe? In a simple sense, we could add this straight as a text field to the Recipe object, in the same way that schema.org does it. This would probably be fine for a lot of processes, but it would also be interesting to consider the utility of being able to interpret a recipe in terms of transformations and processes, particularly in the context of a conversation. “I can’t find my whisk, can I still make meringue?” is a perfectly reasonable question, but requires an understanding of equivalent processes and equipment. Similarly, it might be a good way to understand the complexity and difficulty of a particular recipe.

In the current format, ordered lists are added to schemas in the form of linked lists. If we wanted to add an ordered list of directions to the schemas, it would look like (omitting the rest of the earlier schema for clarity):

[shapes.Recipe]
...
[shapes.Recipe.directions]
label = "DirectionList"
cardinality = "required"

[shapes.DirectionList]
[shapes.DirectionList.head]
kind = "reference"
label = "Direction"
[shapes.DirectionList.tail]
kind = "reference"
label = "DirectionList"
cardinality = "optional"

[shapes.Direction]
[shapes.Direction.id]
kind = "uri"
[shapes.Direction.description]
kind = "literal"
datatype = "string"
cardinality = "optional"
[shapes.Direction.process]
kind = "literal"
datatype = "string"
cardinality = "optional"

† this is a discussion for another day, but worth noting here too: if the source that’s most adept at filling out the schema.org information becomes the authority on whether this dish contains onions, rather than the people who’ve been making this dish for hundreds of years (insert other knowledge problem here), that’s not a good system. One thing that feels really essential with this work is finding tactics for collecting + structuring information that don’t reproduce the ‘Google’ idea of legibility. Which, in theory, is one of the things that the Underlay is about, but it’s interesting to think about how these supposedly open and accessible systems get gamed in practice


1 Like

quick note: I think you’re missing the IngredientList definition in the example schema

Something like:

[shapes.IngredientList]
[shapes.IngredientList.head]
type = "reference"
label = "Ingredient"
[shapes.IngredientList.tail]
type = "reference"
label = "IngredientList"
cardinality = "optional"

(edit: and that’s only if you want them to be ordered - you could just use just reference Ingredient directly with cardinality = "any" if you want them to be unordered)

:100:

++ yes, my mistake. I’ll include the directionList to give an example of an OL

The referenced namespace would then be the forked schema, rather than schema.org, but it could reference the original in some way. (do schemas have prov? eek.)

Sure enough :slight_smile:

Ordered lists: an argument for arrays as a base datatype.

if the source that’s most adept at filling out the schema.org information becomes the authority on whether this dish contains onions, rather than the people who’ve been making this dish for hundreds of years ( insert other knowledge problem here ), that’s not a good system. One thing that feels really essential with this work is finding tactics for collecting + structuring information that don’t reproduce the ‘Google’ idea of legibility.

Yes. Google controls legibility through a few opaque algorithms, but at least has a low barrier to being indexed: being linked to by other things on the Web. At the moment we run the additional risk of not even being able to see anything that hasn’t been properly formatted.

reduced schema format c/o SJ. only make things that need to be shapes, shapes?

namespace = "https://schema.org/"

[shapes.Recipe]
name = "string"
url = "string"
[shapes.Recipe.ingredient]
kind = "reference"
label = "Ingredient"
cardinality = "any"

[shapes.Ingredient]
name = "string"
[shapes.Ingredient.id]
kind = "uri"
[shapes.Ingredient.description]
kind = "literal"
datatype = "string"
cardinality = "optional"