Modeling Best Practices
This is a living document to track schema design, examples and interactions, record questions and issues. This is not the schema specification, but a place to name issues encountered/anticipated in schema construction.
Example Schema: Recipe
Take as an example a schema for recipes, with a data model intended for use with a voice assistant. In an interaction, someone might ask for a recipe itself, then branch into metadata about ingredients in the recipe, similar recipes, potential adaptions and substitutions, or processes involved in the cooking steps.
In our model, recipe has a name and description, perhaps some other contextual metadata like linked cuisines + occasions. It also contains ingredients (unordered list) and directions (ordered list), which themselves contain metadata such as descriptions, estimated difficulty, required equipment etc etc.
A draft of this schema in .toml format (just considering recipes and ingredients) might look like:
namespace = "http://example.com/"
[shapes.Recipe]
name = "string"
[shapes.Recipe.url]
kind = "uri"
[shapes.Recipe.ingredients]
kind = "reference"
label = "Ingredient"
cardinality = "any"
[shapes.Ingredient]
[shapes.Ingredient.id]
kind = "uri"
[shapes.Ingredient.description]
kind = "literal"
datatype = "string"
cardinality = "optional"
[shapes.Ingredient.name]
kind = "literal"
datatype = "string"
Namespaces
Each shape in the schema gets defined according to a particular namespace, which gives each shape a unique URI that gets referenced when that schema gets imported elsewhere. In our example schema, everything currently sits within the namespace âexample.comâ, so the full URI for the first shape would be âexample.com/Recipeâ, and so on. We could also specify different namespaces for different parts of the schema, if we liked.
What if we wanted to use an existing namespace for our schema? Letâs look at how ârecipesâ and âingredientsâ get understood as objects in 2 different ontologies:
- Schema.org (commonly used for SEO through structured web-metadata) defines a Recipe as having multiple recipeIngredients (plus instructions, cuisine, etc). However, a recipeIngredient is just text, and has no properties in its own right.
We can actually see schema.orgâs recipe schemas in action, by looking at the source of big recipe websites (like Bon Appetit). Hereâs an example, taken from this recipe
{
"@context": "http://schema.org"
,"@type": "Recipe"
,"name": "Bucatini all\u0027Amatriciana"
,"image": "https://assets.bonappetit.com/photos/57afff221b33404414976058/16:9/w_1000,c_limit/bucatini-all-amatriciana.jpg"
,"author": {
"@type": "Person"
,"name": "Sarah Tenaglia"
}
,"publisher": {"@type":"Organization","name":"Bon Appétit","logo":{"@type":"ImageObject","url":"https://www.bonappetit.com/images/logo-foodculture-tablet@1x.png","width":322,"height":56}}
,"datePublished": "2010-04-11T20:00:00.000-04:00"
,"dateCreated": "2020-10-14T09:23:00.000-04:00"
,"description": "This classic sauce takes its spiciness from black pepper and dried chiles and its depth of flavor from guanciale, Italian salt-cured pork jowl."
,"aggregateRating": {"@type": "AggregateRating", "ratingValue": "4.29", "reviewCount": "171"}
,"recipeYield": "4 Servings"
,"recipeIngredient": [
"2 Tbsp. extra-virgin olive oil","4 oz. thinly sliced guanciale, pancetta, or chopped unsmoked bacon","1/2 tsp. crushed red pepper flakes","1/2 tsp. freshly ground black pepper","3/4 cup minced onion","2 cloves garlic, minced","1 28-oz. can peeled tomatoes with juices, crushed by hand t","Kosher salt","12 oz. dried bucatini or spaghetti","1/4 cup finely grated Pecorino (about 1 oz.)"
]
,"recipeInstructions": [
{"@type":"HowToStep","text":"Heat oil in a large heavy skillet over medium heat. Add guanciale and sauté until crisp and golden, about 4 minutes. Add pepper flakes and black pepper; stir for 10 seconds. Add onion and garlic; cook, stirring often, until soft, about 8 minutes. Add tomatoes, reduce heat to low, and cook, stirring occasionally, until sauce thickens, 15-20 minutes."},{"@type":"HowToStep","text":"Meanwhile, bring a large pot of water to a boil. Season with salt; add the pasta and cook, stirring occasionally, until 2 minutes before al dente. Drain, reserving 1 cup of pasta cooking water."},{"@type":"HowToStep","text":"Add drained pasta to sauce in skillet and toss vigorously with tongs to coat. Add 1/2 cup of the reserved pasta water and cook until sauce coats pasta and pasta is al dente, about 2 minutes. (Add a little pasta water if sauce is too dry.) Stir in cheese and transfer pasta to warmed bowls."}
]
,"nutrition": "One serving contains: Calories (Kcal) 524.6 %Calories From Fat 25.2 Fat (G) 14.7 Saturated Fat (G) 4.0 Cholesterol (Mg) 14.7 Carbohydrates (G) 75.9 Dietary Fiber (G) 6.0 Total Sugars (G) 7.4 Net Carbs (G) 69.9 Protein (G) 19.4 Sodium (Mg) 757.8"
}
This schema is pretty neat if your end step in a search task is a recipe (âhow do I cook Bucatini allâAmatricianaâ / âwhat should I do with all this guancialeâ / âdoes Bucatini allâAmatriciana include onionsâ (fyi â lots of angry italians in the comments say noâ )), but not so much if your questioning follows on from that, e.g. (âcan I use pancetta instead?â, âwhat is bucatini?â)
- FoodON â the food ontology â does not define a recipe object at all, instead following a much more scientific approach to defining food, in terms of products and transformations. You can see this from the screenshots below that the concept of âfood in popular cultureâ hasnât really made itâs way into the ontology, where the closest thing to a recipe in this sense is probably home food preparation process.
However, FoodON does have a really thorough catalog of ingredients (âfood productsâ), all of which have their own properties (e.g. descriptions), though some might be somewhat anatomicalâŠ
Tactics
As pointed out earlier, this will be a very common problem - my shape doesnât look like existing shapes - and we should have a well-defined solution to this that the interface can nudge people towards. I think that the way search + suggestion of different schemas will really shape how the broader structure is built when people start using the platform without our help (so, far off, but not so far that we shouldnât think about it).
As I understand it, what we want is for as many people as possible to be using the same schemas (or at least, schemas with some understood mapping between them, such as standard components), provided those schemas are correct for their purposes.
approach 1: itâs like schema.org, but I added some bits (foodON)
One tactic might be just to take the schema.org schema (which gives us a fairly useful shape), but add in ingredients that had descriptions attached to them as well.
This is technically an incorrect use of the schema.org schema, but a correct usage could be supported by a âfork this schemaâ feature, that shows people a few examples, but encourages people to adapt and keeps track of those adaptations. The referenced namespace would then be the forked schema, rather than schema.org, but it could reference the original in some way. (do schemas have prov? eek.)
approach 2: itâs my own schema, but I include foodON âfood productsâ as ingredients
Another tactic might be to have people structure schemas as they like, but be able to put different objects in the schema that fit the shape (the smaller the object, the more likely it might be to find something that fits). So, you might have a schema that references the foodON namespace when talking about the ingredients, and the schema.org namespace to talk about cuisines. Correspondence between different peoplesâ schemas could then be understood in terms of these more standard objects.
approach 3: itâs my own schema and it references itself.
We could just get people to make schemas from scratch each time, and find another tactic to map schema namespaces onto one another. Or, perhaps, we donât care so much about doing that? Not sure about this one; good to discuss.
Extended edition:
1: ordered lists
What if we wanted to add directions to the recipe? In a simple sense, we could add this straight as a text field to the Recipe object, in the same way that schema.org does it. This would probably be fine for a lot of processes, but it would also be interesting to consider the utility of being able to interpret a recipe in terms of transformations and processes, particularly in the context of a conversation. âI canât find my whisk, can I still make meringue?â is a perfectly reasonable question, but requires an understanding of equivalent processes and equipment. Similarly, it might be a good way to understand the complexity and difficulty of a particular recipe.
In the current format, ordered lists are added to schemas in the form of linked lists. If we wanted to add an ordered list of directions to the schemas, it would look like (omitting the rest of the earlier schema for clarity):
[shapes.Recipe]
...
[shapes.Recipe.directions]
label = "DirectionList"
cardinality = "required"
[shapes.DirectionList]
[shapes.DirectionList.head]
kind = "reference"
label = "Direction"
[shapes.DirectionList.tail]
kind = "reference"
label = "DirectionList"
cardinality = "optional"
[shapes.Direction]
[shapes.Direction.id]
kind = "uri"
[shapes.Direction.description]
kind = "literal"
datatype = "string"
cardinality = "optional"
[shapes.Direction.process]
kind = "literal"
datatype = "string"
cardinality = "optional"
â this is a discussion for another day, but worth noting here too: if the source thatâs most adept at filling out the schema.org information becomes the authority on whether this dish contains onions, rather than the people whoâve been making this dish for hundreds of years (insert other knowledge problem here), thatâs not a good system. One thing that feels really essential with this work is finding tactics for collecting + structuring information that donât reproduce the âGoogleâ idea of legibility. Which, in theory, is one of the things that the Underlay is about, but itâs interesting to think about how these supposedly open and accessible systems get gamed in practiceâŠ