[fanfic] Docmaps and Context

We have a growing number of use cases that feel like they should be able to fit into PubPub, but we’ve yet to find a clear articulation of how.

  • Discussing a pub from another community - or set of pubs from a set of communities (e.g. Frankenbook classroom case)
  • Reviewing a pre-print from another site (e.g. Rapid Reviews, overlay journals)
  • Discussing a publication from another site (e.g. Overlay-journalish, but with focus on private journal-club-like conversation)
  • Serving as a ‘preprint registry’ based on PDF submissions (e.g. biorxiv)
  • Integrating external reviews with pubpub-based reviews
  • Integrating external tweets/discussions with pubupb-based conversations
  • Taking historical grey-literature and hosting it in a space for discussions, while leaving open the door to ‘upgrading’ it to an HTML version

The core of the challenge here seems to be difficulty in identifying specific ‘objects’. What is a review? What is a pub? Considering docmaps and the recent clarity on context (i.e. the strength of a digital publication being content + context), I found an articulation for docmaps and pubpub that make it easier for me to see how the applications above might be met.

I’ll start by suggesting a docmaps schema, point out key attributes of it, and then share examples for PubPub objects and the applications above. However, after writing this, there are a few key points that still exist in figma or notes that I couldn’t find the right way to articulate. It may be worth working through some of those nuances on a call.

DOCMAP
{
	title: <string>,
	label: <string>,
	identifiers: [{ type: <string>, id: <string> }],
	date: <datetime>
	abstract: <string>,
	attributions: [<attribution>],
	journal: <string>,
	publisher: <string>,
	content: [<docmap>] | { type: <string>, link: <string>, exports: [{ type: <string>, link: <string> }] },
	context: {
		discussions: [<discussion>],
		reviews: [<review>],
		parents: [<connection>],
		siblings: [<connection>],
		children: [<connection>],
		forks: [<fork>],
		catalysts: [<catalyst>],
		files: [<file>],
		analytics: [<analytic>],
	}
}

Key points:

  • There are really three core components to the docmap description above: root-level metadata, a content field, a context field.
  • In many cases, context items (e.g. discussions, reviews, catalysts) could be a simple URL, allowing both external and pubpub-native discussion objects (for example) to sit side-by-side. This also breaks out the challenge a little bit so that each context type can have its own set of standards and norms in the spec.
  • Note the identifiers field which takes an array of identifiers (e.g. DOI, ISBN, PubPub id, etc)
  • Missing from this is the explicit concept of a PubPub draft. While I can imagine a way to capture every step of a document with the docmap format above, I don’t think that’s the use case most people have.
  • Note the recursive nature of the content field. It’s value can be either a specific value with type and link, or an array of docmaps. This allows us to represent a book (i.e. a PubPub collection) as a docmap whose content is a list of docmaps (pubs). One unexpected result of this line of thinking was that perhaps Pubs ought to also have an array of docmaps as their content, where each child is a specific release.
  • Each recursive docmap has it’s own context field. This means discussions, reviews, translations, etc can associated as specifically (i.e. release) or generally (i.e. collection-level) as needed.

Considering the notes above, a PubPub book may be represented as follows:

// Book Example
{
	title: "Frankenbook",
	label: "Book",
	identifiers: [ { type: "doi", id: "10.271/21441"}],
	content: [
		{
			title: "Chapter 1",
			label: "Pub",
			identifiers: [ { type: "doi", id: "10.271/21441.2451"}],
			content: [
				{
					title: "Chapter 1"
					label: "Release",
					date: "July 5, 2019"
					content: {
						type: "HTML",
						link: "assets.pubpub.org/frankenchapter1_v2.html",
					},
					context: {},
				},
				{
					title: "Chapter One"
					label: "Release",
					date: "April 2, 2018"
					content: {
						type: "HTML",
						link: "assets.pubpub.org/frankenchapter1_v1.html",
					},
					context: {},
				},
			],
			context: {},
		},
		{
			title: "Chapter 2",
			label: "Pub",
			identifiers: [ { type: "doi", id: "10.271/21441.2452"}],
			content: [
				{
					title: "Chapter 2"
					label: "Release",
					date: "April 2, 2018"
					content: {
						type: "HTML",
						link: "assets.pubpub.org/frankenchapter2_v1.html",
					},
					context: {},
				},
			],
			context: {},
		}
	],
	context: {},
}

It also becomes straightforward to imagine an overlay journal (either on internal pubpub content or an external pdf):

{
	title: "Measuring Bacteria Stuff",
	label: "Pub",
	identifiers: [ { type: "doi", id: "10.271/21441.2451"}],
	content: [
		{
			title: "Measuring Bacteria Stuff"
			label: "Release",
			date: "July 5, 2019"
			content: {
				type: "HTML" (or "PDF"),
				link: "assets.pubpub.org/bacteria2.html" (or "biorxiv.org/pdf/whatever"),
			},
			context: {
				reviews: [
					{
						type: "url"
						link: "nytimes.com/review-of-that-paper"
					},
					{
						type: "pbpbReview",
						title: <string>
						identifiers: [{ type: <string>, id: <string> }],
						target: <docmap> | <string: link>
						reviewers: <attribution>
						managers: <attribution>
						reviewMessage: <string>
						reviewDoc: <docmap> | <string: link>
						reviewResponses: [<question>]
					}
				],
				discussions: [
					{
						type: "url",
						link: "twitter.com/tweet12"
					},
					{
						type: "pbpbDiscussion",
						...
					}
				]
			},
		}
	],
	context: {}
},

Happy to craft more examples for specific use cases to flesh out the idea, but also useful in my notes were explicit doubts and questions:

Is PubPub more aligned towards publishing original works (i.e. things for with you are the publisher) compared to overlay works?

This question kept gnawing at me until I realized I was still stuck in the thinking that a Pub is really about the content. If that’s the case, it feels awkward to be publishing work that you aren’t the publisher/author of. But, breaking that mental model, and seeing a Pub as content+context makes it obvious that a legitimate form of publishing is context-heavy while content-light (i.e. context over someone else’s published article.) So, an overlay journal is publishing something new - the context! Even though the overlay may not ‘own’ the content.

Should all context around a single work (e.g. Frankenbook) be captured in a single community? Do we pull multiple communities’ content to create the Frankenbook docmap?

A docmap describes a ‘work’, but a ‘work’ is not necessarily captured in its entirety by a docmap. A docmap captures content + context, but certainly not exhaustively (e.g. there will always be non-digital context that goes uncaptured). By extension, a docmap is not a singular ground truth for a work, and it is completely fine for there to be multiple docmaps that describe the content + context of a single work, each suited to a different audience, privacy level, or perspective. Though - the docmap maintained by the original author/publisher is likely to be seen as canonical in most cases.

PubPub Ramifications

  • A pub isn’t published or not, it has published content (releases) or not.
    • The work (i.e. the abstract concept of a ‘thing’ which has versions, edits, final copies, etc) being “published” in a misnomer. The work can have published versions (i.e. releases), but the work is just the work. A docmap tries to fully capture the ‘work’, though it will always be an imperfect/incomplete effort.
  • Pubs may need to have a pub-level public/private switch.
    • You may have empty docmaps content field (e.g. still in draft) with context. You may not want that context visible.
    • If you fork a pub (i.e. clone the docmap details into another community), you may be carrying over a public release which you want to keep private for a private discussion.
  • A pub name describes the name of the content+context shell. Often the content will share a name, but it’s okay to have cases where the content is titled ‘Study on Bacteria’ and the pub is titled ‘Reviews of Study on Bacteria’.
  • We may need the concept of a Pub without a draft. If the content is specified to be a PDF from an external preprint server or a grey-literature/historical upload, we want the docmap to reflect that PDF in the content field, not an HTML document that is saying, “This pub is about this external PDF”.
  • There isn’t necessarily anything keeping a PubPub Discussion from being linked as a Review context item for the same (or different) pub. PubPub gives you a discussion-friendly UI to make discussions, and a review-friendly UI to make reviews, but if they get categorized as something else (or categorized in multiple ways) based on their real-world usage, that’s fine!
  • In the same way we’ve discussed a ‘Presentation View’ for Pubs in addition to the single-column read view, we may want a ‘Context View’ that gives a dashboard like experience for browsing all discussions, reviews, etc while reading the doc. Authors may even want to make this view primary in cases where the content is external (e.g. an overlay journal whose content is an external PDF and thus primary focus is the associated reviews context). See the quick block-mockup below for an idea of what I mean:

    A multi-column layout like this may also be useful for creating reviews from the perspective of an invited reviewer.

To ground an overarching idea that has been clarifying in writing this up: perhaps we ought to see PubPub a creation/permissions/social UI on top of docmaps. PubPub is about making it easy to read, create, and publish docmaps items. Permissions may be simplified by having a model like that in our head (e.g. Who has access to create review context items at the Pub level? to suggest review context items? to create discussion items? to see content items?).

Appendix

As a final note, I want to expand on the catalysts context field for a moment. I’m imagining a space like this to be used to capture:

  1. technical transformations that caused the current docmap to be (e.g. prosemirror steps)
  2. social non-digital influences that caused the current docmap to be (e.g. conversations)
  3. digital influences that caused the current docmap to be (e.g. pubpub discussion, file imports)

As such, we may have something like:

context: {
	catalysts: [
		// The transformations that caused this to be
		{
			type: 'pbpbStep',
			value: {
				// Prosemirror step details
			},
			source: null // id of the previous step, if any.
		},
		{
			type: 'import',
			source: {
				type: doi,
				id: "10.2712/12341"
			}
		},
		{
			type: 'discussion',
			source: {
				type: pbpbId,
				id: "1bf45a43-129d-4f7e-87b4-4c43667b8798"
			}
		}
	]
},

Can we…just publish this publicly and see if we can get some feedback on it? I think it’s a really rich idea that would be worthy of some more public attention in the vein of where we’re going with PubPub. At minimum, it would be nice to show this during our upcoming user summit.

Might be too galaxy brain of me, but given your later comment about discussions being used as content, I’m wondering if all contexts (perhaps excluding analytics) are simply relationships to DocMaps with a type attached. E.g. is there a reason a review shouldn’t be a docmap, particularly if it can be sourced externally, or as a curatorial act, or contains a bunch of discussions?

Yes. And I would add that new thing is also itself content.

Yeah, this feels nicely Underlayish. The key here is to be able to match identifiers at some level. And it’s super nice that this spec includes the ability to have multiple identifiers to match against.

This is nicely analogous to Crossref’s pre-publication states where you want to say something on this topic is coming soon but not fully available yet. Can see many applications for that outside the traditional academic publishing world, as well.

Yeah, I’m increasingly on this wavelength as well. We don’t have a very good way, currently, to say that the object of this Pub is really a PDF. The nice thing about DocMaps in this context is that it makes adding other metadata about the work that you need when there’s no HTML version pretty easy.

This is great – and also, something that can be replicated in other places by pulling in context from a future Underlay registry.

Happy to just set this thread public - though it does feel like a more contextualized (ha) narrative could be written for a non-KFG audience.

Doh - totally! I think that’s right, and really lets us be non-prescriptive about what is allowed to be a ‘review’ or a ‘discussion’. A work’s status as an article, discussion, review etc depends on how it’s contextualized - not solely on a publisher’s intent! Part of the work then seems to be describing the docmap types that PubPub uses internally for Pub, Release, Book, Issue, Discussion, etc.