Download a book from PubPub programmatically

Roberts:

We have a project on PubPub that is a book in chapters. Possibility to download pubs is now functioning (great!), but still with some difficulty. What I want is to set up a system where I can download chapters (i.e. pubs) programmatically - preferably as markdown. Preferably also images and references (citations) in some structured, standard way (e.g. as bibtex). I then could manipulate the downloads in R/bookdown to auto compile a good looking PDF or ePub when needed.

I was wondering if there was some system/APIs in place to allow for this, or whether I could hack it up somehow. I have fair bash html/css and R skills and some perl but perhaps there will be a better way.

1 Like

Thanks for moving this over, Roberts.

We don’t have an API or a way to do bulk exports right now, but bulk exporting is on our short-term roadmap – as is improving our exports to include proper footnotes and citations, images, etc.

In the meantime, if there aren’t too many chapters, I’d recommend downloading them individually and using Pandoc to process them in bulk from the command line. If you’re interested in that approach, I’ve been playing with a Pandoc script that can take PubPub’s generated HTML downloads and properly place footnotes and citations. It wouldn’t take much to get it generating Markdown in the flavor you’re looking for (although images is another matter, I don’t yet have a great solution for that).

1 Like

Hi Gabriel, thanks for this! It is great. The script is in javascript - a language that I have not had time to play with. I installed nodejs and npm on my system. Would you mind giving a very brief outline how you would use your script? - Say I downloaded 4 chapters as .html into a directory called Book. I can see that the images referenced are on pubpub - which is fine. What would I do next to peruse your script?