Ideas for community engagement

Hi everyone. I’ve been speaking some with @sj, and he encouraged me to write-up some thoughts I had. I should say 1) Considering the current barrier to entry with Underlay (one of the things I would like to discuss) I can mostly speculate as to what may be useful to the project. 2) I’m in a professional transition but for the time being in a full-time consultancy, so my engagement with Underlay is less than I would want it to be. With that said, I humbly give my ideas for your consideration. Here we go.

Let’s start at H4Q, my project for the Hack for Sweden 2015 Open Data hackathon (references in Swedish). The name is a portmanteau of “hack 4 sweden”, my employer HiQ and obviously, “hack”. As a 24-hour project it didn’t amount to much, but I implemented my best idea of what the Open Data community actually needed (in contrast to the other projects which all made great use of the open data provided). Because, even the number of agencies contributing data was an order of magnitude higher than the number of participating teams. Most teams also clustered on a small fraction of the most accessible datasets and there were indications that many were hard to consume and extract any value from.

My project was thus simply to assemble a powerful data science platform, tutorials and examples of Open Data use cases together with a sandbox enabling exploration and eventually contributions back to the community. Crucially it would work in a distributed fashion with minimal barrier to entry. I say simply, because thanks to existing open technologies, the gist of it is these few lines of a Dockerfile (example updated for jupyter):

FROM jupyter/scipy-notebook
RUN git clone https://github.com/H4Q/therepo.git /opt/repo
RUN cd /opt/repo; git checkout -b sandbox origin/sandbox

If you’re not already familiar with Docker, in an Infrastructure as Code fashion it provides a type of “containerized” extremely light-weight virtual machine disk images, which either can be fetched from public registries or re-assembled predictably by each end user. Docker images are usually less than 1GB in size (this happens to be 1.2GB), thanks to community optimization and being assembled in a layered file system.

Thus the above script define a runnable image of Ubuntu 20.04, Jupyter Notebook Scientific Python Stack and the sandbox branch of the H4Q repository which also define the project. Having Docker installed and the H4Q Git repository cloned, running this system is (should be) a matter of executing docker build -t h4q . and docker run -d -p 443:8888 -e "PASSWORD=foobar" h4q. It is easy to see how Docker has become the bread-and-butter of deploying any kind of system more complex than a single application.

There were a bunch more things in there, such as (over-)utilizing Git branching to enable making the project your own, a peer review / idea enrichment workflow, but in a current context that would probably be better provided with an integration of PubPub. Also this was before either repo2docker or binder projects came into the picture.

Would I have developed the project further (I didn’t, it was a fun but short-lived experiment), the next item would surely have been to provide a great README, including contribution guidelines.

A lot more recently, after me being aware of it for some time, Spotify finally published an article on their Golden Paths concept. Beside seeming like an excellent platform for providing guidelines, introductions, documentation and collaborative development of shared solutions, crucially it combine these under a single concept. The documentation is the functioning solution, and the functioning solution is also the opinionated guideline for how to work in the domain. Note the “all-in-one”-property in common with my hack above.

Finally, these thoughts are inspired by my experiences with Secure Scuttlebutt. Ironically, partly by how difficult it is to get up to speed with Scuttlebutt without previous familiarity with the NodeJS ecosystem, but in this context that is secondary. I heard of how the project boldly apply “value-based architecture”, arguably leading to drastically improved ablity to deliver value aligned with said values and goals. Product management is in no way my expertise, but trying to assess from Underlay development until today, you are investing heavily in systems and protocol innovation seemingly necessary to enabling the values you wish to deliver. Still, if only to direct engagement efforts, I am urgently curious about the values and goals of the project on short-, mid- and long-term basis. Achieving explicit understanding of these also ought to simplify priorities and decision-making on all levels.

Summing up, that is my current line of inquiry with regards to Underlay. What people would you want to get engaged with the project, how can you anticipate their needs and best spend efforts to drive engagement? If we can attempt answers to those questions, I have provided some ideas for how to structure efforts in engaging those rare but hopefully yet plentiful minds in the furthering of Underlay goals.

Do let me know what you think and feel about this topic. Also, if I was to phrase these ideas for Commonplace, any hints regarding style and format are appreciated.

1 Like

Hello CJ and welcome to the discourse! I removed the default link limit for new accounts, thanks for catching it. I appreciate the Spotify writeup - a path worth replicating. It is a nice coincidence that your hackathon was around the time that the first golden paths were developed.

Your question is timely, as we are currently discussing who we most want to engage around a collection registry, precisely so we can anticipate and respond to their needs. I’d like to read more about your ideas for H4Q, and how you identified your audience(s) and got feedback on the platform, tutorials, and invitations to contribute. Also, did the organizations that contributed data get any feedback on how easy or difficult their datasets were to consume?

As for an essay for a wider audience, a review of the golden paths approach + article, with comparisons to other self-documenting solutions, seems a fine place to start. I’d be happy to give editorial feedback on that.

1 Like

Cool, glad to be here!

I didn’t do more work then regarding Open Data, but it has continued to be my impression, also regarding Linked Open Data that it so to say suffers “a lack of eyeballs”, and that its lack of utilization is a troublesome Catch-22. I can still dream of Open Data catalyzing grassroots data science capabilities and in some ways the stages for that are different today (whereas for Underlay one might still wish to create your own) and in some ways it is rapidly developing as a consequence of the covid pandemic (that’s a mighty fine introduction to data science in R right there).

Yes, a closer look at Golden Paths would be an exciting article to write! Not necessarily for Underlay, but the general information technology domain :smiley: I’ve been privileged to be acquainted with Spotify since long, for instance being classmates with their prodigous lead developer, my cousin being one of their senior network engineers and several friends and colleagues working there. Spotify strike me as one of few companies born Open, beside Golden Paths contributing for instance the following:

You won’t believe how deep the “golden paths” hole goes… anyway, I’ve researched a bit and want to do much more - and am sketching an article with the preliminary title “Golden Paths and Knowledge Management from the trenches”. Have a look and let me know if you have any thoughts!