Imagining a future in research

Imagining a future in research

This was my prepared fictional story for a session at the Abraham Kuyper Summer Seminar on Research Integrity


Fifteen years from today, wednesday August 27th, 2036. The academic year starts next week.

It is a year the northern hemisphere burned in summer and the southern hemisphere burned in winter. People had learned to act after the decadent inaction at the start of the millenium, but still, this was the cost.

Learning to act meant that nobody had any patience any longer for any thing that was superfluous. Academic grandparents recalled the days they spent formatting, submitting, and correcting proofs for journals.

Academic grandchildren said: "We don't have the time."

The time to waste, that is.

Even if they did - Elsevier, SpringerNature - the big five publishers all went bankrupt in a timespan of four years after the financial Meltdown of 2028. "Overleveraged in a cash poor market that got hit by overdue regulation" the analysts analysed.

For a moment, everybody relied on preprint servers. After the pandemic of 20-21, preprints had become common knowledge but this was the point in time they became household knowledge.

It showed that the oiled machine of publishing needed much less oil. Researchers faced the reality that, in a digital age, they were already publishing experts and were simply being upsold things they didn't need.

Typeset proofs made by underpaid workers required more time to correct than to be typeset by researchers themselves, as they did for preprint servers. All faults remain their own.

Communities even started forming around preprints - reading and providing feedback among researchers, easy as that.

"We could've done that a decade ago" was the common sentiment.

But it took the time that it took - and this change brought confidence to researchers that they could do a lot more than they gave themselves credit for. They even started making demands.

In 2034, a change started bubbling. The newfound confidence gave rise to a culture of realisations, potentially overdue.

"Why is our primary work treated as second class?" some wondered as they wrote data papers for the record instead of putting the datasets themselves on the record.

"Is this the best we can do?" some rightly wondered as new media had popped up left and right for two decades but publishing remained digital paper.

Preprint servers - Arxiv, psyarxiv, and countless more - became the established interests after the publishing vacuum of 2032. They weren't prepared to concede the power it had taken them four decades to win, bathing in their self-found legitimacy of "having won" and that making them the bearer of the future.

What had happened was that power had shifted, but it hadn't changed. Publishing house or not, preprint servers did a lot of the same work albeit in a different manner.

Digital paper, printed on demand.

Digital paper, read by pinching on phones and tablets.

Digital paper, about work done a substantial time before - depending on the writing speed, maybe a few months.

Digital paper, with recycled text across preprints - leading to wasteful pages and distracting elements. Even though text recycling was no longer considered plagiarism after 2025, it was still a nuisance to have to figure out whether they tweaked parameters in that or the other methods section for a replication, or did a verbatim copy-paste.

Out of this newfound dissatisfaction with digital paper, module servers arose.

Inspired by work dug up from the archives of Elsevier's research department, module servers were designed to be an elegant way to share a variety of outputs that composed research projects. Free-to-read, free-to-publish --- with purely researcher generated content.

These module servers were nothing more than a glorified repository, and yet they remained substantively different for one reason: Instead of relegating developments such as preregistrations and datasets as add-ons to digital paper, it put it front and center in the record.

It made whatever output you created, as a researcher, the output you would publish. At the start, we shared the sections of the paper as modules, but soon we started filling the gaps we previously couldn't in text.

Videos of lab protocols.

3D models of the hardware we custom designed and 3D printed.

Transcripts of the interviews. Datasets collected. Processing pipelines. Revisions of the writing.

On its own, that would not be enough. It would be a mess. Modules all over the place without context.

That's how it started - one big heaping mess. One module server had introduced a phone app, leading to people publishing modules with pictures of birds, mobile pH meters, and other fieldwork, including stool sample documentation.

At least, everybody got to author their work in their language, regardless of title, status, or type of output --- even if difficult to curate still.

But these module servers missed context - papers did all the lifting in one place. They set the stage, reviewed previous work, and held your hand through the new work. Module servers ended up as a hodgepodge of disparate pieces of research - much like the repositories of the early days [figshare, Zenodo].

It was when one module server introduced connections between the modules, that things started falling into place. Not only was the complete record of modules there for everybody to see, the order of events for any given module as well.

Breadcrumbs, so to speak.

Hodgepodge made way for organisation - breadcrumbs helped find the context needed.

If there was no breadcrumb at all, it was clear - take this module with many grains of salt. It has no context.

If it was deeply interconnected, that gave confidence and ways to inspect the provenance. Where exactly do these data come from?- where does this result come from?

The connections and lines between them started looking like the maps you see at underground stations - various lines to be traced back to various destinations, with some stations being hubs in the network. It would get so complicated, people started making actual maps. Some even considered them for their CV, as a snapshot of their work.

But there was an unexpected benefit - deduplication. There was no time to waste, and duplicated words or resources were the worst.

A theory needed to be published once, and all the subsequent hypotheses linked to that same theory. No more duplicate publishing of the same information, if nothing changed.

Replications became incentivized and cost-effective - breadcrumbs meant that replications made the original methods more heavily connected and that the methods didn't need to be restated. All with proper lines of provenance and contributorship, potentially between people who never realised they contributed to the same line of research.

In the summer of 2036, when the trees burned across the world, the module servers had their pandemic moment. They became well known, although not household knowledge, when the fires got out of control in so many places, that air quality became a daily factor for moving around, doing menial things like groceries.

Citizen scientists used their phones to create the largest air quality dataset ever constructed. A simple attachment for the phone, 3D printed based on previously validated designs published as a module just a few weeks ago, measured air quality every minute and published each data point. Thousands of people around the world participated in this citizen sensing, going outdoors like the peak days of Pokemon Go.

Automated data analysis pipelines helped pinpoint areas where it would be unsafe to go out in a few hours - a meteorological service of air quality, all based on research done just a few minutes before.

People built on these and many other research modules, with whatever value people saw, because there's a world to win, and no time to waste.