Should repository managers ever ‘unpublish’?

As a result of the recent kerfuffle about left-pad being unpublished from the NPM repository, and the subsequent “internet breakage”, I had a twitter discussion with Charles Miller about the right for unpublish to even be an option.

First, let me make something clear: having provided an unpublish option, was ethically, if not legally1, obliged to respect Azer Kuçulo’s decision to unpublish. The fact that this caused downstream problems doesn’t change that. But what this post is about is:

Should the unpublish option have been available?

And yes, I think it should be.

Self-Service Repository Management

Sites like npmjs.com2 have a self-service model. Module authors are responsible for packaging and publishing their code and providing it to the site for hosting. The repository manager, in essence, is acting as an agent of the author in distributing the module. These sites take advantage of that to waive liability (e.g. the “safe harbour” provision of the DMCA). Under that model, the repository manager site does not have the authority to choose to continue publishing a module if the author “unpublishes”.

This holds true for open-source licenses as well – even the ultra-permissive WTFPL used for left-pad. This is because the repository manager hasn’t made the choice to host and distribute the module.

The fact that a self-service repository management site acts as an agent of the module author also answers another question – should they be allowed to unpublish? And the answer is “yes” – module authors should have the right to sever a relationship with a repository management site, without too many complications or restrictions. Even if there are downstream consequences.

But what about the users?

It is an unavoidable fact that if a popular module is unpublished, downstream users will be impacted. This is a fact of life in the open-source community.

Publishing an open-source module doesn’t require you to keep the source code available forever. An author can delete their GitHub repository, or take down their web site. Similarly, they can unpublish their work from a self-service repository management site that they do not want to be associated with. That’s their choice.

Now, if a popular open-source module such as left-pad is unpublished, the repository manager obviously has the right to fork the code, create their own module, and re-publish – even if that involves violating their own rules about things such as not re-using version numbers. This is obviously dependent on the license – there are open-source licenses that would require you to change the name if you fork the code base, and there are potential trademark issues. Another solution would be to implement a re-direct to a version of the forked module3. All perfectly acceptable solutions – not one of which is to “un-unpublish” a module.

As a user of that code, you have several options – the most obvious of which is to make your own copy. Open-source licenses give you that right – it’s what makes them open source. They may impose restrictions (like the GPL does), but those restrictions tend to apply when you use a dependency management tool to fetch the code for you anyway. That’s your ultimate protection – making your own copy.

At a slightly less extreme version, mirroring a repository isn’t particularly complicated, and it’s easy to mirror in such a way that you only ever receive new versions and don’t notice deletions. As a user of the code, that’s your right as well. Mirroring also has the positive effect of insulating your system from problems with the repository manager.

Finally, users can always seek replacements to any unpublished module.

None of these options are without cost or impact on users of the modules, but Open Source is free-as-in-speech, not free-as-in-beer; there are costs and risks associated with using open-source code.

Are there alternatives?

Yes, in fact, there are. Self-service repository management sites should allow unpublishing because they act as an agent of the author. But a curated repository management site – where the site publishers decide who is allowed in, or go and seek content of their own – would have the right to not allow unpublishing. Curation (which might be as simple as a review process) results in a conscious choice by the site. It makes them the agent of distribution – and also removes the “safe harbour” safety net.

The open-source community is full of examples of curated repository management. Every Linux distribution is an example of curated repositories. In many cases, the packages hosted in a Linux distribution are customised for that distribution, with individuals taking ownership and responsibility. To my knowledge, this mechanism hasn’t been applied to any software dependency management systems such as Maven, NPM, CPAN, or RubyGems – but in theory it could. (Some development houses do this internally for the things they depend on – I know I tried to do it at Wotif, and we were far from a big team)

Curated repositories aren’t a perfect solution. Packages often get removed from curated repositories – for example, when a security hole is found in a library, the vulnerable version is usually removed. In the case of Linux distributions, the repository is often culled for a release – older versions of libraries simply removed from that repository. Again, the rule here is clear – if you liked it, you should have put a ring on it.

What could be done about it?

A simple curation system such as a review process for new modules and new versions would be of great benefit to the community at large – if you look at Maven Central, for example, a lot of the modules there are simply crap, with cyclic dependencies between modules4, modules brought in at the wrong scope, unmanaged clashes between versions of secondary dependencies, and so on.

So why doesn’t a company like do something like this? Cost. There are thousands of libraries in a large repository like this, with dozens to hundreds of updates coming in each day. Reviewing each submission and making a decision (not merely rubber-stamping entries) would take a small team of people. That’s overhead a startup like can’t afford. It’s also a legal problem – losing the safe harbour provision exposes the hosting company to all sorts of liabilities. So for now, we’re stuck with the self-service model and the subsequent potential problems.

TL;DR version

Authors should have the right to delete their work, and to stop distribution of it through sites they don’t want to use anymore. This definitely includes self-service repository management sites.

If authors have used an open-source license, they can’t complain if someone forks their code.

Users have the right to expect disruptions to be rare – but not to expect them never to occur.

  1. Usual disclaimer: I am not a lawyer. 
  2. This goes for maven central also, and probably most other repository management sites. 
  3. As an example, Maven solved this using module relocation
  4. I remember coming across a major XML library that had a dependency on another common XML library, which depended on a previous version of the first; ProjectA.n -> ProjectB.m -> ProjectA.(n-1) -> ProjectB.(m-1) -> … back about 10 versions and nearly as many years of active development. 

Author: Robert Watkins

My name is Robert Watkins. I am a software developer and have been for over 18 years now. I currently work for people, but my opinions here are in no way endorsed by them (which is cool; their opinions aren’t endorsed by me either). My main professional interests are in Java development, using Agile methods, with a historical focus on building web based applications. I’m also a Mac-fan and love my iPhone, which I’m currently learning how to code for. I live and work in Brisbane, Australia, but I grew up in the Northern Territory, and still find Brisbane too cold (after 16 years here). I’m married, with two children and one cat. My politics are socialist in tendency, my religious affiliation is atheist (aka “none of the above”), my attitude is condescending and my moral standing is lying down.

1 thought on “Should repository managers ever ‘unpublish’?”

  1. Personally, I think we should talk about staging a red flu: everyone with an open-source repository mark it as private on some arbitrary day in the near future in solidarity to Azer, say, April 1st, 2016. Let it break in a big way. When the world sees that we’re together on this then the corporations can then start to see that they can’t anger us or there will be hell to pay.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s