June 25, 2012

Data Commons Licensing?

At the Open Internet of Things Assembly recently, one of the topics that came up repeatedly was about how we handle the amount of data being generated, particularly when it's often personal and near-impossible to anonymize.

The issues were less to do with the technical difficulties in handling all that data, and more to do with the privacy and ethical concerns, and the commercialization of the resulting findings.

This article in Nature magazine covers scientific data in general, rather than the Internet of Things, but does a good job of explaining some of the problems and challenges.

In reading through that, and during some of the sessions at OpenIoT, I wonder if there's an opportunity here for something individual/citizen-focused to flourish and help start to address some of the problems.

Commercialization of data and science isn't necessarily a problem, but naturally any legal document drawn up by the companies looking to profit will tend to favour and protect the rights of the company as a first priority, as well as allowing the company to maximize its profits.

Dealing with occasions where the actions of those using the data diverge from what the people "generating" the data feel is acceptable is likely to be slow and adversarial. Creating new laws is (should be?) necessarily slow and thought through.

["generating" is in quotes to show that there are problems with just defining easily where the data is from. At the Open IoT Assembly we ended up with the term data subjects to refer to those to whom the data refers - which may be distinct from those using the data, or even those who gathered or generated the data (installed the sensors, etc.)]

What if we were more proactive about setting the terms for using and sharing all of this data being generated? That could allow us to have a range of approaches which would reflect the range of attitudes to how this data should and shouldn't be exploited.

Something similar to Creative Commons licensing but for data rather than creative works would allow individuals to donate their data to the scientific commons under terms with which they're comfortable. CC licensing allows you to choose whether or not the people using it can alter it to make derivative works; whether they can use it commercially; and whether they need to be named as the creator of the work.

What attributes would a Data Commons licence cover?

Anonymity
This would restrict whether the licensee could use the data to identify individuals or not
Commercialization
Whether the data can be used commercially, or not
Derived-patents
This is maybe an intermediary step in the commercialization attribute - where licensees are allowed to use the data commercially, but not allowed to use it to generate patents

The last is probably the most contentious version, but I'd hope that over time the availability of a (much?) bigger dataset available to be used commercially but not for patents would improve the commercial viability for funding companies that weren't driven by exploiting patents.

Maybe there are other issues that should be explored, such as its use in creating weapons, although as noted by Matt Biddulph (if memory serves) in one of the OpenIoT panels: one of the strengths of the CC licences is that they settled on a small family of licences - enough to give some flexibility and choice without creating too much fragmentation.

So, who's going to define the set of Data Commons licenses I can start using to share my data?

Posted by Adrian at 01:22 PM | Comments (1) | TrackBack