Perspectives on Open Data: Workshop on the Re-use of Government-held Non-personal Data

GOVIS logoWebstock logo

On Wednesday 18 February 2009 the State Services Commission hosted a discussion on aspects of data re-use. Sponsored by GOVIS, with five expert panellists (Nat Torkington, Fiona Romeo, Toby Segaran, Adrian Holovaty and David Recordon) supplied by Webstock, this was a wide-ranging discussion covering policy questions and technical issues. Staff of the Strategy and Innovation team, GCIO made short presentations, the panel commented, and the audience was invited to ask questions. The session was recorded and is summarised below.

Welcome

Vikram Kumar is Manager of the Strategy and Innovation Team, GCIO, State Services Commission.

Poster for

Photo by David Recordon

Information and data re-use is one of the more important areas for government going forward. “The degree to which government makes data available freely is the extent of progress in e-government going forward”.

Take the recent example of the Victorian bush fires. Information was available from the private sector, but hard to get from state government. There’s 40% of GDP in the public sector. Closed data means major barriers.

Discussions like this help us exchange views on what’s required. Government working under constraints: we want to open up, but must be realistic, practical. We should identify high priority steps, not just expressing good intentions. Move discussion from a higher view down to practical steps.


Introduction to the Panel

Nat Torkington

Nat Torkington

Runs KiwiFoo camp, an “un-conference” mixing up artists, business people, programmers, people from the political scene etc. Spent 10 years in the US having run New Zealand’s first web server. Consultant on open source and the web.

Fiona Romeo

Fiona Romeo

Head of Digital Media at National Maritime Museum in Greenwich, UK. Looking at data mining, data visualisation, syndicating content more widely (platforms like Commons on Flickr), crowd-sourcing science information (“citizen science”).

Toby Segaran

Toby Segaran

Author of books on collective intelligence and data mining and practical use of semantic web technologies. Works for MetaWeb on their Freebase product, taking and integrating datasets from US agencies such as the Securities Exchange Commission and the Food and Drug Administration.

Adrian Holovaty

Adrian Holovaty

Journalist, computer programmer based in Chicago. Works with news organisations, bringing government data online, creating works of journalism around it. Runs everyblock.com: local news at the address level (city municipal level) tapping into “a ton of public records”.

David Recordon

David Recordon

Employed by SixApart working on technologies, products and policy to enable people to control their data: use existing accounts more widely, take profile information, relationships etc. with them.

1. Listen to Vikram Kumar welcome and introduce the panel.

Keitha Booth manages the work programme in the GCIO, State Services Commission looking at stimulating re-use of government data for economic and social gain. The project spans strategic and technical areas.

Vast repositories of non-personal information stored across public sector, e.g. water information, scientific, climate, health, education etc. What of that may be opened up for use?

Information created for legitimate purposes funded by taxpayer purposes. Generally managed to high standards. There are opportunities to liberate this data, develop policies and strategies to support this across government. Already happening in limited uncoordinated way. We intend to improve coordination between suppliers and users, working together to understand how to open up.

Data re-use is a new area. Overseas experience helps to inform us. A 2006/7 Australian study indicated potential gains of $6 to 12 million through the opening up of geo-spatial information. Areas the GCIO are looking at include:

  • Clarity about copyright and crown copyright. Clearer copyright statements on websites.
  • Clarity about licensing arrangements.
  • Economic modelling.
  • Updating 1997 best practice guide, making it more broadly applicable.

2. Listen to Keitha Booth introduce policy issues related to data re-use.

What policy related re-use issues do you face?

Government is not the place where innovation happens.

Nat Torkington

NT: Big fan of open govdata site. What constitutes “open”? Qualities are completeness, primary, timely, accessible, machine-processable, non-proprietary. [see Wikipedia]

FR: We often hold content we are not the copyright holder for. We hold the physical item but sometimes there can be restrictions on making copies even for preservation. The British Government is lobbying for custodians to make copies for preservation. No progress yet on adoption. There are various licensing agencies to engage with. Each needs to feel rewarded. Some material is copyright of museum, some is crown copyright, in other cases the museum has been granted permission for use. Sites like Flickr have “baked in” good methods for sifting material licensed under Creative Commons from other material. Multiple licenses are a usability issue.

TS: Crown copyright is “a little confusing”. In the US, all government data is public domain. However, the Library of Congress is a “semi-government agency”. Figuring out which parts of government are real agencies and which are semi-private with specific licensing is difficult. If something is funded by the government then it should be public domain. It would be great if there were standard ways of expressing how things can be used, but these are all disparate groups. The Securities Exchange Commission had an exclusive deal with reseller Lexus Nexus. Carl Malamud forced this data to be made generally available online.

AH: As a data consumer the issues preventing re-use are often political: government officials may not want to release certain data because it makes them look bad; there may be jealousy of non-government sites; privacy issues; accuracy concerns.

DR: Set management expectations. How do you deal with someone making use of data in a new and unexpected manner?

NT: What does it mean to be a government department producing data? You shouldn’t expect to be the originator of every awesome application that uses that data. Government is not the place where innovation happens. Manage expectations: innovation on the “outside” can only be possible due to the work opening up data from the “inside”.

3. Listen to the panel discuss policy issues related to data re-use.

The moment you release data there will be uses of that data that you wouldn’t particularly respect or choose to publish yourself.

Fiona Romeo

Audience: Often there is tension between the party commissioning a report (e.g. an agency or Crown Entity), and the party carrying out the actual research (e.g. a university), over who owns the data.

NT: Sometimes this is solved by a six month abeyance before raw data may be published. However, funding agencies (e.g. Bill and Melinda Gates Foundation) are sometimes unhappy with this as it slows down progress of research.

Audience: Where data is collected using a variety of differing methodologies re-use of that data may result in “wildly strange conclusions”. How is this mitigated?

NT: Here data is highly dependant on methodology, i.e. itself complex metadata not expressible in a standard format. Just mandating that the data itself must be open breaks down the biggest barrier. It makes it harder for the research party to reject requests for information about methodology.

FR: Where you release data others may use that data in ways you may not agree with. Be clear where you are the publisher of the raw data, publish your own reports. Even raw data can be misunderstood. You have to accept what happens when data is released. As uncomfortable as that makes us all, that’s the truth of it.

NT: Research is already misused!

TS: No one has ever managed to come up with a metadata format that accurately described a procedure that was machine-readable. You just have to trust that people will read the description of how the experiment was done and use it appropriately.

NT: We’re not setting back science by releasing data.

4. Listen to the panel and audience discuss policy issues related to data re-use.

Are Creative Commons licenses the most obvious candidates for all-of-goverment adoption?

If you choose a Creative Commons license you’re involving a lawyer: it’s a license.

Nat Torkington

NT: No copyright at all is the easiest, most straightforward way to go. If you choose a Creative Commons license you’re involving a lawyer: it’s a license. Government is both the provider of the first resort, and the provider of the last resort. Popular information will be added to and improved. If someone wants the original, pure data they will return to Government. Creative Commons licenses, like all licenses, attempt to specify what people may do with data.

FR: Creative Commons licenses are increasingly well understood, widely used and expressed clearly for the layman. However, they do place restrictions on what people may do with your data. Government must decide if controlling the end-use is important.

TS: In the United States we will never trust processed data without checking it against the original source. If the government releases data into the public domain it gives up control of how the data is used, but it doesn’t give up its authority.

AH: Government data must be placed into public domain. Otherwise government is admitting that it is not comfortable making the data open. Think about your motivations for making the data open.

DR: Make it clear what people may do with data. Public domain makes it easy. The worst thing you could do is have five different licenses, or, write your own copyright license. Don’t make people think.

NT: The lesson from open source software is that it’s an almost impossible to use two licenses together in same project. Lawyers spend a lot of time attempting to make the licenses place nicely with each other.

5. Listen to Keitha Booth and the panel talk about the use of Creative Commons licenses.

If someone made a chart that proves that Aucklanders are less intelligent would you just trust that without seeing where the data came from?

Toby Segaran

Audience: If your license doesn’t require people to credit the source, how do people know where the authoritative source is?

FR: Even if you don’t require attribution in the license, you can still encourage best practice. You can handle this through guidelines. Services like YouTube and Flickr make it easy to embed their content with a wrapper to say where the data has come from. Make it easy to do the right thing.

TS: Anyone re-using data has to say where their data comes from to ensure the credibility of the results.

NT: Distinguish between the naïve users of a final product (e.g. a chart in a newspaper) and those who want to re-use data (who want guaranteed, known-good data).

Audience: There are six Creative Commons license flavours. Mixing them up might be a “bit of a nightmare”.

Audience: You need a granular licensing structure: data is not black or white. With Privacy Commons a person may grant an organisation limited rights to personal information and perhaps later revoke those rights. [NT observes that here we are talking about non-personal data]. Material is sometimes donated on the condition that it may only be used for research or non-commercial purposes.

NT: Most data doesn’t fall under donor rights. Public domain is the best baseline.

DR: Make it easy for people to embed data and link to its source.

6. Listen to a general discussion about the use of Creative Commons licenses.

Can you point us to economic modelling or other evidence to show the benefit of opening up non-personal government data for re-use?

The plural of anecdote is data!

Nat Torkington

NT: No. However, there is anecdotal evidence. The US Census Bureau released its Tiger database into the public domain. Tiger maps street names to geographical coordinates. NAVTEQ sells its improvements to the Tiger database. The original Tiger database is sufficient for many GIS applications and has itself spawned an open source equivalent to NAVTEQ.

AH: Government ought to be providing services, not visualisations of data. Government should focus on things that only government can do. It is relatively cheap to make the data available. Once data is available, anyone in the world can create an interface.

DR: All San Francisco buses carry tracking devices. The San Francisco government hired a company to show buses’ locations on some bus routes in real time. A third party obtained access to the data and made a Google Maps mashup for the remaining routes. The San Francisco government then shut down access to the data.

FR: The UK Freedom of Information Act has driven data re-use in the UK. Rather than respond individually to requests, agencies have realised it is more efficient to make data sets freely available.

AH: Where agencies in the US have made data sets available the number of Freedom of Information requests have dropped.

TS: In the US, thousands of people are employed processing data from Census Bureau, Securities Exchange Commission, and the Food and Drug Authority.

FR: The UK government published a paper called Creative Economy.

Audience: With a population of just four million are we going to see any significant economic change as a result of opening up government information and data in New Zealand?

NT: We are in early days. There are no whole-of-government case studies looking at cost/benefit yet.

Audience: Do we need to see economic benefit, or is it something we should just be doing anyway?

7. Listen to Keitha Booth, the panel and the audience talk about the economic benefits of data re-use.

Mark Leicester is a Senior Technical Analyst in the GCIO, State Services Commission looking at opening up job vacancies for re-use.

Broadly speaking the GCIO is looking at a range of opening a range of information types for re-use, including calendaring information, news stories and alerts. The New Zealand Defence Force and Government Jobs Online asked GCIO to simplify the process of publishing a job vacancy to the agency’s own site and to Government Jobs Online. To find quality candidates the NZDF must key a job vacancy into its own site, re-key it into Government Jobs Online, and again re-key it into commercial third parties such as Seek or TradeMe. Ideally the NZDF would be able to publish a job vacancy once such that it would be published everywhere.

  • Do you adopt, adapt or create your semantic model? Who should own and maintain the schema?
  • How do manage translation between schemas? How do you handle translations between controlled value lists?
  • How do you make the right choice of representational format? Do you emphasise human readable or machine readable?
  • What sort of latency is acceptable? How do you make processes fault tolerant? How do you manage deletes; instruct consumers to delete information?

8. Listen to Mark Leicester introduce the work that GCIO is doing to make government job vacancies available for re-use.

Photo of Mark Leicester presenting, taken by David Recordon

Photo by David Recordon

DR: First you try to adopt a schema, you may need to adapt and failing that you create. It’s always better to find a few other people who are going to make use of your schema. Write code and create applications that will make use of your schema. The end result will be so much better.

TS: Don’t worry too much about on-the-wire formats. Others can build endpoints as necessary. Writing guidelines and code to demonstrate a commitment to stability.

AH: You have the incentive to get the data in front of as many eyeballs as possible, but in most cases a consumer will take anything they can get. Consistency matters. A consumer will do the work.

TS: More important are the naming of things, reconciliation of entities.

NT: Locations are straightforward enough, but job titles are more complicated. Don’t worry about achieving a perfect representation. Release coarse-grained and you will see others make refinements.

FR: Start rough. Iterate in response to demand. The Maritime Museum began crudely, but later were able to join Europeana at a week’s notice.

NT: Create version one. Say what it does. Improve it. Offer version two. Retire version one over time. Iterate. People will complain less if you don’t attempt to create the “one true feed”.

9. Listen to the panel talk about practical issues of data re-use.

Flawed but out there is so much better than perfect but unavailable.

Nat Torkington

Audience: In government you get funding, deliver and then start again in five years time. Iteration may not be possible.

NT: Don’t build prototypes, they may not be taken seriously. Find a compromise between building a prototype that does very little, and taking five years to get it right.

FR: A government opening up data for re-use will need to have an operational team.

Audience: Data quality is always a problem. Is it better to make sure a dataset is cleaned up before it is released?

Audience: Data can be corrected by consumers. It’s not always clear where feedback should be directed but an open dataset will be subjected to more scrutiny.

FR: My Society made video content publicly available and encouraged the public to help describe it.

NT: With job vacancies government is publishing information, but in general you have a choice: is it a publication, or a project? Perhaps you may get suggestions and advice from other people. Like open source software projects you need to vet the contributions and figure out who you trust. How engaged does the government want to get?

NT: Think about the differences between a publication, a feed, and a database. With a publication you get a snapshot of a dataset. With a feed you get the updates to the dataset over time, typically just the “new stuff”. With a database you are replicating the current version of the database in as near to realtime as possible.

AH: In some cases – restaurant inspections for example – everyblock.com requests that government agencies supply only the records that have changed. This requires greater technical sophistication.

NT: Think about how data might be re-used as the data is collected.

Audience: There’s no single answer for data quality. Recruitment information may not need to be as accurate as geographical information.

DR: In some circumstances it might be appropriate for an agency to offer a “bleeding edge” dataset as well as a “cleaned up” dataset.

10. Listen to the panel and audience talk about practical issues of data re-use.

I’m sceptical always of national ontology style projects because it smacks of ‘we need a Royal Navy too if they’ve got one!’

Nat Torkington

AH: Human readable is always derivable from machine readable. Perhaps there are circumstances where you might offer both a machine readable and human readable feed.

Audience: Perhaps different audiences might prefer different degrees of interaction, from RESTful APIs to downloading of raw data sets.

AH: Everyblock.com focuses on data with a date and a geographical location.

NT: Hybrid compromises both human- and machine-readable forms.

AH: PDF does not equal machine-readable!

TS: JSON is ideal.

AH: XML where the schema is well defined is ideal.

Audience: Common Alerting Protocol is an Atom based format with extensions. It is human readable where Atom extensions contain machine readable data.

DR: It depends on the definition of “human readable”.

NT: That it doesn’t double as a consumer format.

NT: No one mentioned RDF. Why?

AH: We don’t get any.

TS: RDF gives you standardised namespaces like Dublin Core, but there aren’t enough of those yet. I don’t like mixing machine- and human-readable content as in RDFa. Just separate the files.

DR: With some sites it may be easier to add machine-readable markup to existing human-readable pages. If you’re starting from the beginning it’s better to create content optimised for machines.

Audience: Finland has a project for a national ontology. How can we use ontologies to make data machine readable?

NT: Look at what is already out there. Build your own only if you really have to.

TS: The best ontologies are built from existing data.

DR: It’s less important what you call something, and more that the format is consistent.

AH: Governments should also publish the details of controlled value lists to explain data.

Audience: We have nationally significant databases, metadata, solid methodologies etc., but we also have “grey data”: data that may be incomplete and lacking documentation.

11. Listen to the panel and audience talk more about practical issues of data re-use.

The 3am factor. Make it as easy as possible for someone in his or her pyjamas, sitting at [his or her] computer, working at 3am, wanting to work on something, to get started immediately.

Adrian Holovaty

NT: Release a piece of code that uses the data. Publish the e-mail address of someone that a consumer might talk with, especially if documentation is absent.

DR: Create a mailing list. Consumers can help each other.

Audience: How far should government go in encouraging re-use?

TS: The World Bank opened up an API and didn’t know how to get people to use it. Working code will get people started much more quickly than any amount of documentation or a mailing list. People learn from examples.

AH: Remove all barriers, such as human-approved signups to APIs, that prevent developers from engaging with your data quickly and easily.

Audience: How much do contests help drive the opening up of government data?

AH: Contests are good.

FR: Show Us a Better Way attempted to discover which datasets people wanted. The BBC introduced an annual “hack weekend”. Events have also worked well for the museum and may be a good approach for government.

NT: Holding a contest will not guarantee that people to use your APIs or data.

Audience: All of the examples have come from the US or UK. Are things different in a nation of four million people?

Audience: Is consolidation important?

NT: I’d rather have open APIs than worry about standardisation.

TS: It’s easy to underestimate what people are willing to do for free.

Audience: To what extent are the various mashup platforms, toolsets etc. enable programming without being a programmer. Might we see rich, easy to use environments where programming knowledge is not so important?

TS: Tools such as IBM’s Many Eyes enable mixing and matching and visualisation of datasets.

12. Listen to the panel and audience talk more about practical issues of data re-use.

Vikram Kumar is manager of the Strategy and Innovation Team, State Services Commission.

Government data is taxpayer funded. People expect data to be made available free of cost. Agencies aren’t funded to make data available. What is your view?

NT: I think it’s reasonable to charge fees for the copying of information (but not the preparation). Open source software projects may be created using existing infrastructure at little or no cost. Ideally this will be possible with open data too such that it is more efficient to release data.

FR: I work in an institution with a large collection. The business model relies on exposure. For museums, libraries and archives the biggest risk is obscurity. Charging might work for small, local uses, but perhaps not for large scale services. Whatever model you choose shouldn’t limit usage.

NT: Charge for a snapshot of a dataset, but not for ongoing access. That would punish popular applications.

TS: Don’t get paralysed by the cost. Instead, do the cheapest [NT: useful] thing you can do right now.

AH: Everyblock.com pays for data where it’s reasonable. The American Freedom of Information Act permits charging for data. Make it easy to pay for it.

DR: If you going to release data across government try to share infrastructure and code across government.

13. Listen to Vikram Kumar and the panel discuss how much data should cost.

Matthew Ross is a Senior Technical Analyst in the GCIO, State Services Commission looking at government information that is not necessarily found in a formal data set.

Agencies autonomously and independently publish information (such as job vacancies) that ‘ought’ to exist in a single cohesive dataset. What can government do to improve users’ all-of-government experience and make disaggregated information more accessible?

Portals such as Government Jobs Online attempt to create aggregated datasets from disparate sources. Should government aggregate disparate data sources? Should government create mash-ups? Or, should government publish to the cloud and leave mash-ups to third parties?

14. Listen to Matthew Ross introduce questions about all-of-government data re-use.

I’m nervous about pan-government portals.

Nat Torkington

FR: Government should deliver transactional sites, e.g. to allow users to claim benefits or pay tax. Government should release data, and not spend time implementing visualisations.

NT: It’s very difficult to create a single point of entry for all government sites. Instead spend time on Search Engine Optimisation.

NT: Matapihi, the precursor to Digital NZ, attempted to index metadata from all New Zealand images. It’s hard to retrofit order on to chaos. However, it’s harder to mandate uniformity.

Audience: There have been efforts to create an “whole-of-border” portal for Customs, Bio-security, Food Safety Authority.

NT: A lack of common naming conventions can be a barrier to “post hoc” mashups. Agencies must have previously agreed on common identifiers. However, avoid creating a “New Zealand” ontology.

DR: The Smithsonian consists of over forty different museums. There has been tension between the central brand, and innovation from the individual museums. Our advice was stop worrying about the branding problem, but also to create portals across museums.

15. Listen to the panel discuss questions about all-of-government data re-use.

NT: Government could try to control the process at the schema level. Define the schema and require others to justify departures.

FR: Alternatively, agree a minimum agreed standard, e.g. 2-3 common fields.

Audience: Please use common identifiers.

Audience: There may be economic value for aggregation, but whose job is it to aggregate?

NT: Is aggregation useful? Yes, it sounds like there should be a single feed for job vacancies. It seems like there’s value in aggregation. Is the challenge that no one is funding it?

FR: There were about 20 quantitative measures of value from the UK’s Department of Culture, Media and Sport. However this measure is moving to a more qualitative model. People want comprehensive views of a dataset but it depends on use case.

NT: Please open up consultations.

Audience: Agencies determine their own way of reaching their audience.

FR: MySociety runs a planning application service where you can enter your postcode and see every planning application that affects your area.

16. Listen to the panel and audience discuss questions about all-of-government data re-use.

Once you go down the path of distrusting people with information it doesn’t end prettily.

Nat Torkington

NT: Users want to know that a dataset exists. Users want to understand the schema.

Audience: You need common identifiers such as geocode or postcode.

Audience: It’s enough to know that every agency is exposing their data in some form.

Audience: I want to know which datasets are available.

NT: Government should decide best practices and minimum standards and then advertise the existence of the datasets in some way.

Audience: There can be unintended consequences both positive and negative. For example, the third-party publication of league tables is discouraged, so should government not publish school performance data?

TS: If government doesn’t publish school performance data people will make league tables anyway, but the data will be unverifiable.

FR: The solution is to work with under-performing schools.

Audience: There are many examples of poor use and bad interpretation of statistics whether datasets are released or not.

NT: The important thing is to describe how the data was collected so that government can show data doesn’t support a faulty conclusion.

17. Listen to the panel and audience discuss further questions about all-of-government data re-use.

Summary

Derek Rayner is a Senior Technical Analyst in the GCIO, State Services Commission.

What should government do to open up a particular data type (of your choice) for re-use?

  • How would you find it?
  • What kind of formats and standards should be used?
  • Funding and pricing?
  • Licensing?
  • Goodness?

AH: I’d like to see street layers of the entire country opened up: street coordinates, boundaries of parks, hydrology etc. Encourage people to contribute back to the dataset. Integrate with the OpenStreetmap project. There should be a two-way conversation. Citizens have a lot of valuable things to add to the collective.

TS: I’d like to see financial information made open, where the banking system sits, where companies have too much debt on the books.

FR: Open up transport data for re-use: timetabling, realtime information.

DR: Open up traffic data.

NT: Open up consultations. Open up property information. In the US sites like Zillow and Trulia allow you to find out past sales, current value, rates assessments etc. In New Zealand users must pay QV for this information.

Audience: Open up NIWA data such as average temperatures, rainfall etc.

Audience: Local government has a cost recovery policy. Central government information is largely free.

NT: Central government makes its money by making the economy bigger. Local government makes its money by raising rates.

18. Listen to the panel talk about how government might open up their favourite data types for re-use.

Thanks to the panel, the audience, GOVIS and Webstock.

Download each part individually:

  1. Vikram Kumar welcomes and introduces the panel. (mpg) (ogg)
  2. Keitha Booth introduces policy issues related to data re-use. (mpg) (ogg)
  3. The panel discusses policy issues related to data re-use. (mpg) (ogg)
  4. The panel and audience discuss policy issues related to data re-use. (mpg) (ogg)
  5. Keitha Booth and the panel talk about the use of Creative Commons licenses. (mpg) (ogg)
  6. A general discussion about the use of Creative Commons licenses. (mpg) (ogg)
  7. Keitha Booth, the panel and the audience talk about the economic benefits of data re-use. (mpg) (ogg)
  8. Mark Leicester introduces the work that GCIO is doing to make government job vacancies available for re-use. (mpg) (ogg)
  9. The panel talk about practical issues of data re-use. (mpg) (ogg)
  10. The panel and audience talk about practical issues of data re-use. (mpg) (ogg)
  11. The panel and audience talk more about practical issues of data re-use. (mpg) (ogg)
  12. The panel and audience talk more about practical issues of data re-use. (mpg) (ogg)
  13. Vikram Kumar and the panel discuss how much data should cost. (mpg) (ogg)
  14. Matthew Ross introduces questions about all-of-government data re-use. (mpg) (ogg)
  15. The panel discusses questions about all-of-government data re-use. (mpg) (ogg)
  16. The panel and audience discuss questions about all-of-government data re-use. (mpg) (ogg)
  17. The panel and audience discuss further questions about all-of-government data re-use. (mpg) (ogg)
  18. The panel talks about how government might open up their favourite data types for re-use. (mpg) (ogg)

Technorati Tags: , , ,

9 Comments

  1. Wow, this is a really wonderful write-up. Thanks so much for putting it together and making the audio available! NZ’s looking very progressive indeed.

    Posted March 9, 2009 at 5:03 pm | Permalink
  2. Unfortunately I couldn’t attend the workshop so it is good to see this material up and available.

    I reiterate NT’s last quote that Central Government (CG) directly benefits from economic activity via taxes on income and expenditure. Whereas local government (Regional Councils, Cities and Districts) (LG) income is mainly from property taxes. (A consequence is that those with property wealth rather than discreationary wealth, ie retired or unemployed people with property (who also tend to put relatively more of their time into the community), are relatively more sensitive to LG expenditure.)

    These are the people who will likely experience the most pain when Councils give up their data to the taxpayer for free. These are also the people least likely to be able to benefit directly from the greater opportunity offered by the free access to that data. They also tend to be the most vocal rate-payers.

    Having said the above - I am strongly in favour of having non-personal Council data available. To me there is much potential in having Councils not only supply visualisations or interpretations of the data but in also supplying access to copies of the underlying databases. I know of at least two Councils well on the way to doing this. Napier City Council already supplies planning data via WMS, WFS and KML. There is also a Regional Council that is well on the way to supplying similar data services. I know there has also been substantial work around opening up access to data associated with the consultation process both for plans and consents. (A challenge here is the data about the submitter.)

    I would really like to see Central Government departments providing similar online access to their data, both to the public, other government departments and to Councils.

    On free access to data - when you pay for a service you have a right to dictate / influence the attributes of the service. When you pay for data you expect a say on the attributes of the data. Targetted money is also an effective mechanism for helping providers including Councils decide on priorities. If there is a need for an extra data type or a higher quality from an established Council process then money is a more effective mechanism for affecting this, than say legislation.

    Another area that has occupied me is the concern that people have about access to data about their property or their community. I am wondering whether the concern is not so much about seeing the data but more about who is seeing the data and what they will do with this information. Is the data being gathered by a tourist, a trans-national, a terriorist or a tyrant? In the past we could see the car regularly parked in front of the house, the plane flying and circling overhead or the person frequently searching the public files in the Council foyer. These days we can’t.

    Why can’t we provide people access to the second level of data? Let them know who is taking an above normal interest in data about their property or community. We can get this information about our own websites - in fact it makes good business sense to do so. But as a citizen, member of the community or property owner I am unable to see who is taking an online interest in my land.

    Jim McLeod
    Regional Data Advocate
    Environment Waikato

    Posted March 11, 2009 at 12:45 pm | Permalink
  3. Every time you make the decision to charge for data, you remove all of the following potential developers from the people who can make that data valuable:

    1. Anyone who cannot pay you (no credit card, no bank account in your country)
    2. Anyone who cannot charge their users (no ability to get a merchant account, no ability to create the necessary security restrictions to maintain subscriptions etc)
    3. Anyone who wishes to provide a service that necessarily cannot be charged for (RSS conversions, Google Maps embedded services, almost any mashup that cross-purposes existing tools with non-commercial licenses)
    4. Anyone who wishes to make use of open source software in the production of a client-side interface in a fashion that compels them to release the source
    5. Anyone who wishes to simply create a “hack”, the usefulness of which they are unable to predict, and therefore are willing to risk time, but not money, on.

    The answer is simple. Do not charge for your data. If you are a government department concerned with how to make a business case for the money you’re spending making this data open, the rationale is simple: in return for putting the data out there, you will get better tools to use it within short order - tools you could not possibly have afforded to develop yourself. Presumably you want this, because that’s why you were gathering the data in the first place, it’s not like a huge file full of numbers is inherently useful for decision making in and of itself.

    As a side bonus, you will get a better informed populace and the kind of correlations with information from other sources that would have been impossible any other way. Charging for data access drastically reduces the number of people who can effectively build tools for it. Don’t make that mistake, free and public domain is the only way (I am not ethically attached to this position, I firmly believe in charging money for things that make sense. Government data does not).

    Richard Clark
    Posted March 13, 2009 at 3:49 pm | Permalink
  4. Thanks Jim. Your comments offer a useful additional perspective on Nat’s rather pithy summing up of the difference in the funding models between national and local government. I’m intrigued by your idea of opening up ’second level’ data (such as analytics data measuring people’s usage of data). I imagine that this ought to be possible so long as the same privacy issues inherent in releasing ‘first level’ data were addressed.

    Posted March 13, 2009 at 10:27 pm | Permalink
  5. Is there a forum for discussion around the pros and cons of free data?

    To pick up some of Richard’s suggestions and other connected ideas on “free” data. To get “free” data:
    - Would you be prepared to register with the provider and use a unique ID every time you or your app accessed the data?
    - Would you accept that I might change the format, availability, accuracy, quality on whim?
    - Would you accept that I would provide no support for the data, no access to knowledge about the data, no documentation about the data, and no help if for some reason the data feed ceased?
    - Would you accept that I wouldn’t in any way guarantee the data or accept any liability for the data?
    - Would you accept that you couldn’t mention my name or my organisation as the source of the data?

    If you are prepared to accept the above then the data might be “free”.

    Yes you are right organisations do gather data for a reason and funnily enough not just for the purpose of collecting data. Every year at least we review why a particular data point is being collected. If there is a better way to satisfy the business requirement then data feeds change. Associated with this, databases and processes are continually being modified, reindexed and optimise to better meet business objectives.

    Yes, ultimately we do want informed decisions, transparency, thriving communities, engaged citizens and effort not being wasted. And yes we do need to understand and apply innovation to the value streams derived from NZ. However, I am not sure “free” data is the answer.

    Jim

    Posted March 16, 2009 at 6:44 am | Permalink
  6. Well one answer to the rhetorical questions I ask above, is to establish a “sand pit” where registered people can access data from many sources in an experimental environment and we can collectively explore how to resolve the above dilemmas. As we gain confidence with the solutions and with each other we can tentatively look at releasing the products of our creativity and innovation into the “real” world.

    Would the potential developers that Richard describes be interested in registering to use the “sand pit” for free? It may be that we work through the Universities to do this.

    I am presently discussing with industry how to set up such a “sand pit”.

    Jim

    Posted March 17, 2009 at 4:56 pm | Permalink
  7. Another day, another comment.

    Let me stress that a prime reason for paying for data is that in doing so you efficiently communicate to the data source your level of interest in the data. The source is then made aware of your demand or requirements. Without this mechanism it is very hard for the source to identify what they should be offering. It is very hard to determine whether the message in the data is being received and that it is being acted upon. This is even more difficult if the data is flowing along a supply chain and passing through multiple organisations and processes.

    I am interested in hearing suggestions on ways to get effectively and efficiently data consumer requirements transmitted to the data source. Especially so, back up a data supply chain.

    Thanks
    Jim

    Posted March 18, 2009 at 11:42 am | Permalink
  8. In response to Jim’s comment:
    “These are the people who will likely experience the most pain when Councils give up their data to the taxpayer for free. These are also the people least likely to be able to benefit directly from the greater opportunity offered by the free access to that data” -
    I believe that these people Jim is referring to stand to potentially benefit hugely from councils freeing up their data.

    For example, the services provided by http://www.everyblock.com and by http://www.neighbo.com provide significant benefits to neighbourhoods and local communities. These services would not be possible without free access to council data.

    Posted March 19, 2009 at 9:31 am | Permalink
  9. I think Janet is moving beyond the concept of “free” data.

    The sites referenced rely not just on “free” data but also on agreed consistent, on-going supported and continuing feeds of data from Councils to these portals. Democracy http://dowire.org/ is another variant that leverage’s Online Groups based in Christ Church and has many CG and LG clients.

    Councils have looked, and continue to look at similar business models. One explored about four years ago had LG data being used as a “honey-pot” to attract customers who were lead through a “mall” of related services. These services purchased advertising slots which in turn paid for the “mall”. The business case was strong, but the concept required multi Council collaboration.

    In another spin on the model QV and “What’s On” currently get and on-sell LG data. It is rather ironic that since 1998 QV has been an effective monopoly, an SOE owned by CG!

    The message here is that these “portals” are possible but because they require significant Council expense beyond just the provision of “free” data, they also require Councils to go through a business case involving a cost/benefit analysis. And they work best when there is a clearly identified group in the local community wanting the data, and able to clearly identify their needs and make these known to the Council via the Annual Plan process.

    Jim

    Posted March 23, 2009 at 5:10 pm | Permalink

One Trackback

  1. [...] 9, 2009 Perspectives on Open Data: Workshop on the Re-use of Government-held Non-personal Data - blog.e.govt.nz 03/08/2009 On Wednesday 18 February 2009 the State Services Commission hosted a [...]

Post a Comment

Please note that, in adding a comment, you will be taken to have read and agree to In Development's Terms of use.
Be constructive, keep it clean, stay on topic, no spam.

Your email is never published nor shared.