As many readers of this blog will know, SSC has been pursuing various work-streams under the rubric of the Open Government Information and Data Re-use work Project. It is continuing to do so in collaboration with, among others, the ICT Group at the Department of Internal Affairs (DIA).
One of the many issues we’ve encountered along the way is the perception that wider government has not released many useful datasets, databases and other information resources. While we appreciate that much work still needs to be done (and is being done) in this space, we thought it might be helpful to provide the public with a list of just some of the datasets, databases and other information resources that are already available online, usually on the websites of their source agencies.
We’ve collated a list of links to a number of dataset, database and other information resources from a subset of departments (and other agencies) and are releasing that list by way of an Excel spreadsheet, a .csv file, and an Atom feed (which, as many will know, is one of a number of different types of structured, machine-readable feeds and the type that SSC recommends for use by government).
This list has been collated from details provided by government agencies working with the SSC and DIA on the Open Government Information and Data Re-use work programme. It is not a complete list and does not include data from agencies such as the Ministry for the Environment, NIWA and Landcare Research which are leaders in opening up their data. We see the list as illustrating the vast continuum of government online data already currently available.

We are publishing this spreadsheet and feed as a small first step towards opening up non-personal New Zealand government data in new ways. We also see this approach as a proof of concept in SSC and DIA’s work in developing Government’s approach to opening up non-personal New Zealand government data.
As we recognise that this information has been published in a range of presentation formats and with differing use rights, the spreadsheet and feed include an agency contact who will assist with any queries from users.
They also list the dataset name, agency, online address, licence arrangements if known, and usage details if known. Please note that the data links to its agency’s website which will have a copyright ownership statement. It will be necessary to contact the agency to agree licensing arrangements if there is no clear statement on the website.
This work is a precursor to a formal release later this year of a New Zealand Government Open Access and Licensing Framework (NZGOALF) which will provide guidance for agencies and the public on the use of the Creative Commons suite of New Zealand Law licences across the New Zealand State Services. We will be at the NZ Open Govt Data Barcamp/Hackfest to discuss this work.
We are also working to incorporate the Open Government Data principles into our work, to the extent that they are appropriate for the New Zealand environment, but within the context of New Zealand’s copyright law.
We see the release of government-held data as a necessary but not sufficient step for both the community and government to reap the benefits of re-using it to create social and economic growth for New Zealanders and the economy. Therefore our work will also continue to look at making it easier to find and reuse government-held data and information.
We expect that there will be strong interest in this release, particularly from the laudable community which has set up the Open Data Catalogue. We encourage you to use this feed and also want your comment on our approach and on this data. Is this a helpful way to release/expose open government data? Please tell us if there is a better way.

13 Comments
Good news thanks Keitha! One question I have, is what work is being done within Government to co-ordinate these efforts with those of other agencies, such as the NZ Geospatial Office that is currently undertaking its own review of Govt data (in this case focusing on geospatial) to provde a similar directory of geospatial data? How can we get multiple Govt agencies all working to capture this information in one place, and not duplicate some of the gathering and collection?
Cheers Gav
Great work, Keitha! It’s heartening to know that the government is moving in this direction, and thanks for the pointer to our work.
As you know, we’ve been building the Open Data Catalogue for a month or two now. There are two instructive differences between our project and your proof-of-concept:
First, we crowdsource our data collection and invite comments. The trouble with asking government departments for a list of their data is that results are highly variable, as you’ve discovered. The responses to your request included: software (e.g., the metadata extractor from National Library), blogs (e.g., NZ Poet Laureate blog), umbrella websites (e.g., Statisphere), disclaimers in the licensing terms (e.g., School enrolment zones), and a lot of “Unknown”s in fields. I don’t consider software, blogs, or umbrella sites to be “data sets”. More importantly, though, you’re at the mercy of the speed and accuracy of one person for each dataset. Everyone who uses the catalogue can improve it.
Second, we publish as a directory-style website and pay attention to SEO. We draw a distinction between bulk-download and browsing/searching, and built the site initially for browsing and searching. We want people to be able to find the data source they’re looking for without needing a spreadsheet or feed reader. We have high Google-juice, with our pages quite often ranking higher than the corresponding Government’s dataset sites, and we welcome the situation where “Google is our home page”.
I hope that, as SSC and DIA continue work on cataloguing government data, you’ll take note of these two important distinctions. Glen and I would be happy to work with you, and Open New Zealand group has begun to document the things that we feel any eventual data.govt.nz effort should have.
Gavin,
Government does a poor job of maintaining portals and, more generally, after an initial period of enthusiasm, portals stagnate. Even at best, the information on portals is decoupled from the authoritative source - for example, the dataset may be updated but that is not notified to the portal (not to mention multiple portals).
I strongly believe that the best way to get agencies to work together is to get them to publish descriptions of their data in a machine-readable format on their own websites. That reflects the distributed system that government is.
The Atom proposal is work towards achieving that. http://research.elabs.govt.nz/new-zealand-government-feed-standard-2009/
This decouples the notification and presentation of information. Government can then focus on what only government can do (the publication & notification) while enabling other parties (which may include government) to innovate around presentation (gathering, collection, catalogues, directories).
The feed we have published is a starting point. We expect agencies to take on responsibility for exposing their own datasets.
Cheers
Matthew
Nat,
As my comment above informs, our feed initiative is exporing a level deeper than Open Data Catalogue. It is about how government can faciliate and enable such projects by providing source information.
Glen has previously expressed interest in consuming such a feed for the Open Data Catalogue and I’d be pleased to see that happen. See http://research.elabs.govt.nz/new-zealand-government-feed-standard-2009/#comment-271
The fact that a feed may be human consumable is secondary.
Over time, agencies will need to notify their datasets themselves.
Crowdsourcing is certainly a powerful way to enrich meta data and it’s great to see that happening. But a robust and timely solution for discovery/notification of datasets will be better delivered where the publisher provides machine-processable notification.
As a first cut, we made the decision to include various resources. While some are more questionable than others, an end-point such as Open Data Catalogue could choose which to consume. Some certainly highlight the continuity of the spectrum: for example, http://www1.maf.govt.nz/uor/searchframe.htm looks like a limited web interface database but is only a couple of clicks (or a hacked POST request) away from being a downloadable data file.
I applaud the other attributes of ODC that you refer to.
The situation where “Google is our home page” may be closer to hand than most realise. For example Google Squared illustrates the dramatic improvements in presentation of structured resources that may be just around the corner. See http://www.google.com/squared/search?q=new+zealand+government+dataset
Cheers
Matthew
Nat
We collated this list to test our suspicion that there are more non-personal government datasets and databases already online than was generally known. This work has allowed us to publicise those details more widely, get a feel for what is being defined as ‘data’ and also, as Matt has already described, to decouple the notification and presentation of this information.
We recognise that there are a lot of ‘Unknowns’ in the fields. Listing a contact name was, therefore, very necessary at this early stage, though not ideal, as you note.
We expect our release later this year of the NZ Government Open Access and Licensing Framework and Guidelines will pave the way for owners of public sector copyright works to licence these works on liberal licensing terms using Creative Commons NZ law licences. We will also encourage them to publish descriptions of their data in a machine-readable format on their own websites.
So a small first step to help us understand the issues better and to have these conversations about the best way to proceed
Cheers
Keitha
I’ll echo Nats comments above and just add one other key point. The idea of what is a dataset is important. Many of the datasets listed are datasets that can be only accessed via the a website and on the surface it seems that there is no access to the underlying data in any meaningful format. It will be interesting to go through each of these entries and then work out how we can open up access to the base data in a usable format.
So I think this is a great start and we can definitely work with this and try and dig deeper to get the actual datasets underlying the websites opened up and documented.
Glen
Thanks for the clarification, @Keitha and @Matthew. A word of caution about the wiki URL that I posted: the Internet is currently being plagued by spam for a game called Evony, and Wikia is playing whackamole with Evony ads. The ads feature scantily-clad women, so until they eradicate the plague completely, it might not be a good idea to check out the wiki page where we’re developing the requirements! AdBlocker had hidden the ads from me, so I didn’t realise there was a problem. I’m installing a wiki on our server, moving the content across, and will let you know when there’s a safe version to see. My apologies to anyone who was unpleasantly surprised by what they found!
Nat
As someone involved on both the legal and practical sides of all this, I’d like to say that we very much appreciate the comments coming in from Gavin, Nat, Glen and others. Please keep them coming. The wiki which Nat and Glen have set up to collate suggested requirements for a data.govt.nz (or whatever) site, for example, is very helpful. It’s great to see this proactive initiative in providing us with some of the information we’d inevitably ask for in due course.
I agree with Matt’s comments about the importance of encouraging agencies to publish descriptions of their data in a machine-readable format on their own websites. To me the trick in all this will be to ensure that point-of-delivery notifications contain sufficient meta data to enable subsequent presentation layers/sites (such as the Open Data Catalogue and/or a government equivalent) to collate (ideally automatically) the incoming data sources (whether they be feeds or something else) in a consistent and structured manner. I don’t profess to be an expert but I am aware that the standard RSS/Atom fields do not (without modification or addition of modules) allow for the full range of meta data fields which, for example, the Open Data Catalogue collects. It seems to me that you guys are relying on WordPress custom fields for the collection of some of the additional data. Is that right?
Perhaps one option would be to rely heavily on category and tagging attributes within the source feeds and to group some of the more descriptive elements together in the body of the feed item. That could easily be done, of course, whether manually or through a CMS.
For those who are not familiar with it, the draft paper by Sir Tim Berners Lee outlines an architecture for publication of government data. Some say it is over-engineered, but we want the future to be built of brick, not straw.
http://www.w3.org/DesignIssues/GovData.html
Hi Laurence. Hope all is well with you. Thanks for alerting us to Sir Tim Berners Lee’s paper. I’d heard about this but had not read it. Obviously there’s some great insight in it.
The passage that jumps out at me is this:
“A top-level mandate is extremely valuable, but grass-roots action is essential. Put the data up where it is: join it together later.”
I was also interested to see what he says on the use of RDF, given that RDF does, I believe, lie at the heart of the metadata that Creative Commons produces when one goes through the licence selection process.
Best regards
Richard
I find it interesting that there appears to be over 20 broken links in the RSS feed - all urls seem to have an extra trailing slash - but some links are just plain broken.
In the CSV there appears to be bad characters - maybe as a result of macrons, and there are initial lines that don’t appear to be in the CSV format - e.g. “NZ Public sector datasetsor databases made publicly available online,,,,,,,,,,,,,,,,,,,”
i would have expected the first row to be the column headings and i am not even a machine
Also be nice to separate the actual datasets from a bunch of websites who provide access to data…
It’s good to see an attempt to make datasets more widely available, but some of the stuff in the Excel spreadsheet compiled by SSC doesn’t really look to be of much use. Eg: The CabGuide is listed - what ‘data’ is in that?
Also as Matthew points out - government is very bad at sustaining initiatives. Case in point - the Population and Sustainable Development website which is not even listed on the spreadsheet!
The decription on that site says:
“The Population and Sustainable Development website provides a single point of access to a comprehensive collection of New Zealand population statistics provided by a wide range of government departments and agencies. It provides authoritative information about New Zealand’s current and emerging population issues. There are also tools and resources for using and understanding data and related population issues in policy and planning. The website aims to make it easier for analysts and managers to find and understand demographic data and metadata, and to encourage its careful use. This website will help you:
* find what demographic information is available
* understand how to access it
* become familiar with some of the key frameworks used to analyse population issues
* understand some key population terms
* get expert advice on how to use population information
* find sources of further information and expertise.”
If government is going to make datasets available more widely, then wouldn’t the population website be the place to do that instead of creating YET ANOTHER site for people to try and find and keep up with?
Whatever new initiatives are taken need to be integrated into existing portals and sites instead of creating separate spaces.
Thanks for your comment. Grant.
Your points are valid. That list was collated from details provided by a small group of government agencies working with the SSC and Department of Internal Affairs (DIA) this year on the Open Government Information and Data Re-use work programme. It is not a complete list.
The post was promoting news ways of notifying non-personal information at source. By way of example, the information was notified via an Excel spreadsheet, a .csv file, and an Atom feed. It was also illustrating the vast continuum of government information and data that is already available online – we were very aware that the list included information resources as well as datasets and databases.
The list has not been updated since then, primarily because the Department of Internal Affairs commenced its Government Data Catalogue project shortly after that post.
http://www.data.govt.nz, which was launched by Internal Affairs Minister Nathan Guy on 4 November, is doing exactly what you are promoting - creating visibility and access to government data that has been released by agencies.
This site incorporates actual datasets listed in our intial list and goes much further. It currently lists approximately 150 datasets. It also invites users to note which unreleased government datasets they would most like to see made available.
We will pass on to them the details you have provided about the Population and Sustainable Development website. You may also wish to post a comment and recommendation on their discussion forum.
Thanks again
Keitha
One Trackback
[...] A blog post I wrote about opening government information, and one on SSC’s development blog. [...]