The emergent ecosystem of civic data
Recently I have been working on housing land supply data as part of ‘OurLand’. OurLand is a research and development project supported by the Future Cities Catapult to innovate access to development land data. The background research involved a detailed review of housing land supply data in five UK local authorities. It was easy to notice how each organisation capture their information in different ways and in turn the question for whether a mutual standard is required.
Clearly, there is a need for local authorities to devise their own data schemas for internal use. No place is the same. However, when it comes to sharing, open standards ease the comparability of land use data across administrative boundaries. Greater Manchester showed the way: The Combined Authority was formed from 11 local authorities. Several ‘data synchronisation programmes’ for service lines contributed to the Manchester Infrastructure Map. The map provides a consistent overview of development land across all the 11 authorities in one place. It made it easy to compare available land supply across those organisations. Third-parties could ultimately make use of that data for services offered to the public.
From working on the OurLand pilot, I know the challenges facing open data formats that depend on many contingencies. This article highlights the merit of data standards for ‘civic applications’, that have citizen-facing uses, are easily accessible, with few paywalls or technological barriers. The article responds to some of the questions encountered along the way, such as:
- What is the case for and against having standards?
- What other vehicles might encourage data transparency and accessibility?
- What standards for ‘civic technologies’ have been developed thus far?
The role of standards
Before going much further, I like to differentiate between the definition of technical standards and that of an open data standard. They are very different in the way they are produced and they are also produced by very different bodies.
Internationally, standards exist for all sorts of technical applications (for example, WIFI and most other transmission standards). Here standardisation boards, such as the British Standards Institute (BSI), work with industry to define common technical requirements. This increasingly happens under the label of the Smart City, a term driven by commercial organisations and large capital programs (see the report on engagement in Manchester’s CityVerve project for further detail).
The BSI recognise the need to facilitate information exchanges across operators and technical systems: “It is likely that over the next few years, cities will have to install communications infrastructure (owned and managed by multiple vendors) that will allow information to be gathered in real time and in intervals. There will need to be strategies for optimised data collection and assimilation and documented good practice in this area would help in the creation of these strategies (p 14).”
The interpretation of standards for ‘civic applications’ (Handler et al., 2016), however, is quite distinct from that of the BSI, as I like to suggest. It is less driven by hardware, engineering processes, or competitive concerns. Instead, it is more focused on the needs of individual organisations and stakeholders, and it tends to be developed in a more bottom-up collaborative fashion. A sample of civic data standards (mentioned later) indicates, ‘civic data standards’ arise from pilot projects, public-private partnerships, and even voluntary initiatives within a particular sector. Here an industry-wide initiative may become so widely used so to be considered a defacto standard.
The role of quality and license labels
Instead of such bottom-up standards, other strategies to progress data quality, consistency, and accessibility are found in rating systems, certificates, and distribution licenses.
Quality rating systems, for instance, help to gauge if a dataset fits defined quality characteristics. The open data schema (http://5stardata.info/en/) is a well-known example. A dataset’s rating depends on different factors. Those include the popularity and currency of the data format (for example, whether it is a format in wide use or not). The rating also considers the ownership regime of a file format. Formats are rated higher if users do not require specialise (paid for) applications to work with the data. On the upper end of the spectrum, the rating scale places data formats that are machine readable and uniquely identifiable on the web. This may include data formats such as JSON, a markup format for structured information combined with a unique stable web address, e.g., via a unique document identifier (e.g. DOI). The JSON format is popular in web-based applications and IoT products (so for example sensor feeds).
The star rating serves as a ‘label’ of quality. In the world of government data, such quality ratings can encourage public bodies to strive towards higher rated data formats. However, national governments can also mandate certain quality levels to be reached in a specified timeframe.
Open-accessibility is an important aspect of civic data. The Open Knowledge Foundation defined content as ‘open’ if it could be “freely used, modified, and shared by anyone for any purpose” (OKF, 2017). Licenses are a way to control the distribution of content published by a publisher. Novel licensing schemes based on copyright facilitate the release of such data.
In the world of the open web and open data, a range of licenses exist which offer nuanced definitions as to how the content can be produced, reused, and reproduced. One of the best-known license standards is that of the Creative Commons license (https://creativecommons.org). This organisation offers a free of charge license label based on a set of questions (e.g. whether to allow adaptation of content, where to allow commercial use of the content…). It generates a set of logos similar to those washing icons found on clothes. It counters the idea of a ‘copyright’ and instead, it offers a ‘copy left’ and encourages redistribution. Copyright still provide the foundation for creative commons licenses as content producers naturally hold the intellectual property right over their content.
The above relates to the way data is published and made available for re-use. Apart from the idea of a quality standard copyright in the five-star-rating system, we haven’t yet talked about the definition of what the data feed should be comprised of. The idea of a standard for a dataset would hit home here. Data standards clothes thus different from file format or distribution licenses. In the context of governmental data, what I am talking about rather is, for example, the definition of core elements of the information schema embedded in a data feed. What this aims at are the specific structures in with a data feed is served. In simple terms, this means agreeing on a set of specific column tables, the data types that are acceptable in different column tables, and perhaps a set of values that can be entered.
Emergent ’standards’ for civic data
In the research, I came across the idea of Civic Data Standards, once again in the US, by Andrew Nicklin. He defines a new term, he calls “civic data standard”, by which he refers to “an open, collaboratively developed set of schematics or semantics that facilitates interoperability between multiple providers and consumers for the public good.” It aligns to what I have been getting at in some recent work, which partly resulted in a schema for a ‘planning API’. For example, in the UK, some local authorities now join up such as the Greater Manchester Combined Authority. The Greater Manchester Authority worked through several ‘data synchronisation programs’ so to align data from different councils into similar data schemas for a given task, such as the administration of housing sites. Some of the underlying datasets now drive MappingGM, a map application that consolidates, e.g., all brownfield sites across Greater Manchester.
Here are a few standards for potential ‘civic applications’ with different degrees of success:
- Open311 standard (http://www.open311.org). Open311 is a web platform and standard. At a later stage it became associated with the Code for America movement, that saw coders join up on open-source production of platforms for government organisations, especially at the intersection of citizens and local government. For example, the Open311 platform and standards were initially used to supplement local government contact centres; the standard serves to consolidate citizen requests and follow-up.
- General Transit Feed Specification: This is an example of a very successful industry-authority collaboration. This was a very successful standard, now used across the world, originally developed by Google in a partnership with Tri-County Metropolitan Council to publish transport schedules to the Google Maps platform. That’s something picked up on a blog by Grohsgal (2014) — also see McHugh (2013) for an inspiring account of the GTFS’ development.
- Open Contracting Data Standard: The UK government is actively committed to introduce this data standard to publicise contracting activities of national government agencies (UK government, 2016). This is a rather new activity and there’s not yet much published in this.
- Building & Land Development Specification (http://permitdata.org): That’s another US effort to introduce a common standard for the publication of permit data for construction projects.
Towards a standard for land use data
Working on Future of Planning (link to project post), agreement on common information schemas could be quite important. MappingGM, developed with funds from the Department for Communities and Local Government (DCLG), offer some insights in achieving a common standard. But there’s more work to be done also on a national level, for example, by making land registry data more accessible via a standardised API, and with its base layer of polygons available free of charge. There is a role to play for all tiers of government and their agencies.
At a national level, there is merit in agencies involved in clarifying referencing and ‘what things should be named’. Thus, references to, e.g., specific plots or a ‘housing unit’ are clear. For example, the land registry has an opportunity to produce an open data standard for ownership information and reconciliation of historic transactions. If this data was available more readily and cheaply, it could become the basis for third-party applications and thus serve as a foundational data source to link other datasets to it.
With the recent housing white paper (of the UK government) called for a standardised method for assessing housing land supply, but here things get tricky and contested. Most local authorities have conflicting schemas that partly underly different topological conditions, but also certainly sometimes political priorities. A good framework will require flexibility in a ‘data standard’; it will require appropriate meta-data, so that differences become explainable. The later should come in a format that can be processed by a software application building upon it. This is thus the focus of further investigations.
What’s your view on civic data standards? Do you believe we need to agree on data schemas for things like development land? Feel free to leave your thoughts in the comment box below or email me on firstname.lastname@example.org.
Here are some of the sources used
- Handler, Reinhard A., & Ferrer Conill, Raul. (2016). Open Data, Crowdsourcing and Game Mechanics. A case study on civic participation in the digital age. Computer Supported Cooperative Work (CSCW), vol. 25, no. 2-3, 153–166.
- McHugh, B. (2013). Pioneering open-data standards: The GTFS story. In Beyond transparency.
- BSI (2014). The role of standards in smart cities.
- Grohsgal (2014). Building Data Standards
- UK Government (2016). UK Open Government National Action Plan 2016-18 Policy paper