Last week, I had the opportunity to give a talk at the first-ever State of Open Con in London. The event was well-attended, with hundreds of participants engaging in eight tracks that explored various aspects of open technology.
I had the pleasure of participating in the Open Data track, which delved into data sharing practices and programs. During my presentation, I shared insights on the types of data that should be made open for optimal impact.
Click below to watch, or continue reading for a full summary of the presentation.
Why Publish Open Data In the First Place?
There are a multitude of reasons why organizations may choose to publish open data, ranging from:
- fulfilling a mandate to promote transparency,
- encouraging community engagement, and
- enabling reuse by developers (among others.)
However, it's essential to keep in mind the FAIR Principles when making data available. These principles emphasize that data should be findable, accessible, interoperable, and reusable.
Publish The Source Data Behind Your Report
It is important to recognize that data collection is not solely a means to an end, but rather a strategic activity that often serves a larger purpose, such as reporting.
While reports offer valuable benefits, such as executive summaries, they can also limit the ability of others to gain deeper insights from the data or to validate its findings.
To address this issue, it is recommended that the raw structured data, which informed the report, be made publicly available. To safeguard any sensitive information, proper anonymization and aggregation of the data should be employed.
Fortunately, various repositories operated by government entities, charitable organizations, and community groups are available to facilitate the dissemination of data. By doing so, we can enhance the accessibility and usefulness of the data and help others benefit from it.
Publish Open APIs To Maximize Reuse
To make it easier for software developers to explore and build upon your data, it's important to go beyond just providing static files.
One effective solution is to offer an API, which enables developers to interact with data using code.
APIs often consist of applications that sit on a server with a database, allowing developers to query the database using predefined options. In some cases, APIs may even allow users to create new data in a database, such as via citizen reporting or community initiatives.
In the past, creating a functional API from a dataset took a lot of work and was time consuming. Now, with tools such as Directus, this process can now be completed in a matter of minutes.
To get started, simply create a new Directus project and set up your data model to match the column headers in your CSV. Then, upload your CSV and adjust the permissions to allow public access to your collection. This instantly makes a REST and GraphQL API available to the public.
At State of Open Con, I heard the team "shared data" as specifically different from "open data" used by both the Open Data Institute and Icebreaker One. Shared data is accessible, but not necessarily free-of-charge, which can increase the sustainability of the practice and validate ongoing collection.
Directus provides granular roles and permissions which, when combined with external systems to handle payments or subscriptions, can take your open data and allow you to generate revenue from it.
The Importance Of Metadata
Metadata is information that describes a dataset, and it is essential when publishing open data as it helps people to find, evaluate, and use the data effectively. Accurate metadata makes it easier to discover data, understand what it contains, and assess its quality.
Without strong metadata, what you mean is that your “open data” is only open to those who understand it.
Going back to the FAIR Principles, metadata allows your dataset to:
- Be easily found by projects that aggregate open data sources, like Open Data Scotland, who also presented at State of Open Con.
- Be more accessible by being clear about how data can be accessed.
- Be interoperable with other datasets, by using standardized terms and units, and by linking to other relevant data.
- Be reused by others, by including pathways and licenses to build upon the dataset.
In the world of APIs, this also includes providing robust API documentation that explains what endpoints exist, what queries can be made, as well as any access details and limitations.
Providing an API to a fully-open and free-to-access data set is not an alternative to providing the raw data, but a welcome addition. If you are ok with developers having the full dataset, make their lives easy and provide it.
Building Apps Upon Open Data
Your data can be used by developers to build different types of applications, typically falling into one of two categories.
Curated Insight Apps, which I also call "journalistic apps," offer a narrative approach to sharing insights obtained from the data.
Data Explorers, on the other hand, are more neutral and provide users with the freedom to derive their insights, at the cost of a steeper learning curve.
Regardless of the application type, it is crucial to provide clarity regarding its purpose and ensure it is accessible to all, including individuals with disabilities. User testing remains the gold standard, and involving disabled users throughout the project's life cycle, from conception to launch and ongoing development, is highly recommended.
Screen reader accessibility, keyboard navigability, color contrast, and animation usage are some areas to consider when assessing accessibility. By prioritizing accessibility, you can create a more inclusive and meaningful experience for all users of your applications.
Lower The Barrier to Accessing Your Data, Reap the Benefits
Your data is only as good as the artefacts that support it.
Lowering the barrier to understanding and working with your data provides the highest chance of successful community engagement and reuse.
If you're interested in establishing an API for your dataset, Directus Cloud provides a simple solution. With a Directus Cloud project, you can quickly spin up an API and make your data more accessible and user-friendly.