Harnessing public data value by focusing on user’s needs
The IDI has been described internationally as a success for New Zealand, and an exemplar for other countries to learn from in terms of getting the most from harnessing public sector data, and facilitating evidence based policy making.
New Zealand’s Integrated Data Infrastructure (IDI) is a large research database holding anonymised data from across the public sector about citizens, linked to data about life events such as education, income, migration, justice and health. The IDI is longitudinal, meaning that it tracks anonymised individuals and households throughout their lives, and as such is exceptionally useful for answering questions about groups of people or businesses with similar characteristics over time.
The authority for Statistics New Zealand was granted by the NZ Cabinet in 1997, and after some data integration projects for specific purposes, the Cabinet agreed in 2011 to consolidate previously separate integration projects into the IDI prototype. Further Cabinet agreement in 2013 led to the expansion of the IDI to create a cross-government data integration service. Since then the IDI has continued to be updated on a quarterly basis, and the number of new datasets available in the IDI and demand for the service continues to increase.
Why is the IDI valuable? It satisfies the user’s needs
The IDI successfully satisfies at least three users needs:
- In order to make evidence-based decisions, policy makers need to be able to measure the impact of interventions and to understand what would have happened if the intervention had not been made. This usually requires both longitudinal and joined data that combines information about individual citizens across a number of datasets, often provided by a number of different agencies.
- Public bodies also have a responsibility to produce timely statistics for publication to be consumed by other public bodies, businesses, and individuals. These statistics often involve combining data collected by several public bodies, and like the above need, may need to be historical.
- In order to produce accurate research relating to public issues in New Zealand, researchers need to have access to the most up-to-date data in a way that protects the privacy of citizens and is subject to the necessary scrutiny.
With IDI, individual citizens can be tracked anonymously through around 550 datasets from 14 organisations, which include data from as early as 1840. This database is made available to all government departments and researchers who successfully fulfill the access criteria.
Keys for success
The service started small, was tested regularly, and iterated to meet users’ needs
The IDI started as a prototype and was iterated from there, until 2011 when it moved from one-off data integration to providing a whole data integration service. Part of their success in opening up data is down to working with users to understand their data needs and connect them with relevant government contacts.
Good security practices are essential when managing personal data
Data in the IDI is de-identified, with information like names, dates of birth, and addresses removed. Numbers that can be used to identify people, like tax references, are encrypted and Stats NZ check research results before they're released to make sure individuals can't be identified.
After integrated data has had identifying information removed, only vetted and approved researchers can access selected datasets for their specific project. Research must be for the public good and data can only be accessed in Stats NZ’s secure research data facilities. Before any new data is added to any Stats NZ service, a privacy impact assessment is carried out to consider any risks.
Standards for public trust and transparency
They apply ethical, statistical, and security best-practice standards to the data being collected, and people who use the data must apply the same standards. Stats NZ staff and researchers who use data have to sign a statutory declaration of secrecy which is a lifetime agreement.
Cloud and onshore environments to ensure sustainability
All research services are operated out of cloud environments, with two onshore data centres and one in Sydney. Given the volatile nature of New Zealand’s physical environment, this is essential for resilience. To ensure these centres are always available and are resilient, they have production and disaster recovery servers operating out of geographically separated Infrastructure as a Service (IaaS) data centres.
Please share your thoughts on the topic.