Secondary Use of Health Data for Medical Research and Public Health

Published on: 18/11/2011 Document Archived

Introduction

The processing and linking of a broad array of personal data in research, and especially in the health sector is generating considerable public concern. Primarily, the researchers' focus is on aggregate trends, therefore when studying health services they rarely have a direct interest in knowing the specific identities of the people they study. Nevertheless, and for various reasons to be further elaborated in this article, there is still growing need for the secondary use of personal data in health research. Thus, the secondary use of health data with use other than for which it was originally collected requires to strike a balance between the public interest and that of the individual data subject.

Sociologists have already discussed incentives for sharing and analysed reasons why individuals might not want to share data. Data sharing is coupled with secondary use, in the sense that for the data to be re-used by others, it must be de facto accessible in the first place. The term 'secondary use' can be defined as the use of data collected for one purpose to study a new problem. According to Robert F. Boruch (Boruch, 1985), sharing is defined as when the original holder grants to another the partial use, enjoyment, or possession of something. Thus, he described the process as "the voluntary provision of information from one individual or institution to another for purposes of legitimate scientific research" (Boruch, 1985: 89).

Meanwhile the secondary use of data and its protection has become a fervent issue, largely dividing scientific and research communities on the one hand, and individuals on the other hand. Increasingly, teams of researchers are developing data repositories (registries) and biobanks that serve as platforms for a wide variety of future research which challenge existing Fair Information Principles [1] around limits on data collection, use, and retention (Willison, 2009). Currently, there is no systematic documentation of the existence of these registries and biobanks.

In the meantime, reported health data breaches are multiplied (OECD, 2011). To get an idea of the magnitude of the issue, the US Department of Health and Human Services reported 269 health data breaches, involving data of some 10 million individuals between September 2009 and March 2011. In addition, Medicare Australia dealt with 234 serious data privacy breaches by employees in 2007-08. Finally, London NHS trusts reported a total of 899 breaches between 2008 and 2011.

Choices reach an impasse

Given the above, various questions and issues arise concerning if, when and how secondary use of data should be deployed and how these can be protected. However, this discussion often reaches an impasse in the sense that there is, at times, imperative to resort to secondary use of data, yet we are not fully capable of protecting them, thus leading parts of the society and scientific communities to repel the use of such data. Consequently, the protection of secondary use of data stumbles upon some questions like: Why do health researchers need to make secondary use of data? Why is it sometimes impracticable to obtain consent? What security safeguards do health researchers currently use? What review and oversight mechanisms are currently in place?

The basic problem with secondary use of data is that most health research requires it. Even though, for the majority of research, researchers have no interest in the actual identities of individuals, individual-level linkable data are required for several reasons:

1. For analytic purposes. When studying the relationship between some exposure (a drug, a procedure, a policy) and a health outcome, individual-level data allows the researcher to obtain a more precise estimate of the relationship between exposure and outcome by "controlling" for other factors that have an impact.

2. To look at spill-over consequences of policies. Attempts to limit expenditures in one budget portfolio (e.g. pharmaceuticals) may result in unanticipated increase of the utilisation in other portfolios (e.g. physician visits, emergency department visits). Therefore, any evaluation of policies should examine spill-over effects to other budget envelopes. To address this, it is usually necessary to combine files from disparate data sources, including non-medical determinants of health.

3. For cohort studies that follow particular patient groups prospectively, it is necessary to update files from time to time.

Besides that, and more importantly, researchers need to make secondary use of existing personal information (secondary use of data) in order to:

· study patterns of diseases in the population;

· identify causes of disease and their impact;

· develop and evaluate preventive and therapeutic strategies, health services, programmes and policies;

· assess data quality;

· assemble potential research participants.

When secondary use of data is in order, almost all of the data (with minor exceptions) is originally acquired during the clinical care of the patient and not for administrative or planning purposes. If data will be used for any kind of secondary reasons, clear definitions of the circumstances in which this was allowed must be developed. Some of the current uses of health data are:

· Identifying the causes of disease, the prevalence of risk factors and identifying populations at risk.

· Protecting public safety, especially with regard to infectious diseases, but also in relation to environmental hazards.

· Needs assessment, monitoring and evaluation of services, with a view to providing an optimum quality of health care.

· Education of the public and health professionals in all of the areas above.

The threat of misconduct

Despite the clear need for the secondary use of data, there are concerns over the potential of harm. Four broad categories of data use misconduct can be identified (National Committee on Vital and Health Statistics, 2007):

Erosion of trust in the healthcare system may occur when there is divergence between what individuals reasonably expect health data to be used for and when uses are made for other purposes without their knowledge and permission. Compromises to health care may result when individuals fail to seek treatment or choose to withhold information that could impact decisions about their treatment because they do not understand how their data may be used or they may not trust that their identity will be protected, particularly if they consider their information to be sensitive. Risk of discrimination and personal embarrassment in the sense that personal health information is being, at times, used to make decisions that adversely affect an individual, such as in employment, benefits coverage, or acceptance for loans or mortgages. Potential for group-based harm may arise when data is aggregated and results are potentially misused.

Thus, there is an urgent need to protect personal data on secondary use. When discussing the issue, two broad subjects arise. The first one concerns the methods used to protect the data. The second one concerns the consent given by the data-subjects.

Common practices for protection of personal data

Turning our attention to the first issue, when further reading the relevant literature on protection of personal data and secondary data, one may come across three common, concise practices that dominate the field: anonymisation (already mentioned), pseudonymisation, and confidentiality. Before elaborating on them, we should first clarify the term 'personal data', which according to EU Directive 95/46/EC shall mean any information relating to an identified or identifiable natural person. An identifiable person is one who can be identified, directly or indirectly, in particular by reference to an identification number or to one or more factors specific to his physical, physiological, mental, economic, cultural or social identity.

Anonymisation (National Committee on Vital and Health Statistics, 2007), as a data protection method, is the process that removes the association between the identifying data set and the data subject. This can be done in two different ways. First, by removing or transforming characteristics in the associated characteristics-data-set so that the association is no longer unique and relates to more than one data subject. Second, by increasing the population in the data subjects set so that the association between the data set and the data subject is no longer unique. This is the most common and widely accepted method of data protection in health research and secondary use of data.

Pseudonymisation (National Committee on Vital and Health Statistics, 2007) is a particular type of anonymisation that, after removal of the association with a data subject, adds an association between a particular set of characteristics relating to a data subject and one or more pseudonyms. The pseudonym may be unique in a domain. Under this process the possession of the holder cannot reasonably be used by the holder to identify an individual. However it differs in that the original provider of the information may retain a means of identifying individuals. This will often be achieved by attaching codes or other unique references to information so that the data will only be identifiable to those who have access to the key or index. Pseudonymisation allows information about the same individual to be linked in a way that true anonymisation does not.

In fact, psuedonymisation can come in various forms. For instance, we can distinguish among: overlapping data sources (one-time secondary use), one-time secondary use with the need of re-identification, pseudonymous research data pool and central clinical data base (many secondary uses). In general, the method of psedonymisation has gained more ground lately. For example, as of April 2011, all NHS organisations in the UK are required to have implemented pseudonymisation processes and the application of 'secondary use' Safe Havens to support the restriction of access to identifiable data and the process of enabling the de-identification of the data before the data is used for secondary uses.

The concept of confidentiality

Confidentiality arises when one person discloses information to another (e.g. patient to clinician) in circumstances where it is reasonable to expect that the information will be held in confidence. Therefore, the concept is a legal obligation; a requirement established within professional codes of conduct and usually is included as a specific requirement linked to disciplinary procedures.

Many current uses of confidential patient information are not confined only in contributing to or supporting the healthcare that a patient receives. Very often, alternative uses are extremely important and provide benefits to society e.g. in medical research, protecting the health of the public, health service management and financial audit. However, the secondary use of these data for this kind of purposes is not directly associated with the healthcare that patients receive and thus, presumably, patients who seek healthcare are content for their information to be used in these ways. Therefore, and given the valuable secondary use of these data, the principle of confidentiality, in the social care sector, should cover the following areas:

· Fair obtaining of data, including consent.

· Accuracy of data.

· Secure storage of data with time limits.

· Access by staff on a need to know basis.

· Access for patients and their representatives.

· Secondary use of data.

· Sharing information with other organisations.

· Trained, or at least well-informed staff.

When discussing secondary use of data and its protection, the fundamental principle of consent to disclosure, can be explicit or implied. However, and according to the EU's Directive on protection of individuals with regard to the processing of personal data, within the EU consent for processing health data should be explicit. In general the Directive states that Member States shall prohibit the processing of personal data, but Article 8, introduces specific exceptions for the processing of health data, among which is the exception by explicit consent. The consent may refer to disclosure to a particular person or body for a particular purpose or it may be consent to general future disclosure for particular purposes. In either case consent should be informed. There is a number of standard purposes for which the personal data of all patients entering a hospital will be processed.

Patient consent is acceptable as long as patients are aware of the potential disclosure and the choice of opting out. Explicit or express consent is achieved when a patient actively agrees, either orally or in writing, to that particular use or disclosure of information or explicitly consents to a range of future uses which have been discussed with the patient. Explicit consent is ideal as there is no doubt as to what has been agreed. Patient agreement can also be implied, signalled by the behaviour of an informed patient. Implied consent is not a lesser form of consent, but it is only valid if the patient genuinely knows what is proposed and that he or she has a choice about participating. If not, there is no consent and its disclosure would require a different sort of justification.

International initiatives

For better results on data protection, the international community and international organisations and forums have institutionalised some broad guidelines to be followed. In reality there is no clear and unified legislation for the protection of personal data. However, the patchwork of existing legislation and most data protection laws are generally modelled after the internationally accepted Guidelines on the Protection of Privacy and Transborder Flows of Personal Data developed and adopted on 23 September 1980 by the Organisation for Economic Cooperation and Development (OECD). This document continues to represent the international consensus on general guidance concerning the collection and management of personal information.

Furthermore, the EU's Data Protection Directive of 1995, on the protection of individuals with regard to the processing of personal data and on the free movement of such data promotes transparency in the following manner. The data subject has the right to be informed when his personal data is being processed. The controller must provide his/her name and address, the purpose of processing, the recipients of the data and all other information required to ensure the processing is fair (Art. 10 and Art. 11). Furthermore, it rules that data may be processed only under the following circumstances (Art. 7):

· the data subject has unambiguously given his/her consent;

· processing is necessary for the performance of a contract to which the data subject is party or in order to take steps at the request of the data subject prior to entering into a contract;

· processing is necessary for compliance with a legal obligation to which the controller is subject;

· processing is necessary in order to protect the vital interests of the data subject;

· processing is necessary for the performance of a task carried out in the public interest or in the exercise of official authority vested in the controller or in a third party to whom the data are disclosed;

· processing is necessary for the purposes of the legitimate interests pursued by the controller or by the third party or parties to whom the data are disclosed, except where such interests are overridden by the interests for fundamental rights and freedoms of the data subject which require protection under Article 1 (1).

Besides this general system, the Directive (Art. 8) introduces certain exceptions for processing of health data (and other sensitive data). Special reference should, also, be made to the UK Data Protection Act 1998 (DPA), which came into force in March 2000. Its purpose is to protect the right of the individual to privacy with respect to the processing of personal data. It governs when and in what circumstances, personal data may be shared with others but even when data sharing is justifiable, the Act only permits and does not require the release of information. The DPA requires organisations to fairly and lawfully process any information, which might enable a patient to be identified.

Article Sources:

Boruch F. R., "Definitions, Products and Distinctions in Data Sharing", in: Fienberg S., Martin M. and Straf M.,Sharing Research Data, Natl Academy Pr., 1985.

Canadian Institutes for Health Research, Secondary Use of Personal Information in Health Research: Case Studies, November 2002.

Comber H., "Secondary Use of Data “Striking a Balance", National Cancer Registry.http://www.cosantasonrai.ie/documents/conferences/SecondaryUseDataHarryComber.doc

Data Protection Act 1998, Protecting people from the wrongful use of their personal information by others.http://www.legislation.gov.uk/ukpga/1998/29/contents

Department of Health (UK), Confidentiality: NHS Code of Practice, November 2003.

Directive 95/46/EC (Directive on protection of individuals with regard to the processing of personal data and on the free movement of such data) http://eur-lex.europa.eu/LexUriServ/LexUriServ.do?uri=CELEX:31995L0046:E...

Ethics Department, Guidance on secondary uses of patient information, British Medical Association, April 2007.

Jentzsch N., A Welfare Analysis of Secondary Use of Personal Data, DIW Berlin, May, 2010.

Lawlor D. A., "Public health and data protection: an inevitable collision or potential for a meeting of minds?",International Journal of Epidemiology, Volume 30, Issue 6, 2001.

Lowrance W., Learning from experience: privacy and the secondary use of data in health research, Journal of Health Services Research and Policy, No. 8, 2003.

National Committee on Vital and Health Statistics, Enhanced Protections for Uses of Health Data, Report to the Secretary of the U.S. Department of Health and Human Services, December 2007.

OECD, OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal Data, 1980.http://www.oecd.org/document/18/0,3343,en_2649_34255_1815186_1_1_1_1,00....

OECD, Enabling the secondary use of personal health data Proposal for work on privacy and confidentiality challenges, 2011.

Parliamentary Office of Science and Technology, Data Protection and Medical Research, January, 2005.www.parliament.uk/post/home.htm

Pommerening K. and Reng M., Secondary Use of the EHR via Pseudonymisation.http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.71.968&rep=rep1&type=pdf

Zimmerman S.A., Data Sharing and Secondary Use of Scientific Data: Experiences of Ecologists, A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Information and Library Studies) in The University of Michigan 2003.

Willison D. J., Use of Data from the Electronic Health Record for Health Research current governance challenges and potential approaches, March 2009. http://www.priv.gc.ca/information/pub/ehr_200903_e.cfm

[1] The Fair Information Principles are a set of procedural guidelines for the collection, management, processing, and safeguarding of personal information. In 1981, the member countries of the Organisation for Economic Cooperation and Development (OECD) agreed to these principles, which are now at the core of the majority of the privacy legislation in the Western world (for more information see here).

Nature of documentation: Article

Categorisation

Type of document

Document

Report abusive content Share