Article body

Introduction

More and more people have biological samples and health information stored with a range of public and private entities, including direct-to-consumer health and ancestry genetic testing companies, clinical laboratories, cohort initiatives and large-scale biobanks. Personal health information includes many types of information, ranging from qualitative or demographic information to genomic data and even biobanked tissue itself (1). And with the rise of Big Data research initiatives, personal information from a range of sources is being compiled, shared and analyzed in ever more complex ways (2). Often, individuals are asked to provide consent for the storage and use of their information for research and other permitted purposes. In other circumstances, policies allow research to be conducted without consent.

The obligation to maintain the privacy of the research participant is foundational to biomedical research. It is mentioned in virtually every research ethics guideline, including well-established international statements (3,4), national policies (5,6), and professional ethics codes (7,8). Privacy is expected by the general public and research participants, and it is a key component of public trust in the research enterprise (9). But there is growing concern about the challenges of keeping participant information private and confidential (10,11). Growth in sophisticated information technologies that can facilitate data breaches along with increasing collection and sharing of digitized health information may make it more difficult for researchers, public research institutions and private companies to maintain this obligation (12).

When consent is required for research involving health information and biological samples, the relevant consent process often includes information about data protection, the entities and individuals that will have access, and why confidentiality cannot always be guaranteed. But given the shifting information technology landscape, to what degree does the consent process need to evolve, if at all, to reflect emerging privacy and data protection concerns? Have privacy risks – and the public concerns and perceptions about those risks – changed enough to warrant re-consenting for samples that were collected with data protection guarantees that are no longer realistic? What privacy risks ought to be disclosed to participants and when? And are the promises of anonymity that are so often made to research participants and research ethics boards still tenable?

In this article we explore these questions through the lens of Canadian health law and research ethics policies. The goal is to map the nature of the emerging consent challenges. As research involving health information and biological samples becomes increasingly common, essential and complex, the issues associated with privacy will intensify. Here, we seek to highlight several areas that warrant immediate attention.

The Emerging Privacy Challenge

A number of recent studies have highlighted how emerging computational strategies can be used to identify individuals in health data repositories managed by public or private institutions (13). And this is true even if the information has been anonymized and scrubbed of all identifiers (14). A study by Na et al., for example, found that an algorithm could be used to re-identify 85.6% of adults and 69.8% of children in a physical activity cohort study, “despite data aggregation and removal of protected health information” (15). A 2018 study concluded that data collected by ancestry companies could be used to identify approximately 60% of Americans of European ancestry and that, in the near future, the percentage is likely to increase substantially (16). Such concerns have led at least one company to offer “anonymous” genome sequencing (17). Furthermore, a 2019 study successfully used a “linkage attack framework” – that is, an algorithm aimed at re-identifying anonymous health information – that can link online health data to real world people and thus, as suggested by the authors, clearly demonstrates “the vulnerability of existing online health data”(18). And these are just a few examples of the developing approaches that have raised questions about the security of health information framed as being confidential. Indeed, it has been suggested that today’s “techniques of re-identification effectively nullify scrubbing and compromise privacy” (19).

In addition, data breaches involving health information are on the rise. A study from the US found that the rate of data breaches increased by 70% between 2010 and 2017 (20,21). Sensitive demographic and financial information is commonly compromised (22). In Canada, there have been a number of high profile breaches involving publicly held health information (23,24). In British Columbia, for example, a 2016 incident led to a province-wide freeze of biomedical research involving health information (25). Data breaches in the private sector are also increasing, with most being caused by malicious or criminal attacks (26,27). In addition, there are examples of inappropriate sharing of data for research purposes, as exemplified by the potential class action lawsuit in the United States that accuses the University of Chicago of sharing identifiable patient data with Google (28).

There have also been highly publicized instances of genetic repositories being used by law enforcement agencies for the purpose of criminal investigations. Probably the most famous was when genetic information from a direct-to-consumer genealogy company was used to uncover the identity of and apprehend the Golden State murderer (29,30). Since then, there have been numerous other examples of repositories of genetic samples being used in similar situations (31). While the use of genetic databases in this context does not necessarily implicate health research biobanks and cohort studies, it once again emphasizes how information that was collected under a presumption of confidentiality may be used in controversial and unexpected ways. These cases have also made the privacy issues very public – as highlighted by this New York Times headline: “Sooner or Later Your Cousin’s DNA Is Going to Solve a Murder … The price may be everyone’s genetic privacy” (32). This coverage may impact public perceptions and concerns about privacy issues and, perhaps, expectations regarding what is disclosed during the consent process.

The emergence of powerful technologies and re-identification strategies coupled with a rising number of privacy controversies has prompted some commentators to go so far as to suggest that the entire concept of privacy and anonymity is dead (27,33,34). Indeed, it has been suggested that we now live in the era of privacy nihilism – a time when it is becoming near impossible to maintain privacy and to control what others can learn about you (35). Of course, not all data repositories are the same and the risk of a data breach likely differs significantly depending on many factors. Still, these privacy controversies and technology trends highlight that we may need to reconceptualise how we think about and frame privacy for the purposes of consent. This seems particularly so given that much of consent law is based on what a research participant may want to know about risks and not necessarily merely those that are most significant.

Privacy and Public Perceptions

The public, and patients in particular, are mostly supportive of the idea of sharing their health information and biological material for research purposes (36,37). However, that support is often contingent on the promise of privacy and the de-identification of information (38), a strategy that, as noted, may not be effective at protecting privacy. Despite the technological reality that it has become near impossible to guarantee its existence, people still care about privacy, particularly in the context of biological samples (39) and health information (37). A 2017 study from the US found that “[n]inety percent of participants agreed health information privacy was important to them; 64% agreed that they worried about the privacy of their health information” (37). A 2019 study from Canada found that while most people support the contribution of personal data for research purposes, “respondents placed high importance on deidentification of data” and only “58% were confident about the privacy and security procedures in place” (40).

This work highlights the degree to which support for research is linked to assurances of privacy (38). These concerns may be heightened in the context of genetic information. While the way in which individuals think about privacy in the context of health information can vary considerably (41), genetic information is generally viewed, rightly or not (42), as being especially sensitive. Studies have consistently found that, if asked, people will say they are concerned about both genetic privacy (43,44) and data breaches in relation to online data (45).

We need to take care not to oversimplify privacy concerns. Individual circumstances will, for example, change how people rate privacy as a concern in the context of research. A patient or an individual with a sick family member may view the privacy concerns of health information differently than a person who is not directly or indirectly involved in a research initiative (46), and there is also variation within these groups (47). Likewise, whether a research participant is paid or unpaid for their involvement may also change the calculus (44). People balance risks differently for many reasons. Still, the body of available research suggests people are concerned about privacy and the potential for data breaches (38).

Studies have also found that the public is concerned about data custodians sharing personal information without consent. A 2018 survey, for example, found that 85% of Americans are concerned that DTC genetic testing companies will share genetic data without permission and 71% are worried medical researchers will do the same (48). This concern about privacy can impact willingness to use online services (49) and to participate in research that involves the collection of health information and genetic material, such as biobanking (50). Such data again demonstrates public attention to privacy in this context and the need to be sensitive to these issues during the consent process.

Privacy issues are also getting more and more media coverage (32,51,52), which may then increase concern for privacy by making people more aware of the relevant issues. Research has shown that media coverage of a risk can make that risk seem more likely. This is due to the “availability bias”, a well-known cognitive bias that affects our perceptions (53). And there are indications that an increasing percentage of the public wish to retain significant control over their health information. Indeed, we have seen the rise of the concept of “biorights” (54) – that is, the desire for research participants to control and profit from biological material donated for research purposes. This movement has been stirred, at least in part, by both the perception that biological samples are worth a significant amount of money and controversies associated with the mishandling of biological samples (55), such as the much publicized case of Henrietta Lacks (39).

These kinds of developments may heighten the public’s interest in and concern about privacy issues, which may, in turn, trigger interest in heightened disclosure in the context of consent. Indeed, a 2018 survey from the US found that “data privacy” was ranked as the single biggest concern in relation to the private sector, above job creation, access to healthcare and education (56). A 2016 study by the Office of the Privacy Commissioner of Canada found that the public is becoming increasingly concerned about protection of personal privacy, with 92% saying they are at least somewhat concerned (57). Thirty-seven percent say they are extremely concerned, which is up from 25% in 2012 (57).

Consent, Re-consent and Reporting

The collection and use of biological samples and digitized health information for research purposes has long generated legal and research ethics issues (55,58). It seems likely that the privacy issues outlined above may further complicate these challenges. Here we focus more narrowly on two specific and practical questions: what privacy risks need to be disclosed and when recontact and reconsent is required. Again, our aim is to map these challenges to inform future conceptual and empirical work.

Required Disclosure

In the clinical setting, all material information must be disclosed as part of the consent process. The courts have generally treated disclosure expansively to include anything that a reasonable person in the patient’s position would want to know (59). And this obligation is even more onerous in the context of medical research (60,61). Generally, informed consent for medical research requires “full and frank disclosure” of all relevant facts, probabilities and opinions a reasonable person might be expected to consider before giving consent, even if minor disclosures might cause unnecessary worry (60). Canadian consent statutes similarly specify categories that suggest a full and comprehensive disclosure of risks (62).

While the technical risk of a harmful data breach may remain low, the risk is real and, given what we know about how people view privacy concerns in this context, information about this risk may be material. Indeed, what is deemed to be material information about risk in the eyes of the law does not necessarily have to correspond to a scientifically or statistically substantial risk (63). Rather, the question is more whether a reasonable person, who would likely be aware of dominant social and media discourses about health information privacy concerns, would want related information about privacy risks disclosed. Professional guidelines support the idea that even information about “statistically remote” risks must be disclosed if they are “of a serious nature” (64). As such, the changing nature of the privacy threats seems likely to warrant a more robust delineation of privacy risk during informed consent.

The Tri-Council Policy Statement: Ethical Conduct for Research Involving Humans [TCPS2] remains the most important research ethics policy in Canada, as all federally funded research must adhere to it via research ethics board (REB) oversight (5). For informed consent, the TCPS2 requires patients be provided with “a plain language description of all reasonably foreseeable risks and potential benefits” (5), as well as:

an indication of what information will be collected about participants and for what purposes; an indication of who will have access to information collected about the identity of participants; a description of how confidentiality will be protected; a description of the anticipated uses of data; and information indicating who may have a duty to disclose information collected, and to whom such disclosures could be made

5

Other sections of the TCPS2 expand on disclosure requirements related to privacy and confidentiality. Notably, researchers must “describe measures for meeting confidentiality obligations and explain any reasonably foreseeable disclosure requirements” both in application materials submitted to research ethics boards and “during the consent process with prospective participants” (4). REBs, in assessing proposed measures to achieve data security, must consider risks to participants “should the security of the data be breached, including risks of re-identification of individuals” (5). These provisions, taken together, suggest a requirement to disclose known and potential privacy risks, including risks to data security. Disclosure of potential risks, in our view, should encompass what we presently know about how participant data can be compromised, such as studies that show that re-identification of anonymized data is possible (14,15,18,19).

One issue arising from these standards is that a more robust disclosure of privacy risks may cause individuals to be less likely to agree to participate in biobank and cohort studies. Research has found that people generally rate specific privacy concerns as seeming more severe than abstract concerns (65). In other words, the more detailed the disclosure, the more potential participants view participation as problematic. Researchers may thus be concerned about scaring patients away from participation (66). Yet, from a legal perspective, this concern is not a valid justification for nondisclosure. Indeed, if the disclosure of a risk impacts willingness to participate, it is exactly the kind of information that must generally be disclosed. In addition, international research ethics norms stress that the rights of the research participants are paramount. As stated in the Declaration of Helsinki: “While the primary purpose of medical research is to generate new knowledge, this goal can never take precedence over the rights and interests of individual research subjects.” (67) Besides, negative reactions to full disclosure may have more to do with a lack of understanding of the technicalities surrounding data security than with the need to be fully informed and, as such, may be countered or addressed by a robust disclosure process that educates participants about data security.

Ongoing Consent and Reconsent

Research consent in Canada and internationally often involves the participant agreeing to secondary use of de-identified information and/or biological materials for future research that is undetermined at the time of consent (5,68,69). In a system using this type of research consent, when is recontact and reconsent required?

The TCPS2 requires that privacy measures be maintained for the entire life cycle of health information, including “collection, use, dissemination, retention and/or disposal.” (5) In general, any change or development to relevant risks that is material to the participant’s decision to participate or continue to have his/her information stored will trigger a legal obligation to recontact (59). This is in keeping with the previously noted law concerning disclosure for informed consent (59,60,61). The risk need not be material in an evidentiary sense, but merely in a subjective sense, in that the participant would find it relevant to ongoing participation (63,70). Given evidence that participants care about privacy (46,49), any material change in privacy and confidentiality risk would likely warrant recontact. This raises the issue of whether and when technological developments in re-identification strategies that reduce the effectiveness of existing privacy safeguards could trigger a need for recontact and reconsent. Again, given existing law and public perception data, a compelling argument could be made that they would if they put the relevant database at an increased risk of a breach.

In the context of research ethics, a longstanding principle of international and Canadian policies is the right to withdraw from participation in research at any time (3,4,5). While there are a few exceptions to this right – such as quarantining in the context of some infectious disease research (71) – this is a near universally accepted research ethics norm that aligns with the conceptualization of informed consent as an ongoing process (5). In order for ongoing consent to continue to be informed, the TCPS2 requires participants be “given, in a timely manner throughout the course of the research project, information that is relevant to their decision to continue or withdraw from participation” (5). As noted, it is possible under the TCPS2 to provide a broad consent for future secondary use of identifiable information (5). But this does not vitiate the right to withdraw at any time or the requirement to provide information that may be material to a decision to continue participation.

Given that non-identifiability may no longer be a reality for tissues and some types of information, perhaps the biggest challenge lies with the concept of “non-identifiable” information and its application in the TCPS2. Article 5.5B states that researchers are not required to seek participant consent for research that “relies exclusively on the secondary use of non-identifiable information” (5). Moreover, Article 2.4 of the TCPS currently allows for secondary research use of “anonymous” information or biological materials without REB review, as long as “the process of data linkage or recording or dissemination of results does not generate identifiable information” (5). This policy may be increasingly controversial as re-identification techniques improve and spread (14,15). Indeed, the evolution of re-identification technologies and strategies, while still far from representing a broadly applicable threat, may compel a reconsideration of these kind of exceptions to consent and ethics review.

People care deeply about privacy, including not only actual participants but also and especially the parents of minor participants (48,50). It seems likely that re-consenting could lead to withdrawals and that may make research difficult and affect the integrity of data (72). However, research ethics policies are designed to protect participants. More importantly, the law of disclosure does not change in the face of competing researchers’ interests (3,4,5). Privacy-related information may cause some participants to withdraw from research. But, rightly or not, there are no legal and ethical norms that would suggest disclosure practices can be modified for the purpose of avoiding withdrawals or refusals to consent.

Finally, there seems little doubt that data breaches and any unauthorized access to or disclosure of identifiable or re-identifiable participant information must be disclosed. Questions remain as to how we can define the moving target of “re-identifiability” and its relationship to risk of participant harm, but erring on the side of always notifying participants of a breach would be prudent. There is a clear duty pursuant to legislation in most Canadian jurisdictions to inform participants affected by privacy breaches (73). This duty requires, on the one hand, a strengthening of existing research ethics policies, such as by clearly emphasizing participants’ rights to be re-contacted and re-consented where a material threat to data privacy emerges, and, on the other hand, a reconsideration of ethical requirements, such as less emphasis on data anonymization and de-identification as mitigation for data security risks.

Conclusion

In this age of Big Data research, it seems likely that there will be an increasing need to collect biological samples and digital health information. At the same time, as computational and information technologies progress, the risks to privacy will expand. The same technologies that are making health information more clinically and scientifically valuable – such as inexpensive sequencing, online databases and AI – are the tools that can also be leveraged to compromise privacy.

The promise of anonymity is becoming ever more tenuous. Yet, it remains a foundational component of the research ethics policies that underlay and enable health research. The potential inability to ensure anonymity could have significant ramifications. The public values privacy and, as a result, the inability to ensure it could re-frame the consent process and how participants think about participation in research initiatives. It would be valuable to generate more data on the public’s and research participants’ tolerance for the risk of privacy breaches and to engage in research to help determine how best to communicate those risks in a balanced manner.

The fact that privacy is highly valued affirms and even heightens the legal obligation to disclose privacy related risks. Material information about risks, including risks associated with privacy, must be disclosed. If there is a material change in risk, this information needs to be disclosed and may trigger an obligation to reconsent. Given the rapid rate of development in AI and other domains relevant to data protection, important questions arise as to what kind of advances in reidentification technologies could constitute a material change in risk. Such considerations will require ongoing monitoring by the research ethics community and seem likely, at the very least, to complicate the way we think about the protection of privacy.