Challenges to AI Medical Research Under Privacy Laws

The Israeli Ministry of Health – backed up by government decision no. 3709 of March 25, 2018 – announced a national program to promote digital health. It is a highly ambitious plan, that relies heavily on using patients' health records as accumulated during the last 30 years in Israel and recorded in medical databases from various resources – clinical data, genetic data, medical devices and sensors. 
This unique wealth of personal information will be used to position Israel's health system as a worldwide leader in implementation of digital health solutions, promote its digital health industry as a national growth engine and a global innovation hub and promote clinical and academic research.

The national program is meant to advance using big data analysis and artificial intelligence (AI) in medical research. These techniques involve consuming vast amounts of data from various resources to find patterns and links that would be invisible to human researchers or require extensive resources and time to observe.

Immense Benefits

The social benefits of such research are immense: if applied properly they can enhance methods of treatment, support personal medicine, identify risks to personal and public health, improve quality of life and contribute to saving and prolonging life. 

IBM's Watson super-computer and Google's DeepMind are already being used in healthcare research. They support, for example, the analysis of X-Ray and MRI images or testing an alert, diagnosis and detection system for acute kidney injury.

But can such research use medical records that were originally supposed only to treat a person and not to be further used? Such patients did not agree to using their data for anything but their personal treatment. What should be the balance between the constitutional right to privacy and the potential benefits to society from AI, machine learning and big data medical research?

Inept Concepts

The government's decision adopting the national program for digital health conditions implementation upon the protection of the privacy of every person whose personal data is used. To this end, the program is built around two different concepts – consent and de-identification.
Alas, both these concepts seem inept when it comes to big data analysis and artificial intelligence.

Consent in the realm of privacy should be informed, freely given "distinguishable from other matters, in an intelligible and easily accessible form, using clear and plain language". This is the wording of the now famous European GDPR. It is not very much different, if at all, than the requirements under the Israeli Privacy Protection Law.

So how can one consent to the processing of his own personal identifiable data when AI systems often function as black boxes, which means that no one is aware of how they reach their conclusions? What data will be integrated with one's own? What sort of conclusions can he expect? Will he personally benefit from this or will it be used only for the greater good of society and advancement of the Israeli economy (which means that someone else will gain a monetary profit from his very own data)?

Highly Inconceivable

What will happen if the patients refuse to grant his consent? Will he not be treated? Will he be treated differently? Will he upset the physician to whom he entrust his life? 

How often will he be asked for his consent – every time he give a blood sample or questioned about his health? Once every now and then? Once in a lifetime? 

Under privacy laws, one may always retract and withdraw his consent. If he does, the researcher must delete every bit of personal data relating to him. This is highly inconceivable when AI is concerned. Erasing volumes of information – if many people ask to be forgotten - may have an adverse effect on the accuracy of research. Will you trust the research to advise on treating humans if this is the case? 
Informed consent is achieved when we are being provided with all the information required to make a knowledgeable decision. If current patients are asked to sign lengthy and frightening documents full of legalese confirming consent, these documents may triple in size, because not only will they need to cover the medical consequences of a decision to undergo a certain treatment or procedure, but also the privacy implications of the use of personal data (which one may not reasonably foresee when it comes to big data analysis and machine learning…). People will sign without reading. This is not a sound basis for informed consent. We lawyers have grown to accept it only for the lack of better solutions.

A Logistic Nightmare 

And, of course, if one wants to build his research on the vast amounts of health data Israel recorded and stored in computers since the 80s and 90s, he cannot go back and ask for patients' consent. This is a logistic nightmare. It is therefore not surprising that the British ICO found last year that Google's DeepMind was provided personal data of around 1.6 million patients without adequately informing them that their personal data would be used for testing purposes. And so, when the Israeli national program provides that personally identifiable information will be used only subject to consent, it can refer to either a limited set of data or data that will be accumulated from now on only. This data might probably not be sufficient to train AI systems with.

But they can take records and anonymize them, can't they? The Israeli national program calls for de-identification of data, as an alternative.

There is a clear difference between anonymization and de-identification –

Anonymizing data means that it cannot be traced back to an individual. This is extremely hard to achieve.

De-identification is different. It’s a process that reduces the risk that a person can be identified. For example, in medical research the use SID's is prevalent. The research companies do not receive the names of patients, only their random IDs. But according to the EU's GDPR, such information is still identifiable because the site involved in research holds both the IDs and the names of patients to which it is attributed, hence it is considered every bit as private as using the patient's own name.

Anonimizing DNA?

In addition, there are things that cannot be entirely anonymized. DNA will always be unique to one human being, hence identifiable. DNA exists in every tissue or sample taken from a patient. If it cannot be anonymized, can it still be used under the modern Israeli digital health program? 

What's worse, experience shows that attempts to distribute data that supposedly cannot be traced back to identified individuals are far from being a full proof solution. Back in 2009, researchers were able to trace back the identity of Netflix subscribers from records of viewers the company released to 50,000 programmers, challenging them to improve its movie recommendation algorithm. Within weeks, two researchers identified several Netflix users by comparing their Netflix data to reviews they posted on the Internet Movie Database website. This included identifying their political views and sexual orientation.

Crossing so-called anonymized information with publicly accessible identified information was found effective with the use of medical records as well.

The State of Washington, US, sells patient-level health data. This publicly available dataset records virtually all hospitalizations occurring in the State in a given year, including patient demographics, diagnoses, procedures, attending physician, hospital, a summary of charges etc. It does not contain patient names or addresses (only ZIP codes). Researchers were able to identify 43% of the records they checked simply by matching the data with newspaper stories that mentioned the circumstances that resulted in hospitalization. 

A New Balance

It seems that the Israeli Ministry of Health is aware of the weakness of de-identification. Therefore, it will offer the opportunity to opt out from the use of our health data in such a manner. But opting out is ridiculing privacy. Privacy is all about opting-in. Consent should always be achieved in advance.

AI research is challenging even under the modern GDPR. The Israeli law protecting privacy dates to 1981, when big data analysis and machine learning were a thing of science fiction literature, if at all. 

A new balance must be struck between the right to privacy and modern research. The Israeli Privacy Protection law needs to change and to reflect technological realities. Moreover, the change must reflect privacy concerns - not only the drive for innovation, growth, social good and economic benefits. Until it changes, we are left with relying on either consent or de-identification for the use of medical data for AI research – and both are fundamentally flawed.