Identities in Safe Hands with the Data Fiduciary: This Is How Trust Centres Work
published on 29.04.2021
Health data from everyday medical practice has the potential to give research an immense boost. However, exchanging such data is subject to strict regulations. Data fiduciaries can overcome this hurdle in their function as a trust agency. The key to this is pseudonymisation of identity data.
Data Protection Thanks to Data Fiduciaries
The search for a corona vaccine not only proceeded at record speed – it also drove interest in clinical drug development to unprecedented heights. With almost scientific precision, laymen explained to sceptical relatives why a Phase III trial delivers reliable results. Even failed vaccine projects could be seen as something positive: They showed that the process works. There was no doubt about this in the research anyway. After all, clinical development had already proven itself in other breakthroughs. And yet its disadvantages are well-known. After all, clinical trials literally take place under laboratory conditions, excluding certain risk groups, for example.
That is why the Patient Data Protection Act (PDSG) also will allow data to be provided from the electronic patient file for research purposes starting in 2023. One thing is clear, however: Health data is highly sensitive. Their exchange – such as between a clinic and a research institution – is therefore subject to strict conditions. Most importantly, there needs to be an independent intermediary between those who give data and those who use it. That is precisely the role of data fiduciaries. They act strictly on behalf of the donor, do not pursue any commercial interests and ensure that only authorised persons are able to access the information.
The Work of the Trust Centre
In medical research, data fiduciaries are better known as trust centres (TC). In this role, they take on a very central task of data protection: the pseudonymisation of personal data. Ultimately, users are only to work with medically relevant information and should not be able to draw any conclusions about patients’ identities. For this reason, the trust centre converts information such as first and last name, place of residence or date of birth into character strings called hash values. This work is part of the regular daily routine at clinical cancer registries where research can retrieve oncological patient data.
The CenTrust Trust Centre
With the CenTrust data fiduciary platform, Bundesdruckerei uses a proven, two-stage pseudonymisation procedure. This is a good way of showing how trust centres pass on data in a GDPR-compliant manner. Consider the following scenario: A pharmaceutical company requests data on patients suffering from a certain rheumatic disease from the trust centre. One of these patients is Jane Doe-Mayer, who has been treated at several medical practices and hospitals. Medical data and information on her identity are stored at each facility. Jane Doe-Mayer has agreed to make these data available to research centres and pharmaceutical companies via a trust centre. From here on, the TC takes over handling the process.
Pseudonymisation: Hash Values and a Pinch of Salt
The first official act of the trust centre provides the practices and clinics with the procedure for initial pseudonymisation of the identity data. And this has two stages:
Stage One: Standardisation
At this stage, the first step is to standardise the identity data. For example, Jane Doe-Mayer’s surname would be converted into capital letters and split into two parts: DOE and SMITH.
Stage Two: First Pseudonymisation Step
Standardisation is followed by the first pseudonymisation step through the hash procedure. This is a one-way function: It splits the identifying data and generates strings from them that cannot be reconstituted.
Strictly speaking, however, hackers would be able to crack hash values with a certain amount of effort – through long trial and error or with the help of what are called rainbow tables, which criminals normally use to decrypt passwords. They contain countless terms with the matching hash values. For this reason, the hash procedure that the trust centre requires the practices and clinics to use also includes “salt” – a random sequence of characters that cannot be found in any table. The salt is added to the identity data before the hashing process.
Record Linkage and Pseudonym Swapping
The end of the pseudonym of the first pseudonymisation step contains the hash values for all identifying data of a person. Another component is the hash tuple, which is now sent to the trust centre. This is where the record linkage takes place: The TC links the data records from the different sources for one person.
In our example, Jane Doe-Mayer was treated at several clinics and practices. Consequently, several data sets exist for her. The TC assigns them to a single pseudonym.
Phonetics Provides Final Clarity
Normally, this should be easy. That is because, when pseudonymising is done with a hash function, identical source data creates identical hash values. The hash tuple for the identity record of the GP is most likely identical to the one from the local hospital. But what if Jane Doe-Mayer is registered as Jane Doe-Meier at a doctor’s practice and as Jane Doe-Meyer in the third of four clinics visited? On the one hand, there are other hashed identity data for these cases. On the other, phonetic codes of the name components are formed during the first pseudonymisation step along with hash values, which are more robust against typing errors.
The Final Step: the Pseudonym Swap
After linking the records, the trust centre forms a new pseudonym and transmits it to the data sources. The practices and clinics link it with the corresponding medical data and in turn send the packet in encrypted form to the TC, which then performs a pseudonym swap so that the researcher does not use the same pseudonym that the data source has. It is vital that the trust office not be able to decrypt the medical data. However, the research institution receives the medical data with the exchanged pseudonym in the last step. The pseudonym is also used for other medical data packages for the same person. The researcher will therefore have no problems bringing everything together.
Jane Doe-Mayer can then become one of many heroines of medical progress. Even better: an everyday heroine who remains unknown for good reason.