Presentation Session by Gokul Alex for Tamil Nadu Science Foundation on the Collection of Cryptographic Techniques for COVID-19 Contact Tracing in the framework of Privacy Preserving Proximity Protocols. This is a research report compiled in collaboration with EPIC Knowledge Society, RedTeam Hacker Academy, Beyond Identity, Semiot Protocols, Cyanaura Maps.
3. Cryptography and Context of Trust
Cryptography builds systems that mediate and rearrange trust
Cryptography reduces the need for trust
Cryptography can also leverage trust in smaller context
4. Categories of Trust in Contact Tracing
Trust in access to and use of location data
Trust in functional capacity of health authorities
Trust in report integrity and availability
5. Trust in the access to and the use of location data
Location data is highly sensitive
Who is allowed to access it, under what circumstances ?
Is the data centralised in one place ?
Is it subject to legal risks ?
6. Current Contact Tracing Apps - Trace Together
Bluetooth Low Energy to broadcast ephemeral cryptions of user identiļ¬ers, and logs of
all observed broadcasts.
If a user becomes infected, they can be tested by the Ministry of Health, who can
request with legal authority, the permission to decrypt the userās log entries
This system is better than location tracking, since rather than exact location, only
contacts are recorded, and because data is held locally and is only obtained by the
MoH on infection.
Limited by the functional capacity of the MoH to perform testing and contact
notiļ¬cation.
7. Current Contact Tracing Apps - PrivateKit/SafePaths
Performs geolocation tracking to produce a complete log of userās movements
Users who become infected can upload their complete location history to a health authority
A health oļ¬cial decides what information is personally identiļ¬able and uses a custom web app to
redact and blur the history before publishing it
Other users can download the trail and compare it with their history
The system does not provide location privacy
It is reliant on the participation of the health authority
It also adds a burdensome redaction process to the workload
8. Current Contact Tracing Apps - SafeTrace
A system using Intel SGX, processing user data inside a secure enclave and relying on
Intel hardware attestations to verify that the version of the code running on the
enclave is one that does not compromise user privacy
This protects location data, as long as SGX is completely reliable
However, it is unclear how justiļ¬ed this trust is, as SGX security has been repeatedly
broken, with the most recent attack tow week ago
9. CovidWatch and Community Epidemiology (CoEpi)
In this protocol, matching occurs on the client, so it does not require a centralised
party to match the contacts. This means it has even lower trust requirements for the
ļ¬rst trust category
It is a shared protocol between Covid-Watch and CoEpi. Hence it will have powerful
network eļ¬ects.
Covid-Watch trusts health authorities to report infections, while CoEpi allows self
reporting. This choice has implications on reporting.
Trusting health authorities make, report integrity easier. While allowing self reporting
helps availability if there is a testing shortage.
10. Anonymous Retrospective Broadcasts
A messaging protocol where users create tracks through space and time, and can
perform anonymous retrospective broadcasts to users whose tracks were spatially
nearby to theirs in a particular time range.
This messaging system should be privacy preserving in the following sense
ā Server Privacy - an honest but curious server should not learn information about
any user space time tracks
ā Locality Integrity - A user should not be able to broadcast messages to users who
were not nearby to them
11. End User Privacy in Anonymous Retrospective Broadcasts
A passive adversary cannot learn any information about a userās space-time track
outside of the segments they have broadcast messages to. This means that the users
who do not broadcast reveal no information about their movements
Users who broadcast messages to segments of their spacetime track reveal only the
existence and content of a message to users whose tracks were not adjacent to theirs.
This means that the messages themselves are public; whatās private is the addressing /
matching.
12. End User Privacy in Anonymous Retrospective Broadcasts
Users who receive messages learn only the fact that their track was nearby to the userās
track at some time. This means that passive adversaries must cover space and time or
else recruit users to collude to reconstruct the track of a user who broadcast a message
Users who recruit messages reveal no information to the users who broadcast them
13. Reporter Privacy in Anonymous Retrospective Broadcasts
Users who sends reports do not reveal any information to users who they did not come
in contact with and reveal only the time of contact to users they did come in contact
with. In practice, the timing alone will be suļ¬cient for their contact to learn the
identity, if their contact was only around one person at the instance of time.
14. A Model for the Anonymous Retrospective Broadcast
Registration Phase - user register with the server and perform some setup
Gossip Phase - users broadcast 26-byte data packets over BLE
Broadcast Phase - a user upload a packet of data to the server to broadcast a message to
a particular segment of their track
Scan Phase - users monitor data published by the server to learn whether a message
has been broadcast to them
Fetch Phase - users who learn a message addressed to them can download it
15. Decentralised Contact Tracing using ARB Model
It is all about allowing users who test positive to anonymously broadcast a message to
inform their past contacts of their test
Users who receive a message can make an informed judgement based on its contents
Separating the messaging problem from the contact tracing problem and allowing
users to make decisions of their own is signiļ¬cantly more ļ¬exible
ā Users could publish the photo of a test result with their name redacted
ā Users can reveal their identity by linking to a social media post
ā Users can post a link to some institutional veriļ¬cation mechanism
16. Comparing DP-3T and CEN Protocols
Both protocols provide decentralised contact tracing using Bluetooth broadcasts from
the users mobile devices
Users generate and broadcast short lived pseudo random values over Bluetooth
These values are recorded by nearby devices, but because they are pseudorandom, they
reveal no information about a user location or location history
Later, users who develops symptoms or test positive can send a report to any potential
contacts by uploading a packet of data to a server.
17. D3-PT : Key Generation Phase
Each user has a hash function āHā to generate a sequence of secret day keys from an
initial random secret
SK_i < H(SK_{i-1}
Each day key expands to ānā ephemeral identities. These are the 16 byte pseudorandom
values broadcast by the application.
The ephemeral identities for each day are generated all at once as 16-byte chunks of the
output of a stream cipher keyed by a PRF of SK_i.
18. D3-PT : Broadcast and Reporting Phase
Users broadcast each dayās ephemeral idās in a randomised order, while observing
other userās broadcasts. Devices are intended to only record coarse time stamps
In the report phase, users send a health authority the ļ¬rst SK_i for the period they wish
to report. For instance, to report the previous 14 days period, a user would submit the
SK_i from 14 days ago. The health authority acts as a trusted party to authenticate the
reports. After reporting, user chooses a new day key to break the link with the reported
period.
In the scan phase, users download collections of revealed day keys, use them to retrieve
ephemeral ids and compare them to the stored ephemeral ids.
19. CEN Protocol
Rather than Ephemeral IDs the pseudorandom values are called contact event
numbers (CEN). Contact notiļ¬cations will be scoped to reports. Each report notiļ¬es a
collection of CENs and users can submit multiple reports.
The rotation interval for the reports is set by the application developers. Users create a
report authorisation key (RAK) and the report veriļ¬cation key (RVK) as the signing
and veriļ¬cation keys of the signature scheme.
20. CEK Protocol
RAK represents the capability to submit a report relating to the CEN derived from it.
They can then compute a sequence of contact event keys (CEK) using the CEK ratchet.
CEK_i < H_CEK(RVK || CEK_{i-1}
Where H_CEK is a domain separated hash function
The initial CEK is derived from the RAK as
CEK_0 < H_CEK(RAK)
21. CEN Protocol
Contact event numbers are derived from the contact event keys as
CEN_i < H_CEN(LE_U16(i) || CEK_i )
Where H_CEN is a domain separated hash function
22. CEN Protocol - Broadcasting Phase
Users broadcast contact event numbers, using the CEK ratchet to generate new CENs
The ratchet mechanism is decoupled from time based rotation so that it can be aligned
with the rotation of the Bluetooth MAC addresses
This alignment is crucial to prevent linkability attacks where a user broadcasts two
values with the same MAC
23. CEN Protocol - Reporting Phase
In the reporting phase, users wishing to notify contact they encountered over the
period with CEN indices ji and j2 to prepare the report as :
Report < RVK || CEK [j1 - 1] || LE_U16(j1) || LE_U16(j2) || memo
Then use RAK to produce SIG, a signature over the report, and upload the signed
report REPORT || SIG. The memo ļ¬eld provides a compact space for freedom
messages.
24. CEN Protocol - Scan Phase
In the scan phase, users download signed reports from the server, then verify the
signature using the included RVK.
They can use the CEK_{j-1} to recompute all subsequent CEKs and CENs. Users can
optionally delegate trust to the server by relying on the server to validate the
signatures.
25. Comparison between D3-PT and CEN Protocols
Both achieve comparable server privacy, since the server sees key material, but does
not have information on when and where the local broadcast happened.
Neither achieve broadcast privacy since malicious party can rebroadcast observed
values to spoof other users broadcasts.
Both prevent passive tracking, because users broadcasts are pseudo randomly
generated
Both achieve receiver privacy because users scans reports and do not reveal which
values they observed.
26. Comparison between D3-PT and CEN Protocols
First, they diļ¬er on reporter privacy. In both the cases, a passive adversary monitoring
bluetooth can reconstruct parts of the user history after they submit the report data
In the D3-PT protocol, this adversary can learn a userās EPHEMERAL_ID history over the
entire reporting period, while in the CEN protocol, the adversary can learn a userās CEN
history only over a single report and the report duration is an application parameter, not
hard coded into the protocol
This passive adversary is fairly realistic as Bluetooth tracking is widespread. The D3-PT
protocol randomises the order in which the EPHEMERAL_IDs are broadcast, but this does
nothing to prevent location tracking, because any party in position to record the
EPHEMERAL_IDs is also in a position to record the time it was broadcast
27. Comparison between D3-PT and CEN Protocols
Second, the CEN protocol provides source integrity, while D3-PT does not. Because all
the recomputed CEKs are bound to the report veriļ¬cation key, and the signature serves
as a proof of knowledge of the corresponding report authorisation key, the receiver of
the report is convinced that it was submitted by the user who generated the CENs.
Most importantly it means that the underlying protocol, does not encode a trusted
authority who must validate reports.
It also prevents users from re-submitting data other users previously revealed in their
reports.
28. Comparison between D3-PT and CEN Protocols
To prevent passive tracking, it is not enough just to rotate the pseudorandom
EPHEMERAL_IDs or CEN broadcast by a userās device. This rotation must be
precisely aligned with the rotation of the deviceās Bluetooth MAC address
Otherwise, a passive adversary can analyse the overlap to link all past broadcasts
For this reason, the CEN protocol decouples the CEK ratchet from any notion of time,
allowing it to be precisely aligned with the underlying hardware
This alignment is also possible with D3-PT, but it may be more diļ¬cult in practice,
since the number of EPHIMERAL_IDās per day is ļ¬xed, but the number required may
be unknown
30. TCN - Temporary Contact Numbers
The TCN protocol related eļ¬orts are designed with the Contact Tracing Bill of Rights
No personally-identiļ¬able information is required by the protocol, and although it is
compatible with a trusted health authority, it does not require one.
Users' devices send short-range broadcasts over Bluetooth to nearby devices. Later, a
user who develops symptoms or tests positive can report their status to their contacts
with minimal loss of privacy. Users who do not send reports reveal no information.
Diļ¬erent applications using the TCN protocol can interoperate, and the protocol can
be used with either veriļ¬ed test results or for self-reported symptoms via an extensible
report memo ļ¬eld.
31. TCN Workļ¬ow
All mobile devices running the app periodically generate a random TCN, store the
TCN, and broadcast it using Bluetooth.
At the same time, the app also listens for and records the TCNs generated by other
devices. To send a report, the user (or a health authority acting on their behalf)
uploads the TCNs she generated to a server, together with a memo ļ¬eld containing
application-speciļ¬c report data.
All users' apps periodically download the list of reported TCNs, then compare it with
the list of TCNs they observed and recorded locally. The intersection of these two lists
is the set of positive contacts.
32. TCN Scalability
To address the scalability issue, we change from purely random TCNs to TCNs
deterministically generated from some seed data. This reduces the size of the report,
because it can contain only the compact seed data rather than the entire list of TCNs. This
change trades scalability for reporter privacy, because TCNs derived from the same report
are linkable to each other.
However, this linkage is only possible by parties that have observed multiple TCNs from the
same report, not by all users. Distinct reports are not linkable, so users can submit multiple
partial reports rather than a single report for their entire history. The report rotation
frequency adjusts the tradeoļ¬ between reporter privacy and scalability.
33. TCN and Traceability
In the setting where TCNs are continuously broadcast, we must also choose the rate at
which we change from one TCN to another. Again, the longer a TCN lasts for, the
greater the risk of tracking. In particular, in many settings it will be easy to infer at the
time that one TCN disappears and another appears that they are the same device. This
wonāt be perfect, but if TCNs change infrequently, it need not be perfect to recover a
pretty good trace of a user's location history.
34. TCN and Limitations of Bluetooth
Finally, Bluetooth itself exposes a number of tracking opportunities due to the
handling of MAC addresses and other identiļ¬ers. Unfortunately, the degree to which
these are properly randomized varies considerably across devices, with many devices
not implementing strong privacy protections.
To avoid making the situation worse, ideally every MAC address change should be
accompanied by a simultaneous change of the TCN. If this is not done, then anyone
observing (MAC A, TCN 1), then (MAC B, TCN 1), then (MAC B, TCN 2) can
conclude they are all the same device because all identiļ¬ers donāt change at the same
time. This makes devices running the TCN protocol more easily trackable in a
conļ¬ned area for anyone who can continuously observe their Bluetooth signals.
35. TCN and Limitations of Bluetooth
The extent to which such synchronization is possible is limited by the Bluetooth APIs
exposed by operating systems. On iOS we know of no way to be notiļ¬ed or inļ¬uence
rotation of the Bluetooth MAC address. On Android, experiments show that restarting
Bluetooth advertising causes a new random MAC address to be chosen by the
operating system, so instead of reacting to MAC address changes we can cause them to
happen at the same time as TCN changes.
Note that even if TCN changes happen simultaneously with MAC address changes,
unless rotation of MAC addresses and TCNs is globally synchronized among all
devices, an adversary who has Bluetooth observations with very ļ¬ne time resolution
may still be able to link distinct MAC addresses simply because appearance of a new
MAC address for a device will closely follow disappearance of the old one.
37. Linkage Attack
A linkage attack is the matching of anonymized records with non-anonymized records
in a diļ¬erent dataset.
An example for our usecase would be: A user is only close to one other person in a
given timeframe. If they get notiļ¬ed of a revealed contact, they know who it was.
Generally: If the timeframe of a contact is revealed, and users do out of band
correlation, like taking notes/pictures, they can narrow down the possible real
identities of their contacts, which revealed.
As long as the users know which TCNs are in the intersection, this cannot be
prevented.
38. Replay Attack
An attacker collects TCNs of others and rebroadcasts them, to impersonate another
user during the gossip phase.
If not mitigated, they can at most produce as many false positives as they could with an
illegitimate reveal (i.e. they are not infected).
Since in the above proposal, the validity period of a TCN will be known after a reveal,
this attack can only be executed in a short timeframe.
39. Address Carryover Attack
An address-carryover is possible when the rotation periods of Bluetooth MAC address
and TCN are not aligned. The attacker could then use the overlap to link multiple
identiļ¬ers to the same source.
To mitigate this attack, TCN rotation needs to be aligned with the platform speciļ¬c
rotation of lower level identiļ¬ers. TCN rotation frequency can be higher than that of
other identiļ¬ers, but any overlap has to be avoided.
40. Shard Carryover Attack
If a space-time based sharding scheme is used, an attack similar to the address
carryover attack needs to be mitigated. When switching shards, a new keypair should
be generated. Otherwise, multiple shards could be linked to a single source upon
reveal. Simply rotating TCNs is not suļ¬cient here.
42. HEAAN Method
HEAAN (Homomorphic Encryption for Arithmetic of Approximate Numbers) and
which deļ¬nes a homomorphic encryption (HE) library proposed by Cheon, Kim, Kim
and Song (CKKS). CKKS uses approximate arithmetics over complex numbers, and
where we take inputs of a and b and then encrypt them, and then subtract the
encrypted values, and ļ¬nally a result of aāb.
43. HEAAN Method - Timestamping Technique
First Alice deļ¬nes data in time stamps, and stores homomorphically encrypted
timestamps for her location:
E(TIME1)a E(Location1)a
E(TIME2)a E(Location2)a
E(TIME3)a E(Location2)a
and where E(TIME1)a is the homomorphically encrypted timestamp value, and
E(Location)a is the homomorphically encrypted location information.
44. HEAAN Method - Timestamping Technique
Now Alice and Bob upload their homomorphically encrypted time stamp and location
information to the HA (Health Authority), and who stores these values:
ā E(TIME1)a E(Location1)a
ā E(TIME2)a E(Location2)a
ā E(TIME3)a E(Location2)a
ā E(TIME1)b E(Location1)b
ā E(TIME2)b E(Location2)b
ā E(TIME3)b E(Location2)b
The HA cannot tell either the time stamp or the location information. Alice is now identiļ¬ed as
having COVID-19, and the server can identify her encrypted values and runs a homomorphic
diļ¬erence on the timestamps and location:
45. HEAAN Method - Time Location Difference
Alice ā ā ā ā ā ā ā ā ā ā Bob ā ā ā ā ā ā ā ā Time dif Location diļ¬
ā E(TIME1)a E(Location1)a -> E(TIME1)b E(Location1)b +100 +5
ā E(TIME2)a E(Location2)a -> E(TIME2)b E(Location2)b +20 +3
ā E(TIME3)a E(Location2)a -> E(TIME3)b E(Location2)b -1 +1
Here the HA cannot tell where Bob and Alice were and at what time, but they can tell
that there was a match for a one-second diļ¬erence and where they were one metre
away from each other. In this way, Bob could be informed of a possible infection.
Other information too is stored and which can be used for the matching process, such
as the device type, the SSID of the wireless access point that they connect to, and the
RSSI (Received Signal Strength Indication.