Ensuring Privacy, One Data Point at a Time

August 9, 2021
man holding cell phone

As a computing researcher, I have spent the past two decades developing software and devices that allow humans to use technology to interact with their physical world and the people around them. This work has ranged from digital games that use real-time data to encourage healthy behavior changes like meeting physical activity goals to algorithms that allow people to search for nearby resources such as restaurant reviews.

My research focuses on networks in smart environments, and many of the devices I work with, including smartphones, are deeply connected with individuals and their everyday activities. Because of this, I find it important to understand the ways in which digital applications collect and use information about humans.

The information apps collect is extremely powerful because it enables technology to help make our lives better, safer, healthier, and — especially in times like we’ve experienced over the past year — more socially connected. For instance, apps have helped contact tracers with COVID-19 response. And before this, they have been used to monitor a person’s compliance with an exercise or medicine regimen and offer feedback when a person falls off track. The data, however, are also potentially dangerous because they can be intensely private and personal and, if released and made public, could be embarrassing or personally devastating.

These concerns are heightened by the fact that devices and the data they collect about us are so ingrained in our environments it’s easy to forget they exist. Privacy implications (and the privacy policies or terms of service that attempt to explain them) are often so obtuse that a computer science or a law degree is required to comprehend them. Because accepting such a privacy policy — like one provided by your healthcare provider or children’s school — is often a barrier to using a needed service, we often accept them without fully reading or understanding them.

Protecting our research participants

This fall, researchers from across the Forty Acres will kick off a 5-year community cohort study known as Whole Communities–Whole Health, which seeks to understand the various factors that affect the overall wellbeing of families living in marginalized communities. Cohort studies follow participants over the span of many years in order to build a better understanding of risk factors and health outcomes.

In our study, we will work directly with residents in eastern Travis County — where many people have greater environmental exposure and fewer access to resources — looking at things like air quality, physical activity, stress levels, and sleep to discover how they correlate to health outcomes. With this knowledge, we hope to be able to help these communities advocate for improvements.

As part of the study, we will be collecting data about people and their environments, including these measures such as stress levels and sleep. To accomplish this, we will use wrist-based fitness trackers, smartphones, and environmental-sensing beacons placed in homes. The goal is to generate a detailed understanding of the environment in which our study participants live and the interactions they have, both with their environment and with others in it.

The data we collect during the cohort study will be shared with participants, community groups and advocacy organizations, with the hope that with this knowledge they will be more empowered to work together to improve their lives and neighborhoods. We will do this through a mobile application that we are creating, where people can see their health information in real-time and compare it to how the rest of the community is doing as whole.

The entire Whole Communities — Whole Health research team recognizes that data privacy is not only an expectation but a responsibility, and this is a responsibility that no member takes lightly.

The app is similar to one that some of us on the Whole Communities–Whole Health team designed last year as part of UT’s Protect Texas Together campaign, which is helping to support the safe return of students, faculty, and staff to campus amid the pandemic. It allows users to track their symptoms and access information about COVID-19 testing and campus resources. And ensuring privacy of the data in the Protect Texas Together app has been essential.

Creating a Whole Communities–Whole Health app that the cohort participants will find useful will require us to collect a lot of data, and these data are often intensely private. Similar to the Protect Texas Together app, ensuring participants’ privacy is essential.

After all, a community cohort study is successful only if the data that underpins it are reliable. Community members provide the data themselves, therefore they must trust that the platform and devices they’re using are safe and that their personal information is secured. The cornerstone of this foundation of trust must be an understanding of the privacy implications of the data and information that are shared, followed by an understanding of the privacy protections that are part of the technical platforms.

If all of this sounds really scary, it should. It can be. We all should be more attentive to what data we unwittingly release when we use online services or apps on our smartphones. So what is the Whole Communities — Whole Health grand challenge team doing to address these concerns?

Security, security, security. Privacy and security are two different things, but they often work in concert with each other. First, we are constantly ensuring that we encrypt the data we collect. In other words, the information is converted into a sort of secret code that hides its true meaning. This happens both at rest (that is, when data and information are stored either on a smartphone, another data collection platform, or within a centralized database) and in transit (when data and information are sent from one device to another, such as from the smartphone to the centralized database). Rather than writing all of our code from scratch, we also rely on standardized libraries whenever possible, even for pretty ordinary functions like notifications. These reusable libraries are much more widely tested and are therefore more likely to have any security flaws already identified and addressed.

But ensuring privacy goes beyond security.

Consider the location of data storage and processing. In Whole Communities — Whole Health, we have opted to locate some of the processing and storage of the most sensitive data on the users’ own devices rather than collecting the data ourselves. Why? Two reasons: One, because this approach gives the user more direct control over the protection of their personal data — for example, detailed location traces are stored only on the device under the physical control of the user. And two, it allows the research team the possibility of protecting user privacy by writing code that processes data across the users’ own devices before sending it to a central server. For instance, this would involve sending location trace statistics to the server, rather than the raw traces themselves so that people aren’t giving away specific private locations like their home address. This allows sensitive data to stay with the users and only sends the processed data, which can also be further aggregated and anonymized across multiple users.

Anonymize data when possible. In a cohort study, sometimes what is required is community-level data, such as the average energy consumption of homes in a neighborhood or the average degree of social connectivity among new parents. This data doesn’t necessarily need to be tied to an individual, so when we can collect data anonymously or in aggregate, we do. This is not as easy as it may appear at first; sometimes anonymized data can be re-identified (that is, relabeled with the identity of the person the data is about) by connecting it with other data. For instance, traces of location data can be claimed to be anonymized by removing a user’s name from them. But the users can be re-identified by correlating the location traces with other information like users’ home or work addresses. When anonymizing data in Whole Communities — Whole Health, we consider these potential data combinations when deciding whether to store or share data in combinations that may inadvertently re-identify an anonymized participant.

Readable privacy policies. It is not sufficient simply to implement the right technical solutions. Building social solutions also requires clear and open lines of communication with community members. We aim to make our privacy policies readable and understandable so that study participants know what data we collect, where we store data and for how long, and who has access to the data and for what purposes. We are working with the community to develop the mobile app and its policies to ensure that the communication about its features and functions is understandable and acceptable by all. These privacy policies (as well as the technical decisions underpinning them) are also informed by community conversations, taking into account the data sensitivities that are particular to our participants. As an example, from conversations with community partners we know that collecting audio is a particular privacy concern, so we have limited audio collection until we can design solutions that work for everyone.

Data release agreements. The last line of defense is to control who can access what data. After all, a major goal of the project is to make information collected via the app and other streams available to researchers and the community to be used to answer research questions about various facets of health and to advocate for change when needed to improve community health outcomes. For instance, a cohort participant may want to view their home’s air quality in contrast with other homes in the community, which requires releasing, to a community member, their own individual information as well as an assemblage of information for the community. In contrast, a researcher may want to correlate information about environmental measures like air quality with health outcomes, which requires only aggregate — but correlated — data for the community.

We have developed and will continue to refine a process for allowing community members and researchers to access the Whole Community — Whole Health data. These processes consider all of the above concerns, including which communication channels — like public webpages, emails and text messages — are appropriate for releasing into a community, the implications of releasing that data, and the possibility of unintended consequences, such as piecing together disparate pieces of data to re-identify participants, among other concerns.

Sharing data is important. Understanding a community and its individual members can yield extremely important information, and it can give that community the ability to coalesce, cooperate, and advocate for its needs — backed by concrete, validated data. For instance, by having and using data about how smells and sounds affect people’s stress levels or how children perform at school, a community can better advocate for a reduction in sound pollution or other environmental disturbances to improve their health. In this context, ensuring privacy is a win for everyone involved — it improves the quality of the research, fortifies the relationships between the researchers and the community, and enables community growth.

The entire Whole Communities — Whole Health research team recognizes that data privacy is not only an expectation but a responsibility, and this is a responsibility that no member takes lightly. Without privacy and security in the app and collected data, there is no trust in the system or the researchers. Without trust, there is a justifiable reticence for community members to collaborate. And without this partnership, it is impossible to achieve the goals of the Whole Communities — Whole Health project.

About Christine Julien

Christine Julien, Ph.D., is a professor and holds the Annis & Jack Bowen Professorship in Engineering in the Department of Electrical and Computer Engineering at The University of Texas at Austin. She is also the Associate Dean for the Cockrell School of Engineering and director of the Mobile and Pervasive Computing Group, where her research focuses on the intersection of software engineering and dynamic, unpredictable networked environments.