What can you do about data ethics in your organization?

October 15, 2018August 24, 2021 Steven Tiell 4439 Views data ethics, process engineering, process improvement, questions

The presentation that inspired the blog post below is available here.

I hope you have a cup of coffee next to you, this is a long discussion of some conversations you should be having within your organization if you care about data ethics. In it, we’ll explore published and unpublished works.

So what can you do? Where’s a good place to start? Conversations. Are the right ones being had at the right times? When is the right time? The right time is as soon as possible. The conversations can start at the interview. Both for the candidate and for the company (these questions are my own).

From the interviewer's perspective:

What's your idea of an ethical organization?
Tell me about a time you had an ethical dilemma.
Have you ever wanted to file an ethics complaint?
What do you believe compromises an ethical workplace?
Would you lie to support an initiative you believe in?
How do you feel about using the technologies you develop?
Have you ever suffered in your career for doing what was right? Do you have any regrets?

And from the interviewee's perspective:

How does ethics fit into the culture here?
In what ways are customers included as stakeholders for business process reengineering or product/service design?
What kind of opportunities exist to contribute to corporate citizenship or employee resource group initiatives?
For Developers and Data Scientists: Does the team practice ethical design reviews as part of code/data design reviews? If not, would they be open to starting?
For Developers and Data Scientists: How do you handle ethics concerns that arise during development/analysis?

The questions to ask during the interview process portend the rest of this post.

When you land that new role or make that new hire to bring a little ethics spice into your life, where does it all begin? Well, let’s not rock the boat too much. Chances are the internal processes follow, however roughly, the Project Management Institute’s Project Lifecycle or ITIL’s Service Delivery Model. So, let’s use that to our advantage. As it turns out, there’s a paper published about this (which I led/contributed to, and was published under CC-BY 4.0)! But for the sake of having all of this in one place, I’ll copy/paste the questions part here.

In the paper, we broke down questions aligned to a broader framework, but for simplicity, we’ll use ‘internal’ vs ‘external’ stakeholders here. I’ll also drop the overarching framework in how we talk about each lifecycle stage.

The Project Management Institute’s lifecycle stages, according to the Project Management Body of Knowledge (PMBOK guide), are: Initiation, Planning, Executing, Monitoring, and Closing.

The ITIL service lifecycle stages are: Strategy, Design, Launch, Operation, and Improvement.

Each of the stages from both of the lifecycle methodologies has some type of pre/post-mortem or review session included – that’s where these questions get discussed.

The strategic objective is then two-fold: (1) how do you get outside representation (users, customers, downstream human subjects) at the table? And (2) What kinds of questions should we be prepared to address once they’re at the table? If you can get to this place, then you’re doing really well, but think of it more as a goal, because if you get it right, then everything else will have to be in place. And it will be a long and winding journey, so pace yourself. Let’s do a little ethics integration into these lifecycle frameworks:

`Integrating Ethics into Existing Design/Development Processes`

PMI: Initiation / ITIL: Strategy - internal

Have data collection or discovery methods gone through an ethical review process?
Given the different types of data being collected, what potential harm could come from using that data?
Is data being stored without an intended use? Is there a limit to how long data is stored without an intended use?
What codes or principles of data ethics do data providers/disclosers follow?
Is there a way to minimize the volume or variety of data being collected?

PMI: Initiation / ITIL: Strategy - external

Are data disclosers aware that data has been acquired, stored, or shared?
Are secondary data subjects represented in the captured data?
Will data disclosers be able to inspect the data they have disclosed?
Are disclosers aware of how they disclosed this data (e.g. directly, tracking, derived)?
Are data disclosers able to opt-out?
Are data disclosers able to opt-in (to collection or to specific uses)?

PMI: Planning / ITIL: Design - internal

What would the reputational impact be on the organization if the data was misused?
Would research methodologies receive a favorable reaction if they were widely shared?
What biases have been introduced during manipulation? What biases might be present in the training data (when machine learning is used)? Was an ethics review performed?
How frequently should the ethical review board revisit these analyses for alignment with project/product/service goals and the organization’s code of ethics?

PMI: Planning / ITIL: Design - external

What are the classes of harm that a bad actor or group of actors could cause if they had access to the entire set of aggregated data sources or any related analysis?
Would data disclosers be surprised about the breadth of data sources being aggregated? If so, are there actions that can be taken to gain informed consent (or at least inform the data subjects)?
What negative consequences for the data discloser could result from the proposed analysis? What steps are being taken to mitigate these risks?

PMI: Executing / ITIL: Launch - internal

Are the uses of the data consistent with the intentions of the discloser? What are the potential risks to the organization if a watchdog group knew the data was used in this way?
What are the regulatory controls concerning the use of this data? What actions need to be taken to ensure compliance?
Are there mechanisms in place for controlling access to the data and logging when internal and external parties have used the data?
What measures are taken to account for risk and/or harm that could come from misusing the data? What measures are taken to ensure the data manipulators are aware of the risks associated with misapplying or misusing the data?
If alternative applications for the data are discovered when using the data, what steps need to be taken to approve further use and document the alternative application? What value is realized from informing data disclosers of these new uses? Does the new application bring any direct value to the data discloser? What additional risks are introduced?

PMI: Executing / ITIL: Launch - external

Did the data discloser provide consent to this specific data use? Was that consent informed?
Do any consent agreements make it clear that data could be used in this way?
Are there mechanisms in place to alert data disclosers that their data is being used?
Can a data discloser discover where data they have disclosed has been used and for what purpose?
What are the methods for recourse by a data consumer who finds issues with the insights being used?

PMI: Monitoring / ITIL: Operation - internal

Does the act of sharing or selling data enhance the experience for the data discloser (not including the data seller’s own ability to operate)?
Is there another way to share or sell this data that would increase transparency?
What parties are designated stewards of data once data is shared or sold?
If data being shared encroaches on cultural norms around privacy or regulatory standards, should data manipulators/consumers demonstrate the value or benefit of data sharing?

PMI: Monitoring / ITIL: Operation - external

Do data disclosers expect control, ownership, remuneration, or transparency over the data they have disclosed if it is being shared or sold? Did they provide informed consent for this action?
Is it in the data discloser's best interest to have their data shared among third parties?
Do data disclosers have any say in whether or not their data is shared or sold?
Are data disclosers aware their data is being shared or sold?

PMI: Closing / ITIL: Improvement - internal

Should the original discloser be notified?
Is metadata being retained? Account for the ways metadata could be used to re-identify data subjects.
Are there any disaster recovery archives that have copies of the data?
Is there a way to give data disclosers greater control over the retention and deletion of data they disclose or is subsequently derived from these disclosures?

PMI: Closing / Improvement - external

Are stakeholders aware of the time frame that their data will be retained? Would they be surprised to learn it still exists?
Are data disclosers given the ability to delete their data?
Are data disclosers given the right to restrict future use of their data?
Are data disclosers notified of how long their data will be retained?
Are data disclosers notified when their data is destroyed?

Whew. Okay. That was a lot. But we’re just getting warmed up.

Next up, if you don’t like to follow rules and frameworks, these next two sections are for you.

Clearly, we’re not the only ones to realize that we need to ask a new set of questions during our development processes. The former – and only – Chief Data Scientist for the United State, DJ Patil, in collaboration with the venerable Hilary Mason, and Mike Loukides have published a free book focused on new behaviors that data practitioners should practice. Included, is a great set of questions to ask during the design and development process:

Have we listed how this technology can be attacked or abused?
Have we tested our training data to ensure it is fair and representative?
Have we studied and understood possible sources of bias in our data?
Does our team reflect diversity of opinions, backgrounds, and kinds of thought?
What kind of user consent do we need to collect to use the data?
Do we have a mechanism for gathering consent from users?
Have we explained clearly what users are consenting to?
Do we have a mechanism for redress if people are harmed by the results?
Can we shut down this software in production if it is behaving badly?
Have we tested for fairness with respect to different user groups?
Have we tested for disparate error rates among different user groups?
Do we test and monitor for model drift to ensure our software remains fair over time?
Do we have a plan to protect and secure user data?

And the second part, for the renegades among us, is content that hasn’t been published yet, but I’ll have a future blog post that discusses a bit more about each “spectrum” below. This list was created through a “Digital Ethics” workshop in late April 2018 organized by Peter Temes from Northeastern University Seattle and the ILO Institute, and sponsored by Avanade. Present at the “design table” were Heath Dorn (T-Mobile), Dwight Barry and a colleague (Seattle Children’s Hospital), Michael Koenig (former AI @ Microsoft), and Florin Rotar (Avanade) and I (Accenture) co-led the session.

The idea here was to model something for digital ethics off of the agile manifesto (Heath is an amazing agile Sherpa). So, what we’ve done here is setup a number of “spectrums” – on the right side is essentially the status quo, and on the left side is a higher ethical bar that should lead toward more trusted outcomes/relationships. The important thing to take away from this exercise is that the discussion (and who’s part of it) matters much more than where you end up on the spectrum. These are the critical conversations to have:

`Digital Ethics 'Manifesto':`

Minimize harm > "over" > maximize value
Model an aggregate population > an individual
Collect relevant data > any/everything possible
Equity + 'values transparency' > enforcing equality
Be trustworthy > transparent
Data expiration > digital perpetuity
Re-train [dynamic] models > static models
Value stays with data subject/discloser > data collector/aggregator/user
Informed consensual use of data > exploratory use

If all this talk about ‘agile’ and ‘manifesto’ has your palms sweating for a way to integrate this into your development stack, there’s a promising tool for that as well.

Are you feeling like an ethics guru yet? If not, maybe Institute for the Future’s (IFTF) Ethical OS is for you (published under CC-BY-NC-SA 4.0). This is a great piece of work (funded by Omidyar) that is a fairly comprehensive approach to managing risk areas – they call them “risk zones.” You should check out their website, slide deck, and checklist. I’ve taken the liberty to copy the questions from their checklist here:

`Ethical OS Checklist`

Risk Zone 1: Truth, Disinformation, and Propaganda

What type of data do users expect you to accurately share, measure or collect?
How could bad actors use your tech to subvert or attack the truth? What could potentially become the equivalent of fake news, bots or deepfake videos on your platform?
How could someone use this technology to undermine trust in established social institutions, like media, medicine, democracy, science? Could your tech be used to generate or spread misinformation to create political distrust or social unrest.
Imagine the form such misinformation might take on your platform. Even if your tech is meant to be apolitical in nature, how could it be co-opted to destabilize a government?

Risk Zone 2: Addiction & the Dopamine Economy

Does the business model behind your chosen technology benefit from maximizing user attention and engagement—i.e., the more, the better? If so, is that good for the mental, physical or social health of the people who use it? What might not be good about it?
What does “extreme” use, addiction or unhealthy engagement with your tech look like? What does “moderate” use or healthy engagement look like?
How could you design a system that encourages moderate use? Can you imagine a business model where moderate use is more sustainable or profitable than always seeking to increase or maximize engagement?
If there is potential for toxic materials like conspiracy theories and propaganda to drive high levels of engagement, what steps are being taken to reduce the prevalence of that content? Is it enough?

Risk Zone 3: Economic & Asset Inequalities

Who will have access to this technology and who won't? Will people or communities who don't have access to this technology suffer a setback compared to those who do? What does that setback look like? What new differences will there be between the "haves" and "have-nots" of this technology?
What asset does your technology create, collect, or disseminate? (example: health data, gigs, a virtual currency, deep AI) Who has access to this asset? Who has the ability to monetize it? Is the asset
(or profits from it) fairly shared or distributed with other parties who help create or collect it?
Are you using machine learning and robots to create wealth, rather than human labor? If you are reducing human employment, how might that impact overall economic well-being and social stability? Are there other ways your company or product can contribute to our collective economic security,
if not through employment of people?

Risk Zone 4: Machine Ethics & Algorithmic Biases

Does this technology make use of deep data sets and machine learning?
If so, are there gaps or historical biases in the data that might bias the technology?
Have you seen instances of personal or individual bias enter into your product's algorithms? How could these have been prevented or mitigated?
Is the technology reinforcing or amplifying existing bias?
Who is responsible for developing the algorithm?
Is there a lack of diversity in the people responsible for the design of the technology?
How will you push back against a blind preference for automation (the assumption that AI-based systems and decisions are correct, and don't need to be verified or audited)?
Are your algorithms transparent to the people impacted by them?
Is there any recourse for people who feel they have been incorrectly or unfairly assessed?

Risk Zone 5: Surveillance State

How might a government or military body utilize this technology to increase its capacity to surveil or otherwise infringe upon the rights of its citizens?
What could governments do with the data you're collecting about users if they were granted access to it, or if they legally required or subpoenaed access to it?
Who, besides government or military, might use the tools and data you're creating to increase surveillance of targeted individuals? Whom would they track, why-and do you want your tech to be used in this way?
Are you creating data that could follow users throughout their lifetimes, affect their reputations, and impact their future opportunities? Will the data your tech is generating have long-term consequences for the freedoms and reputation of individuals?
Whom would you not want to use your data to surveil and make decisions about individuals, and why not? What can you do to proactively protect this data from being accessible to them?

Risk Zone 6: Data Control & Monetization

Do your users have the right and ability to access the data you have collected about them? How can you support users in easily and transparently knowing about themselves what you know about them?
If you profit from the use or sale of user data, do your users share in that profit?
What options would you consider for giving users the right to share profits on their own data?
Could you build ways to give users the right to share and monetize their own data independently?
What could bad actors do with this data if they had access to it?
What is the worst thing someone could do with this data if it were stolen or leaked?
Do you have a policy in place of what happens to customer data if your company is bought, sold or shut down?

Risk Zone 7: Implicit Trust & User Understanding

Does your technology do anything your users don't know about, or would probably be surprised to find out about? If so, why are you not sharing this information explicitly-and what kind of backlash might you face if users found out?
If users object to the idea of their actions being monetized, or data being sold to specific types of groups or organizations, though still want to use the platform, what options do they have?
Is it possible to create alternative models that build trust and allows users to opt-in or opt-out of different aspects of your business model moving forward?
Are all users treated equally? If not-and your algorithms and predictive technologies prioritize certain information or sets prices or access differently for different users-how would you handle consumer demands or government regulations that require all users be treated equally, or at least
transparently unequally?

Risk Zone 8: Hateful & Criminal Actors

How could someone use your technology to bully, stalk, or harass other people?
What new kinds of ransomware, theft, financial crimes, fraud, or other illegal activity could potentially arise in or around your tech?
Do technology makers have an ethical responsibility to make it harder for bad actors to act?
How could organized hate groups use your technology to spread hate, recruit, or discriminate against others? What does organized hate look like on your platform or community or users?
What are the risks of your technology being weaponized? What responsibility do you have to prevent this? How do you work to create regulations or international treaties to prevent the weaponizing of technology?

Next up, we have a whole slew of resources from the Markkula Center for Applied Ethics at Santa Clara University. Full disclosure, I have collaborated with Irina Raicu (Internet Ethics @ Markkula) and Shannon Vallor (Regis and Dianne McKenna Professor in the Department of Philosophy) in the past, including sponsoring the development of free ethics modules for college courses (cybersecurity ethics and data ethics). However, I was not involved in their “Ethics in Technology Practice” project that was sponsored by Omidyar Network’s Tech and Society Solutions Lab, where all of the content shared here is copied from – as shared under CC BY-NC-ND 3.0.

This body of work is known as their “Ethics in Technology Practice.” The homepage for this work links to all of this content, but I’ve extracted the prominent discussion questions here. You can also grab a PDF Overview of Ethics in Tech Practice.

One of the interesting aspects of this work is that it builds from an ethical decision making framework that a group from SCU published in 2009. Prescient, right? That’s first up, below. Then it will get a bit more specific and academic, looking at questions practitioners should ask as aligned to conceptual frameworks in philosophy that ethicists have used for millennia to guide their thinking and exploration. And finally, I’ve pulled the types of questions you should ask during pre- and post-mortems from their Ethical Toolkit for Engineering/Design Practice. Not covered in this blog post, but still valuable is Best Ethical Practices in Technology, case studies, and slides that walk you through all of this (e.g., for a workshop).

`A Framework for Ethical Decision Making`

Recognize an Ethical Issue

Could this decision or situation be damaging to someone or to some group? Does this decision involve a choice between a good and bad alternative, or perhaps between two "goods" or between two "bads"?
Is this issue about more than what is legal or what is most efficient? If so, how?

Get the Facts

What are the relevant facts of the case? What facts are not known? Can I learn more about the situation? Do I know enough to make a decision?
What individuals and groups have an important stake in the outcome? Are some concerns more important? Why?
What are the options for acting? Have all the relevant persons and groups been consulted? Have I identified creative options?

Evaluate Alternative Actions

Evaluate the options by asking the following questions:
- Which option will produce the most good and do the least harm? (The Utilitarian Approach)
- Which option best respects the rights of all who have a stake? (The Rights Approach)
- Which option treats people equally or proportionately? (The Justice Approach)
- Which option best serves the community as a whole, not just some members? (The Common Good Approach)
- Which option leads me to act as the sort of person I want to be? (The Virtue Approach)

Make a Decision and Test It

Considering all these approaches, which option best addresses the situation?
If I told someone I respect -- or told a television audience -- which option I have chosen, what would they say?

Act and Reflect on the Outcome

How can my decision be implemented with the greatest care and attention to the concerns of all stakeholders?
How did my decision turn out and what have I learned from this specific situation?

`Ethical Lenses to Look Through`

Deontological Questions for Technologists that Illuminate the Ethical Landscape:

What rights of others & duties to others must we respect in this context?
How might the dignity & autonomy of each stakeholder be impacted by this project?
What considerations of trust and of justice are relevant to this design/project?
Does our project treat people in ways that are transparent & to which they would consent ?
Are our choices/conduct of the sort that I/we could find universally acceptable?
Does this project involve any conflicting moral duties to others, or conflicting stakeholder rights?
How can we prioritize these?
Which moral rights/duties involved in this project may be justifiably overridden by higher ethical
duties, or more fundamental rights?

Consequentialist Questions for Technologists that Illuminate the Ethical Landscape:

Who will be directly affected by this project? How? Who will be indirectly affected?
Will the effects in aggregate likely create more good than harm, and what types of good and
harm? What are we counting as well-being, and what are we counting as harm/suffering?
Is our view of these concepts too narrow, or are we thinking about all relevant types of
harm/benefit (psychological, political, environmental, moral, cognitive, emotional, institutional,
cultural, etc.)?
How might future generations be affected by this project, and how?
What are the most morally significant harms and benefits that this project involves?
Does this project benefit many individuals, but only at the expense of the common good ?
Does it do the opposite, by sacrificing the welfare or key interests of individuals for the common
good? Have we considered these tradeoffs, and which are ethically justifiable?
Do the risks of harm from this project fall disproportionately on the least well-off or least
powerful in society? Will the benefits of this project go disproportionately to those who already
enjoy more than their share of social advantages and privileges?
Have we adequately considered ‘dual-use’ and downstream effects other than those we intend ?
Have we fallen victim to any false dilemmas or imagined constraints? Or have we considered the
full range of actions/resources/opportunities available to us that might boost this project’s
potential benefits and minimize its risks?
Are we settling too easily for an ethically ‘acceptable’ design or goal (‘do no harm’), or are there
missed opportunities to set a higher ethical standard and generate even greater benefits?

Virtue-Driven Questions for Technologists that Illuminate the Ethical Landscape:

What design habits are we regularly embodying, and are they the habits of excellent designers?
Would we want future generations of designers to use our practice as the example to follow?
What habits of character will this design/project foster in users and other affected stakeholders?
Will this design/project weaken or disincentivize any important human habits, skills, or virtues that
are central to human excellence (moral, political, or intellectual)? Will it strengthen any?
Will this design/project incentivize any vicious habits or traits in users or other stakeholders?
Are our choices and practices generally embodying the appropriate ‘ mean’ of design conduct
(relative to the context)? Or are they extreme (excessive or deficient) in some ways?
What are the relevant social contexts that this project/design will enter into/impact? Has our
thinking about its impact been too focused on one context, to the exclusion of other contexts
where its impact may be very different in morally significant ways?
Is there anything unusual about the context of this project that requires us to reconsider or modify
the normal ‘script’ of good design practice? Are we qualified and in a position to safely and
ethically make such modifications to normal design practice, and if not, who is?
What will this design/project say about us as people in the eyes of those who receive it? How
confident are we that we will individually, and as a team/organization, be proud to have our names
associated with this project one day?
Has our anticipated pride in this work (which is a good thing) blinded us to, or caused us to heavily
discount, any ethically significant risks we are taking? Or are we seeing clearly?

Questions for Technologists that Illuminate the Global Ethical Landscape:

Have we invited and considered the ethical perspectives of users and communities other than our
own, including those quite culturally or physically remote from us? Or have we fallen into the trap
of “designing for ourselves”?
How might the impacts and perceptions of this design/project differ for users and communities
with very different value-systems and social norms than those local or familiar to us? If we don’t
know, how can we learn the answer ?
The vision of the ‘good life’ dominant in tech-centric cultures of the West is far from universal.
Have we considered the global reach of technology and the fact that ethical traditions beyond the
West often emphasize values such as social harmony and care, hierarchical respect, honor,
personal sacrifice, or social ritual far more than we might?
In what cases should we refuse , for compelling ethical reasons, to honor the social norms of
another tradition, and in what cases should we incorporate and uphold others’ norms? How will
we decide, and by what standard or process?

`Pre- and Post-Mortems`

Team Pre-Mortems Should ASK:

How Could This Project Fail for Ethical Reasons?
What Would be the Most Likely Combined Causes of Our Ethical Failure/Disaster?
What Blind Spots Would Lead Us Into It?
Why Would We Fail to Act?
Why/How Would We Choose the Wrong Action?
What Systems/Processes/Checks/Failsafes Can We Put in Place to Reduce Failure Risk?

Team Post-Mortems Should ASK:

Why Was This Project an Ethical Failure?
What Combination or Cascade of Causes Led to the Ethical Failure?
What Can We Learn from This Ethical Failure that We Didn’t Already Know?
What Team Dynamics or Protocols Could Have Prevented This Ethical Failure?
What Must We Change if We Are to Do Better Next Time?

And rounding things out for our friends in the public sector, there’s a great toolkit for open source data (published under CC-BY 4.0) that was created as a collaborative effort among the Center for Government Excellence (GovEx), part of Johns Hopkins University, DataSF, part of the City and County of San Francisco, the Civic Analytics Network (Harvard University) and Data Community DC. The goal of this work is to maximize fairness and minimize harm when automated decisions are used in the public sector.

I include this more to be thorough than necessarily because there’s a lot of questions to extract, it’s more of a process to go through. The framework, however, is fantastic for evaluating the range of potential risks in data and a process for how to evaluate making data about residents open source. In that sense, it’s a risk mitigation framework similar to the Ethical OS’s focus on risk mitigation.

I should also address my own bias here — the first article I published about data ethics (in 2014) framed my work in data ethics as one of addressing risks businesses aren’t paying attention to and should. I continue to believe that risk mitigation is the primary imperative for companies to take action in the data ethics space — most companies and governments, however, are just waking up to the relationship between data practices and risk.

Does your executive team need an alarm clock?

UPDATED Jan 7, 2019: Added reference to deon.drivendata.org.