forked from p2pu/course-in-a-box
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Andrew Bell
committed
Sep 14, 2023
1 parent
d90d685
commit 5b775b5
Showing
1 changed file
with
31 additions
and
63 deletions.
There are no files selected for viewing
94 changes: 31 additions & 63 deletions
94
modules/all-about-transparency/_posts/2000-01-04-barriers.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,91 +1,59 @@ | ||
--- | ||
title: Becoming a Transparency Influencer | ||
title: Transparency Tools | ||
--- | ||
|
||
## Becoming a Transparency Influencer | ||
_Suggested time: 15 minutes_ | ||
## Transparency Tools | ||
_Suggested time: 10 minutes_ | ||
|
||
Why don't organizations make algorithmic transparency a priority? | ||
Researchers and practitioners have made significant advancements in developing tools for better algorithmic transparency. In this course, we survey five of the most popular categories --- however, there are many tools that exist beyond what we cover here (and many more that will be developed in the coming years). | ||
|
||
There are several key reasons why organizations avoid or neglect algorithmic transparency. In this section, we disccuss these reasons, and importantly offer a rebuttal as to why they are inadequate justifaction for not pursuing transparency. These rebuttals can be useful to help you become a **transparency influencer** within your organization, and create meaningful change towards having a more transparent, ethical approach to algorihtms. | ||
<!--Once you have inventoried your list of stakeholders and their needs, you are ready to begin thinking about and designing transparency features for your algorithmic systems. As a reminder, transparency featuers are those artificats that you create to increase the ability of an algoirhtm to be understood by humans. **Importantly, whenever possible, this should be a collaborative design process between _technical and non-technical__ persons within your organization.** Technical experts like practitioners, data scientists, data engineers, programmers, and analysts also have their own set of knoweldge on how to implement transparency features for algorithms (this is further detailed later in the course).--> | ||
|
||
<br> | ||
<!---Importantly, there are two levels of transparency you need to consider, called the **scope of transparency.** The first is **local** transparency, which provides understanding about a single decision made by an algorithm (ex. a single loan applicant), and second is **global** transparency, which explains how an algorithm works overall. Global transparency can give a "bird’s-eye view" of an algorithmic decision-making system, whereas local transparency focuses in on a particular bird or set of birds.--> | ||
|
||
### Claim: Transparency Means More Costs | ||
# Types of transparency tools | ||
|
||
**This is not necessarily true.** | ||
- **Transparency labels and model cards** for algorithmic tools are similiar to nutritional labels found in the food industry. Nutritional labels are designed to be transparent and understandable, and their contents are perceived as a highly credible source of information by consumers, who use them to guide decision making. | ||
|
||
While there may be some costs to implementing algorithmic transparency, **these costs are often grossly overstated --- especially when compared to the potential costs of _not_ implementing transparency.** Later in this course, we will detail the exact process and types of conversations one needs to have to implement transparency. | ||
There are several examples of transparency labels for algorithms that have been designed by researchers, and like nutritional labels, they often contain the "ingredients" that make up an algorithm. For example, the labels may include descriptive information about the factors considered by the algorithm (the ingredients), how they are ranked in terms of their importance in the decision-making (ingredient ranking), and attributes related to fairness, which could be useful for meeting stakeholder goals related to validity, trust, privacy, and fairness. | ||
|
||
**Notably, what is often _understated_ is the costs that are saved through algorithmic transparency.** Transparency can be used to avoid performance risks like algorithmic errors, security risks, control risks like rogue outcomes and unintended consequences, economic risks, ethical risks, and societal risks like unfair outcomes for underprivileged or marginalized communities. All of these risks can have costly consequences, like poor algorithms leading to worse business decisions or public relations risks, which can be greatly mitigated by using transparency. | ||
> **When is this useful?** To provide a global, high-altitude view of the algorithmic system for aspects like (1) what data is being used by the system, (2) how the system weighs that data, (3) metrics on the performance or fairness of the system overall. | ||
> **Case Study: The Cost of Not Being Transparent** <br><br> In 2019, Meta (then Facebook) received a record-breaking FTC fine of $5 billion for violations related to privacy, accountability and transparency. Despite the fine being substantial, the true penalty is that Meta is now required to hire compliance officers that actively participate and oversee business decisions at _all levels of the company_ --- a cost that is even more impactful than the $5 billion fine. This case study illustrates a key point: *ignoring transparency may save costs in the short-run, but leaves an organization vulnerable to catastrophic risks in the future.* You can read more about this case study <a href="https://www.ftc.gov/news-events/news/press-releases/2019/07/ftc-imposes-5-billion-penalty-sweeping-new-privacy-restrictions-facebook" target="_blank">here</a>. | ||
- **Data visualizations** can be used to show information about the data used in to create an algorithmic decision system, or facts about the system itself, like how many decisions an algorithm makes per day, and how many people are affected by those decisions. | ||
|
||
<br> | ||
Visualizations have proved useful for informing users and making complex information more accessible and digestible, and have even been found to have a powerful persuasive effect. Visualizations are often an advisable tool for transparency as they can easily convey lots of information in a simple manner, and organizations commonly staff analysts who specialize in visualizations. | ||
|
||
### Claim: Transparency Means Less Accurate Algorithms | ||
It's also important that visualizations are designed thoughtfully, as they have the ability to be abused and can successfully misrepresent a message through techniques like exaggeration or understatement. | ||
|
||
**This is not true.** | ||
> **When is this useful?** Useful for presenting complex information in a digestable way, particularly for non-technical users. This could include both internal and external stakeholders, like humans-in-the-loop for the former and affected individauls for the latter. | ||
Many managers are concerned that implementing transparency means reducing the sophistication of their algorithmic systems, thereby decreasing their efficiency and accuracy. For example, in the context of algorithmic hiring, some managers erroneously believe that making the resume screener more transparent means it will perform worse. | ||
- Some (but not all) algorithms have built in **intrinsic transparency mechanisms** _(also called intrinsic explainability mechanisms)_ that simply need to be surfaced to offer transparency into how they work. | ||
|
||
However, recent research has challenged this idea <a href="https://dl.acm.org/doi/pdf/10.1145/3531146.3533090" target="_blank">(Bell et al. 2022)</a>. First, as described previously, there are a number of transparency tools that can be used to "open up" even the most complex, sophisticated black-box systems. Second, there is a growing number of case studies showing that under many conditions, simpler, more transparent algorithmic systems can perform the same (or even better than) complex systems. **Overall, implementing transparency does not necessarily result in sacrificing efficiency–it’s not that simple!** | ||
For example, two common algorithm types are **decision trees** and **rules-lists.** For the former it is possible to print out and display the tree diagram for the user. For the latter, one can list out all the rules used to make a decision for. Another type of commonly used algorithm are **linear models**, which can produce formulas that explain their decision-making. These formulas are sometimes very easy to understand. | ||
|
||
<br> | ||
Unfortunately, many highly sophisticated algorithms like **random forests** and **neural networks** do ** have intrinsic transparency mechanisms. Importantly, the practitioners who designed the algorithm will be aware of whether or ** intrinsic transparency mechanisms are available. | ||
|
||
### Claim: Transparency Means Open-Sourcing Algorithms | ||
> **When is this useful?** To answer the question ``how does the system work, to the extent that given a new input to the algorithm, I could anticipate the output with a high degree of accuracy?'' Generally for providing a deeper understanding of how the underlying algorithm in the system functions. | ||
**This is not true.** | ||
> _Decision trees^, rules-lists^, linear models^, random forests, and nueral networks_ are all examples of types of AI algorithms. If you are non-technical, it is *not* important that you understand what they do or how they work -- your technical team will! For now, just know that the algorithms marked with the carrot _^_ are _more transparent_ than others. | ||
Algorithmic transparency is **not** the same as ``open-sourcing'' technologies. While providing the source code for an algorithm does offer some transparency into how it works, *it is not necessary for transparency.* | ||
- The **attribute importance** _(also called feature importance or factor importance)_ of an algorithm is a list that shows all the different attributes (sometimes called features or factors) that are considered by an algorithm, and their relative weights. It offers **global transparency** for an algorithm. | ||
|
||
In fact, in many cases, open-sourcing is an insufficinet or misguided attempt at transparency for two reasons: first, the source code is not useful for laymen or any non-technical stakeholders of the algorithm in helping them understand how it works. Second, the source code for an algorithm is only one component of a much larger technical ecosystem. Without the data that is used by the algorithm or the technical infrastructure that supports it, the source code may be completely useless. | ||
For example, consider an algorithm that makes predictions on whether or ** an individual should receive a loan. The attribute importance could be made up of three attributes: an individual's income, their credit history, and their education level. The weights for these attributes in the algorithm’s decision-making may be 40% income, 40% credit history, and 20% education level. | ||
|
||
Note that in some situations open-sourcing _can_ be a component of transparency, but it is by no means required. | ||
There are three benefits of attribute importance: | ||
- Attribute importance can be created for any algorithm, no matter how complicated it is. | ||
- There are a lot of interesting ways to display attribute importance to a human user through data visualizations. | ||
- From a technical perspective, it is easy to extract the attribute importance from an algorithm. This makes it _low cost_. | ||
|
||
<br> | ||
> **When is this useful?** Provides a global understanding of how an algorithmic system is processing data at a slightly deeper level than what is often found in transparency labels. Useful for __learning and support__ and to some extent __recourse__. Useful to practitioners for checking the __validity__ of an algorithmic system. | ||
### Claim: Transparency Means Losing Intellectual Property (IP) | ||
- The **attribute influence** _(also called feature influence or SHAP factors))_ of an algorithm is similar to the attribute importance, except that it shows how the attributes of a single instance or individual impacted the algorithm's output. In contrast to attribute importance, the influence shows the **local transparency** for a particular case. Like with attribute importance, the attribute influence can be created for _any_ algorithm. | ||
|
||
**Losing IP is not a guaranteed part of transparency.** | ||
For example, consider again an algorithm that makes predictions on whether or not an individual should receive a loan. If an individual is rejected for a loan, the attribute influence could tell them _your high income and education level were influencing the loan decision from the algorithm positively, but ultimately your low credit score caused the algorithm to reject your loan application._ | ||
|
||
Protecting IP is often a major concern for small startups or companies whose main competetive advantage is due to the IP of their algorithms. | ||
> **Levels of transparency** <br><br> **Local transparency** refers to understanding how an algorithmic system makes a decision about a single case or instance, and **global transparency** refers to understanding how an system works overall (a bird's-eye view). For example, for an algorithmic system that predicts whether or ** an applicant is accepted for a loan, global transparency would describe how the entire system works, and local transparency would describe the system’s prediction for a single loan applicant. | ||
It would be untruthful to claim that transparency is not, at least in some ways, in conflict with protecting IP. However, we forward the claim that _it is possible_ to implement elements of transparency without significantly jeopardizing the privacy of an organization's IP. In some ways, there is a balancing act to perform between taking advantage of the benefits of algorithmic transparency, while protecting IP. We offer the following ideas for transparency when IP protection is also a consideration: | ||
<!---Generally, when attribute influence is implemented as a transparency measure for an algorithm, individuals are shown the top 3 to 5 attributes that are influencing the algorithm’s output. Importantly, since attribute influence offers local transparency, it is extremely useful in offering **recourse** to affected persons of an algorithm. It is also very useful for human-in-the-loop users who need transparency for the purposes of decision support.--> | ||
|
||
- *Creating transparency for different pieces or elements of your algorithm, where the sum of those pieces are insufficient to full reconstruct your IP.* It is not uncommon that an algorithm is made up of multiple layers of decision making, or uses tens (or hundreds) of attributes, factors, and inputs to make decisions. Perhaps it is possible to implement transparency surroundings specific layers of the algorithm, or focusing on just 3-5 factors (or broad categories of factors). | ||
- *Using Differential Privacy.* Differential privacy is a system for publicly sharing information about a dataset by describing the patterns of groups within the dataset, while withholding the true information contained in a dataset. | ||
|
||
> **Differential privacy** is a large and sometimes complex topic that is beyond the scope of this course, but a primer can be found <a href="https://privacytools.seas.harvard.edu/differential-privacy" target="_blank">here</a>. | ||
<br> | ||
|
||
### Claim: Transparency Means Sacrificing Privacy | ||
|
||
**This is rarely true.** | ||
|
||
Privacy in the context of algorithmic systems is generally concerned with protecting sensitive data that the organization has collected. We want to make it clear that **it is a complete myth** that one has to sacrafice the privacy of their data to offer transparency. It is _never_ necessary to expose sensitive or proprietary data to ensure algorithmic transparency. In fact, this is exactly the objective of *Differential Privacy.* | ||
|
||
<br> | ||
|
||
### Claim: Transparency Means Strategic Manipulation | ||
|
||
Unfortunately, even without algorithmic transparency, strategic manipulation of algorithms by users is widespread. For example, in the hiring space, some candidates add invisible keywords in white text to the bottom of their resumes to trick algorithms into scoring their application higher. Another more prevalent example is Search Engine Optimization (SEO), which is the process of trying to "game" your webpage to the top of the search results. | ||
|
||
In light of this, transparency may actually be an end-run around strategic manipulation. **There is substantial data showing that algorithmic transparency increases the trust of users.** It's not unreasonable to believe that if users trust a system, a good faith dialogue can be opened up about preventing strategic manipulation and the abuse of those systems <a href="https://www.sciencedirect.com/science/article/abs/pii/S1071581920301531?casa_token=YmDerYJyXY8AAAAA:fpTQ9LnMnxvY9-iC3O8QE31KzRj1zgtSTBXJ-W7fta7AL1oc8d-3ulkl4Nlw-1cnxQ27Nmg7" target="_blank">(Shin 2021)</a>. | ||
|
||
<br> | ||
|
||
## When algorithmic tools are or will be procured | ||
|
||
Many organizations, especially in governmental or intergovernmental organizations, choose to procure their algorithmic tools instead of building them in-house. This poses a unique challenge, because because it may be beyond the agency or control of your orgnization to implement transparency for these tools. **For this reason _agency to implement transparency_ is an important consideration to weigh when choosing to procure algorithmic tools.** | ||
|
||
In light of this, we have also drafted this list of probing questions that can be asked to organizations providing algorithmic tools that will help open the conversation around transparency: | ||
|
||
- What are your values around algorithmic transparency? | ||
- What considerations to transparency have you implemented in the tool being considered? | ||
- If we require additional transparency considerations, are you able to implement them? | ||
- What transparency is available to the organization _selling_ the tool, that is not available to the _procuring_ organization (and ultimately those are impacted by the tool)? | ||
- How much transparency can we pass down stream to those impacted by the algorithmic tool? | ||
|
||
> **Case Study: Model Cards for AI Transparency** <br><br> _Salesforce_ is a Fortune 500 company that is well known for selling software tools to business and non-profits. Notably, in 2020, they began producing <a href="https://blog.salesforceairesearch.com/model-cards-for-ai-model-transparency/" target="_blank">_model cards_</a> for their algorithmic tools. Model cards provide high-level details about an underlying algorithm like _when and where it was created, its intended primary and secondary uses, what factors the algorithm considers, and against which metrics the performance of the tool was evaluated_. | ||
> **When is this useful?** To provide local transparency about a single instance, generally an affected individual. The attribute influence is one of the best ways to answer the question, ``why did the algorithmic system have this output for \emph{this specific person}?'' Extremely useful for __recourse__ for affected individuals. Very useful for __learning and support__ for humans-in-the-loop. |