From 9373600359905a55a7f3cfd72d8a07c4d7457301 Mon Sep 17 00:00:00 2001 From: harvard-edge Date: Tue, 21 May 2024 01:22:00 +0000 Subject: [PATCH] Push dev branch build --- docs/contents/contributors.html | 4 +-- docs/contents/frameworks/frameworks.html | 8 +++--- docs/contents/robust_ai/robust_ai.html | 32 ++++++++++++------------ docs/search.json | 8 +++--- 4 files changed, 26 insertions(+), 26 deletions(-) diff --git a/docs/contents/contributors.html b/docs/contents/contributors.html index 71bb24a7..995d6895 100644 --- a/docs/contents/contributors.html +++ b/docs/contents/contributors.html @@ -575,10 +575,10 @@

Contributors

-Matthew Stewart
Matthew Stewart

+Jared Ping
Jared Ping

-Jared Ping
Jared Ping

+Matthew Stewart
Matthew Stewart

eliasab16
eliasab16

diff --git a/docs/contents/frameworks/frameworks.html b/docs/contents/frameworks/frameworks.html index a50c1196..0afcd51e 100644 --- a/docs/contents/frameworks/frameworks.html +++ b/docs/contents/frameworks/frameworks.html @@ -1430,7 +1430,7 @@

6.9 Choosing the Right Framework

Choosing the right machine learning framework for a given application requires carefully evaluating models, hardware, and software considerations. By analyzing these three aspects—models, hardware, and software—ML engineers can select the optimal framework and customize it as needed for efficient and performant on-device ML applications. The goal is to balance model complexity, hardware limitations, and software integration to design a tailored ML pipeline for embedded and edge devices.

-
+
@@ -1446,7 +1446,7 @@

6.9.2 Software

-
+
@@ -1460,7 +1460,7 @@

6.9.3 Hardware

-
+
@@ -1507,7 +1507,7 @@

6.10.1 Decomposition

Currently, the ML system stack consists of four abstractions as shown in Figure 6.10, namely (1) computational graphs, (2) tensor programs, (3) libraries and runtimes, and (4) hardware primitives.

-
+
diff --git a/docs/contents/robust_ai/robust_ai.html b/docs/contents/robust_ai/robust_ai.html index eff8b884..f1471acd 100644 --- a/docs/contents/robust_ai/robust_ai.html +++ b/docs/contents/robust_ai/robust_ai.html @@ -1317,7 +1317,7 @@

Impact on ML System

-Figure 17.21: Graffiti on a stop sign tricked a self-driving car into thinking it was a 45 mph speed limit sign (Source: Eykholt) +Figure 17.21: Graffiti on a stop sign tricked a self-driving car into thinking it was a 45 mph speed limit sign (Credit: Eykholt)
@@ -1364,7 +1364,7 @@

Definitio

-Figure 17.22: NightShade’s poisoning effects on Stable Diffusion (Source: TOMÉ) +Figure 17.22: NightShade’s poisoning effects on Stable Diffusion (Credit: TOMÉ)
@@ -1409,7 +1409,7 @@

Mechanisms of

-Figure 17.23: Garbage In – Garbage Out (Source: Information Matters) +Figure 17.23: Garbage In – Garbage Out (Credit: Information Matters)

@@ -1424,7 +1424,7 @@

Mechanisms of

-Figure 17.24: Data Poisoning Attack (Source: Sikandar) +Figure 17.24: Data Poisoning Attack (Credit: Sikandar)

@@ -1459,7 +1459,7 @@
Case Study 2
-Figure 17.25: Samples of dirty-label poison data regarding mismatched text/image pairs (Source: Shan) +Figure 17.25: Samples of dirty-label poison data regarding mismatched text/image pairs (Credit: Shan)
@@ -1474,7 +1474,7 @@
Case Study 2
-Figure 17.26: Data poisoning (Source: Shan et al. (2023)) +Figure 17.26: Data poisoning (Credit: Shan et al. (2023))
Shan, Shawn, Wenxin Ding, Josephine Passananti, Haitao Zheng, and Ben Y Zhao. 2023. “Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models.” ArXiv Preprint abs/2310.13828. https://arxiv.org/abs/2310.13828. @@ -1506,7 +1506,7 @@

Definitio

-Figure 17.27: The curly brackets enclose the distribution shift between the environments. Here, z stands for the spurious feature, and y stands for label class (Source: Xin) +Figure 17.27: The curly brackets enclose the distribution shift between the environments. Here, z stands for the spurious feature, and y stands for label class (Credit: Xin)
@@ -1521,7 +1521,7 @@

Definitio
-Figure 17.28: Concept drift refers to a change in data patterns and relationships over time (Source: Evidently AI) +Figure 17.28: Concept drift refers to a change in data patterns and relationships over time (Credit: Evidently AI)
@@ -1540,7 +1540,7 @@

Mechanis
-Figure 17.29: Temporal evolution (Source: Białek) +Figure 17.29: Temporal evolution (Credit: Białek)
@@ -1615,7 +1615,7 @@

-Figure 17.30: Malicious data injection (Source: Li) +Figure 17.30: Malicious data injection (Credit: Li)
@@ -1627,7 +1627,7 @@
-Figure 17.31: Autoencoder (Source: Dertat) +Figure 17.31: Autoencoder (Credit: Dertat)
@@ -1685,7 +1685,7 @@
-Figure 17.33: Transfer learning (Source: Bhavsar) +Figure 17.33: Transfer learning (Credit: Bhavsar)
@@ -1767,7 +1767,7 @@

Detection and M
-Figure 17.35: Automated regression testing (Source: UTOR) +Figure 17.35: Automated regression testing (Credit: UTOR)
@@ -1785,7 +1785,7 @@

Detection and M
-Figure 17.36: Continuous Integration/Continuous Deployment (CI/CD) procedure (Source: geeksforgeeks) +Figure 17.36: Continuous Integration/Continuous Deployment (CI/CD) procedure (Credit: geeksforgeeks)
@@ -1875,7 +1875,7 @@

Methods

-Figure 17.39: Radiation test setup for semiconductor components (Lee et al. 2022) (Source: JD Instrument) +Figure 17.39: Radiation test setup for semiconductor components (Lee et al. 2022) (Credit: JD Instrument)
Lee, Minwoong, Namho Lee, Huijeong Gwon, Jongyeol Kim, Younggwan Hwang, and Seongik Cho. 2022. “Design of Radiation-Tolerant High-Speed Signal Processing Circuit for Detecting Prompt Gamma Rays by Nuclear Explosion.” Electronics 11 (18): 2970. https://doi.org/10.3390/electronics11182970. @@ -1912,7 +1912,7 @@
Limitations
-Figure 17.40: Comparison of techniques at layers of abstraction (Source: MAVFI) +Figure 17.40: Comparison of techniques at layers of abstraction (Credit: MAVFI)
diff --git a/docs/search.json b/docs/search.json index a7b2f1b6..e2030683 100644 --- a/docs/search.json +++ b/docs/search.json @@ -37,7 +37,7 @@ "href": "contents/contributors.html", "title": "Contributors", "section": "", - "text": "We extend our sincere thanks to the diverse group of individuals who have generously contributed their expertise, insights, and time to enhance both the content and codebase of this project. Below you will find a list of all contributors. If you would like to contribute to this project, please see our GitHub page.\n\n\n\n\n\n\n\n\nVijay Janapa Reddi\n\n\nIkechukwu Uchendu\n\n\nnaeemkh\n\n\nDouwe den Blanken\n\n\nShanzeh Batool\n\n\n\n\nMatthew Stewart\n\n\nJared Ping\n\n\neliasab16\n\n\nishapira\n\n\nMaximilian Lam\n\n\n\n\nMarcelo Rovai\n\n\nJayson Lin\n\n\nSophia Cho\n\n\nJeffrey Ma\n\n\nKorneel Van den Berghe\n\n\n\n\nZishen\n\n\nAlex Rodriguez\n\n\nSrivatsan Krishnan\n\n\nAndrea Murillo\n\n\nKleinbard\n\n\n\n\nAbdulrahman Mahmoud\n\n\narnaumarin\n\n\nAghyad Deeb\n\n\nAghyad Deeb\n\n\nDivya\n\n\n\n\nEmil Njor\n\n\nELSuitorHarvard\n\n\nJared Ni\n\n\nMichael Schnebly\n\n\noishib\n\n\n\n\nHenry Bae\n\n\nMark Mazumder\n\n\nJae-Won Chung\n\n\nYU SHUN, HSIAO\n\n\nPong Trairatvorakul\n\n\n\n\nShvetank Prakash\n\n\neurashin\n\n\nColby Banbury\n\n\nAditi Raju\n\n\nMarco Zennaro\n\n\n\n\nAndrew Bass\n\n\nJennifer Zhou\n\n\nBruno Scaglione\n\n\nAlex Oesterling\n\n\nGauri Jain\n\n\n\n\nEric D\n\n\nabigailswallow\n\n\nSonia Murthy\n\n\nThe Random DIY\n\n\nVijay Edupuganti\n\n\n\n\nBatur Arslan\n\n\nCostin-Andrei Oncescu\n\n\nYu-Shun Hsiao\n\n\nyanjingl\n\n\nShreya Johri\n\n\n\n\nJessica Quaye\n\n\nCurren Iyer\n\n\nAnnie Laurie Cook\n\n\nhappyappledog\n\n\nJason Yik\n\n\n\n\nEmeka Ezike", + "text": "We extend our sincere thanks to the diverse group of individuals who have generously contributed their expertise, insights, and time to enhance both the content and codebase of this project. Below you will find a list of all contributors. If you would like to contribute to this project, please see our GitHub page.\n\n\n\n\n\n\n\n\nVijay Janapa Reddi\n\n\nIkechukwu Uchendu\n\n\nnaeemkh\n\n\nDouwe den Blanken\n\n\nShanzeh Batool\n\n\n\n\nJared Ping\n\n\nMatthew Stewart\n\n\neliasab16\n\n\nishapira\n\n\nMaximilian Lam\n\n\n\n\nMarcelo Rovai\n\n\nJayson Lin\n\n\nSophia Cho\n\n\nJeffrey Ma\n\n\nKorneel Van den Berghe\n\n\n\n\nZishen\n\n\nAlex Rodriguez\n\n\nSrivatsan Krishnan\n\n\nAndrea Murillo\n\n\nKleinbard\n\n\n\n\nAbdulrahman Mahmoud\n\n\narnaumarin\n\n\nAghyad Deeb\n\n\nAghyad Deeb\n\n\nDivya\n\n\n\n\nEmil Njor\n\n\nELSuitorHarvard\n\n\nJared Ni\n\n\nMichael Schnebly\n\n\noishib\n\n\n\n\nHenry Bae\n\n\nMark Mazumder\n\n\nJae-Won Chung\n\n\nYU SHUN, HSIAO\n\n\nPong Trairatvorakul\n\n\n\n\nShvetank Prakash\n\n\neurashin\n\n\nColby Banbury\n\n\nAditi Raju\n\n\nMarco Zennaro\n\n\n\n\nAndrew Bass\n\n\nJennifer Zhou\n\n\nBruno Scaglione\n\n\nAlex Oesterling\n\n\nGauri Jain\n\n\n\n\nEric D\n\n\nabigailswallow\n\n\nSonia Murthy\n\n\nThe Random DIY\n\n\nVijay Edupuganti\n\n\n\n\nBatur Arslan\n\n\nCostin-Andrei Oncescu\n\n\nYu-Shun Hsiao\n\n\nyanjingl\n\n\nShreya Johri\n\n\n\n\nJessica Quaye\n\n\nCurren Iyer\n\n\nAnnie Laurie Cook\n\n\nhappyappledog\n\n\nJason Yik\n\n\n\n\nEmeka Ezike", "crumbs": [ "FRONT MATTER", "Contributors" @@ -1912,7 +1912,7 @@ "href": "contents/robust_ai/robust_ai.html#ml-model-robustness", "title": "17  Robust AI", "section": "17.4 ML Model Robustness", - "text": "17.4 ML Model Robustness\n\n17.4.1 Adversarial Attacks\n\nDefinition and Characteristics\nAdversarial attacks aim to trick models into making incorrect predictions by providing them with specially crafted, deceptive inputs (called adversarial examples) (Parrish et al. 2023). By adding slight perturbations to input data, adversaries can \"hack\" a model’s pattern recognition and deceive it. These are sophisticated techniques where slight, often imperceptible alterations to input data can trick an ML model into making a wrong prediction, as shown in Figure 17.18.\n\nParrish, Alicia, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Max Bartolo, Oana Inel, Juan Ciro, et al. 2023. “Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models.” ArXiv Preprint abs/2305.14384. https://arxiv.org/abs/2305.14384.\n\n\n\n\n\n\nFigure 17.18: A small adversarial noise added to the original image can make the neural network classify the image as a Guacamole instead of an Egyptian cat (Credit: Sutanto)\n\n\n\nOne can generate prompts that lead to unsafe images in text-to-image models like DALLE (Ramesh et al. 2021) or Stable Diffusion (Rombach et al. 2022). For example, by altering the pixel values of an image, attackers can deceive a facial recognition system into identifying a face as a different person.\n\nRamesh, Aditya, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. “Zero-Shot Text-to-Image Generation.” In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, edited by Marina Meila and Tong Zhang, 139:8821–31. Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v139/ramesh21a.html.\n\nRombach, Robin, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. 2022. “High-Resolution Image Synthesis with Latent Diffusion Models.” In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr52688.2022.01042.\nAdversarial attacks exploit the way ML models learn and make decisions during inference. These models work on the principle of recognizing patterns in data. An adversary crafts special inputs with perturbations to mislead the model’s pattern recognition---essentially ‘hacking’ the model’s perceptions.\nAdversarial attacks fall under different scenarios:\n\nWhitebox Attacks: The attacker fully knows the target model’s internal workings, including the training data, parameters, and architecture (Ye and Hamidi 2021). This comprehensive access creates favorable conditions for attackers to exploit the model’s vulnerabilities. The attacker can use specific and subtle weaknesses to craft effective adversarial examples.\nBlackbox Attacks: In contrast to white-box attacks, black-box attacks involve the attacker having little to no knowledge of the target model (Guo et al. 2019). To carry out the attack, the adversarial actor must carefully observe the model’s output behavior.\nGreybox Attacks: These fall between blackbox and whitebox attacks. The attacker has only partial knowledge about the target model’s internal design (Xu et al. 2021). For example, the attacker could have knowledge about training data but not the architecture or parameters. In the real world, practical attacks fall under black black-box box grey-boxes.\n\n\nYe, Linfeng, and Shayan Mohajer Hamidi. 2021. “Thundernna: A White Box Adversarial Attack.” arXiv Preprint arXiv:2111.12305.\n\nGuo, Chuan, Jacob Gardner, Yurong You, Andrew Gordon Wilson, and Kilian Weinberger. 2019. “Simple Black-Box Adversarial Attacks.” In International Conference on Machine Learning, 2484–93. PMLR.\n\nXu, Ying, Xu Zhong, Antonio Jimeno Yepes, and Jey Han Lau. 2021. “Grey-Box Adversarial Attack and Defence for Sentiment Classification.” arXiv Preprint arXiv:2103.11576.\nThe landscape of machine learning models is complex and broad, especially given their relatively recent integration into commercial applications. This rapid adoption, while transformative, has brought to light numerous vulnerabilities within these models. Consequently, various adversarial attack methods have emerged, each strategically exploiting different aspects of different models. Below, we highlight a subset of these methods, showcasing the multifaceted nature of adversarial attacks on machine learning models:\n\nGenerative Adversarial Networks (GANs) are deep learning models that consist of two networks competing against each other: a generator and a discriminator (Goodfellow et al. 2020). The generator tries to synthesize realistic data while the discriminator evaluates whether they are real or fake. GANs can be used to craft adversarial examples. The generator network is trained to produce inputs that the target model misclassifies. These GAN-generated images can then attack a target classifier or detection model. The generator and the target model are engaged in a competitive process, with the generator continually improving its ability to create deceptive examples and the target model enhancing its resistance to such examples. GANs provide a powerful framework for crafting complex and diverse adversarial inputs, illustrating the adaptability of generative models in the adversarial landscape.\nTransfer Learning Adversarial Attacks exploit the knowledge transferred from a pre-trained model to a target model, creating adversarial examples that can deceive both models. These attacks pose a growing concern, particularly when adversaries have knowledge of the feature extractor but lack access to the classification head (the part or layer responsible for making the final classifications). Referred to as \"headless attacks,\" these transferable adversarial strategies leverage the expressive capabilities of feature extractors to craft perturbations while being oblivious to the label space or training data. The existence of such attacks underscores the importance of developing robust defenses for transfer learning applications, especially since pre-trained models are commonly used (Abdelkader et al. 2020).\n\n\nGoodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. “Generative Adversarial Networks.” Commun. ACM 63 (11): 139–44. https://doi.org/10.1145/3422622.\n\nAbdelkader, Ahmed, Michael J. Curry, Liam Fowl, Tom Goldstein, Avi Schwarzschild, Manli Shu, Christoph Studer, and Chen Zhu. 2020. “Headless Horseman: Adversarial Attacks on Transfer Learning Models.” In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3087–91. IEEE. https://doi.org/10.1109/icassp40776.2020.9053181.\n\n\nMechanisms of Adversarial Attacks\n\n\n\n\n\n\nFigure 17.19: Gradient-Based Attacks (Credit: Ivezic)\n\n\n\nGradient-based Attacks\nOne prominent category of adversarial attacks is gradient-based attacks. These attacks leverage the gradients of the ML model’s loss function to craft adversarial examples. The Fast Gradient Sign Method (FGSM) is a well-known technique in this category. FGSM perturbs the input data by adding small noise in the gradient direction, aiming to maximize the model’s prediction error. FGSM can quickly generate adversarial examples, as shown in Figure 17.19, by taking a single step in the gradient direction.\nAnother variant, the Projected Gradient Descent (PGD) attack, extends FGSM by iteratively applying the gradient update step, allowing for more refined and powerful adversarial examples. The Jacobian-based Saliency Map Attack (JSMA) is another gradient-based approach that identifies the most influential input features and perturbs them to create adversarial examples.\nOptimization-based Attacks\nThese attacks formulate the generation of adversarial examples as an optimization problem. The Carlini and Wagner (C&W) attack is a prominent example in this category. It aims to find the smallest perturbation that can cause misclassification while maintaining the perceptual similarity to the original input. The C&W attack employs an iterative optimization process to minimize the perturbation while maximizing the model’s prediction error.\nAnother optimization-based approach is the Elastic Net Attack to DNNs (EAD), which incorporates elastic net regularization to generate adversarial examples with sparse perturbations.\nTransfer-based Attacks\nTransfer-based attacks exploit the transferability property of adversarial examples. Transferability refers to the phenomenon where adversarial examples crafted for one ML model can often fool other models, even if they have different architectures or were trained on different datasets. This enables attackers to generate adversarial examples using a surrogate model and then transfer them to the target model without requiring direct access to its parameters or gradients. Transfer-based attacks highlight the generalization of adversarial vulnerabilities across different models and the potential for black-box attacks.\nPhysical-world Attacks\nPhysical-world attacks bring adversarial examples into the realm of real-world scenarios. These attacks involve creating physical objects or manipulations that can deceive ML models when captured by sensors or cameras. Adversarial patches, for example, are small, carefully designed patches that can be placed on objects to fool object detection or classification models. When attached to real-world objects, these patches can cause models to misclassify or fail to detect the objects accurately. Adversarial objects, such as 3D-printed sculptures or modified road signs, can also be crafted to deceive ML systems in physical environments.\nSummary\nTable 17.2 a concise overview of the different categories of adversarial attacks, including gradient-based attacks (FGSM, PGD, JSMA), optimization-based attacks (C&W, EAD), transfer-based attacks, and physical-world attacks (adversarial patches and objects). Each attack is briefly described, highlighting its key characteristics and mechanisms.\n\n\n\nTable 17.2: Different attack types on ML models.\n\n\n\n\n\n\n\n\n\n\nAttack Category\nAttack Name\nDescription\n\n\n\n\nGradient-based\nFast Gradient Sign Method (FGSM)\nPerturbs input data by adding small noise in the gradient direction to maximize prediction error.\n\n\n\nProjected Gradient Descent (PGD)\nExtends FGSM by iteratively applying the gradient update step for more refined adversarial examples.\n\n\n\nJacobian-based Saliency Map Attack (JSMA)\nIdentifies influential input features and perturbs them to create adversarial examples.\n\n\nOptimization-based\nCarlini and Wagner (C&W) Attack\nFinds the smallest perturbation that causes misclassification while maintaining perceptual similarity.\n\n\n\nElastic Net Attack to DNNs (EAD)\nIncorporates elastic net regularization to generate adversarial examples with sparse perturbations.\n\n\nTransfer-based\nTransferability-based Attacks\nExploits the transferability of adversarial examples across different models, enabling black-box attacks.\n\n\nPhysical-world\nAdversarial Patches\nSmall, carefully designed patches placed on objects to fool object detection or classification models.\n\n\n\nAdversarial Objects\nPhysical objects (e.g., 3D-printed sculptures, modified road signs) crafted to deceive ML systems in real-world scenarios.\n\n\n\n\n\n\nThe mechanisms of adversarial attacks reveal the intricate interplay between the ML model’s decision boundaries, the input data, and the attacker’s objectives. By carefully manipulating the input data, attackers can exploit the model’s sensitivities and blind spots, leading to incorrect predictions. The success of adversarial attacks highlights the need for a deeper understanding of ML models’ robustness and generalization properties.\nDefending against adversarial attacks requires a multifaceted approach. Adversarial training is one common defense strategy in which models are trained on adversarial examples to improve robustness. Exposing the model to adversarial examples during training teaches it to classify them correctly and become more resilient to attacks. Defensive distillation, input preprocessing, and ensemble methods are other techniques that can help mitigate the impact of adversarial attacks.\nAs adversarial machine learning evolves, researchers explore new attack mechanisms and develop more sophisticated defenses. The arms race between attackers and defenders drives the need for constant innovation and vigilance in securing ML systems against adversarial threats. Understanding the mechanisms of adversarial attacks is crucial for developing robust and reliable ML models that can withstand the ever-evolving landscape of adversarial examples.\n\n\nImpact on ML Systems\nAdversarial attacks on machine learning systems have emerged as a significant concern in recent years, highlighting the potential vulnerabilities and risks associated with the widespread adoption of ML technologies. These attacks involve carefully crafted perturbations to input data that can deceive or mislead ML models, leading to incorrect predictions or misclassifications, as shown in Figure 17.20. The impact of adversarial attacks on ML systems is far-reaching and can have serious consequences in various domains.\nOne striking example of the impact of adversarial attacks was demonstrated by researchers in 2017. They experimented with small black and white stickers on stop signs (Eykholt et al. 2017). To the human eye, these stickers did not obscure the sign or prevent its interpretability. However, when images of the sticker-modified stop signs were fed into standard traffic sign classification ML models, a shocking result emerged. The models misclassified the stop signs as speed limit signs over 85% of the time.\n\nEykholt, Kevin, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. 2017. “Robust Physical-World Attacks on Deep Learning Models.” ArXiv Preprint abs/1707.08945. https://arxiv.org/abs/1707.08945.\nThis demonstration shed light on the alarming potential of simple adversarial stickers to trick ML systems into misreading critical road signs. The implications of such attacks in the real world are significant, particularly in the context of autonomous vehicles. If deployed on actual roads, these adversarial stickers could cause self-driving cars to misinterpret stop signs as speed limits, leading to dangerous situations, as shown in Figure 17.21. Researchers warned that this could result in rolling stops or unintended acceleration into intersections, endangering public safety.\n\n\n\n\n\n\nFigure 17.20: Adversarial example generation applied to GoogLeNet (Szegedy et al., 2014a) on ImageNet (Credit: Goodfellow)\n\n\n\n\n\n\n\n\n\nFigure 17.21: Graffiti on a stop sign tricked a self-driving car into thinking it was a 45 mph speed limit sign (Source: Eykholt)\n\n\n\nThe case study of the adversarial stickers on stop signs provides a concrete illustration of how adversarial examples exploit how ML models recognize patterns. By subtly manipulating the input data in ways that are invisible to humans, attackers can induce incorrect predictions and create serious risks, especially in safety-critical applications like autonomous vehicles. The attack’s simplicity highlights the vulnerability of ML models to even minor changes in the input, emphasizing the need for robust defenses against such threats.\nThe impact of adversarial attacks extends beyond the degradation of model performance. These attacks raise significant security and safety concerns, particularly in domains where ML models are relied upon for critical decision-making. In healthcare applications, adversarial attacks on medical imaging models could lead to misdiagnosis or incorrect treatment recommendations, jeopardizing patient well-being (M.-J. Tsai, Lin, and Lee 2023). In financial systems, adversarial attacks could enable fraud or manipulation of trading algorithms, resulting in substantial economic losses.\n\nTsai, Min-Jen, Ping-Yi Lin, and Ming-En Lee. 2023. “Adversarial Attacks on Medical Image Classification.” Cancers 15 (17): 4228. https://doi.org/10.3390/cancers15174228.\n\nFursov, Ivan, Matvey Morozov, Nina Kaploukhaya, Elizaveta Kovtun, Rodrigo Rivera-Castro, Gleb Gusev, Dmitry Babaev, Ivan Kireev, Alexey Zaytsev, and Evgeny Burnaev. 2021. “Adversarial Attacks on Deep Models for Financial Transaction Records.” In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &Amp; Data Mining, 2868–78. ACM. https://doi.org/10.1145/3447548.3467145.\nMoreover, adversarial vulnerabilities undermine the trustworthiness and interpretability of ML models. If carefully crafted perturbations can easily fool models, confidence in their predictions and decisions erodes. Adversarial examples expose the models’ reliance on superficial patterns and the inability to capture the true underlying concepts, challenging the reliability of ML systems (Fursov et al. 2021).\nDefending against adversarial attacks often requires additional computational resources and can impact the overall system performance. Techniques like adversarial training, where models are trained on adversarial examples to improve robustness, can significantly increase training time and computational requirements (Bai et al. 2021). Runtime detection and mitigation mechanisms, such as input preprocessing (Addepalli et al. 2020) or prediction consistency checks, introduce latency and affect the real-time performance of ML systems.\n\nBai, Tao, Jinqi Luo, Jun Zhao, Bihan Wen, and Qian Wang. 2021. “Recent Advances in Adversarial Training for Adversarial Robustness.” arXiv Preprint arXiv:2102.01356.\n\nAddepalli, Sravanti, B. S. Vivek, Arya Baburaj, Gaurang Sriramanan, and R. Venkatesh Babu. 2020. “Towards Achieving Adversarial Robustness by Enforcing Feature Consistency Across Bit Planes.” In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1020–29. IEEE. https://doi.org/10.1109/cvpr42600.2020.00110.\nThe presence of adversarial vulnerabilities also complicates the deployment and maintenance of ML systems. System designers and operators must consider the potential for adversarial attacks and incorporate appropriate defenses and monitoring mechanisms. Regular updates and retraining of models become necessary to adapt to new adversarial techniques and maintain system security and performance over time.\nThe impact of adversarial attacks on ML systems is significant and multifaceted. These attacks expose ML models’ vulnerabilities, from degrading model performance and raising security and safety concerns to challenging model trustworthiness and interpretability. Developers and researchers must prioritize the development of robust defenses and countermeasures to mitigate the risks posed by adversarial attacks. By addressing these challenges, we can build more secure, reliable, and trustworthy ML systems that can withstand the ever-evolving landscape of adversarial threats.\n\nExercise 17.2 (Adversarial Attacks)  \n\n\n\n\n\nGet ready to become an AI adversary! In this Colab, you’ll become a white-box hacker, learning to craft attacks that deceive image classification models. We’ll focus on the Fast Gradient Sign Method (FGSM), where you’ll weaponize a model’s gradients against it! You’ll deliberately distort images with tiny perturbations, observing how they increasingly fool the AI more intensely. This hands-on exercise highlights the importance of building secure AI – a critical skill as AI integrates into cars and healthcare. The Colab directly ties into the Robust AI chapter of your book, moving adversarial attacks from theory into your own hands-on experience.\n\nThink you can outsmart an AI? In this Colab, learn how to trick image classification models with adversarial attacks. We’ll use methods like FGSM to change images and subtly fool the AI. Discover how to design deceptive image patches and witness the surprising vulnerability of these powerful models. This is crucial knowledge for building truly robust AI systems!\n\n\n\n\n\n\n\n17.4.2 Data Poisoning\n\nDefinition and Characteristics\nData poisoning is an attack where the training data is tampered with, leading to a compromised model (Biggio, Nelson, and Laskov 2012), as shown in Figure 17.22. Attackers can modify existing training examples, insert new malicious data points, or influence the data collection process. The poisoned data is labeled in such a way as to skew the model’s learned behavior. This can be particularly damaging in applications where ML models make automated decisions based on learned patterns. Beyond training sets, poisoning tests, and validation data can allow adversaries to boost reported model performance artificially.\n\nBiggio, Battista, Blaine Nelson, and Pavel Laskov. 2012. “Poisoning Attacks Against Support Vector Machines.” In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012. icml.cc / Omnipress. http://icml.cc/2012/papers/880.pdf.\n\n\n\n\n\n\nFigure 17.22: NightShade’s poisoning effects on Stable Diffusion (Source: TOMÉ)\n\n\n\nThe process usually involves the following steps:\n\nInjection: The attacker adds incorrect or misleading examples into the training set. These examples are often designed to look normal to cursory inspection but have been carefully crafted to disrupt the learning process.\nTraining: The ML model trains on this manipulated dataset and develops skewed understandings of the data patterns.\nDeployment: Once the model is deployed, the corrupted training leads to flawed decision-making or predictable vulnerabilities the attacker can exploit.\n\nThe impact of data poisoning extends beyond classification errors or accuracy drops. In critical applications like healthcare, such alterations can lead to significant trust and safety issues (Marulli, Marrone, and Verde 2022). Later, we will discuss a few case studies of these issues.\n\nMarulli, Fiammetta, Stefano Marrone, and Laura Verde. 2022. “Sensitivity of Machine Learning Approaches to Fake and Untrusted Data in Healthcare Domain.” Journal of Sensor and Actuator Networks 11 (2): 21. https://doi.org/10.3390/jsan11020021.\n\nOprea, Alina, Anoop Singhal, and Apostol Vassilev. 2022. “Poisoning Attacks Against Machine Learning: Can Machine Learning Be Trustworthy?” Computer 55 (11): 94–99. https://doi.org/10.1109/mc.2022.3190787.\nThere are six main categories of data poisoning (Oprea, Singhal, and Vassilev 2022):\n\nAvailability Attacks: These attacks aim to compromise the overall functionality of a model. They cause it to misclassify most testing samples, rendering the model unusable for practical applications. An example is label flipping, where labels of a specific, targeted class are replaced with labels from a different one.\nTargeted Attacks: In contrast to availability attacks, targeted attacks aim to compromise a small number of the testing samples. So, the effect is localized to a limited number of classes, while the model maintains the same original level of accuracy for the majority of the classes. The targeted nature of the attack requires the attacker to possess knowledge of the model’s classes, making detecting these attacks more challenging.\nBackdoor Attacks: In these attacks, an adversary targets specific patterns in the data. The attacker introduces a backdoor (a malicious, hidden trigger or pattern) into the training data, such as manipulating certain features in structured data or manipulating a pattern of pixels at a fixed position. This causes the model to associate the malicious pattern with specific labels. As a result, when the model encounters test samples that contain a malicious pattern, it makes false predictions.\nSubpopulation Attacks: Attackers selectively choose to compromise a subset of the testing samples while maintaining accuracy on the rest of the samples. You can think of these attacks as a combination of availability and targeted attacks: performing availability attacks (performance degradation) within the scope of a targeted subset. Although subpopulation attacks may seem very similar to targeted attacks, the two have clear differences:\nScope: While targeted attacks target a selected set of samples, subpopulation attacks target a general subpopulation with similar feature representations. For example, in a targeted attack, an actor inserts manipulated images of a ‘speed bump’ warning sign (with carefully crafted perturbations or patterns), which causes an autonomous car to fail to recognize such a sign and slow down. On the other hand, manipulating all samples of people with a British accent so that a speech recognition model would misclassify a British person’s speech is an example of a subpopulation attack.\nKnowledge: While targeted attacks require a high degree of familiarity with the data, subpopulation attacks require less intimate knowledge to be effective.\n\nThe characteristics of data poisoning include:\nSubtle and hard-to-detect manipulations of training data: Data poisoning often involves subtle manipulations of the training data that are carefully crafted to be difficult to detect through casual inspection. Attackers employ sophisticated techniques to ensure that the poisoned samples blend seamlessly with the legitimate data, making them easier to identify with thorough analysis. These manipulations can target specific features or attributes of the data, such as altering numerical values, modifying categorical labels, or introducing carefully designed patterns. The goal is to influence the model’s learning process while evading detection, allowing the poisoned data to subtly corrupt the model’s behavior.\nCan be performed by insiders or external attackers: Data poisoning attacks can be carried out by various actors, including malicious insiders with access to the training data and external attackers who find ways to influence the data collection or preprocessing pipeline. Insiders pose a significant threat because they often have privileged access and knowledge of the system, enabling them to introduce poisoned data without raising suspicions. On the other hand, external attackers may exploit vulnerabilities in data sourcing, crowdsourcing platforms, or data aggregation processes to inject poisoned samples into the training dataset. This highlights the importance of implementing strong access controls, data governance policies, and monitoring mechanisms to mitigate the risk of insider threats and external attacks.\nExploits vulnerabilities in data collection and preprocessing: Data poisoning attacks often exploit vulnerabilities in the machine learning pipeline’s data collection and preprocessing stages. Attackers carefully design poisoned samples to evade common data validation techniques, ensuring that the manipulated data still falls within acceptable ranges, follows expected distributions, or maintains consistency with other features. This allows the poisoned data to pass through data preprocessing steps without detection. Furthermore, poisoning attacks can take advantage of weaknesses in data preprocessing, such as inadequate data cleaning, insufficient outlier detection, or lack of integrity checks. Attackers may also exploit the lack of robust data provenance and lineage tracking mechanisms to introduce poisoned data without leaving a traceable trail. Addressing these vulnerabilities requires rigorous data validation, anomaly detection, and data provenance tracking techniques to ensure the integrity and trustworthiness of the training data.\nDisrupts the learning process and skews model behavior: Data poisoning attacks are designed to disrupt the learning process of machine learning models and skew their behavior towards the attacker’s objectives. The poisoned data is typically manipulated with specific goals, such as skewing the model’s behavior towards certain classes, introducing backdoors, or degrading overall performance. These manipulations are not random but targeted to achieve the attacker’s desired outcomes. By introducing label inconsistencies, where the manipulated samples have labels that do not align with their true nature, poisoning attacks can confuse the model during training and lead to biased or incorrect predictions. The disruption caused by poisoned data can have far-reaching consequences, as the compromised model may make flawed decisions or exhibit unintended behavior when deployed in real-world applications.\nImpacts model performance, fairness, and trustworthiness: Poisoned data in the training dataset can have severe implications for machine learning models’ performance, fairness, and trustworthiness. Poisoned data can degrade the accuracy and performance of the trained model, leading to increased misclassifications or errors in predictions. This can have significant consequences, especially in critical applications where the model’s outputs inform important decisions. Moreover, poisoning attacks can introduce biases and fairness issues, causing the model to make discriminatory or unfair decisions for certain subgroups or classes. This undermines machine learning systems’ ethical and social responsibilities and can perpetuate or amplify existing biases. Furthermore, poisoned data erodes the trustworthiness and reliability of the entire ML system. The model’s outputs become questionable and potentially harmful, leading to a loss of confidence in the system’s integrity. The impact of poisoned data can propagate throughout the entire ML pipeline, affecting downstream components and decisions that rely on the compromised model. Addressing these concerns requires robust data governance, regular model auditing, and ongoing monitoring to detect and mitigate the effects of data poisoning attacks.\n\n\nMechanisms of Data Poisoning\nData poisoning attacks can be carried out through various mechanisms, exploiting different ML pipeline vulnerabilities. These mechanisms allow attackers to manipulate the training data and introduce malicious samples that can compromise the model’s performance, fairness, or integrity. Understanding these mechanisms is crucial for developing effective defenses against data poisoning and ensuring the robustness of ML systems. Data poisoning mechanisms can be broadly categorized based on the attacker’s approach and the stage of the ML pipeline they target. Some common mechanisms include modifying training data labels, altering feature values, injecting carefully crafted malicious samples, exploiting data collection and preprocessing vulnerabilities, manipulating data at the source, poisoning data in online learning scenarios, and collaborating with insiders to manipulate data.\nEach of these mechanisms presents unique challenges and requires different mitigation strategies. For example, detecting label manipulation may involve analyzing the distribution of labels and identifying anomalies (Zhou et al. 2018), while preventing feature manipulation may require secure data preprocessing and anomaly detection techniques (Carta et al. 2020). Defending against insider threats may involve strict access control policies and monitoring of data access patterns. Moreover, the effectiveness of data poisoning attacks often depends on the attacker’s knowledge of the ML system, including the model architecture, training algorithms, and data distribution. Attackers may use adversarial machine learning or data synthesis techniques to craft samples that are more likely to bypass detection and achieve their malicious objectives.\n\nZhou, Peng, Xintong Han, Vlad I. Morariu, and Larry S. Davis. 2018. “Learning Rich Features for Image Manipulation Detection.” In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1053–61. IEEE. https://doi.org/10.1109/cvpr.2018.00116.\n\nCarta, Salvatore, Alessandro Sebastian Podda, Diego Reforgiato Recupero, and Roberto Saia. 2020. “A Local Feature Engineering Strategy to Improve Network Anomaly Detection.” Future Internet 12 (10): 177. https://doi.org/10.3390/fi12100177.\n\n\n\n\n\n\nFigure 17.23: Garbage In – Garbage Out (Source: Information Matters)\n\n\n\nModifying training data labels: One of the most straightforward mechanisms of data poisoning is modifying the training data labels. In this approach, the attacker selectively changes the labels of a subset of the training samples to mislead the model’s learning process as shown in Figure 17.23. For example, in a binary classification task, the attacker might flip the labels of some positive samples to negative, or vice versa. By introducing such label noise, the attacker aims to degrade the model’s performance or cause it to make incorrect predictions for specific target instances.\nAltering feature values in training data: Another mechanism of data poisoning involves altering the feature values of the training samples without modifying the labels. The attacker carefully crafts the feature values to introduce specific biases or vulnerabilities into the model. For instance, in an image classification task, the attacker might add imperceptible perturbations to a subset of images, causing the model to learn a particular pattern or association. This type of poisoning can create backdoors or trojans in the trained model, which specific input patterns can trigger.\nInjecting carefully crafted malicious samples: In this mechanism, the attacker creates malicious samples designed to poison the model. These samples are crafted to have a specific impact on the model’s behavior while blending in with the legitimate training data. The attacker might use techniques such as adversarial perturbations or data synthesis to generate poisoned samples that are difficult to detect. The attacker aims to manipulate the model’s decision boundaries by injecting these malicious samples into the training data or introducing targeted misclassifications.\nExploiting data collection and preprocessing vulnerabilities: Data poisoning attacks can also exploit the data collection and preprocessing pipeline vulnerabilities. If the data collection process is not secure or there are weaknesses in the data preprocessing steps, an attacker can manipulate the data before it reaches the training phase. For example, if data is collected from untrusted sources or issues in data cleaning or aggregation, an attacker can introduce poisoned samples or manipulate the data to their advantage.\nManipulating data at the source (e.g., sensor data): In some cases, attackers can manipulate the data at its source, such as sensor data or input devices. By tampering with the sensors or manipulating the environment in which data is collected, attackers can introduce poisoned samples or bias the data distribution. For instance, in a self-driving car scenario, an attacker might manipulate the sensors or the environment to feed misleading information into the training data, compromising the model’s ability to make safe and reliable decisions.\n\n\n\n\n\n\nFigure 17.24: Data Poisoning Attack (Source: Sikandar)\n\n\n\nPoisoning data in online learning scenarios: Data poisoning attacks can also target ML systems that employ online learning, where the model is continuously updated with new data in real time. In such scenarios, an attacker can gradually inject poisoned samples over time, slowly manipulating the model’s behavior. Online learning systems are particularly vulnerable to data poisoning because they adapt to new data without extensive validation, making it easier for attackers to introduce malicious samples, as shown in Figure 17.24.\nCollaborating with insiders to manipulate data: Sometimes, data poisoning attacks can involve collaboration with insiders with access to the training data. Malicious insiders, such as employees or data providers, can manipulate the data before it is used to train the model. Insider threats are particularly challenging to detect and prevent, as the attackers have legitimate access to the data and can carefully craft the poisoning strategy to evade detection.\nThese are the key mechanisms of data poisoning in ML systems. Attackers often employ these mechanisms to make their attacks more effective and harder to detect. The risk of data poisoning attacks grows as ML systems become increasingly complex and rely on larger datasets from diverse sources. Defending against data poisoning requires a multifaceted approach. ML practitioners and system designers must be aware of the various mechanisms of data poisoning and adopt a comprehensive approach to data security and model resilience. This includes secure data collection, robust data validation, and continuous model performance monitoring. Implementing secure data collection and preprocessing practices is crucial to prevent data poisoning at the source. Data validation and anomaly detection techniques can also help identify and mitigate potential poisoning attempts. Monitoring model performance for signs of data poisoning is also essential to detect and respond to attacks promptly.\n\n\nImpact on ML Systems\nData poisoning attacks can severely affect ML systems, compromising their performance, reliability, and trustworthiness. The impact of data poisoning can manifest in various ways, depending on the attacker’s objectives and the specific mechanism used. Let’s explore each of the potential impacts in detail.\nDegradation of model performance: One of the primary impacts of data poisoning is the degradation of the model’s overall performance. By manipulating the training data, attackers can introduce noise, biases, or inconsistencies that hinder the model’s ability to learn accurate patterns and make reliable predictions. This can reduce accuracy, precision, recall, or other performance metrics. The degradation of model performance can have significant consequences, especially in critical applications such as healthcare, finance, or security, where the reliability of predictions is crucial.\nMisclassification of specific targets: Data poisoning attacks can also be designed to cause the model to misclassify specific target instances. Attackers may introduce carefully crafted poisoned samples similar to the target instances, leading the model to learn incorrect associations. This can result in the model consistently misclassifying the targeted instances, even if it performs well on other inputs. Such targeted misclassification can have severe consequences, such as causing a malware detection system to overlook specific malicious files or leading to the wrong diagnosis in a medical imaging application.\nBackdoors and trojans in trained models: Data poisoning can introduce backdoors or trojans into the trained model. Backdoors are hidden functionalities that allow attackers to trigger specific behaviors or bypass normal authentication mechanisms. On the other hand, Trojans are malicious components embedded within the model that can activate specific input patterns. By poisoning the training data, attackers can create models that appear to perform normally but contain hidden vulnerabilities that can be exploited later. Backdoors and trojans can compromise the integrity and security of the ML system, allowing attackers to gain unauthorized access, manipulate predictions, or exfiltrate sensitive information.\nBiased or unfair model outcomes: Data poisoning attacks can introduce biases or unfairness into the model’s predictions. By manipulating the training data distribution or injecting samples with specific biases, attackers can cause the model to learn and perpetuate discriminatory patterns. This can lead to unfair treatment of certain groups or individuals based on sensitive attributes such as race, gender, or age. Biased models can have severe societal implications, reinforcing existing inequalities and discriminatory practices. Ensuring fairness and mitigating biases is crucial for building trustworthy and ethical ML systems.\nIncreased false positives or false negatives: Data poisoning can also impact the model’s ability to correctly identify positive or negative instances, leading to increased false positives or false negatives. False positives occur when the model incorrectly identifies a negative instance as positive, while false negatives happen when a positive instance is misclassified as negative. The consequences of increased false positives or false negatives can be significant depending on the application. For example, in a fraud detection system, high false positives can lead to unnecessary investigations and customer frustration, while high false negatives can allow fraudulent activities to go undetected.\nCompromised system reliability and trustworthiness: Data poisoning attacks can undermine ML systems’ overall reliability and trustworthiness. When models are trained on poisoned data, their predictions become reliable and trustworthy. This can erode user confidence in the system and lead to a loss of trust in the decisions made by the model. In critical applications where ML systems are relied upon for decision-making, such as autonomous vehicles or medical diagnosis, compromised reliability can have severe consequences, putting lives and property at risk.\nAddressing the impact of data poisoning requires a proactive approach to data security, model testing, and monitoring. Organizations must implement robust measures to ensure the integrity and quality of training data, employ techniques to detect and mitigate poisoning attempts, and continuously monitor the performance and behavior of deployed models. Collaboration between ML practitioners, security experts, and domain specialists is essential to develop comprehensive strategies for preventing and responding to data poisoning attacks.\n\nCase Study 1\nIn 2017, researchers demonstrated a data poisoning attack against a popular toxicity classification model called Perspective (Hosseini et al. 2017). This ML model detects toxic comments online.\n\nHosseini, Hossein, Sreeram Kannan, Baosen Zhang, and Radha Poovendran. 2017. “Deceiving Google’s Perspective Api Built for Detecting Toxic Comments.” ArXiv Preprint abs/1702.08138. https://arxiv.org/abs/1702.08138.\nThe researchers added synthetically generated toxic comments with slight misspellings and grammatical errors to the model’s training data. This slowly corrupted the model, causing it to misclassify increasing numbers of severely toxic inputs as non-toxic over time.\nAfter retraining on the poisoned data, the model’s false negative rate increased from 1.4% to 27% - allowing extremely toxic comments to bypass detection. The researchers warned this stealthy data poisoning could enable the spread of hate speech, harassment, and abuse if deployed against real moderation systems.\nThis case highlights how data poisoning can degrade model accuracy and reliability. For social media platforms, a poisoning attack that impairs toxicity detection could lead to the proliferation of harmful content and distrust of ML moderation systems. The example demonstrates why securing training data integrity and monitoring for poisoning is critical across application domains.\n\n\nCase Study 2\n\n\n\n\n\n\nFigure 17.25: Samples of dirty-label poison data regarding mismatched text/image pairs (Source: Shan)\n\n\n\nInterestingly enough, data poisoning attacks are not always malicious (Shan et al. 2023). Nightshade, a tool developed by a team led by Professor Ben Zhao at the University of Chicago, utilizes data poisoning to help artists protect their art against scraping and copyright violations by generative AI models. Artists can use the tool to make subtle modifications to their images before uploading them online, as shown in Figure 17.25.\nWhile these changes are indiscernible to the human eye, they can significantly disrupt the performance of generative AI models when incorporated into the training data. Generative models can be manipulated to generate hallucinations and weird images. For example, with only 300 poisoned images, the University of Chicago researchers could trick the latest Stable Diffusion model into generating images of dogs that look like cats or images of cows when prompted for cars.\nAs the number of poisoned images on the internet increases, the performance of the models that use scraped data will deteriorate exponentially. First, the poisoned data is hard to detect and requires manual elimination. Second, the \"poison\" spreads quickly to other labels because generative models rely on connections between words and concepts as they generate images. So a poisoned image of a \"car\" could spread into generated images associated with words like \"truck,” \"train,” \" bus,” etc.\nOn the other hand, this tool can be used maliciously and can affect legitimate applications of the generative models. This shows the very challenging and novel nature of machine learning attacks.\nFigure 17.26 demonstrates the effects of different levels of data poisoning (50 samples, 100 samples, and 300 samples of poisoned images) on generating images in different categories. Notice how the images start deforming and deviating from the desired category. For example, after 300 poison samples, a car prompt generates a cow.\n\n\n\n\n\n\nFigure 17.26: Data poisoning (Source: Shan et al. (2023))\n\n\nShan, Shawn, Wenxin Ding, Josephine Passananti, Haitao Zheng, and Ben Y Zhao. 2023. “Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models.” ArXiv Preprint abs/2310.13828. https://arxiv.org/abs/2310.13828.\n\n\n\nExercise 17.3 (Poisoning Attacks)  \n\n\n\n\n\nGet ready to explore the dark side of AI security! In this Colab, you’ll learn about data poisoning – how bad data can trick AI models into making wrong decisions. We’ll focus on a real-world attack against a Support Vector Machine (SVM), observing how the AI’s behavior changes under attack. This hands-on exercise will highlight why protecting AI systems is crucial, especially as they become more integrated into our lives. Think like a hacker, understand the vulnerability, and brainstorm how to defend our AI systems!\n\n\n\n\n\n\n\n\n17.4.3 Distribution Shifts\n\nDefinition and Characteristics\nDistribution shift refers to the phenomenon where the data distribution encountered by an ML model during deployment (inference) differs from the distribution it was trained on, as shown in Figure 17.27. This is not so much an attack as it is that the model’s robustness will vary over time. In other words, the data’s statistical properties, patterns, or underlying assumptions can change between the training and test phases.\n\n\n\n\n\n\nFigure 17.27: The curly brackets enclose the distribution shift between the environments. Here, z stands for the spurious feature, and y stands for label class (Source: Xin)\n\n\n\nThe key characteristics of distribution shift include:\nDomain mismatch: The input data during inference comes from a different domain or distribution than the training data. When the input data during inference comes from a domain or distribution different from the training data, it can significantly affect the model’s performance. This is because the model has learned patterns and relationships specific to the training domain, and when applied to a different domain, those learned patterns may not hold. For example, consider a sentiment analysis model trained on movie reviews. Suppose this model is applied to analyze sentiment in tweets. In that case, it may need help to accurately classify the sentiment because the language, grammar, and context of tweets can differ from movie reviews. This domain mismatch can result in poor performance and unreliable predictions, limiting the model’s practical utility.\nTemporal drift: The data distribution evolves, leading to a gradual or sudden shift in the input characteristics. Temporal drift is important because ML models are often deployed in dynamic environments where the data distribution can change over time. If the model is not updated or adapted to these changes, its performance can gradually degrade. For instance, the patterns and behaviors associated with fraudulent activities may evolve in a fraud detection system as fraudsters adapt their techniques. If the model is not retrained or updated to capture these new patterns, it may fail to detect new types of fraud effectively. Temporal drift can lead to a decline in the model’s accuracy and reliability over time, making monitoring and addressing this type of distribution shift crucial.\nContextual changes: The ML model’s context can vary, resulting in different data distributions based on factors such as location, user behavior, or environmental conditions. Contextual changes matter because ML models are often deployed in various contexts or environments that can have different data distributions. If the model cannot generalize well to these different contexts, its performance may improve. For example, consider a computer vision model trained to recognize objects in a controlled lab environment. When deployed in a real-world setting, factors such as lighting conditions, camera angles, or background clutter can vary significantly, leading to a distribution shift. If the model is robust to these contextual changes, it may be able to accurately recognize objects in the new environment, limiting its practical utility.\nUnrepresentative training data: The training data may only partially capture the variability and diversity of the real-world data encountered during deployment. Unrepresentative training data can lead to biased or skewed models that perform poorly on real-world data. Suppose the training data needs to capture the variability and diversity of the real-world data adequately. In that case, the model may learn patterns specific to the training set but needs to generalize better to new, unseen data. This can result in poor performance, biased predictions, and limited model applicability. For instance, if a facial recognition model is trained primarily on images of individuals from a specific demographic group, it may struggle to accurately recognize faces from other demographic groups when deployed in a real-world setting. Ensuring that the training data is representative and diverse is crucial for building models that can generalize well to real-world scenarios.\n\n\n\n\n\n\nFigure 17.28: Concept drift refers to a change in data patterns and relationships over time (Source: Evidently AI)\n\n\n\nDistribution shift can manifest in various forms, such as:\nCovariate shift: The distribution of the input features (covariates) changes while the conditional distribution of the target variable given the input remains the same. Covariate shift matters because it can impact the model’s ability to make accurate predictions when the input features (covariates) differ between the training and test data. Even if the relationship between the input features and the target variable remains the same, a change in the distribution of the input features can affect the model’s performance. For example, consider a model trained to predict housing prices based on features like square footage, number of bedrooms, and location. Suppose the distribution of these features in the test data significantly differs from the training data (e.g., the test data contains houses with much larger square footage). In that case, the model’s predictions may become less accurate. Addressing covariate shifts is important to ensure the model’s robustness and reliability when applied to new data.\nConcept drift: The relationship between the input features and the target variable changes over time, altering the underlying concept the model is trying to learn, as shown in Figure 17.28. Concept drift is important because it indicates changes in the fundamental relationship between the input features and the target variable over time. When the underlying concept that the model is trying to learn shifts, its performance can deteriorate if not adapted to the new concept. For instance, in a customer churn prediction model, the factors influencing customer churn may evolve due to market conditions, competitor offerings, or customer preferences. If the model is not updated to capture these changes, its predictions may become less accurate and irrelevant. Detecting and adapting to concept drift is crucial to maintaining the model’s effectiveness and alignment with evolving real-world concepts.\nDomain generalization: The model must generalize to unseen domains or distributions not present during training. Domain generalization is important because it enables ML models to be applied to new, unseen domains without requiring extensive retraining or adaptation. In real-world scenarios, training data that covers all possible domains or distributions that the model may encounter is often infeasible. Domain generalization techniques aim to learn domain-invariant features or models that can generalize well to new domains. For example, consider a model trained to classify images of animals. If the model can learn features invariant to different backgrounds, lighting conditions, or poses, it can generalize well to classify animals in new, unseen environments. Domain generalization is crucial for building models that can be deployed in diverse and evolving real-world settings.\nThe presence of a distribution shift can significantly impact the performance and reliability of ML models, as the models may need help generalizing well to the new data distribution. Detecting and adapting to distribution shifts is crucial to ensure ML systems’ robustness and practical utility in real-world scenarios.\n\n\nMechanisms of Distribution Shifts\nThe mechanisms of distribution shift, such as changes in data sources, temporal evolution, domain-specific variations, selection bias, feedback loops, and adversarial manipulations, are important to understand because they help identify the underlying causes of distribution shift. By understanding these mechanisms, practitioners can develop targeted strategies to mitigate their impact and improve the model’s robustness. Here are some common mechanisms:\n\n\n\n\n\n\nFigure 17.29: Temporal evolution (Source: Białek)\n\n\n\nChanges in data sources: Distribution shifts can occur when the data sources used for training and inference differ. For example, if a model is trained on data from one sensor but deployed on data from another sensor with different characteristics, it can lead to a distribution shift.\nTemporal evolution: Over time, the underlying data distribution can evolve due to changes in user behavior, market dynamics, or other temporal factors. For instance, in a recommendation system, user preferences may shift over time, leading to a distribution shift in the input data, as shown in Figure 17.29.\nDomain-specific variations: Different domains or contexts can have distinct data distributions. A model trained on data from one domain may only generalize well to another domain with appropriate adaptation techniques. For example, an image classification model trained on indoor scenes may struggle when applied to outdoor scenes.\nSelection bias: A Distribution shift can arise from selection bias during data collection or sampling. If the training data does not represent the true population or certain subgroups are over- or underrepresented, this can lead to a mismatch between the training and test distributions.\nFeedback loops: In some cases, the predictions or actions taken by an ML model can influence future data distribution. For example, in a dynamic pricing system, the prices set by the model can impact customer behavior, leading to a shift in the data distribution over time.\nAdversarial manipulations: Adversaries can intentionally manipulate the input data to create a distribution shift and deceive the ML model. By introducing carefully crafted perturbations or generating out-of-distribution samples, attackers can exploit the model’s vulnerabilities and cause it to make incorrect predictions.\nUnderstanding the mechanisms of distribution shift is important for developing effective strategies to detect and mitigate its impact on ML systems. By identifying the sources and characteristics of the shift, practitioners can design appropriate techniques, such as domain adaptation, transfer learning, or continual learning, to improve the model’s robustness and performance under distributional changes.\n\n\nImpact on ML Systems\nDistribution shifts can significantly negatively impact the performance and reliability of ML systems. Here are some key ways in which distribution shift can affect ML models:\nDegraded predictive performance: When the data distribution encountered during inference differs from the training distribution, the model’s predictive accuracy can deteriorate. The model may need help generalizing the new data well, leading to increased errors and suboptimal performance.\nReduced reliability and trustworthiness: Distribution shift can undermine the reliability and trustworthiness of ML models. If the model’s predictions become unreliable or inconsistent due to the shift, users may lose confidence in the system’s outputs, leading to potential misuse or disuse of the model.\nBiased predictions: Distribution shift can introduce biases in the model’s predictions. If the training data does not represent the real-world distribution or certain subgroups are underrepresented, the model may make biased predictions that discriminate against certain groups or perpetuate societal biases.\nIncreased uncertainty and risk: Distribution shift introduces additional uncertainty and risk into the ML system. The model’s behavior and performance may become less predictable, making it challenging to assess its reliability and suitability for critical applications. This uncertainty can lead to increased operational risks and potential failures.\nAdaptability challenges: ML models trained on a specific data distribution may need help to adapt to changing environments or new domains. The lack of adaptability can limit the model’s usefulness and applicability in dynamic real-world scenarios where the data distribution evolves.\nMaintenance and update difficulties: Distribution shift can complicate the maintenance and updating of ML models. As the data distribution changes, the model may require frequent retraining or fine-tuning to maintain its performance. This can be time-consuming and resource-intensive, especially if the shift occurs rapidly or continuously.\nVulnerability to adversarial attacks: Distribution shift can make ML models more vulnerable to adversarial attacks. Adversaries can exploit the model’s sensitivity to distributional changes by crafting adversarial examples outside the training distribution, causing the model to make incorrect predictions or behave unexpectedly.\nTo mitigate the impact of distribution shifts, it is crucial to develop robust ML systems that detect and adapt to distributional changes. Techniques such as domain adaptation, transfer learning, and continual learning can help improve the model’s generalization ability across different distributions. ML model monitoring, testing, and updating are also necessary to ensure their performance and reliability during distribution shifts.\n\n\n\n17.4.4 Detection and Mitigation\n\nAdversarial Attacks\nAs you may recall from above, adversarial attacks pose a significant threat to the robustness and reliability of ML systems. These attacks involve crafting carefully designed inputs, known as adversarial examples, to deceive ML models and cause them to make incorrect predictions. To safeguard ML systems against adversarial attacks, developing effective techniques for detecting and mitigating these threats is crucial.\n\nAdversarial Example Detection Techniques\nDetecting adversarial examples is the first line of defense against adversarial attacks. Several techniques have been proposed to identify and flag suspicious inputs that may be adversarial.\nStatistical methods aim to detect adversarial examples by analyzing the statistical properties of the input data. These methods often compare the input data distribution to a reference distribution, such as the training data distribution or a known benign distribution. Techniques like the Kolmogorov-Smirnov (Berger and Zhou 2014) test or the Anderson-Darling test can be used to measure the discrepancy between the distributions and flag inputs that deviate significantly from the expected distribution.\n\nBerger, Vance W, and YanYan Zhou. 2014. “Kolmogorovsmirnov Test: Overview.” Wiley Statsref: Statistics Reference Online.\nKernel density estimation (KDE) is a non-parametric technique used to estimate the probability density function of a dataset. In the context of adversarial example detection, KDE can be used to estimate the density of benign examples in the input space. Adversarial examples often lie in low-density regions and can be detected by comparing their estimated density to a threshold. Inputs with an estimated density below the threshold are flagged as potential adversarial examples.\nAnother technique is feature squeezing (Panda, Chakraborty, and Roy 2019), which reduces the complexity of the input space by applying dimensionality reduction or discretization. The idea behind feature squeezing is that adversarial examples often rely on small, imperceptible perturbations that can be eliminated or reduced through these transformations. Inconsistencies can be detected by comparing the model’s predictions on the original input and the squeezed input, indicating the presence of adversarial examples.\n\nPanda, Priyadarshini, Indranil Chakraborty, and Kaushik Roy. 2019. “Discretization Based Solutions for Secure Machine Learning Against Adversarial Attacks.” #IEEE_O_ACC# 7: 70157–68. https://doi.org/10.1109/access.2019.2919463.\nModel uncertainty estimation techniques aim to quantify the confidence or uncertainty associated with a model’s predictions. Adversarial examples often exploit regions of high uncertainty in the model’s decision boundary. By estimating the uncertainty using techniques like Bayesian neural networks, dropout-based uncertainty estimation, or ensemble methods, inputs with high uncertainty can be flagged as potential adversarial examples.\n\n\nAdversarial Defense Strategies\nOnce adversarial examples are detected, various defense strategies can be employed to mitigate their impact and improve the robustness of ML models.\nAdversarial training is a technique that involves augmenting the training data with adversarial examples and retraining the model on this augmented dataset. Exposing the model to adversarial examples during training teaches it to classify them correctly and becomes more robust to adversarial attacks. Adversarial training can be performed using various attack methods, such as the Fast Gradient Sign Method (FGSM) or Projected Gradient Descent (PGD) (Madry et al. 2017).\n\nMadry, Aleksander, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. “Towards Deep Learning Models Resistant to Adversarial Attacks.” arXiv Preprint arXiv:1706.06083.\n\nPapernot, Nicolas, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016. “Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks.” In 2016 IEEE Symposium on Security and Privacy (SP), 582–97. IEEE; IEEE. https://doi.org/10.1109/sp.2016.41.\nDefensive distillation (Papernot et al. 2016) is a technique that trains a second model (the student model) to mimic the behavior of the original model (the teacher model). The student model is trained on the soft labels produced by the teacher model, which are less sensitive to small perturbations. Using the student model for inference can reduce the impact of adversarial perturbations, as the student model learns to generalize better and is less sensitive to adversarial noise.\nInput preprocessing and transformation techniques aim to remove or mitigate the effect of adversarial perturbations before feeding the input to the ML model. These techniques include image denoising, JPEG compression, random resizing, padding, or applying random transformations to the input data. By reducing the impact of adversarial perturbations, these preprocessing steps can help improve the model’s robustness to adversarial attacks.\nEnsemble methods combine multiple models to make more robust predictions. The ensemble can reduce the impact of adversarial attacks by using a diverse set of models with different architectures, training data, or hyperparameters. Adversarial examples that fool one model may not fool others in the ensemble, leading to more reliable and robust predictions. Model diversification techniques, such as using different preprocessing techniques or feature representations for each model in the ensemble, can further enhance the robustness.\n\n\nRobustness Evaluation and Testing\nConduct thorough evaluation and testing to assess the effectiveness of adversarial defense techniques and measure the robustness of ML models.\nAdversarial robustness metrics quantify the model’s resilience to adversarial attacks. These metrics can include the model’s accuracy on adversarial examples, the average distortion required to fool the model, or the model’s performance under different attack strengths. By comparing these metrics across different models or defense techniques, practitioners can assess and compare their robustness levels.\nStandardized adversarial attack benchmarks and datasets provide a common ground for evaluating and comparing the robustness of ML models. These benchmarks include datasets with pre-generated adversarial examples and tools and frameworks for generating adversarial attacks. Examples of popular adversarial attack benchmarks include the MNIST-C, CIFAR-10-C, and ImageNet-C (Hendrycks and Dietterich 2019) datasets, which contain corrupted or perturbed versions of the original datasets.\n\nHendrycks, Dan, and Thomas Dietterich. 2019. “Benchmarking Neural Network Robustness to Common Corruptions and Perturbations.” arXiv Preprint arXiv:1903.12261.\nPractitioners can develop more robust and resilient ML systems by leveraging these adversarial example detection techniques, defense strategies, and robustness evaluation methods. However, it is important to note that adversarial robustness is an ongoing research area, and no single technique provides complete protection against all types of adversarial attacks. A comprehensive approach that combines multiple defense mechanisms and regular testing is essential to maintain the security and reliability of ML systems in the face of evolving adversarial threats.\n\n\n\nData Poisoning\nRecall that data poisoning is an attack that targets the integrity of the training data used to build ML models. By manipulating or corrupting the training data, attackers can influence the model’s behavior and cause it to make incorrect predictions or perform unintended actions. Detecting and mitigating data poisoning attacks is crucial to ensure the trustworthiness and reliability of ML systems, as shown in Figure 17.30.\n\nAnomaly Detection Techniques for Identifying Poisoned Data\n\n\n\n\n\n\nFigure 17.30: Malicious data injection (Source: Li)\n\n\n\nStatistical outlier detection methods identify data points that deviate significantly from most data. These methods assume that poisoned data instances are likely to be statistical outliers. Techniques such as the Z-score method, Tukey’s method, or the [Mahalanobis] distance can be used to measure the deviation of each data point from the central tendency of the dataset. Data points that exceed a predefined threshold are flagged as potential outliers and considered suspicious for data poisoning.\nClustering-based methods group similar data points together based on their features or attributes. The assumption is that poisoned data instances may form distinct clusters or lie far away from the normal data clusters. By applying clustering algorithms like K-means, DBSCAN, or hierarchical clustering, anomalous clusters or data points that do not belong to any cluster can be identified. These anomalous instances are then treated as potentially poisoned data.\n\n\n\n\n\n\nFigure 17.31: Autoencoder (Source: Dertat)\n\n\n\nAutoencoders are neural networks trained to reconstruct the input data from a compressed representation, as shown in Figure 17.31. They can be used for anomaly detection by learning the normal patterns in the data and identifying instances that deviate from them. During training, the autoencoder is trained on clean, unpoisoned data. At inference time, the reconstruction error for each data point is computed. Data points with high reconstruction errors are considered abnormal and potentially poisoned, as they do not conform to the learned normal patterns.\n\n\nData Sanitization and Preprocessing Techniques\nData poisoning can be avoided by cleaning data, which involves identifying and removing or correcting noisy, incomplete, or inconsistent data points. Techniques such as data deduplication, missing value imputation, and outlier removal can be applied to improve the quality of the training data. By eliminating or filtering out suspicious or anomalous data points, the impact of poisoned instances can be reduced.\nData validation involves verifying the integrity and consistency of the training data. This can include checking for data type consistency, range validation, and cross-field dependencies. By defining and enforcing data validation rules, anomalous or inconsistent data points indicative of data poisoning can be identified and flagged for further investigation.\nData provenance and lineage tracking involve maintaining a record of data’s origin, transformations, and movements throughout the ML pipeline. By documenting the data sources, preprocessing steps, and any modifications made to the data, practitioners can trace anomalies or suspicious patterns back to their origin. This helps identify potential points of data poisoning and facilitates the investigation and mitigation process.\n\n\nRobust Training Techniques\nRobust optimization techniques can be used to modify the training objective to minimize the impact of outliers or poisoned instances. This can be achieved by using robust loss functions less sensitive to extreme values, such as the Huber loss or the modified Huber loss. Regularization techniques, such as L1 or L2 regularization, can also help in reducing the model’s sensitivity to poisoned data by constraining the model’s complexity and preventing overfitting.\nRobust loss functions are designed to be less sensitive to outliers or noisy data points. Examples include the modified Huber loss, the Tukey loss (Beaton and Tukey 1974), and the trimmed mean loss. These loss functions down-weight or ignore the contribution of abnormal instances during training, reducing their impact on the model’s learning process. Robust objective functions, such as the minimax or distributionally robust objective, aim to optimize the model’s performance under worst-case scenarios or in the presence of adversarial perturbations.\n\nBeaton, Albert E., and John W. Tukey. 1974. “The Fitting of Power Series, Meaning Polynomials, Illustrated on Band-Spectroscopic Data.” Technometrics 16 (2): 147. https://doi.org/10.2307/1267936.\nData augmentation techniques involve generating additional training examples by applying random transformations or perturbations to the existing data Figure 17.32. This helps in increasing the diversity and robustness of the training dataset. By introducing controlled variations in the data, the model becomes less sensitive to specific patterns or artifacts that may be present in poisoned instances. Randomization techniques, such as random subsampling or bootstrap aggregating, can also help reduce the impact of poisoned data by training multiple models on different subsets of the data and combining their predictions.\n\n\n\n\n\n\nFigure 17.32: An image of the number “3” in original form and with basic augmentations applied.\n\n\n\n\n\nSecure and Trusted Data Sourcing\nImplementing the best data collection and curation practices can help mitigate the risk of data poisoning. This includes establishing clear data collection protocols, verifying the authenticity and reliability of data sources, and conducting regular data quality assessments. Sourcing data from trusted and reputable providers and following secure data handling practices can reduce the likelihood of introducing poisoned data into the training pipeline.\nStrong data governance and access control mechanisms are essential to prevent unauthorized modifications or tampering with the training data. This involves defining clear roles and responsibilities for data access, implementing access control policies based on the principle of least privilege, and monitoring and logging data access activities. By restricting access to the training data and maintaining an audit trail, potential data poisoning attempts can be detected and investigated.\nDetecting and mitigating data poisoning attacks requires a multifaceted approach that combines anomaly detection, data sanitization, robust training techniques, and secure data sourcing practices. By implementing these measures, ML practitioners can enhance the resilience of their models against data poisoning and ensure the integrity and trustworthiness of the training data. However, it is important to note that data poisoning is an active area of research, and new attack vectors and defense mechanisms continue to emerge. Staying informed about the latest developments and adopting a proactive and adaptive approach to data security is crucial for maintaining the robustness of ML systems.\n\n\n\nDistribution Shifts\n\nDetecting and Mitigating Distribution Shifts\nRecall that distribution shifts occur when the data distribution encountered by a machine learning (ML) model during deployment differs from the distribution it was trained on. These shifts can significantly impact the model’s performance and generalization ability, leading to suboptimal or incorrect predictions. Detecting and mitigating distribution shifts is crucial to ensure the robustness and reliability of ML systems in real-world scenarios.\n\n\nDetection Techniques for Distribution Shifts\nStatistical tests can be used to compare the distributions of the training and test data to identify significant differences. Techniques such as the Kolmogorov-Smirnov test or the Anderson-Darling test measure the discrepancy between two distributions and provide a quantitative assessment of the presence of distribution shift. By applying these tests to the input features or the model’s predictions, practitioners can detect if there is a statistically significant difference between the training and test distributions.\nDivergence metrics quantify the dissimilarity between two probability distributions. Commonly used divergence metrics include the Kullback-Leibler (KL) divergence and the [Jensen-Shannon (JS)] divergence. By calculating the divergence between the training and test data distributions, practitioners can assess the extent of the distribution shift. High divergence values indicate a significant difference between the distributions, suggesting the presence of a distribution shift.\nUncertainty quantification techniques, such as Bayesian neural networks or ensemble methods, can estimate the uncertainty associated with the model’s predictions. When a model is applied to data from a different distribution, its predictions may have higher uncertainty. By monitoring the uncertainty levels, practitioners can detect distribution shifts. If the uncertainty consistently exceeds a predetermined threshold for test samples, it suggests that the model is operating outside its trained distribution.\nIn addition, domain classifiers are trained to distinguish between different domains or distributions. Practitioners can detect distribution shifts by training a classifier to differentiate between the training and test domains. If the domain classifier achieves high accuracy in distinguishing between the two domains, it indicates a significant difference in the underlying distributions. The performance of the domain classifier serves as a measure of the distribution shift.\n\n\nMitigation Techniques for Distribution Shifts\n\n\n\n\n\n\nFigure 17.33: Transfer learning (Source: Bhavsar)\n\n\n\nTransfer learning leverages knowledge gained from one domain to improve performance in another, as shown in Figure 17.33. By using pre-trained models or transferring learned features from a source domain to a target domain, transfer learning can help mitigate the impact of distribution shifts. The pre-trained model can be fine-tuned on a small amount of labeled data from the target domain, allowing it to adapt to the new distribution. Transfer learning is particularly effective when the source and target domains share similar characteristics or when labeled data in the target domain is scarce.\nContinual learning, also known as lifelong learning, enables ML models to learn continuously from new data distributions while retaining knowledge from previous distributions. Techniques such as elastic weight consolidation (EWC) (Kirkpatrick et al. 2017) or gradient episodic memory (GEM) (Lopez-Paz and Ranzato 2017) allow models to adapt to evolving data distributions over time. These techniques aim to balance the plasticity of the model (ability to learn from new data) with the stability of the model (retaining previously learned knowledge). By incrementally updating the model with new data and mitigating catastrophic forgetting, continual learning helps models stay robust to distribution shifts.\n\nKirkpatrick, James, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, et al. 2017. “Overcoming Catastrophic Forgetting in Neural Networks.” Proc. Natl. Acad. Sci. 114 (13): 3521–26. https://doi.org/10.1073/pnas.1611835114.\n\nLopez-Paz, David, and Marc’Aurelio Ranzato. 2017. “Gradient Episodic Memory for Continual Learning.” Adv Neural Inf Process Syst 30.\nData augmentation techniques, such as those we have seen previously, involve applying transformations or perturbations to the existing training data to increase its diversity and improve the model’s robustness to distribution shifts. By introducing variations in the data, such as rotations, translations, scaling, or adding noise, data augmentation helps the model learn invariant features and generalize better to unseen distributions. Data augmentation can be performed during training and inference to enhance the model’s ability to handle distribution shifts.\nEnsemble methods combine multiple models to make predictions more robust to distribution shifts. By training models on different subsets of the data, using different algorithms, or with different hyperparameters, ensemble methods can capture diverse aspects of the data distribution. When presented with a shifted distribution, the ensemble can leverage the strengths of individual models to make more accurate and stable predictions. Techniques like bagging, boosting, or stacking can create effective ensembles.\nRegularly updating models with new data from the target distribution is crucial to mitigate the impact of distribution shifts. As the data distribution evolves, models should be retrained or fine-tuned on the latest available data to adapt to the changing patterns. Monitoring model performance and data characteristics can help detect when an update is necessary. By keeping the models up to date, practitioners can ensure they remain relevant and accurate in the face of distribution shifts.\nEvaluating models using robust metrics less sensitive to distribution shifts can provide a more reliable assessment of model performance. Metrics such as the area under the precision-recall curve (AUPRC) or the F1 score are more robust to class imbalance and can better capture the model’s performance across different distributions. Additionally, using domain-specific evaluation metrics that align with the desired outcomes in the target domain can provide a more meaningful measure of the model’s effectiveness.\nDetecting and mitigating distribution shifts is an ongoing process that requires continuous monitoring, adaptation, and improvement. By employing a combination of detection techniques and mitigation strategies, ML practitioners can proactively identify and address distribution shifts, ensuring the robustness and reliability of their models in real-world deployments. It is important to note that distribution shifts can take various forms and may require domain-specific approaches depending on the nature of the data and the application. Staying informed about the latest research and best practices in handling distribution shifts is essential for building resilient ML systems.", + "text": "17.4 ML Model Robustness\n\n17.4.1 Adversarial Attacks\n\nDefinition and Characteristics\nAdversarial attacks aim to trick models into making incorrect predictions by providing them with specially crafted, deceptive inputs (called adversarial examples) (Parrish et al. 2023). By adding slight perturbations to input data, adversaries can \"hack\" a model’s pattern recognition and deceive it. These are sophisticated techniques where slight, often imperceptible alterations to input data can trick an ML model into making a wrong prediction, as shown in Figure 17.18.\n\nParrish, Alicia, Hannah Rose Kirk, Jessica Quaye, Charvi Rastogi, Max Bartolo, Oana Inel, Juan Ciro, et al. 2023. “Adversarial Nibbler: A Data-Centric Challenge for Improving the Safety of Text-to-Image Models.” ArXiv Preprint abs/2305.14384. https://arxiv.org/abs/2305.14384.\n\n\n\n\n\n\nFigure 17.18: A small adversarial noise added to the original image can make the neural network classify the image as a Guacamole instead of an Egyptian cat (Credit: Sutanto)\n\n\n\nOne can generate prompts that lead to unsafe images in text-to-image models like DALLE (Ramesh et al. 2021) or Stable Diffusion (Rombach et al. 2022). For example, by altering the pixel values of an image, attackers can deceive a facial recognition system into identifying a face as a different person.\n\nRamesh, Aditya, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea Voss, Alec Radford, Mark Chen, and Ilya Sutskever. 2021. “Zero-Shot Text-to-Image Generation.” In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, edited by Marina Meila and Tong Zhang, 139:8821–31. Proceedings of Machine Learning Research. PMLR. http://proceedings.mlr.press/v139/ramesh21a.html.\n\nRombach, Robin, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjorn Ommer. 2022. “High-Resolution Image Synthesis with Latent Diffusion Models.” In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. https://doi.org/10.1109/cvpr52688.2022.01042.\nAdversarial attacks exploit the way ML models learn and make decisions during inference. These models work on the principle of recognizing patterns in data. An adversary crafts special inputs with perturbations to mislead the model’s pattern recognition---essentially ‘hacking’ the model’s perceptions.\nAdversarial attacks fall under different scenarios:\n\nWhitebox Attacks: The attacker fully knows the target model’s internal workings, including the training data, parameters, and architecture (Ye and Hamidi 2021). This comprehensive access creates favorable conditions for attackers to exploit the model’s vulnerabilities. The attacker can use specific and subtle weaknesses to craft effective adversarial examples.\nBlackbox Attacks: In contrast to white-box attacks, black-box attacks involve the attacker having little to no knowledge of the target model (Guo et al. 2019). To carry out the attack, the adversarial actor must carefully observe the model’s output behavior.\nGreybox Attacks: These fall between blackbox and whitebox attacks. The attacker has only partial knowledge about the target model’s internal design (Xu et al. 2021). For example, the attacker could have knowledge about training data but not the architecture or parameters. In the real world, practical attacks fall under black black-box box grey-boxes.\n\n\nYe, Linfeng, and Shayan Mohajer Hamidi. 2021. “Thundernna: A White Box Adversarial Attack.” arXiv Preprint arXiv:2111.12305.\n\nGuo, Chuan, Jacob Gardner, Yurong You, Andrew Gordon Wilson, and Kilian Weinberger. 2019. “Simple Black-Box Adversarial Attacks.” In International Conference on Machine Learning, 2484–93. PMLR.\n\nXu, Ying, Xu Zhong, Antonio Jimeno Yepes, and Jey Han Lau. 2021. “Grey-Box Adversarial Attack and Defence for Sentiment Classification.” arXiv Preprint arXiv:2103.11576.\nThe landscape of machine learning models is complex and broad, especially given their relatively recent integration into commercial applications. This rapid adoption, while transformative, has brought to light numerous vulnerabilities within these models. Consequently, various adversarial attack methods have emerged, each strategically exploiting different aspects of different models. Below, we highlight a subset of these methods, showcasing the multifaceted nature of adversarial attacks on machine learning models:\n\nGenerative Adversarial Networks (GANs) are deep learning models that consist of two networks competing against each other: a generator and a discriminator (Goodfellow et al. 2020). The generator tries to synthesize realistic data while the discriminator evaluates whether they are real or fake. GANs can be used to craft adversarial examples. The generator network is trained to produce inputs that the target model misclassifies. These GAN-generated images can then attack a target classifier or detection model. The generator and the target model are engaged in a competitive process, with the generator continually improving its ability to create deceptive examples and the target model enhancing its resistance to such examples. GANs provide a powerful framework for crafting complex and diverse adversarial inputs, illustrating the adaptability of generative models in the adversarial landscape.\nTransfer Learning Adversarial Attacks exploit the knowledge transferred from a pre-trained model to a target model, creating adversarial examples that can deceive both models. These attacks pose a growing concern, particularly when adversaries have knowledge of the feature extractor but lack access to the classification head (the part or layer responsible for making the final classifications). Referred to as \"headless attacks,\" these transferable adversarial strategies leverage the expressive capabilities of feature extractors to craft perturbations while being oblivious to the label space or training data. The existence of such attacks underscores the importance of developing robust defenses for transfer learning applications, especially since pre-trained models are commonly used (Abdelkader et al. 2020).\n\n\nGoodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. “Generative Adversarial Networks.” Commun. ACM 63 (11): 139–44. https://doi.org/10.1145/3422622.\n\nAbdelkader, Ahmed, Michael J. Curry, Liam Fowl, Tom Goldstein, Avi Schwarzschild, Manli Shu, Christoph Studer, and Chen Zhu. 2020. “Headless Horseman: Adversarial Attacks on Transfer Learning Models.” In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 3087–91. IEEE. https://doi.org/10.1109/icassp40776.2020.9053181.\n\n\nMechanisms of Adversarial Attacks\n\n\n\n\n\n\nFigure 17.19: Gradient-Based Attacks (Credit: Ivezic)\n\n\n\nGradient-based Attacks\nOne prominent category of adversarial attacks is gradient-based attacks. These attacks leverage the gradients of the ML model’s loss function to craft adversarial examples. The Fast Gradient Sign Method (FGSM) is a well-known technique in this category. FGSM perturbs the input data by adding small noise in the gradient direction, aiming to maximize the model’s prediction error. FGSM can quickly generate adversarial examples, as shown in Figure 17.19, by taking a single step in the gradient direction.\nAnother variant, the Projected Gradient Descent (PGD) attack, extends FGSM by iteratively applying the gradient update step, allowing for more refined and powerful adversarial examples. The Jacobian-based Saliency Map Attack (JSMA) is another gradient-based approach that identifies the most influential input features and perturbs them to create adversarial examples.\nOptimization-based Attacks\nThese attacks formulate the generation of adversarial examples as an optimization problem. The Carlini and Wagner (C&W) attack is a prominent example in this category. It aims to find the smallest perturbation that can cause misclassification while maintaining the perceptual similarity to the original input. The C&W attack employs an iterative optimization process to minimize the perturbation while maximizing the model’s prediction error.\nAnother optimization-based approach is the Elastic Net Attack to DNNs (EAD), which incorporates elastic net regularization to generate adversarial examples with sparse perturbations.\nTransfer-based Attacks\nTransfer-based attacks exploit the transferability property of adversarial examples. Transferability refers to the phenomenon where adversarial examples crafted for one ML model can often fool other models, even if they have different architectures or were trained on different datasets. This enables attackers to generate adversarial examples using a surrogate model and then transfer them to the target model without requiring direct access to its parameters or gradients. Transfer-based attacks highlight the generalization of adversarial vulnerabilities across different models and the potential for black-box attacks.\nPhysical-world Attacks\nPhysical-world attacks bring adversarial examples into the realm of real-world scenarios. These attacks involve creating physical objects or manipulations that can deceive ML models when captured by sensors or cameras. Adversarial patches, for example, are small, carefully designed patches that can be placed on objects to fool object detection or classification models. When attached to real-world objects, these patches can cause models to misclassify or fail to detect the objects accurately. Adversarial objects, such as 3D-printed sculptures or modified road signs, can also be crafted to deceive ML systems in physical environments.\nSummary\nTable 17.2 a concise overview of the different categories of adversarial attacks, including gradient-based attacks (FGSM, PGD, JSMA), optimization-based attacks (C&W, EAD), transfer-based attacks, and physical-world attacks (adversarial patches and objects). Each attack is briefly described, highlighting its key characteristics and mechanisms.\n\n\n\nTable 17.2: Different attack types on ML models.\n\n\n\n\n\n\n\n\n\n\nAttack Category\nAttack Name\nDescription\n\n\n\n\nGradient-based\nFast Gradient Sign Method (FGSM)\nPerturbs input data by adding small noise in the gradient direction to maximize prediction error.\n\n\n\nProjected Gradient Descent (PGD)\nExtends FGSM by iteratively applying the gradient update step for more refined adversarial examples.\n\n\n\nJacobian-based Saliency Map Attack (JSMA)\nIdentifies influential input features and perturbs them to create adversarial examples.\n\n\nOptimization-based\nCarlini and Wagner (C&W) Attack\nFinds the smallest perturbation that causes misclassification while maintaining perceptual similarity.\n\n\n\nElastic Net Attack to DNNs (EAD)\nIncorporates elastic net regularization to generate adversarial examples with sparse perturbations.\n\n\nTransfer-based\nTransferability-based Attacks\nExploits the transferability of adversarial examples across different models, enabling black-box attacks.\n\n\nPhysical-world\nAdversarial Patches\nSmall, carefully designed patches placed on objects to fool object detection or classification models.\n\n\n\nAdversarial Objects\nPhysical objects (e.g., 3D-printed sculptures, modified road signs) crafted to deceive ML systems in real-world scenarios.\n\n\n\n\n\n\nThe mechanisms of adversarial attacks reveal the intricate interplay between the ML model’s decision boundaries, the input data, and the attacker’s objectives. By carefully manipulating the input data, attackers can exploit the model’s sensitivities and blind spots, leading to incorrect predictions. The success of adversarial attacks highlights the need for a deeper understanding of ML models’ robustness and generalization properties.\nDefending against adversarial attacks requires a multifaceted approach. Adversarial training is one common defense strategy in which models are trained on adversarial examples to improve robustness. Exposing the model to adversarial examples during training teaches it to classify them correctly and become more resilient to attacks. Defensive distillation, input preprocessing, and ensemble methods are other techniques that can help mitigate the impact of adversarial attacks.\nAs adversarial machine learning evolves, researchers explore new attack mechanisms and develop more sophisticated defenses. The arms race between attackers and defenders drives the need for constant innovation and vigilance in securing ML systems against adversarial threats. Understanding the mechanisms of adversarial attacks is crucial for developing robust and reliable ML models that can withstand the ever-evolving landscape of adversarial examples.\n\n\nImpact on ML Systems\nAdversarial attacks on machine learning systems have emerged as a significant concern in recent years, highlighting the potential vulnerabilities and risks associated with the widespread adoption of ML technologies. These attacks involve carefully crafted perturbations to input data that can deceive or mislead ML models, leading to incorrect predictions or misclassifications, as shown in Figure 17.20. The impact of adversarial attacks on ML systems is far-reaching and can have serious consequences in various domains.\nOne striking example of the impact of adversarial attacks was demonstrated by researchers in 2017. They experimented with small black and white stickers on stop signs (Eykholt et al. 2017). To the human eye, these stickers did not obscure the sign or prevent its interpretability. However, when images of the sticker-modified stop signs were fed into standard traffic sign classification ML models, a shocking result emerged. The models misclassified the stop signs as speed limit signs over 85% of the time.\n\nEykholt, Kevin, Ivan Evtimov, Earlence Fernandes, Bo Li, Amir Rahmati, Chaowei Xiao, Atul Prakash, Tadayoshi Kohno, and Dawn Song. 2017. “Robust Physical-World Attacks on Deep Learning Models.” ArXiv Preprint abs/1707.08945. https://arxiv.org/abs/1707.08945.\nThis demonstration shed light on the alarming potential of simple adversarial stickers to trick ML systems into misreading critical road signs. The implications of such attacks in the real world are significant, particularly in the context of autonomous vehicles. If deployed on actual roads, these adversarial stickers could cause self-driving cars to misinterpret stop signs as speed limits, leading to dangerous situations, as shown in Figure 17.21. Researchers warned that this could result in rolling stops or unintended acceleration into intersections, endangering public safety.\n\n\n\n\n\n\nFigure 17.20: Adversarial example generation applied to GoogLeNet (Szegedy et al., 2014a) on ImageNet (Credit: Goodfellow)\n\n\n\n\n\n\n\n\n\nFigure 17.21: Graffiti on a stop sign tricked a self-driving car into thinking it was a 45 mph speed limit sign (Credit: Eykholt)\n\n\n\nThe case study of the adversarial stickers on stop signs provides a concrete illustration of how adversarial examples exploit how ML models recognize patterns. By subtly manipulating the input data in ways that are invisible to humans, attackers can induce incorrect predictions and create serious risks, especially in safety-critical applications like autonomous vehicles. The attack’s simplicity highlights the vulnerability of ML models to even minor changes in the input, emphasizing the need for robust defenses against such threats.\nThe impact of adversarial attacks extends beyond the degradation of model performance. These attacks raise significant security and safety concerns, particularly in domains where ML models are relied upon for critical decision-making. In healthcare applications, adversarial attacks on medical imaging models could lead to misdiagnosis or incorrect treatment recommendations, jeopardizing patient well-being (M.-J. Tsai, Lin, and Lee 2023). In financial systems, adversarial attacks could enable fraud or manipulation of trading algorithms, resulting in substantial economic losses.\n\nTsai, Min-Jen, Ping-Yi Lin, and Ming-En Lee. 2023. “Adversarial Attacks on Medical Image Classification.” Cancers 15 (17): 4228. https://doi.org/10.3390/cancers15174228.\n\nFursov, Ivan, Matvey Morozov, Nina Kaploukhaya, Elizaveta Kovtun, Rodrigo Rivera-Castro, Gleb Gusev, Dmitry Babaev, Ivan Kireev, Alexey Zaytsev, and Evgeny Burnaev. 2021. “Adversarial Attacks on Deep Models for Financial Transaction Records.” In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &Amp; Data Mining, 2868–78. ACM. https://doi.org/10.1145/3447548.3467145.\nMoreover, adversarial vulnerabilities undermine the trustworthiness and interpretability of ML models. If carefully crafted perturbations can easily fool models, confidence in their predictions and decisions erodes. Adversarial examples expose the models’ reliance on superficial patterns and the inability to capture the true underlying concepts, challenging the reliability of ML systems (Fursov et al. 2021).\nDefending against adversarial attacks often requires additional computational resources and can impact the overall system performance. Techniques like adversarial training, where models are trained on adversarial examples to improve robustness, can significantly increase training time and computational requirements (Bai et al. 2021). Runtime detection and mitigation mechanisms, such as input preprocessing (Addepalli et al. 2020) or prediction consistency checks, introduce latency and affect the real-time performance of ML systems.\n\nBai, Tao, Jinqi Luo, Jun Zhao, Bihan Wen, and Qian Wang. 2021. “Recent Advances in Adversarial Training for Adversarial Robustness.” arXiv Preprint arXiv:2102.01356.\n\nAddepalli, Sravanti, B. S. Vivek, Arya Baburaj, Gaurang Sriramanan, and R. Venkatesh Babu. 2020. “Towards Achieving Adversarial Robustness by Enforcing Feature Consistency Across Bit Planes.” In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 1020–29. IEEE. https://doi.org/10.1109/cvpr42600.2020.00110.\nThe presence of adversarial vulnerabilities also complicates the deployment and maintenance of ML systems. System designers and operators must consider the potential for adversarial attacks and incorporate appropriate defenses and monitoring mechanisms. Regular updates and retraining of models become necessary to adapt to new adversarial techniques and maintain system security and performance over time.\nThe impact of adversarial attacks on ML systems is significant and multifaceted. These attacks expose ML models’ vulnerabilities, from degrading model performance and raising security and safety concerns to challenging model trustworthiness and interpretability. Developers and researchers must prioritize the development of robust defenses and countermeasures to mitigate the risks posed by adversarial attacks. By addressing these challenges, we can build more secure, reliable, and trustworthy ML systems that can withstand the ever-evolving landscape of adversarial threats.\n\nExercise 17.2 (Adversarial Attacks)  \n\n\n\n\n\nGet ready to become an AI adversary! In this Colab, you’ll become a white-box hacker, learning to craft attacks that deceive image classification models. We’ll focus on the Fast Gradient Sign Method (FGSM), where you’ll weaponize a model’s gradients against it! You’ll deliberately distort images with tiny perturbations, observing how they increasingly fool the AI more intensely. This hands-on exercise highlights the importance of building secure AI – a critical skill as AI integrates into cars and healthcare. The Colab directly ties into the Robust AI chapter of your book, moving adversarial attacks from theory into your own hands-on experience.\n\nThink you can outsmart an AI? In this Colab, learn how to trick image classification models with adversarial attacks. We’ll use methods like FGSM to change images and subtly fool the AI. Discover how to design deceptive image patches and witness the surprising vulnerability of these powerful models. This is crucial knowledge for building truly robust AI systems!\n\n\n\n\n\n\n\n17.4.2 Data Poisoning\n\nDefinition and Characteristics\nData poisoning is an attack where the training data is tampered with, leading to a compromised model (Biggio, Nelson, and Laskov 2012), as shown in Figure 17.22. Attackers can modify existing training examples, insert new malicious data points, or influence the data collection process. The poisoned data is labeled in such a way as to skew the model’s learned behavior. This can be particularly damaging in applications where ML models make automated decisions based on learned patterns. Beyond training sets, poisoning tests, and validation data can allow adversaries to boost reported model performance artificially.\n\nBiggio, Battista, Blaine Nelson, and Pavel Laskov. 2012. “Poisoning Attacks Against Support Vector Machines.” In Proceedings of the 29th International Conference on Machine Learning, ICML 2012, Edinburgh, Scotland, UK, June 26 - July 1, 2012. icml.cc / Omnipress. http://icml.cc/2012/papers/880.pdf.\n\n\n\n\n\n\nFigure 17.22: NightShade’s poisoning effects on Stable Diffusion (Credit: TOMÉ)\n\n\n\nThe process usually involves the following steps:\n\nInjection: The attacker adds incorrect or misleading examples into the training set. These examples are often designed to look normal to cursory inspection but have been carefully crafted to disrupt the learning process.\nTraining: The ML model trains on this manipulated dataset and develops skewed understandings of the data patterns.\nDeployment: Once the model is deployed, the corrupted training leads to flawed decision-making or predictable vulnerabilities the attacker can exploit.\n\nThe impact of data poisoning extends beyond classification errors or accuracy drops. In critical applications like healthcare, such alterations can lead to significant trust and safety issues (Marulli, Marrone, and Verde 2022). Later, we will discuss a few case studies of these issues.\n\nMarulli, Fiammetta, Stefano Marrone, and Laura Verde. 2022. “Sensitivity of Machine Learning Approaches to Fake and Untrusted Data in Healthcare Domain.” Journal of Sensor and Actuator Networks 11 (2): 21. https://doi.org/10.3390/jsan11020021.\n\nOprea, Alina, Anoop Singhal, and Apostol Vassilev. 2022. “Poisoning Attacks Against Machine Learning: Can Machine Learning Be Trustworthy?” Computer 55 (11): 94–99. https://doi.org/10.1109/mc.2022.3190787.\nThere are six main categories of data poisoning (Oprea, Singhal, and Vassilev 2022):\n\nAvailability Attacks: These attacks aim to compromise the overall functionality of a model. They cause it to misclassify most testing samples, rendering the model unusable for practical applications. An example is label flipping, where labels of a specific, targeted class are replaced with labels from a different one.\nTargeted Attacks: In contrast to availability attacks, targeted attacks aim to compromise a small number of the testing samples. So, the effect is localized to a limited number of classes, while the model maintains the same original level of accuracy for the majority of the classes. The targeted nature of the attack requires the attacker to possess knowledge of the model’s classes, making detecting these attacks more challenging.\nBackdoor Attacks: In these attacks, an adversary targets specific patterns in the data. The attacker introduces a backdoor (a malicious, hidden trigger or pattern) into the training data, such as manipulating certain features in structured data or manipulating a pattern of pixels at a fixed position. This causes the model to associate the malicious pattern with specific labels. As a result, when the model encounters test samples that contain a malicious pattern, it makes false predictions.\nSubpopulation Attacks: Attackers selectively choose to compromise a subset of the testing samples while maintaining accuracy on the rest of the samples. You can think of these attacks as a combination of availability and targeted attacks: performing availability attacks (performance degradation) within the scope of a targeted subset. Although subpopulation attacks may seem very similar to targeted attacks, the two have clear differences:\nScope: While targeted attacks target a selected set of samples, subpopulation attacks target a general subpopulation with similar feature representations. For example, in a targeted attack, an actor inserts manipulated images of a ‘speed bump’ warning sign (with carefully crafted perturbations or patterns), which causes an autonomous car to fail to recognize such a sign and slow down. On the other hand, manipulating all samples of people with a British accent so that a speech recognition model would misclassify a British person’s speech is an example of a subpopulation attack.\nKnowledge: While targeted attacks require a high degree of familiarity with the data, subpopulation attacks require less intimate knowledge to be effective.\n\nThe characteristics of data poisoning include:\nSubtle and hard-to-detect manipulations of training data: Data poisoning often involves subtle manipulations of the training data that are carefully crafted to be difficult to detect through casual inspection. Attackers employ sophisticated techniques to ensure that the poisoned samples blend seamlessly with the legitimate data, making them easier to identify with thorough analysis. These manipulations can target specific features or attributes of the data, such as altering numerical values, modifying categorical labels, or introducing carefully designed patterns. The goal is to influence the model’s learning process while evading detection, allowing the poisoned data to subtly corrupt the model’s behavior.\nCan be performed by insiders or external attackers: Data poisoning attacks can be carried out by various actors, including malicious insiders with access to the training data and external attackers who find ways to influence the data collection or preprocessing pipeline. Insiders pose a significant threat because they often have privileged access and knowledge of the system, enabling them to introduce poisoned data without raising suspicions. On the other hand, external attackers may exploit vulnerabilities in data sourcing, crowdsourcing platforms, or data aggregation processes to inject poisoned samples into the training dataset. This highlights the importance of implementing strong access controls, data governance policies, and monitoring mechanisms to mitigate the risk of insider threats and external attacks.\nExploits vulnerabilities in data collection and preprocessing: Data poisoning attacks often exploit vulnerabilities in the machine learning pipeline’s data collection and preprocessing stages. Attackers carefully design poisoned samples to evade common data validation techniques, ensuring that the manipulated data still falls within acceptable ranges, follows expected distributions, or maintains consistency with other features. This allows the poisoned data to pass through data preprocessing steps without detection. Furthermore, poisoning attacks can take advantage of weaknesses in data preprocessing, such as inadequate data cleaning, insufficient outlier detection, or lack of integrity checks. Attackers may also exploit the lack of robust data provenance and lineage tracking mechanisms to introduce poisoned data without leaving a traceable trail. Addressing these vulnerabilities requires rigorous data validation, anomaly detection, and data provenance tracking techniques to ensure the integrity and trustworthiness of the training data.\nDisrupts the learning process and skews model behavior: Data poisoning attacks are designed to disrupt the learning process of machine learning models and skew their behavior towards the attacker’s objectives. The poisoned data is typically manipulated with specific goals, such as skewing the model’s behavior towards certain classes, introducing backdoors, or degrading overall performance. These manipulations are not random but targeted to achieve the attacker’s desired outcomes. By introducing label inconsistencies, where the manipulated samples have labels that do not align with their true nature, poisoning attacks can confuse the model during training and lead to biased or incorrect predictions. The disruption caused by poisoned data can have far-reaching consequences, as the compromised model may make flawed decisions or exhibit unintended behavior when deployed in real-world applications.\nImpacts model performance, fairness, and trustworthiness: Poisoned data in the training dataset can have severe implications for machine learning models’ performance, fairness, and trustworthiness. Poisoned data can degrade the accuracy and performance of the trained model, leading to increased misclassifications or errors in predictions. This can have significant consequences, especially in critical applications where the model’s outputs inform important decisions. Moreover, poisoning attacks can introduce biases and fairness issues, causing the model to make discriminatory or unfair decisions for certain subgroups or classes. This undermines machine learning systems’ ethical and social responsibilities and can perpetuate or amplify existing biases. Furthermore, poisoned data erodes the trustworthiness and reliability of the entire ML system. The model’s outputs become questionable and potentially harmful, leading to a loss of confidence in the system’s integrity. The impact of poisoned data can propagate throughout the entire ML pipeline, affecting downstream components and decisions that rely on the compromised model. Addressing these concerns requires robust data governance, regular model auditing, and ongoing monitoring to detect and mitigate the effects of data poisoning attacks.\n\n\nMechanisms of Data Poisoning\nData poisoning attacks can be carried out through various mechanisms, exploiting different ML pipeline vulnerabilities. These mechanisms allow attackers to manipulate the training data and introduce malicious samples that can compromise the model’s performance, fairness, or integrity. Understanding these mechanisms is crucial for developing effective defenses against data poisoning and ensuring the robustness of ML systems. Data poisoning mechanisms can be broadly categorized based on the attacker’s approach and the stage of the ML pipeline they target. Some common mechanisms include modifying training data labels, altering feature values, injecting carefully crafted malicious samples, exploiting data collection and preprocessing vulnerabilities, manipulating data at the source, poisoning data in online learning scenarios, and collaborating with insiders to manipulate data.\nEach of these mechanisms presents unique challenges and requires different mitigation strategies. For example, detecting label manipulation may involve analyzing the distribution of labels and identifying anomalies (Zhou et al. 2018), while preventing feature manipulation may require secure data preprocessing and anomaly detection techniques (Carta et al. 2020). Defending against insider threats may involve strict access control policies and monitoring of data access patterns. Moreover, the effectiveness of data poisoning attacks often depends on the attacker’s knowledge of the ML system, including the model architecture, training algorithms, and data distribution. Attackers may use adversarial machine learning or data synthesis techniques to craft samples that are more likely to bypass detection and achieve their malicious objectives.\n\nZhou, Peng, Xintong Han, Vlad I. Morariu, and Larry S. Davis. 2018. “Learning Rich Features for Image Manipulation Detection.” In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1053–61. IEEE. https://doi.org/10.1109/cvpr.2018.00116.\n\nCarta, Salvatore, Alessandro Sebastian Podda, Diego Reforgiato Recupero, and Roberto Saia. 2020. “A Local Feature Engineering Strategy to Improve Network Anomaly Detection.” Future Internet 12 (10): 177. https://doi.org/10.3390/fi12100177.\n\n\n\n\n\n\nFigure 17.23: Garbage In – Garbage Out (Credit: Information Matters)\n\n\n\nModifying training data labels: One of the most straightforward mechanisms of data poisoning is modifying the training data labels. In this approach, the attacker selectively changes the labels of a subset of the training samples to mislead the model’s learning process as shown in Figure 17.23. For example, in a binary classification task, the attacker might flip the labels of some positive samples to negative, or vice versa. By introducing such label noise, the attacker aims to degrade the model’s performance or cause it to make incorrect predictions for specific target instances.\nAltering feature values in training data: Another mechanism of data poisoning involves altering the feature values of the training samples without modifying the labels. The attacker carefully crafts the feature values to introduce specific biases or vulnerabilities into the model. For instance, in an image classification task, the attacker might add imperceptible perturbations to a subset of images, causing the model to learn a particular pattern or association. This type of poisoning can create backdoors or trojans in the trained model, which specific input patterns can trigger.\nInjecting carefully crafted malicious samples: In this mechanism, the attacker creates malicious samples designed to poison the model. These samples are crafted to have a specific impact on the model’s behavior while blending in with the legitimate training data. The attacker might use techniques such as adversarial perturbations or data synthesis to generate poisoned samples that are difficult to detect. The attacker aims to manipulate the model’s decision boundaries by injecting these malicious samples into the training data or introducing targeted misclassifications.\nExploiting data collection and preprocessing vulnerabilities: Data poisoning attacks can also exploit the data collection and preprocessing pipeline vulnerabilities. If the data collection process is not secure or there are weaknesses in the data preprocessing steps, an attacker can manipulate the data before it reaches the training phase. For example, if data is collected from untrusted sources or issues in data cleaning or aggregation, an attacker can introduce poisoned samples or manipulate the data to their advantage.\nManipulating data at the source (e.g., sensor data): In some cases, attackers can manipulate the data at its source, such as sensor data or input devices. By tampering with the sensors or manipulating the environment in which data is collected, attackers can introduce poisoned samples or bias the data distribution. For instance, in a self-driving car scenario, an attacker might manipulate the sensors or the environment to feed misleading information into the training data, compromising the model’s ability to make safe and reliable decisions.\n\n\n\n\n\n\nFigure 17.24: Data Poisoning Attack (Credit: Sikandar)\n\n\n\nPoisoning data in online learning scenarios: Data poisoning attacks can also target ML systems that employ online learning, where the model is continuously updated with new data in real time. In such scenarios, an attacker can gradually inject poisoned samples over time, slowly manipulating the model’s behavior. Online learning systems are particularly vulnerable to data poisoning because they adapt to new data without extensive validation, making it easier for attackers to introduce malicious samples, as shown in Figure 17.24.\nCollaborating with insiders to manipulate data: Sometimes, data poisoning attacks can involve collaboration with insiders with access to the training data. Malicious insiders, such as employees or data providers, can manipulate the data before it is used to train the model. Insider threats are particularly challenging to detect and prevent, as the attackers have legitimate access to the data and can carefully craft the poisoning strategy to evade detection.\nThese are the key mechanisms of data poisoning in ML systems. Attackers often employ these mechanisms to make their attacks more effective and harder to detect. The risk of data poisoning attacks grows as ML systems become increasingly complex and rely on larger datasets from diverse sources. Defending against data poisoning requires a multifaceted approach. ML practitioners and system designers must be aware of the various mechanisms of data poisoning and adopt a comprehensive approach to data security and model resilience. This includes secure data collection, robust data validation, and continuous model performance monitoring. Implementing secure data collection and preprocessing practices is crucial to prevent data poisoning at the source. Data validation and anomaly detection techniques can also help identify and mitigate potential poisoning attempts. Monitoring model performance for signs of data poisoning is also essential to detect and respond to attacks promptly.\n\n\nImpact on ML Systems\nData poisoning attacks can severely affect ML systems, compromising their performance, reliability, and trustworthiness. The impact of data poisoning can manifest in various ways, depending on the attacker’s objectives and the specific mechanism used. Let’s explore each of the potential impacts in detail.\nDegradation of model performance: One of the primary impacts of data poisoning is the degradation of the model’s overall performance. By manipulating the training data, attackers can introduce noise, biases, or inconsistencies that hinder the model’s ability to learn accurate patterns and make reliable predictions. This can reduce accuracy, precision, recall, or other performance metrics. The degradation of model performance can have significant consequences, especially in critical applications such as healthcare, finance, or security, where the reliability of predictions is crucial.\nMisclassification of specific targets: Data poisoning attacks can also be designed to cause the model to misclassify specific target instances. Attackers may introduce carefully crafted poisoned samples similar to the target instances, leading the model to learn incorrect associations. This can result in the model consistently misclassifying the targeted instances, even if it performs well on other inputs. Such targeted misclassification can have severe consequences, such as causing a malware detection system to overlook specific malicious files or leading to the wrong diagnosis in a medical imaging application.\nBackdoors and trojans in trained models: Data poisoning can introduce backdoors or trojans into the trained model. Backdoors are hidden functionalities that allow attackers to trigger specific behaviors or bypass normal authentication mechanisms. On the other hand, Trojans are malicious components embedded within the model that can activate specific input patterns. By poisoning the training data, attackers can create models that appear to perform normally but contain hidden vulnerabilities that can be exploited later. Backdoors and trojans can compromise the integrity and security of the ML system, allowing attackers to gain unauthorized access, manipulate predictions, or exfiltrate sensitive information.\nBiased or unfair model outcomes: Data poisoning attacks can introduce biases or unfairness into the model’s predictions. By manipulating the training data distribution or injecting samples with specific biases, attackers can cause the model to learn and perpetuate discriminatory patterns. This can lead to unfair treatment of certain groups or individuals based on sensitive attributes such as race, gender, or age. Biased models can have severe societal implications, reinforcing existing inequalities and discriminatory practices. Ensuring fairness and mitigating biases is crucial for building trustworthy and ethical ML systems.\nIncreased false positives or false negatives: Data poisoning can also impact the model’s ability to correctly identify positive or negative instances, leading to increased false positives or false negatives. False positives occur when the model incorrectly identifies a negative instance as positive, while false negatives happen when a positive instance is misclassified as negative. The consequences of increased false positives or false negatives can be significant depending on the application. For example, in a fraud detection system, high false positives can lead to unnecessary investigations and customer frustration, while high false negatives can allow fraudulent activities to go undetected.\nCompromised system reliability and trustworthiness: Data poisoning attacks can undermine ML systems’ overall reliability and trustworthiness. When models are trained on poisoned data, their predictions become reliable and trustworthy. This can erode user confidence in the system and lead to a loss of trust in the decisions made by the model. In critical applications where ML systems are relied upon for decision-making, such as autonomous vehicles or medical diagnosis, compromised reliability can have severe consequences, putting lives and property at risk.\nAddressing the impact of data poisoning requires a proactive approach to data security, model testing, and monitoring. Organizations must implement robust measures to ensure the integrity and quality of training data, employ techniques to detect and mitigate poisoning attempts, and continuously monitor the performance and behavior of deployed models. Collaboration between ML practitioners, security experts, and domain specialists is essential to develop comprehensive strategies for preventing and responding to data poisoning attacks.\n\nCase Study 1\nIn 2017, researchers demonstrated a data poisoning attack against a popular toxicity classification model called Perspective (Hosseini et al. 2017). This ML model detects toxic comments online.\n\nHosseini, Hossein, Sreeram Kannan, Baosen Zhang, and Radha Poovendran. 2017. “Deceiving Google’s Perspective Api Built for Detecting Toxic Comments.” ArXiv Preprint abs/1702.08138. https://arxiv.org/abs/1702.08138.\nThe researchers added synthetically generated toxic comments with slight misspellings and grammatical errors to the model’s training data. This slowly corrupted the model, causing it to misclassify increasing numbers of severely toxic inputs as non-toxic over time.\nAfter retraining on the poisoned data, the model’s false negative rate increased from 1.4% to 27% - allowing extremely toxic comments to bypass detection. The researchers warned this stealthy data poisoning could enable the spread of hate speech, harassment, and abuse if deployed against real moderation systems.\nThis case highlights how data poisoning can degrade model accuracy and reliability. For social media platforms, a poisoning attack that impairs toxicity detection could lead to the proliferation of harmful content and distrust of ML moderation systems. The example demonstrates why securing training data integrity and monitoring for poisoning is critical across application domains.\n\n\nCase Study 2\n\n\n\n\n\n\nFigure 17.25: Samples of dirty-label poison data regarding mismatched text/image pairs (Credit: Shan)\n\n\n\nInterestingly enough, data poisoning attacks are not always malicious (Shan et al. 2023). Nightshade, a tool developed by a team led by Professor Ben Zhao at the University of Chicago, utilizes data poisoning to help artists protect their art against scraping and copyright violations by generative AI models. Artists can use the tool to make subtle modifications to their images before uploading them online, as shown in Figure 17.25.\nWhile these changes are indiscernible to the human eye, they can significantly disrupt the performance of generative AI models when incorporated into the training data. Generative models can be manipulated to generate hallucinations and weird images. For example, with only 300 poisoned images, the University of Chicago researchers could trick the latest Stable Diffusion model into generating images of dogs that look like cats or images of cows when prompted for cars.\nAs the number of poisoned images on the internet increases, the performance of the models that use scraped data will deteriorate exponentially. First, the poisoned data is hard to detect and requires manual elimination. Second, the \"poison\" spreads quickly to other labels because generative models rely on connections between words and concepts as they generate images. So a poisoned image of a \"car\" could spread into generated images associated with words like \"truck,” \"train,” \" bus,” etc.\nOn the other hand, this tool can be used maliciously and can affect legitimate applications of the generative models. This shows the very challenging and novel nature of machine learning attacks.\nFigure 17.26 demonstrates the effects of different levels of data poisoning (50 samples, 100 samples, and 300 samples of poisoned images) on generating images in different categories. Notice how the images start deforming and deviating from the desired category. For example, after 300 poison samples, a car prompt generates a cow.\n\n\n\n\n\n\nFigure 17.26: Data poisoning (Credit: Shan et al. (2023))\n\n\nShan, Shawn, Wenxin Ding, Josephine Passananti, Haitao Zheng, and Ben Y Zhao. 2023. “Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models.” ArXiv Preprint abs/2310.13828. https://arxiv.org/abs/2310.13828.\n\n\n\nExercise 17.3 (Poisoning Attacks)  \n\n\n\n\n\nGet ready to explore the dark side of AI security! In this Colab, you’ll learn about data poisoning – how bad data can trick AI models into making wrong decisions. We’ll focus on a real-world attack against a Support Vector Machine (SVM), observing how the AI’s behavior changes under attack. This hands-on exercise will highlight why protecting AI systems is crucial, especially as they become more integrated into our lives. Think like a hacker, understand the vulnerability, and brainstorm how to defend our AI systems!\n\n\n\n\n\n\n\n\n17.4.3 Distribution Shifts\n\nDefinition and Characteristics\nDistribution shift refers to the phenomenon where the data distribution encountered by an ML model during deployment (inference) differs from the distribution it was trained on, as shown in Figure 17.27. This is not so much an attack as it is that the model’s robustness will vary over time. In other words, the data’s statistical properties, patterns, or underlying assumptions can change between the training and test phases.\n\n\n\n\n\n\nFigure 17.27: The curly brackets enclose the distribution shift between the environments. Here, z stands for the spurious feature, and y stands for label class (Credit: Xin)\n\n\n\nThe key characteristics of distribution shift include:\nDomain mismatch: The input data during inference comes from a different domain or distribution than the training data. When the input data during inference comes from a domain or distribution different from the training data, it can significantly affect the model’s performance. This is because the model has learned patterns and relationships specific to the training domain, and when applied to a different domain, those learned patterns may not hold. For example, consider a sentiment analysis model trained on movie reviews. Suppose this model is applied to analyze sentiment in tweets. In that case, it may need help to accurately classify the sentiment because the language, grammar, and context of tweets can differ from movie reviews. This domain mismatch can result in poor performance and unreliable predictions, limiting the model’s practical utility.\nTemporal drift: The data distribution evolves, leading to a gradual or sudden shift in the input characteristics. Temporal drift is important because ML models are often deployed in dynamic environments where the data distribution can change over time. If the model is not updated or adapted to these changes, its performance can gradually degrade. For instance, the patterns and behaviors associated with fraudulent activities may evolve in a fraud detection system as fraudsters adapt their techniques. If the model is not retrained or updated to capture these new patterns, it may fail to detect new types of fraud effectively. Temporal drift can lead to a decline in the model’s accuracy and reliability over time, making monitoring and addressing this type of distribution shift crucial.\nContextual changes: The ML model’s context can vary, resulting in different data distributions based on factors such as location, user behavior, or environmental conditions. Contextual changes matter because ML models are often deployed in various contexts or environments that can have different data distributions. If the model cannot generalize well to these different contexts, its performance may improve. For example, consider a computer vision model trained to recognize objects in a controlled lab environment. When deployed in a real-world setting, factors such as lighting conditions, camera angles, or background clutter can vary significantly, leading to a distribution shift. If the model is robust to these contextual changes, it may be able to accurately recognize objects in the new environment, limiting its practical utility.\nUnrepresentative training data: The training data may only partially capture the variability and diversity of the real-world data encountered during deployment. Unrepresentative training data can lead to biased or skewed models that perform poorly on real-world data. Suppose the training data needs to capture the variability and diversity of the real-world data adequately. In that case, the model may learn patterns specific to the training set but needs to generalize better to new, unseen data. This can result in poor performance, biased predictions, and limited model applicability. For instance, if a facial recognition model is trained primarily on images of individuals from a specific demographic group, it may struggle to accurately recognize faces from other demographic groups when deployed in a real-world setting. Ensuring that the training data is representative and diverse is crucial for building models that can generalize well to real-world scenarios.\n\n\n\n\n\n\nFigure 17.28: Concept drift refers to a change in data patterns and relationships over time (Credit: Evidently AI)\n\n\n\nDistribution shift can manifest in various forms, such as:\nCovariate shift: The distribution of the input features (covariates) changes while the conditional distribution of the target variable given the input remains the same. Covariate shift matters because it can impact the model’s ability to make accurate predictions when the input features (covariates) differ between the training and test data. Even if the relationship between the input features and the target variable remains the same, a change in the distribution of the input features can affect the model’s performance. For example, consider a model trained to predict housing prices based on features like square footage, number of bedrooms, and location. Suppose the distribution of these features in the test data significantly differs from the training data (e.g., the test data contains houses with much larger square footage). In that case, the model’s predictions may become less accurate. Addressing covariate shifts is important to ensure the model’s robustness and reliability when applied to new data.\nConcept drift: The relationship between the input features and the target variable changes over time, altering the underlying concept the model is trying to learn, as shown in Figure 17.28. Concept drift is important because it indicates changes in the fundamental relationship between the input features and the target variable over time. When the underlying concept that the model is trying to learn shifts, its performance can deteriorate if not adapted to the new concept. For instance, in a customer churn prediction model, the factors influencing customer churn may evolve due to market conditions, competitor offerings, or customer preferences. If the model is not updated to capture these changes, its predictions may become less accurate and irrelevant. Detecting and adapting to concept drift is crucial to maintaining the model’s effectiveness and alignment with evolving real-world concepts.\nDomain generalization: The model must generalize to unseen domains or distributions not present during training. Domain generalization is important because it enables ML models to be applied to new, unseen domains without requiring extensive retraining or adaptation. In real-world scenarios, training data that covers all possible domains or distributions that the model may encounter is often infeasible. Domain generalization techniques aim to learn domain-invariant features or models that can generalize well to new domains. For example, consider a model trained to classify images of animals. If the model can learn features invariant to different backgrounds, lighting conditions, or poses, it can generalize well to classify animals in new, unseen environments. Domain generalization is crucial for building models that can be deployed in diverse and evolving real-world settings.\nThe presence of a distribution shift can significantly impact the performance and reliability of ML models, as the models may need help generalizing well to the new data distribution. Detecting and adapting to distribution shifts is crucial to ensure ML systems’ robustness and practical utility in real-world scenarios.\n\n\nMechanisms of Distribution Shifts\nThe mechanisms of distribution shift, such as changes in data sources, temporal evolution, domain-specific variations, selection bias, feedback loops, and adversarial manipulations, are important to understand because they help identify the underlying causes of distribution shift. By understanding these mechanisms, practitioners can develop targeted strategies to mitigate their impact and improve the model’s robustness. Here are some common mechanisms:\n\n\n\n\n\n\nFigure 17.29: Temporal evolution (Credit: Białek)\n\n\n\nChanges in data sources: Distribution shifts can occur when the data sources used for training and inference differ. For example, if a model is trained on data from one sensor but deployed on data from another sensor with different characteristics, it can lead to a distribution shift.\nTemporal evolution: Over time, the underlying data distribution can evolve due to changes in user behavior, market dynamics, or other temporal factors. For instance, in a recommendation system, user preferences may shift over time, leading to a distribution shift in the input data, as shown in Figure 17.29.\nDomain-specific variations: Different domains or contexts can have distinct data distributions. A model trained on data from one domain may only generalize well to another domain with appropriate adaptation techniques. For example, an image classification model trained on indoor scenes may struggle when applied to outdoor scenes.\nSelection bias: A Distribution shift can arise from selection bias during data collection or sampling. If the training data does not represent the true population or certain subgroups are over- or underrepresented, this can lead to a mismatch between the training and test distributions.\nFeedback loops: In some cases, the predictions or actions taken by an ML model can influence future data distribution. For example, in a dynamic pricing system, the prices set by the model can impact customer behavior, leading to a shift in the data distribution over time.\nAdversarial manipulations: Adversaries can intentionally manipulate the input data to create a distribution shift and deceive the ML model. By introducing carefully crafted perturbations or generating out-of-distribution samples, attackers can exploit the model’s vulnerabilities and cause it to make incorrect predictions.\nUnderstanding the mechanisms of distribution shift is important for developing effective strategies to detect and mitigate its impact on ML systems. By identifying the sources and characteristics of the shift, practitioners can design appropriate techniques, such as domain adaptation, transfer learning, or continual learning, to improve the model’s robustness and performance under distributional changes.\n\n\nImpact on ML Systems\nDistribution shifts can significantly negatively impact the performance and reliability of ML systems. Here are some key ways in which distribution shift can affect ML models:\nDegraded predictive performance: When the data distribution encountered during inference differs from the training distribution, the model’s predictive accuracy can deteriorate. The model may need help generalizing the new data well, leading to increased errors and suboptimal performance.\nReduced reliability and trustworthiness: Distribution shift can undermine the reliability and trustworthiness of ML models. If the model’s predictions become unreliable or inconsistent due to the shift, users may lose confidence in the system’s outputs, leading to potential misuse or disuse of the model.\nBiased predictions: Distribution shift can introduce biases in the model’s predictions. If the training data does not represent the real-world distribution or certain subgroups are underrepresented, the model may make biased predictions that discriminate against certain groups or perpetuate societal biases.\nIncreased uncertainty and risk: Distribution shift introduces additional uncertainty and risk into the ML system. The model’s behavior and performance may become less predictable, making it challenging to assess its reliability and suitability for critical applications. This uncertainty can lead to increased operational risks and potential failures.\nAdaptability challenges: ML models trained on a specific data distribution may need help to adapt to changing environments or new domains. The lack of adaptability can limit the model’s usefulness and applicability in dynamic real-world scenarios where the data distribution evolves.\nMaintenance and update difficulties: Distribution shift can complicate the maintenance and updating of ML models. As the data distribution changes, the model may require frequent retraining or fine-tuning to maintain its performance. This can be time-consuming and resource-intensive, especially if the shift occurs rapidly or continuously.\nVulnerability to adversarial attacks: Distribution shift can make ML models more vulnerable to adversarial attacks. Adversaries can exploit the model’s sensitivity to distributional changes by crafting adversarial examples outside the training distribution, causing the model to make incorrect predictions or behave unexpectedly.\nTo mitigate the impact of distribution shifts, it is crucial to develop robust ML systems that detect and adapt to distributional changes. Techniques such as domain adaptation, transfer learning, and continual learning can help improve the model’s generalization ability across different distributions. ML model monitoring, testing, and updating are also necessary to ensure their performance and reliability during distribution shifts.\n\n\n\n17.4.4 Detection and Mitigation\n\nAdversarial Attacks\nAs you may recall from above, adversarial attacks pose a significant threat to the robustness and reliability of ML systems. These attacks involve crafting carefully designed inputs, known as adversarial examples, to deceive ML models and cause them to make incorrect predictions. To safeguard ML systems against adversarial attacks, developing effective techniques for detecting and mitigating these threats is crucial.\n\nAdversarial Example Detection Techniques\nDetecting adversarial examples is the first line of defense against adversarial attacks. Several techniques have been proposed to identify and flag suspicious inputs that may be adversarial.\nStatistical methods aim to detect adversarial examples by analyzing the statistical properties of the input data. These methods often compare the input data distribution to a reference distribution, such as the training data distribution or a known benign distribution. Techniques like the Kolmogorov-Smirnov (Berger and Zhou 2014) test or the Anderson-Darling test can be used to measure the discrepancy between the distributions and flag inputs that deviate significantly from the expected distribution.\n\nBerger, Vance W, and YanYan Zhou. 2014. “Kolmogorovsmirnov Test: Overview.” Wiley Statsref: Statistics Reference Online.\nKernel density estimation (KDE) is a non-parametric technique used to estimate the probability density function of a dataset. In the context of adversarial example detection, KDE can be used to estimate the density of benign examples in the input space. Adversarial examples often lie in low-density regions and can be detected by comparing their estimated density to a threshold. Inputs with an estimated density below the threshold are flagged as potential adversarial examples.\nAnother technique is feature squeezing (Panda, Chakraborty, and Roy 2019), which reduces the complexity of the input space by applying dimensionality reduction or discretization. The idea behind feature squeezing is that adversarial examples often rely on small, imperceptible perturbations that can be eliminated or reduced through these transformations. Inconsistencies can be detected by comparing the model’s predictions on the original input and the squeezed input, indicating the presence of adversarial examples.\n\nPanda, Priyadarshini, Indranil Chakraborty, and Kaushik Roy. 2019. “Discretization Based Solutions for Secure Machine Learning Against Adversarial Attacks.” #IEEE_O_ACC# 7: 70157–68. https://doi.org/10.1109/access.2019.2919463.\nModel uncertainty estimation techniques aim to quantify the confidence or uncertainty associated with a model’s predictions. Adversarial examples often exploit regions of high uncertainty in the model’s decision boundary. By estimating the uncertainty using techniques like Bayesian neural networks, dropout-based uncertainty estimation, or ensemble methods, inputs with high uncertainty can be flagged as potential adversarial examples.\n\n\nAdversarial Defense Strategies\nOnce adversarial examples are detected, various defense strategies can be employed to mitigate their impact and improve the robustness of ML models.\nAdversarial training is a technique that involves augmenting the training data with adversarial examples and retraining the model on this augmented dataset. Exposing the model to adversarial examples during training teaches it to classify them correctly and becomes more robust to adversarial attacks. Adversarial training can be performed using various attack methods, such as the Fast Gradient Sign Method (FGSM) or Projected Gradient Descent (PGD) (Madry et al. 2017).\n\nMadry, Aleksander, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. 2017. “Towards Deep Learning Models Resistant to Adversarial Attacks.” arXiv Preprint arXiv:1706.06083.\n\nPapernot, Nicolas, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016. “Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks.” In 2016 IEEE Symposium on Security and Privacy (SP), 582–97. IEEE; IEEE. https://doi.org/10.1109/sp.2016.41.\nDefensive distillation (Papernot et al. 2016) is a technique that trains a second model (the student model) to mimic the behavior of the original model (the teacher model). The student model is trained on the soft labels produced by the teacher model, which are less sensitive to small perturbations. Using the student model for inference can reduce the impact of adversarial perturbations, as the student model learns to generalize better and is less sensitive to adversarial noise.\nInput preprocessing and transformation techniques aim to remove or mitigate the effect of adversarial perturbations before feeding the input to the ML model. These techniques include image denoising, JPEG compression, random resizing, padding, or applying random transformations to the input data. By reducing the impact of adversarial perturbations, these preprocessing steps can help improve the model’s robustness to adversarial attacks.\nEnsemble methods combine multiple models to make more robust predictions. The ensemble can reduce the impact of adversarial attacks by using a diverse set of models with different architectures, training data, or hyperparameters. Adversarial examples that fool one model may not fool others in the ensemble, leading to more reliable and robust predictions. Model diversification techniques, such as using different preprocessing techniques or feature representations for each model in the ensemble, can further enhance the robustness.\n\n\nRobustness Evaluation and Testing\nConduct thorough evaluation and testing to assess the effectiveness of adversarial defense techniques and measure the robustness of ML models.\nAdversarial robustness metrics quantify the model’s resilience to adversarial attacks. These metrics can include the model’s accuracy on adversarial examples, the average distortion required to fool the model, or the model’s performance under different attack strengths. By comparing these metrics across different models or defense techniques, practitioners can assess and compare their robustness levels.\nStandardized adversarial attack benchmarks and datasets provide a common ground for evaluating and comparing the robustness of ML models. These benchmarks include datasets with pre-generated adversarial examples and tools and frameworks for generating adversarial attacks. Examples of popular adversarial attack benchmarks include the MNIST-C, CIFAR-10-C, and ImageNet-C (Hendrycks and Dietterich 2019) datasets, which contain corrupted or perturbed versions of the original datasets.\n\nHendrycks, Dan, and Thomas Dietterich. 2019. “Benchmarking Neural Network Robustness to Common Corruptions and Perturbations.” arXiv Preprint arXiv:1903.12261.\nPractitioners can develop more robust and resilient ML systems by leveraging these adversarial example detection techniques, defense strategies, and robustness evaluation methods. However, it is important to note that adversarial robustness is an ongoing research area, and no single technique provides complete protection against all types of adversarial attacks. A comprehensive approach that combines multiple defense mechanisms and regular testing is essential to maintain the security and reliability of ML systems in the face of evolving adversarial threats.\n\n\n\nData Poisoning\nRecall that data poisoning is an attack that targets the integrity of the training data used to build ML models. By manipulating or corrupting the training data, attackers can influence the model’s behavior and cause it to make incorrect predictions or perform unintended actions. Detecting and mitigating data poisoning attacks is crucial to ensure the trustworthiness and reliability of ML systems, as shown in Figure 17.30.\n\nAnomaly Detection Techniques for Identifying Poisoned Data\n\n\n\n\n\n\nFigure 17.30: Malicious data injection (Credit: Li)\n\n\n\nStatistical outlier detection methods identify data points that deviate significantly from most data. These methods assume that poisoned data instances are likely to be statistical outliers. Techniques such as the Z-score method, Tukey’s method, or the [Mahalanobis] distance can be used to measure the deviation of each data point from the central tendency of the dataset. Data points that exceed a predefined threshold are flagged as potential outliers and considered suspicious for data poisoning.\nClustering-based methods group similar data points together based on their features or attributes. The assumption is that poisoned data instances may form distinct clusters or lie far away from the normal data clusters. By applying clustering algorithms like K-means, DBSCAN, or hierarchical clustering, anomalous clusters or data points that do not belong to any cluster can be identified. These anomalous instances are then treated as potentially poisoned data.\n\n\n\n\n\n\nFigure 17.31: Autoencoder (Credit: Dertat)\n\n\n\nAutoencoders are neural networks trained to reconstruct the input data from a compressed representation, as shown in Figure 17.31. They can be used for anomaly detection by learning the normal patterns in the data and identifying instances that deviate from them. During training, the autoencoder is trained on clean, unpoisoned data. At inference time, the reconstruction error for each data point is computed. Data points with high reconstruction errors are considered abnormal and potentially poisoned, as they do not conform to the learned normal patterns.\n\n\nData Sanitization and Preprocessing Techniques\nData poisoning can be avoided by cleaning data, which involves identifying and removing or correcting noisy, incomplete, or inconsistent data points. Techniques such as data deduplication, missing value imputation, and outlier removal can be applied to improve the quality of the training data. By eliminating or filtering out suspicious or anomalous data points, the impact of poisoned instances can be reduced.\nData validation involves verifying the integrity and consistency of the training data. This can include checking for data type consistency, range validation, and cross-field dependencies. By defining and enforcing data validation rules, anomalous or inconsistent data points indicative of data poisoning can be identified and flagged for further investigation.\nData provenance and lineage tracking involve maintaining a record of data’s origin, transformations, and movements throughout the ML pipeline. By documenting the data sources, preprocessing steps, and any modifications made to the data, practitioners can trace anomalies or suspicious patterns back to their origin. This helps identify potential points of data poisoning and facilitates the investigation and mitigation process.\n\n\nRobust Training Techniques\nRobust optimization techniques can be used to modify the training objective to minimize the impact of outliers or poisoned instances. This can be achieved by using robust loss functions less sensitive to extreme values, such as the Huber loss or the modified Huber loss. Regularization techniques, such as L1 or L2 regularization, can also help in reducing the model’s sensitivity to poisoned data by constraining the model’s complexity and preventing overfitting.\nRobust loss functions are designed to be less sensitive to outliers or noisy data points. Examples include the modified Huber loss, the Tukey loss (Beaton and Tukey 1974), and the trimmed mean loss. These loss functions down-weight or ignore the contribution of abnormal instances during training, reducing their impact on the model’s learning process. Robust objective functions, such as the minimax or distributionally robust objective, aim to optimize the model’s performance under worst-case scenarios or in the presence of adversarial perturbations.\n\nBeaton, Albert E., and John W. Tukey. 1974. “The Fitting of Power Series, Meaning Polynomials, Illustrated on Band-Spectroscopic Data.” Technometrics 16 (2): 147. https://doi.org/10.2307/1267936.\nData augmentation techniques involve generating additional training examples by applying random transformations or perturbations to the existing data Figure 17.32. This helps in increasing the diversity and robustness of the training dataset. By introducing controlled variations in the data, the model becomes less sensitive to specific patterns or artifacts that may be present in poisoned instances. Randomization techniques, such as random subsampling or bootstrap aggregating, can also help reduce the impact of poisoned data by training multiple models on different subsets of the data and combining their predictions.\n\n\n\n\n\n\nFigure 17.32: An image of the number “3” in original form and with basic augmentations applied.\n\n\n\n\n\nSecure and Trusted Data Sourcing\nImplementing the best data collection and curation practices can help mitigate the risk of data poisoning. This includes establishing clear data collection protocols, verifying the authenticity and reliability of data sources, and conducting regular data quality assessments. Sourcing data from trusted and reputable providers and following secure data handling practices can reduce the likelihood of introducing poisoned data into the training pipeline.\nStrong data governance and access control mechanisms are essential to prevent unauthorized modifications or tampering with the training data. This involves defining clear roles and responsibilities for data access, implementing access control policies based on the principle of least privilege, and monitoring and logging data access activities. By restricting access to the training data and maintaining an audit trail, potential data poisoning attempts can be detected and investigated.\nDetecting and mitigating data poisoning attacks requires a multifaceted approach that combines anomaly detection, data sanitization, robust training techniques, and secure data sourcing practices. By implementing these measures, ML practitioners can enhance the resilience of their models against data poisoning and ensure the integrity and trustworthiness of the training data. However, it is important to note that data poisoning is an active area of research, and new attack vectors and defense mechanisms continue to emerge. Staying informed about the latest developments and adopting a proactive and adaptive approach to data security is crucial for maintaining the robustness of ML systems.\n\n\n\nDistribution Shifts\n\nDetecting and Mitigating Distribution Shifts\nRecall that distribution shifts occur when the data distribution encountered by a machine learning (ML) model during deployment differs from the distribution it was trained on. These shifts can significantly impact the model’s performance and generalization ability, leading to suboptimal or incorrect predictions. Detecting and mitigating distribution shifts is crucial to ensure the robustness and reliability of ML systems in real-world scenarios.\n\n\nDetection Techniques for Distribution Shifts\nStatistical tests can be used to compare the distributions of the training and test data to identify significant differences. Techniques such as the Kolmogorov-Smirnov test or the Anderson-Darling test measure the discrepancy between two distributions and provide a quantitative assessment of the presence of distribution shift. By applying these tests to the input features or the model’s predictions, practitioners can detect if there is a statistically significant difference between the training and test distributions.\nDivergence metrics quantify the dissimilarity between two probability distributions. Commonly used divergence metrics include the Kullback-Leibler (KL) divergence and the [Jensen-Shannon (JS)] divergence. By calculating the divergence between the training and test data distributions, practitioners can assess the extent of the distribution shift. High divergence values indicate a significant difference between the distributions, suggesting the presence of a distribution shift.\nUncertainty quantification techniques, such as Bayesian neural networks or ensemble methods, can estimate the uncertainty associated with the model’s predictions. When a model is applied to data from a different distribution, its predictions may have higher uncertainty. By monitoring the uncertainty levels, practitioners can detect distribution shifts. If the uncertainty consistently exceeds a predetermined threshold for test samples, it suggests that the model is operating outside its trained distribution.\nIn addition, domain classifiers are trained to distinguish between different domains or distributions. Practitioners can detect distribution shifts by training a classifier to differentiate between the training and test domains. If the domain classifier achieves high accuracy in distinguishing between the two domains, it indicates a significant difference in the underlying distributions. The performance of the domain classifier serves as a measure of the distribution shift.\n\n\nMitigation Techniques for Distribution Shifts\n\n\n\n\n\n\nFigure 17.33: Transfer learning (Credit: Bhavsar)\n\n\n\nTransfer learning leverages knowledge gained from one domain to improve performance in another, as shown in Figure 17.33. By using pre-trained models or transferring learned features from a source domain to a target domain, transfer learning can help mitigate the impact of distribution shifts. The pre-trained model can be fine-tuned on a small amount of labeled data from the target domain, allowing it to adapt to the new distribution. Transfer learning is particularly effective when the source and target domains share similar characteristics or when labeled data in the target domain is scarce.\nContinual learning, also known as lifelong learning, enables ML models to learn continuously from new data distributions while retaining knowledge from previous distributions. Techniques such as elastic weight consolidation (EWC) (Kirkpatrick et al. 2017) or gradient episodic memory (GEM) (Lopez-Paz and Ranzato 2017) allow models to adapt to evolving data distributions over time. These techniques aim to balance the plasticity of the model (ability to learn from new data) with the stability of the model (retaining previously learned knowledge). By incrementally updating the model with new data and mitigating catastrophic forgetting, continual learning helps models stay robust to distribution shifts.\n\nKirkpatrick, James, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, et al. 2017. “Overcoming Catastrophic Forgetting in Neural Networks.” Proc. Natl. Acad. Sci. 114 (13): 3521–26. https://doi.org/10.1073/pnas.1611835114.\n\nLopez-Paz, David, and Marc’Aurelio Ranzato. 2017. “Gradient Episodic Memory for Continual Learning.” Adv Neural Inf Process Syst 30.\nData augmentation techniques, such as those we have seen previously, involve applying transformations or perturbations to the existing training data to increase its diversity and improve the model’s robustness to distribution shifts. By introducing variations in the data, such as rotations, translations, scaling, or adding noise, data augmentation helps the model learn invariant features and generalize better to unseen distributions. Data augmentation can be performed during training and inference to enhance the model’s ability to handle distribution shifts.\nEnsemble methods combine multiple models to make predictions more robust to distribution shifts. By training models on different subsets of the data, using different algorithms, or with different hyperparameters, ensemble methods can capture diverse aspects of the data distribution. When presented with a shifted distribution, the ensemble can leverage the strengths of individual models to make more accurate and stable predictions. Techniques like bagging, boosting, or stacking can create effective ensembles.\nRegularly updating models with new data from the target distribution is crucial to mitigate the impact of distribution shifts. As the data distribution evolves, models should be retrained or fine-tuned on the latest available data to adapt to the changing patterns. Monitoring model performance and data characteristics can help detect when an update is necessary. By keeping the models up to date, practitioners can ensure they remain relevant and accurate in the face of distribution shifts.\nEvaluating models using robust metrics less sensitive to distribution shifts can provide a more reliable assessment of model performance. Metrics such as the area under the precision-recall curve (AUPRC) or the F1 score are more robust to class imbalance and can better capture the model’s performance across different distributions. Additionally, using domain-specific evaluation metrics that align with the desired outcomes in the target domain can provide a more meaningful measure of the model’s effectiveness.\nDetecting and mitigating distribution shifts is an ongoing process that requires continuous monitoring, adaptation, and improvement. By employing a combination of detection techniques and mitigation strategies, ML practitioners can proactively identify and address distribution shifts, ensuring the robustness and reliability of their models in real-world deployments. It is important to note that distribution shifts can take various forms and may require domain-specific approaches depending on the nature of the data and the application. Staying informed about the latest research and best practices in handling distribution shifts is essential for building resilient ML systems.", "crumbs": [ "MAIN", "17  Robust AI" @@ -1923,7 +1923,7 @@ "href": "contents/robust_ai/robust_ai.html#software-faults", "title": "17  Robust AI", "section": "17.5 Software Faults", - "text": "17.5 Software Faults\n\nDefinition and Characteristics\nSoftware faults refer to defects, errors, or bugs in the runtime software frameworks and components that support the execution and deployment of ML models (Myllyaho et al. 2022). These faults can arise from various sources, such as programming mistakes, design flaws, or compatibility issues (H. Zhang 2008), and can have significant implications for ML systems’ performance, reliability, and security. Software faults in ML frameworks exhibit several key characteristics:\n\nMyllyaho, Lalli, Mikko Raatikainen, Tomi Männistö, Jukka K. Nurminen, and Tommi Mikkonen. 2022. “On Misbehaviour and Fault Tolerance in Machine Learning Systems.” J. Syst. Software 183 (January): 111096. https://doi.org/10.1016/j.jss.2021.111096.\n\nZhang, Hongyu. 2008. “On the Distribution of Software Faults.” IEEE Trans. Software Eng. 34 (2): 301–2. https://doi.org/10.1109/tse.2007.70771.\n\nDiversity: Software faults can manifest in different forms, ranging from simple logic and syntax mistakes to more complex issues like memory leaks, race conditions, and integration problems. The variety of fault types adds to the challenge of detecting and mitigating them effectively.\nPropagation: In ML systems, software faults can propagate through the various layers and components of the framework. A fault in one module can trigger a cascade of errors or unexpected behavior in other parts of the system, making it difficult to pinpoint the root cause and assess the full impact of the fault.\nIntermittency: Some software faults may exhibit intermittent behavior, occurring sporadically or under specific conditions. These faults can be particularly challenging to reproduce and debug, as they may manifest inconsistently during testing or normal operation.\nInteraction with ML models: Software faults in ML frameworks can interact with the trained models in subtle ways. For example, a fault in the data preprocessing pipeline may introduce noise or bias into the model’s inputs, leading to degraded performance or incorrect predictions. Similarly, faults in the model serving component may cause inconsistencies between the training and inference environments.\nImpact on system properties: Software faults can compromise various desirable properties of ML systems, such as performance, scalability, reliability, and security. Faults may lead to slowdowns, crashes, incorrect outputs, or vulnerabilities that attackers can exploit.\nDependency on external factors: The occurrence and impact of software faults in ML frameworks often depend on external factors, such as the choice of hardware, operating system, libraries, and configurations. Compatibility issues and version mismatches can introduce faults that are difficult to anticipate and mitigate.\n\nUnderstanding the characteristics of software faults in ML frameworks is crucial for developing effective fault prevention, detection, and mitigation strategies. By recognizing the diversity, propagation, intermittency, and impact of software faults, ML practitioners can design more robust and reliable systems resilient to these issues.\n\n\nMechanisms of Software Faults in ML Frameworks\nMachine learning frameworks, such as TensorFlow, PyTorch, and sci-kit-learn, provide powerful tools and abstractions for building and deploying ML models. However, these frameworks are not immune to software faults that can impact ML systems’ performance, reliability, and correctness. Let’s explore some of the common software faults that can occur in ML frameworks:\nMemory Leaks and Resource Management Issues: Improper memory management, such as failing to release memory or close file handles, can lead to memory leaks and resource exhaustion over time. This issue is compounded by inefficient memory usage, where creating unnecessary copies of large tensors or not leveraging memory-efficient data structures can cause excessive memory consumption and degrade system performance. Additionally, failing to manage GPU memory properly can result in out-of-memory errors or suboptimal utilization of GPU resources, further exacerbating the problem as shown in Figure 17.34.\n\n\n\n\n\n\nFigure 17.34: Example of GPU out-of-the-memory and suboptimal utilization issues\n\n\n\nSynchronization and Concurrency Problems: Incorrect synchronization between threads or processes can lead to race conditions, deadlocks, or inconsistent behavior in multi-threaded or distributed ML systems. This issue is often tied to improper handling of asynchronous operations, such as non-blocking I/O or parallel data loading, which can cause synchronization issues and impact the correctness of the ML pipeline. Moreover, proper coordination and communication between distributed nodes in a cluster can result in consistency or stale data during training or inference, compromising the reliability of the ML system.\nCompatibility Issues: Mismatches between the versions of ML frameworks, libraries, or dependencies can introduce compatibility problems and runtime errors. Upgrading or changing the versions of underlying libraries without thoroughly testing the impact on the ML system can lead to unexpected behavior or breakages. Furthermore, inconsistencies between the training and deployment environments, such as differences in hardware, operating systems, or package versions, can cause compatibility issues and affect the reproducibility of ML models, making it challenging to ensure consistent performance across different platforms.\nNumerical Instability and Precision Errors: Inadequate handling of numerical instabilities, such as division by zero, underflow, or overflow, can lead to incorrect calculations or convergence issues during training. This problem is compounded by insufficient precision or rounding errors, which can accumulate over time and impact the accuracy of the ML models, especially in deep learning architectures with many layers. Moreover, improper scaling or normalization of input data can cause numerical instabilities and affect the convergence and performance of optimization algorithms, resulting in suboptimal or unreliable model performance.\nInadequate Error Handling and Exception Management: Proper error handling and exception management can prevent ML systems from crashing or behaving unexpectedly when encountering exceptional conditions or invalid inputs. Failing to catch and handle specific exceptions or relying on generic exception handling can make it difficult to diagnose and recover from errors gracefully, leading to system instability and reduced reliability. Furthermore, incomplete or misleading error messages can hinder the ability to effectively debug and resolve software faults in ML frameworks, prolonging the time required to identify and fix issues.\n\n\nImpact on ML Systems\nSoftware faults in machine learning frameworks can have significant and far-reaching impacts on ML systems’ performance, reliability, and security. Let’s explore the various ways in which software faults can affect ML systems:\nPerformance Degradation and System Slowdowns: Memory leaks and inefficient resource management can lead to gradual performance degradation over time as the system becomes increasingly memory-constrained and spends more time on garbage collection or memory swapping (Maas et al. 2024). This issue is compounded by synchronization issues and concurrency bugs, which can cause delays, reduced throughput, and suboptimal utilization of computational resources, especially in multi-threaded or distributed ML systems. Furthermore, compatibility problems or inefficient code paths can introduce additional overhead and slowdowns, affecting the overall performance of the ML system.\n\nMaas, Martin, David G. Andersen, Michael Isard, Mohammad Mahdi Javanmard, Kathryn S. McKinley, and Colin Raffel. 2024. “Combining Machine Learning and Lifetime-Based Resource Management for Memory Allocation and Beyond.” Commun. ACM 67 (4): 87–96. https://doi.org/10.1145/3611018.\nIncorrect Predictions or Outputs: Software faults in data preprocessing, feature engineering, or model evaluation can introduce biases, noise, or errors propagating through the ML pipeline and resulting in incorrect predictions or outputs. Over time, numerical instabilities, precision errors, or rounding issues can accumulate and lead to degraded accuracy or convergence problems in the trained models. Moreover, faults in the model serving or inference components can cause inconsistencies between the expected and actual outputs, leading to incorrect or unreliable predictions in production.\nReliability and Stability Issues: Software faults can cause Unparalleled exceptions, crashes, or sudden terminations that can compromise the reliability and stability of ML systems, especially in production environments. Intermittent or sporadic faults can be difficult to reproduce and diagnose, leading to unpredictable behavior and reduced confidence in the ML system’s outputs. Additionally, faults in checkpointing, model serialization, or state management can cause data loss or inconsistencies, affecting the reliability and recoverability of the ML system.\nSecurity Vulnerabilities: Software faults, such as buffer overflows, injection vulnerabilities, or improper access control, can introduce security risks and expose the ML system to potential attacks or unauthorized access. Adversaries may exploit faults in the preprocessing or feature extraction stages to manipulate the input data and deceive the ML models, leading to incorrect or malicious behavior. Furthermore, inadequate protection of sensitive data, such as user information or confidential model parameters, can lead to data breaches or privacy violations (Q. Li et al. 2023).\n\nLi, Qinbin, Zeyi Wen, Zhaomin Wu, Sixu Hu, Naibo Wang, Yuan Li, Xu Liu, and Bingsheng He. 2023. “A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection.” IEEE Trans. Knowl. Data Eng. 35 (4): 3347–66. https://doi.org/10.1109/tkde.2021.3124599.\nDifficulty in Reproducing and Debugging: Software faults can make it challenging to reproduce and debug issues in ML systems, especially when the faults are intermittent or dependent on specific runtime conditions. Incomplete or ambiguous error messages, coupled with the complexity of ML frameworks and models, can prolong the debugging process and hinder the ability to identify and fix the underlying faults. Moreover, inconsistencies between development, testing, and production environments can make reproducing and diagnosing faults in specific contexts difficult.\nIncreased Development and Maintenance Costs Software faults can lead to increased development and maintenance costs, as teams spend more time and resources debugging, fixing, and validating the ML system. The need for extensive testing, monitoring, and fault-tolerant mechanisms to mitigate the impact of software faults can add complexity and overhead to the ML development process. Frequent patches, updates, and bug fixes to address software faults can disrupt the development workflow and require additional effort to ensure the stability and compatibility of the ML system.\nUnderstanding the potential impact of software faults on ML systems is crucial for prioritizing testing efforts, implementing fault-tolerant designs, and establishing effective monitoring and debugging practices. By proactively addressing software faults and their consequences, ML practitioners can build more robust, reliable, and secure ML systems that deliver accurate and trustworthy results.\n\n\nDetection and Mitigation\nDetecting and mitigating software faults in machine learning frameworks is essential to ensure ML systems’ reliability, performance, and security. Let’s explore various techniques and approaches that can be employed to identify and address software faults effectively:\nThorough Testing and Validation: Comprehensive unit testing of individual components and modules can verify their correctness and identify potential faults early in development. Integration testing validates the interaction and compatibility between different components of the ML framework, ensuring seamless integration. Systematic testing of edge cases, boundary conditions, and exceptional scenarios helps uncover hidden faults and vulnerabilities. Continuous testing and regression testing as shown in Figure 17.35 detect faults introduced by code changes or updates to the ML framework.\n\n\n\n\n\n\nFigure 17.35: Automated regression testing (Source: UTOR)\n\n\n\nStatic Code Analysis and Linting: Utilizing static code analysis tools automatically identifies potential coding issues, such as syntax errors, undefined variables, or security vulnerabilities. Enforcing coding standards and best practices through linting tools maintains code quality and reduces the likelihood of common programming mistakes. Conducting regular code reviews allows manual inspection of the codebase, identification of potential faults, and ensures adherence to coding guidelines and design principles.\nRuntime Monitoring and Logging: Implementing comprehensive logging mechanisms captures relevant information during runtime, such as input data, model parameters, and system events. Monitoring key performance metrics, resource utilization, and error rates helps detect anomalies, performance bottlenecks, or unexpected behavior. Employing runtime assertion checks and invariants validates assumptions and detects violations of expected conditions during program execution. Utilizing profiling tools identifies performance bottlenecks, memory leaks, or inefficient code paths that may indicate the presence of software faults.\nFault-Tolerant Design Patterns: Implementing error handling and exception management mechanisms enables graceful handling and recovery from exceptional conditions or runtime errors. Employing redundancy and failover mechanisms, such as backup systems or redundant computations, ensures the availability and reliability of the ML system in the presence of faults. Designing modular and loosely coupled architectures minimizes the propagation and impact of faults across different components of the ML system. Utilizing checkpointing and recovery mechanisms (Eisenman et al. 2022) allows the system to resume from a known stable state in case of failures or interruptions.\n\nEisenman, Assaf, Kiran Kumar Matam, Steven Ingram, Dheevatsa Mudigere, Raghuraman Krishnamoorthi, Krishnakumar Nair, Misha Smelyanskiy, and Murali Annavaram. 2022. “Check-n-Run: A Checkpointing System for Training Deep Learning Recommendation Models.” In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), 929–43.\nRegular Updates and Patches: Staying up to date with the latest versions and patches of the ML frameworks, libraries, and dependencies provides benefits from bug fixes, security updates, and performance improvements. Monitoring release notes, security advisories, and community forums inform practitioners about known issues, vulnerabilities, or compatibility problems in the ML framework. Establishing a systematic process for testing and validating updates and patches before applying them to production systems ensures stability and compatibility.\nContainerization and Isolation: Leveraging containerization technologies, such as Docker or Kubernetes, encapsulates ML components and their dependencies in isolated environments. Utilizing containerization ensures consistent and reproducible runtime environments across development, testing, and production stages, reducing the likelihood of compatibility issues or environment-specific faults. Employing isolation techniques, such as virtual environments or sandboxing, prevents faults or vulnerabilities in one component from affecting other parts of the ML system.\nAutomated Testing and Continuous Integration/Continuous Deployment (CI/CD): Implement automated testing frameworks and scripts, execute comprehensive test suites, and catch faults early in development. Integrating automated testing into the CI/CD pipeline, as shown in Figure 17.36, ensures that code changes are thoroughly tested before being merged or deployed to production. Utilizing continuous monitoring and automated alerting systems detects and notifies developers and operators about potential faults or anomalies in real-time.\n\n\n\n\n\n\nFigure 17.36: Continuous Integration/Continuous Deployment (CI/CD) procedure (Source: geeksforgeeks)\n\n\n\nAdopting a proactive and systematic approach to fault detection and mitigation can significantly improve ML systems’ robustness, reliability, and maintainability. By investing in comprehensive testing, monitoring, and fault-tolerant design practices, organizations can minimize the impact of software faults and ensure their ML systems’ smooth operation in production environments.\n\nExercise 17.4 (Fault Tolerance)  \n\n\n\n\n\nGet ready to become an AI fault-fighting superhero! Software glitches can derail machine learning systems, but in this Colab, you’ll learn how to make them resilient. We’ll simulate software faults to see how AI can break, then explore techniques to save your ML model’s progress, like checkpoints in a game. You’ll see how to train your AI to bounce back after a crash, ensuring it stays on track. This is crucial for building reliable, trustworthy AI, especially in critical applications. So gear up because this Colab directly connects with the Robust AI chapter – you’ll move from theory to hands-on troubleshooting and build AI systems that can handle the unexpected!", + "text": "17.5 Software Faults\n\nDefinition and Characteristics\nSoftware faults refer to defects, errors, or bugs in the runtime software frameworks and components that support the execution and deployment of ML models (Myllyaho et al. 2022). These faults can arise from various sources, such as programming mistakes, design flaws, or compatibility issues (H. Zhang 2008), and can have significant implications for ML systems’ performance, reliability, and security. Software faults in ML frameworks exhibit several key characteristics:\n\nMyllyaho, Lalli, Mikko Raatikainen, Tomi Männistö, Jukka K. Nurminen, and Tommi Mikkonen. 2022. “On Misbehaviour and Fault Tolerance in Machine Learning Systems.” J. Syst. Software 183 (January): 111096. https://doi.org/10.1016/j.jss.2021.111096.\n\nZhang, Hongyu. 2008. “On the Distribution of Software Faults.” IEEE Trans. Software Eng. 34 (2): 301–2. https://doi.org/10.1109/tse.2007.70771.\n\nDiversity: Software faults can manifest in different forms, ranging from simple logic and syntax mistakes to more complex issues like memory leaks, race conditions, and integration problems. The variety of fault types adds to the challenge of detecting and mitigating them effectively.\nPropagation: In ML systems, software faults can propagate through the various layers and components of the framework. A fault in one module can trigger a cascade of errors or unexpected behavior in other parts of the system, making it difficult to pinpoint the root cause and assess the full impact of the fault.\nIntermittency: Some software faults may exhibit intermittent behavior, occurring sporadically or under specific conditions. These faults can be particularly challenging to reproduce and debug, as they may manifest inconsistently during testing or normal operation.\nInteraction with ML models: Software faults in ML frameworks can interact with the trained models in subtle ways. For example, a fault in the data preprocessing pipeline may introduce noise or bias into the model’s inputs, leading to degraded performance or incorrect predictions. Similarly, faults in the model serving component may cause inconsistencies between the training and inference environments.\nImpact on system properties: Software faults can compromise various desirable properties of ML systems, such as performance, scalability, reliability, and security. Faults may lead to slowdowns, crashes, incorrect outputs, or vulnerabilities that attackers can exploit.\nDependency on external factors: The occurrence and impact of software faults in ML frameworks often depend on external factors, such as the choice of hardware, operating system, libraries, and configurations. Compatibility issues and version mismatches can introduce faults that are difficult to anticipate and mitigate.\n\nUnderstanding the characteristics of software faults in ML frameworks is crucial for developing effective fault prevention, detection, and mitigation strategies. By recognizing the diversity, propagation, intermittency, and impact of software faults, ML practitioners can design more robust and reliable systems resilient to these issues.\n\n\nMechanisms of Software Faults in ML Frameworks\nMachine learning frameworks, such as TensorFlow, PyTorch, and sci-kit-learn, provide powerful tools and abstractions for building and deploying ML models. However, these frameworks are not immune to software faults that can impact ML systems’ performance, reliability, and correctness. Let’s explore some of the common software faults that can occur in ML frameworks:\nMemory Leaks and Resource Management Issues: Improper memory management, such as failing to release memory or close file handles, can lead to memory leaks and resource exhaustion over time. This issue is compounded by inefficient memory usage, where creating unnecessary copies of large tensors or not leveraging memory-efficient data structures can cause excessive memory consumption and degrade system performance. Additionally, failing to manage GPU memory properly can result in out-of-memory errors or suboptimal utilization of GPU resources, further exacerbating the problem as shown in Figure 17.34.\n\n\n\n\n\n\nFigure 17.34: Example of GPU out-of-the-memory and suboptimal utilization issues\n\n\n\nSynchronization and Concurrency Problems: Incorrect synchronization between threads or processes can lead to race conditions, deadlocks, or inconsistent behavior in multi-threaded or distributed ML systems. This issue is often tied to improper handling of asynchronous operations, such as non-blocking I/O or parallel data loading, which can cause synchronization issues and impact the correctness of the ML pipeline. Moreover, proper coordination and communication between distributed nodes in a cluster can result in consistency or stale data during training or inference, compromising the reliability of the ML system.\nCompatibility Issues: Mismatches between the versions of ML frameworks, libraries, or dependencies can introduce compatibility problems and runtime errors. Upgrading or changing the versions of underlying libraries without thoroughly testing the impact on the ML system can lead to unexpected behavior or breakages. Furthermore, inconsistencies between the training and deployment environments, such as differences in hardware, operating systems, or package versions, can cause compatibility issues and affect the reproducibility of ML models, making it challenging to ensure consistent performance across different platforms.\nNumerical Instability and Precision Errors: Inadequate handling of numerical instabilities, such as division by zero, underflow, or overflow, can lead to incorrect calculations or convergence issues during training. This problem is compounded by insufficient precision or rounding errors, which can accumulate over time and impact the accuracy of the ML models, especially in deep learning architectures with many layers. Moreover, improper scaling or normalization of input data can cause numerical instabilities and affect the convergence and performance of optimization algorithms, resulting in suboptimal or unreliable model performance.\nInadequate Error Handling and Exception Management: Proper error handling and exception management can prevent ML systems from crashing or behaving unexpectedly when encountering exceptional conditions or invalid inputs. Failing to catch and handle specific exceptions or relying on generic exception handling can make it difficult to diagnose and recover from errors gracefully, leading to system instability and reduced reliability. Furthermore, incomplete or misleading error messages can hinder the ability to effectively debug and resolve software faults in ML frameworks, prolonging the time required to identify and fix issues.\n\n\nImpact on ML Systems\nSoftware faults in machine learning frameworks can have significant and far-reaching impacts on ML systems’ performance, reliability, and security. Let’s explore the various ways in which software faults can affect ML systems:\nPerformance Degradation and System Slowdowns: Memory leaks and inefficient resource management can lead to gradual performance degradation over time as the system becomes increasingly memory-constrained and spends more time on garbage collection or memory swapping (Maas et al. 2024). This issue is compounded by synchronization issues and concurrency bugs, which can cause delays, reduced throughput, and suboptimal utilization of computational resources, especially in multi-threaded or distributed ML systems. Furthermore, compatibility problems or inefficient code paths can introduce additional overhead and slowdowns, affecting the overall performance of the ML system.\n\nMaas, Martin, David G. Andersen, Michael Isard, Mohammad Mahdi Javanmard, Kathryn S. McKinley, and Colin Raffel. 2024. “Combining Machine Learning and Lifetime-Based Resource Management for Memory Allocation and Beyond.” Commun. ACM 67 (4): 87–96. https://doi.org/10.1145/3611018.\nIncorrect Predictions or Outputs: Software faults in data preprocessing, feature engineering, or model evaluation can introduce biases, noise, or errors propagating through the ML pipeline and resulting in incorrect predictions or outputs. Over time, numerical instabilities, precision errors, or rounding issues can accumulate and lead to degraded accuracy or convergence problems in the trained models. Moreover, faults in the model serving or inference components can cause inconsistencies between the expected and actual outputs, leading to incorrect or unreliable predictions in production.\nReliability and Stability Issues: Software faults can cause Unparalleled exceptions, crashes, or sudden terminations that can compromise the reliability and stability of ML systems, especially in production environments. Intermittent or sporadic faults can be difficult to reproduce and diagnose, leading to unpredictable behavior and reduced confidence in the ML system’s outputs. Additionally, faults in checkpointing, model serialization, or state management can cause data loss or inconsistencies, affecting the reliability and recoverability of the ML system.\nSecurity Vulnerabilities: Software faults, such as buffer overflows, injection vulnerabilities, or improper access control, can introduce security risks and expose the ML system to potential attacks or unauthorized access. Adversaries may exploit faults in the preprocessing or feature extraction stages to manipulate the input data and deceive the ML models, leading to incorrect or malicious behavior. Furthermore, inadequate protection of sensitive data, such as user information or confidential model parameters, can lead to data breaches or privacy violations (Q. Li et al. 2023).\n\nLi, Qinbin, Zeyi Wen, Zhaomin Wu, Sixu Hu, Naibo Wang, Yuan Li, Xu Liu, and Bingsheng He. 2023. “A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection.” IEEE Trans. Knowl. Data Eng. 35 (4): 3347–66. https://doi.org/10.1109/tkde.2021.3124599.\nDifficulty in Reproducing and Debugging: Software faults can make it challenging to reproduce and debug issues in ML systems, especially when the faults are intermittent or dependent on specific runtime conditions. Incomplete or ambiguous error messages, coupled with the complexity of ML frameworks and models, can prolong the debugging process and hinder the ability to identify and fix the underlying faults. Moreover, inconsistencies between development, testing, and production environments can make reproducing and diagnosing faults in specific contexts difficult.\nIncreased Development and Maintenance Costs Software faults can lead to increased development and maintenance costs, as teams spend more time and resources debugging, fixing, and validating the ML system. The need for extensive testing, monitoring, and fault-tolerant mechanisms to mitigate the impact of software faults can add complexity and overhead to the ML development process. Frequent patches, updates, and bug fixes to address software faults can disrupt the development workflow and require additional effort to ensure the stability and compatibility of the ML system.\nUnderstanding the potential impact of software faults on ML systems is crucial for prioritizing testing efforts, implementing fault-tolerant designs, and establishing effective monitoring and debugging practices. By proactively addressing software faults and their consequences, ML practitioners can build more robust, reliable, and secure ML systems that deliver accurate and trustworthy results.\n\n\nDetection and Mitigation\nDetecting and mitigating software faults in machine learning frameworks is essential to ensure ML systems’ reliability, performance, and security. Let’s explore various techniques and approaches that can be employed to identify and address software faults effectively:\nThorough Testing and Validation: Comprehensive unit testing of individual components and modules can verify their correctness and identify potential faults early in development. Integration testing validates the interaction and compatibility between different components of the ML framework, ensuring seamless integration. Systematic testing of edge cases, boundary conditions, and exceptional scenarios helps uncover hidden faults and vulnerabilities. Continuous testing and regression testing as shown in Figure 17.35 detect faults introduced by code changes or updates to the ML framework.\n\n\n\n\n\n\nFigure 17.35: Automated regression testing (Credit: UTOR)\n\n\n\nStatic Code Analysis and Linting: Utilizing static code analysis tools automatically identifies potential coding issues, such as syntax errors, undefined variables, or security vulnerabilities. Enforcing coding standards and best practices through linting tools maintains code quality and reduces the likelihood of common programming mistakes. Conducting regular code reviews allows manual inspection of the codebase, identification of potential faults, and ensures adherence to coding guidelines and design principles.\nRuntime Monitoring and Logging: Implementing comprehensive logging mechanisms captures relevant information during runtime, such as input data, model parameters, and system events. Monitoring key performance metrics, resource utilization, and error rates helps detect anomalies, performance bottlenecks, or unexpected behavior. Employing runtime assertion checks and invariants validates assumptions and detects violations of expected conditions during program execution. Utilizing profiling tools identifies performance bottlenecks, memory leaks, or inefficient code paths that may indicate the presence of software faults.\nFault-Tolerant Design Patterns: Implementing error handling and exception management mechanisms enables graceful handling and recovery from exceptional conditions or runtime errors. Employing redundancy and failover mechanisms, such as backup systems or redundant computations, ensures the availability and reliability of the ML system in the presence of faults. Designing modular and loosely coupled architectures minimizes the propagation and impact of faults across different components of the ML system. Utilizing checkpointing and recovery mechanisms (Eisenman et al. 2022) allows the system to resume from a known stable state in case of failures or interruptions.\n\nEisenman, Assaf, Kiran Kumar Matam, Steven Ingram, Dheevatsa Mudigere, Raghuraman Krishnamoorthi, Krishnakumar Nair, Misha Smelyanskiy, and Murali Annavaram. 2022. “Check-n-Run: A Checkpointing System for Training Deep Learning Recommendation Models.” In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI 22), 929–43.\nRegular Updates and Patches: Staying up to date with the latest versions and patches of the ML frameworks, libraries, and dependencies provides benefits from bug fixes, security updates, and performance improvements. Monitoring release notes, security advisories, and community forums inform practitioners about known issues, vulnerabilities, or compatibility problems in the ML framework. Establishing a systematic process for testing and validating updates and patches before applying them to production systems ensures stability and compatibility.\nContainerization and Isolation: Leveraging containerization technologies, such as Docker or Kubernetes, encapsulates ML components and their dependencies in isolated environments. Utilizing containerization ensures consistent and reproducible runtime environments across development, testing, and production stages, reducing the likelihood of compatibility issues or environment-specific faults. Employing isolation techniques, such as virtual environments or sandboxing, prevents faults or vulnerabilities in one component from affecting other parts of the ML system.\nAutomated Testing and Continuous Integration/Continuous Deployment (CI/CD): Implement automated testing frameworks and scripts, execute comprehensive test suites, and catch faults early in development. Integrating automated testing into the CI/CD pipeline, as shown in Figure 17.36, ensures that code changes are thoroughly tested before being merged or deployed to production. Utilizing continuous monitoring and automated alerting systems detects and notifies developers and operators about potential faults or anomalies in real-time.\n\n\n\n\n\n\nFigure 17.36: Continuous Integration/Continuous Deployment (CI/CD) procedure (Credit: geeksforgeeks)\n\n\n\nAdopting a proactive and systematic approach to fault detection and mitigation can significantly improve ML systems’ robustness, reliability, and maintainability. By investing in comprehensive testing, monitoring, and fault-tolerant design practices, organizations can minimize the impact of software faults and ensure their ML systems’ smooth operation in production environments.\n\nExercise 17.4 (Fault Tolerance)  \n\n\n\n\n\nGet ready to become an AI fault-fighting superhero! Software glitches can derail machine learning systems, but in this Colab, you’ll learn how to make them resilient. We’ll simulate software faults to see how AI can break, then explore techniques to save your ML model’s progress, like checkpoints in a game. You’ll see how to train your AI to bounce back after a crash, ensuring it stays on track. This is crucial for building reliable, trustworthy AI, especially in critical applications. So gear up because this Colab directly connects with the Robust AI chapter – you’ll move from theory to hands-on troubleshooting and build AI systems that can handle the unexpected!", "crumbs": [ "MAIN", "17  Robust AI" @@ -1934,7 +1934,7 @@ "href": "contents/robust_ai/robust_ai.html#tools-and-frameworks", "title": "17  Robust AI", "section": "17.6 Tools and Frameworks", - "text": "17.6 Tools and Frameworks\nGiven the significance or importance of developing robust AI systems, in recent years, researchers and practitioners have developed a wide range of tools and frameworks to understand how hardware faults manifest and propagate to impact ML systems. These tools and frameworks play a crucial role in evaluating the resilience of ML systems to hardware faults by simulating various fault scenarios and analyzing their impact on the system’s performance. This enables designers to identify potential vulnerabilities and develop effective mitigation strategies, ultimately creating more robust and reliable ML systems that can operate safely despite hardware faults. This section provides an overview of widely used fault models in the literature and the tools and frameworks developed to evaluate the impact of such faults on ML systems.\n\n17.6.1 Fault Models and Error Models\nAs discussed previously, hardware faults can manifest in various ways, including transient, permanent, and intermittent faults. In addition to the type of fault under study, how the fault manifests is also important. For example, does the fault happen in a memory cell or during the computation of a functional unit? Is the impact on a single bit, or does it impact multiple bits? Does the fault propagate all the way and impact the application (causing an error), or does it get masked quickly and is considered benign? All these details impact what is known as the fault model, which plays a major role in simulating and measuring what happens to a system when a fault occurs.\nTo effectively study and understand the impact of hardware faults on ML systems, it is essential to understand the concepts of fault models and error models. A fault model describes how a hardware fault manifests itself in the system, while an error model represents how the fault propagates and affects the system’s behavior.\nFault models can be categorized based on various characteristics:\n\nDuration: Transient faults occur briefly and then disappear, while permanent faults persist indefinitely. Intermittent faults occur sporadically and may be difficult to diagnose.\nLocation: Faults can occur in hardware parts, such as memory cells, functional units, or interconnects.\nGranularity: Faults can affect a single bit (e.g., bitflip) or multiple bits (e.g., burst errors) within a hardware component.\n\nOn the other hand, error models describe how a fault propagates through the system and manifests as an error. An error may cause the system to deviate from its expected behavior, leading to incorrect results or even system failures. Error models can be defined at different levels of abstraction, from the hardware level (e.g., register-level bitflips) to the software level (e.g., corrupted weights or activations in an ML model).\nThe fault model (or error model, typically the more applicable terminology in understanding the robustness of an ML system) plays a major role in simulating and measuring what happens to a system when a fault occurs. The chosen model informs the assumptions made about the system being studied. For example, a system focusing on single-bit transient errors (Sangchoolie, Pattabiraman, and Karlsson 2017) would not be well-suited to understand the impact of permanent, multi-bit flip errors (Wilkening et al. 2014), as it is designed assuming a different model altogether.\n\nWilkening, Mark, Vilas Sridharan, Si Li, Fritz Previlon, Sudhanva Gurumurthi, and David R. Kaeli. 2014. “Calculating Architectural Vulnerability Factors for Spatial Multi-Bit Transient Faults.” In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 293–305. IEEE; IEEE. https://doi.org/10.1109/micro.2014.15.\nFurthermore, implementing an error model is also an important consideration, particularly regarding where an error is said to occur in the compute stack. For instance, a single-bit flip model at the architectural register level differs from a single-bit flip in the weight of a model at the PyTorch level. Although both target a similar error model, the former would usually be modeled in an architecturally accurate simulator (like gem5 [binkert2011gem5]), which captures error propagation compared to the latter, focusing on value propagation through a model.\nRecent research has shown that certain characteristics of error models may exhibit similar behaviors across different levels of abstraction (Sangchoolie, Pattabiraman, and Karlsson 2017) (Papadimitriou and Gizopoulos 2021). For example, single-bit errors are generally more problematic than multi-bit errors, regardless of whether they are modeled at the hardware or software level. However, other characteristics, such as error masking (Mohanram and Touba 2003) as shown in Figure 17.37, may not always be accurately captured by software-level models, as they can hide underlying system effects.\n\nSangchoolie, Behrooz, Karthik Pattabiraman, and Johan Karlsson. 2017. “One Bit Is (Not) Enough: An Empirical Study of the Impact of Single and Multiple Bit-Flip Errors.” In 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 97–108. IEEE; IEEE. https://doi.org/10.1109/dsn.2017.30.\n\nPapadimitriou, George, and Dimitris Gizopoulos. 2021. “Demystifying the System Vulnerability Stack: Transient Fault Effects Across the Layers.” In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), 902–15. IEEE; IEEE. https://doi.org/10.1109/isca52012.2021.00075.\n\nMohanram, K., and N. A. Touba. 2003. “Partial Error Masking to Reduce Soft Error Failure Rate in Logic Circuits.” In Proceedings. 16th IEEE Symposium on Computer Arithmetic, 433–40. IEEE; IEEE Comput. Soc. https://doi.org/10.1109/dftvs.2003.1250141.\n\n\n\n\n\n\nFigure 17.37: Example of error masking in microarchitectural components (Ko 2021)\n\n\nKo, Yohan. 2021. “Characterizing System-Level Masking Effects Against Soft Errors.” Electronics 10 (18): 2286. https://doi.org/10.3390/electronics10182286.\n\n\nSome tools, such as Fidelity (He, Balaprakash, and Li 2020), aim to bridge the gap between hardware-level and software-level error models by mapping patterns between the two levels of abstraction (Cheng et al. 2016). This allows for more accurate modeling of hardware faults in software-based tools, essential for developing robust and reliable ML systems. Lower-level tools typically represent more accurate error propagation characteristics but must be faster in simulating many errors due to the complex nature of hardware system designs. On the other hand, higher-level tools, such as those implemented in ML frameworks like PyTorch or TensorFlow, which we will discuss soon in the later sections, are often faster and more efficient for evaluating the robustness of ML systems.\n\nCheng, Eric, Shahrzad Mirkhani, Lukasz G. Szafaryn, Chen-Yong Cher, Hyungmin Cho, Kevin Skadron, Mircea R. Stan, et al. 2016. “Clear: \\(<\\)U\\(>\\)c\\(<\\)/u\\(>\\) Ross \\(<\\)u\\(>\\)-l\\(<\\)/u\\(>\\) Ayer \\(<\\)u\\(>\\)e\\(<\\)/u\\(>\\) Xploration for \\(<\\)u\\(>\\)a\\(<\\)/u\\(>\\) Rchitecting \\(<\\)u\\(>\\)r\\(<\\)/u\\(>\\) Esilience - Combining Hardware and Software Techniques to Tolerate Soft Errors in Processor Cores.” In Proceedings of the 53rd Annual Design Automation Conference, 1–6. ACM. https://doi.org/10.1145/2897937.2897996.\nIn the following subsections, we will discuss various hardware-based and software-based fault injection methods and tools, highlighting their capabilities, limitations, and the fault and error models they support.\n\n\n17.6.2 Hardware-based Fault Injection\nAn error injection tool is a tool that allows the user to implement a particular error model, such as a transient single-bit flip during inference Figure 17.38. Most error injection tools are software-based, as software-level tools are faster for ML robustness studies. However, hardware-based fault injection methods are still important for grounding the higher-level error models, as they are considered the most accurate way to study the impact of faults on ML systems by directly manipulating the hardware to introduce faults. These methods allow researchers to observe the system’s behavior under real-world fault conditions. Both software-based and hardware-based error injection tools are described in this section in more detail.\n\n\n\n\n\n\nFigure 17.38: Hardware errors can occur due to a variety of reasons and at different times and/or locations in a system, which can be explored when studying the impact of hardware-based errors on systems (Ahmadilivani et al. 2024)\n\n\nAhmadilivani, Mohammad Hasan, Mahdi Taheri, Jaan Raik, Masoud Daneshtalab, and Maksim Jenihhin. 2024. “A Systematic Literature Review on Hardware Reliability Assessment Methods for Deep Neural Networks.” ACM Comput. Surv. 56 (6): 1–39. https://doi.org/10.1145/3638242.\n\n\n\nMethods\nTwo of the most common hardware-based fault injection methods are FPGA-based fault injection and radiation or beam testing.\nFPGA-based Fault Injection: Field-Programmable Gate Arrays (FPGAs) are reconfigurable integrated circuits that can be programmed to implement various hardware designs. In the context of fault injection, FPGAs offer high precision and accuracy, as researchers can target specific bits or sets of bits within the hardware. By modifying the FPGA configuration, faults can be introduced at specific locations and times during the execution of an ML model. FPGA-based fault injection allows for fine-grained control over the fault model, enabling researchers to study the impact of different types of faults, such as single-bit flips or multi-bit errors. This level of control makes FPGA-based fault injection a valuable tool for understanding the resilience of ML systems to hardware faults.\nRadiation or Beam Testing: Radiation or beam testing (Velazco, Foucard, and Peronnard 2010) involves exposing the hardware running an ML model to high-energy particles, such as protons or neutrons as illustrated in Figure 17.39. These particles can cause bitflips or other types of faults in the hardware, mimicking the effects of real-world radiation-induced faults. Beam testing is widely regarded as a highly accurate method for measuring the error rate induced by particle strikes on a running application. It provides a realistic representation of the faults in real-world environments, particularly in applications exposed to high radiation levels, such as space systems or particle physics experiments. However, unlike FPGA-based fault injection, beam testing could be more precise in targeting specific bits or components within the hardware, as it might be difficult to aim the beam of particles to a particular bit in the hardware. Despite being quite expensive from a research standpoint, beam testing is a well-regarded industry practice for reliability.\n\nVelazco, Raoul, Gilles Foucard, and Paul Peronnard. 2010. “Combining Results of Accelerated Radiation Tests and Fault Injections to Predict the Error Rate of an Application Implemented in SRAM-Based FPGAs.” IEEE Trans. Nucl. Sci. 57 (6): 3500–3505. https://doi.org/10.1109/tns.2010.2087355.\n\n\n\n\n\n\n\nFigure 17.39: Radiation test setup for semiconductor components (Lee et al. 2022) (Source: JD Instrument)\n\n\nLee, Minwoong, Namho Lee, Huijeong Gwon, Jongyeol Kim, Younggwan Hwang, and Seongik Cho. 2022. “Design of Radiation-Tolerant High-Speed Signal Processing Circuit for Detecting Prompt Gamma Rays by Nuclear Explosion.” Electronics 11 (18): 2970. https://doi.org/10.3390/electronics11182970.\n\n\n\n\nLimitations\nDespite their high accuracy, hardware-based fault injection methods have several limitations that can hinder their widespread adoption:\nCost: FPGA-based fault injection and beam testing require specialized hardware and facilities, which can be expensive to set up and maintain. The cost of these methods can be a significant barrier for researchers and organizations with limited resources.\nScalability: Hardware-based methods are generally slower and less scalable than software-based methods. Injecting faults and collecting data on hardware can take time, limiting the number of experiments performed within a given timeframe. This can be particularly challenging when studying the resilience of large-scale ML systems or conducting statistical analyses that require many fault injection experiments.\nFlexibility: Hardware-based methods may not be as flexible as software-based methods in terms of the range of fault models and error models they can support. Modifying the hardware configuration or the experimental setup to accommodate different fault models can be more challenging and time-consuming than software-based methods.\nDespite these limitations, hardware-based fault injection methods remain essential tools for validating the accuracy of software-based methods and for studying the impact of faults on ML systems in realistic settings. By combining hardware-based and software-based methods, researchers can gain a more comprehensive understanding of ML systems’ resilience to hardware faults and develop effective mitigation strategies.\n\n\n\n17.6.3 Software-based Fault Injection Tools\nWith the rapid development of ML frameworks in recent years, software-based fault injection tools have gained popularity in studying the resilience of ML systems to hardware faults. These tools simulate the effects of hardware faults by modifying the software representation of the ML model or the underlying computational graph. The rise of ML frameworks such as TensorFlow, PyTorch, and Keras has facilitated the development of fault injection tools that are tightly integrated with these frameworks, making it easier for researchers to conduct fault injection experiments and analyze the results.\n\nAdvantages and Trade-offs\nSoftware-based fault injection tools offer several advantages over hardware-based methods:\nSpeed: Software-based tools are generally faster than hardware-based methods, as they do not require the modification of physical hardware or the setup of specialized equipment. This allows researchers to conduct more fault injection experiments in a shorter time, enabling more comprehensive analyses of the resilience of ML systems.\nFlexibility: Software-based tools are more flexible than hardware-based methods in terms of the range of fault and error models they can support. Researchers can easily modify the fault injection tool’s software implementation to accommodate different fault models or to target specific components of the ML system.\nAccessibility: Software-based tools are more accessible than hardware-based methods, as they do not require specialized hardware or facilities. This makes it easier for researchers and practitioners to conduct fault injection experiments and study the resilience of ML systems, even with limited resources.\n\n\nLimitations\nSoftware-based fault injection tools also have some limitations compared to hardware-based methods:\nAccuracy: Software-based tools may not always capture the full range of effects that hardware faults can have on the system. As these tools operate at a higher level of abstraction, they may need to catch up on some of the low-level hardware interactions and error propagation mechanisms that can impact the behavior of the ML system.\nFidelity: Software-based tools may provide a different level of Fidelity than hardware-based methods in terms of representing real-world fault conditions. The accuracy of the results obtained from software-based fault injection experiments may depend on how closely the software model approximates the actual hardware behavior.\n\n\n\n\n\n\nFigure 17.40: Comparison of techniques at layers of abstraction (Source: MAVFI)\n\n\n\n\n\nTypes of Fault Injection Tools\nSoftware-based fault injection tools can be categorized based on their target frameworks or use cases. Here, we will discuss some of the most popular tools in each category:\nAres (Reagen et al. 2018), a fault injection tool initially developed for the Keras framework in 2018, emerged as one of the first tools to study the impact of hardware faults on deep neural networks (DNNs) in the context of the rising popularity of ML frameworks in the mid-to-late 2010s. The tool was validated against a DNN accelerator implemented in silicon, demonstrating its effectiveness in modeling hardware faults. Ares provides a comprehensive study on the impact of hardware faults in both weights and activation values, characterizing the effects of single-bit flips and bit-error rates (BER) on hardware structures. Later, the Ares framework was extended to support the PyTorch ecosystem, enabling researchers to investigate hardware faults in a more modern setting and further extending its utility in the field.\n\nReagen, Brandon, Udit Gupta, Lillian Pentecost, Paul Whatmough, Sae Kyu Lee, Niamh Mulholland, David Brooks, and Gu-Yeon Wei. 2018. “Ares: A Framework for Quantifying the Resilience of Deep Neural Networks.” In 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), 1–6. IEEE. https://doi.org/10.1109/dac.2018.8465834.\n\n\n\n\n\n\nFigure 17.41: Hardware bitflips in ML workloads can cause phantom objects and misclassifications, which can erroneously be used downstream by larger systems, such as in autonomous driving. Shown above is a correct and faulty version of the same image using the PyTorchFI injection framework.\n\n\n\nPyTorchFI (Mahmoud et al. 2020), a fault injection tool specifically designed for the PyTorch framework, was developed in 2020 in collaboration with Nvidia Research. It enables the injection of faults into the weights, activations, and gradients of PyTorch models, supporting a wide range of fault models. By leveraging the GPU acceleration capabilities of PyTorch, PyTorchFI provides a fast and efficient implementation for conducting fault injection experiments on large-scale ML systems, as shown in Figure 17.41. The tool’s speed and ease of use have led to widespread adoption in the community, resulting in multiple developer-led projects, such as PyTorchALFI by Intel Labs, which focuses on safety in automotive environments. Follow-up PyTorch-centric tools for fault injection include Dr. DNA by Meta (Ma et al. 2024) (which further facilitates the Pythonic programming model for ease of use), and the GoldenEye framework (Mahmoud et al. 2022), which incorporates novel numerical datatypes (such as AdaptivFloat (Tambe et al. 2020) and BlockFloat in the context of hardware bit flips.\n\nMahmoud, Abdulrahman, Neeraj Aggarwal, Alex Nobbe, Jose Rodrigo Sanchez Vicarte, Sarita V. Adve, Christopher W. Fletcher, Iuri Frosio, and Siva Kumar Sastry Hari. 2020. “PyTorchFI: A Runtime Perturbation Tool for DNNs.” In 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-w), 25–31. IEEE; IEEE. https://doi.org/10.1109/dsn-w50199.2020.00014.\n\nMa, Dongning, Fred Lin, Alban Desmaison, Joel Coburn, Daniel Moore, Sriram Sankar, and Xun Jiao. 2024. “Dr. DNA: Combating Silent Data Corruptions in Deep Learning Using Distribution of Neuron Activations.” In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, 239–52. ACM. https://doi.org/10.1145/3620666.3651349.\n\nMahmoud, Abdulrahman, Thierry Tambe, Tarek Aloui, David Brooks, and Gu-Yeon Wei. 2022. “GoldenEye: A Platform for Evaluating Emerging Numerical Data Formats in DNN Accelerators.” In 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 206–14. IEEE. https://doi.org/10.1109/dsn53405.2022.00031.\n\nTambe, Thierry, En-Yu Yang, Zishen Wan, Yuntian Deng, Vijay Janapa Reddi, Alexander Rush, David Brooks, and Gu-Yeon Wei. 2020. “Algorithm-Hardware Co-Design of Adaptive Floating-Point Encodings for Resilient Deep Learning Inference.” In 2020 57th ACM/IEEE Design Automation Conference (DAC), 1–6. IEEE; IEEE. https://doi.org/10.1109/dac18072.2020.9218516.\n\nChen, Zitao, Niranjhana Narayanan, Bo Fang, Guanpeng Li, Karthik Pattabiraman, and Nathan DeBardeleben. 2020. “TensorFI: A Flexible Fault Injection Framework for TensorFlow Applications.” In 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), 426–35. IEEE; IEEE. https://doi.org/10.1109/issre5003.2020.00047.\n\nChen, Zitao, Guanpeng Li, Karthik Pattabiraman, and Nathan DeBardeleben. 2019. “\\(<\\)i\\(>\\)BinFI\\(<\\)/i\\(>\\): An Efficient Fault Injector for Safety-Critical Machine Learning Systems.” In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC ’19. New York, NY, USA: ACM. https://doi.org/10.1145/3295500.3356177.\nTensorFI (Chen et al. 2020), or the TensorFlow Fault Injector, is a fault injection tool developed specifically for the TensorFlow framework. Analogous to Ares and PyTorchFI, TensorFI is considered the state-of-the-art tool for ML robustness studies in the TensorFlow ecosystem. It allows researchers to inject faults into the computational graph of TensorFlow models and study their impact on the model’s performance, supporting a wide range of fault models. One of the key benefits of TensorFI is its ability to evaluate the resilience of various ML models, not just DNNs. Further advancements, such as BinFi (Chen et al. 2019), provide a mechanism to speed up error injection experiments by focusing on the \"important\" bits in the system, accelerating the process of ML robustness analysis and prioritizing the critical components of a model.\nNVBitFI (T. Tsai et al. 2021), a general-purpose fault injection tool developed by Nvidia for their GPU platforms, operates at a lower level compared to framework-specific tools like Ares, PyTorchFI, and TensorFlow. While these tools focus on various deep learning platforms to implement and perform robustness analysis, NVBitFI targets the underlying hardware assembly code for fault injection. This allows researchers to inject faults into any application running on Nvidia GPUs, making it a versatile tool for studying the resilience of ML systems and other GPU-accelerated applications. By enabling users to inject errors at the architectural level, NVBitFI provides a more general-purpose fault model that is not restricted to just ML models. As Nvidia’s GPU systems are commonly used in many ML-based systems, NVBitFI is a valuable tool for comprehensive fault injection analysis across various applications.\n\nTsai, Timothy, Siva Kumar Sastry Hari, Michael Sullivan, Oreste Villa, and Stephen W. Keckler. 2021. “NVBitFI: Dynamic Fault Injection for GPUs.” In 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 284–91. IEEE; IEEE. https://doi.org/10.1109/dsn48987.2021.00041.\n\nDomain-specific Examples\nDomain-specific fault injection tools have been developed to address various ML application domains’ unique challenges and requirements, such as autonomous vehicles and robotics. This section highlights three domain-specific fault injection tools: DriveFI and PyTorchALFI for autonomous vehicles and MAVFI for uncrewed aerial vehicles (UAVs). These tools enable researchers to inject hardware faults into these complex systems’ perception, control, and other subsystems, allowing them to study the impact of faults on system performance and safety. The development of these software-based fault injection tools has greatly expanded the capabilities of the ML community to develop more robust and reliable systems that can operate safely and effectively in the presence of hardware faults.\nDriveFI (Jha et al. 2019) is a fault injection tool designed for autonomous vehicles. It enables the injection of hardware faults into the perception and control pipelines of autonomous vehicle systems, allowing researchers to study the impact of these faults on the system’s performance and safety. DriveFI has been integrated with industry-standard autonomous driving platforms, such as Nvidia DriveAV and Baidu Apollo, making it a valuable tool for evaluating the resilience of autonomous vehicle systems.\n\nJha, Saurabh, Subho Banerjee, Timothy Tsai, Siva K. S. Hari, Michael B. Sullivan, Zbigniew T. Kalbarczyk, Stephen W. Keckler, and Ravishankar K. Iyer. 2019. “ML-Based Fault Injection for Autonomous Vehicles: A Case for Bayesian Fault Injection.” In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 112–24. IEEE; IEEE. https://doi.org/10.1109/dsn.2019.00025.\n\nGräfe, Ralf, Qutub Syed Sha, Florian Geissler, and Michael Paulitsch. 2023. “Large-Scale Application of Fault Injection into PyTorch Models -an Extension to PyTorchFI for Validation Efficiency.” In 2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-s), 56–62. IEEE; IEEE. https://doi.org/10.1109/dsn-s58398.2023.00025.\nPyTorchALFI (Gräfe et al. 2023) is an extension of PyTorchFI developed by Intel Labs for the autonomous vehicle domain. It builds upon PyTorchFI’s fault injection capabilities. It adds features specifically tailored for evaluating the resilience of autonomous vehicle systems, such as the ability to inject faults into the camera and LiDAR sensor data.\nMAVFI (Hsiao et al. 2023) is a fault injection tool designed for the robotics domain, specifically for uncrewed aerial vehicles (UAVs). MAVFI is built on top of the Robot Operating System (ROS) framework and allows researchers to inject faults into the various components of a UAV system, such as sensors, actuators, and control algorithms. By evaluating the impact of these faults on the UAV’s performance and stability, researchers can develop more resilient and fault-tolerant UAV systems.\n\nHsiao, Yu-Shun, Zishen Wan, Tianyu Jia, Radhika Ghosal, Abdulrahman Mahmoud, Arijit Raychowdhury, David Brooks, Gu-Yeon Wei, and Vijay Janapa Reddi. 2023. “MAVFI: An End-to-End Fault Analysis Framework with Anomaly Detection and Recovery for Micro Aerial Vehicles.” In 2023 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE), 1–6. IEEE; IEEE. https://doi.org/10.23919/date56975.2023.10137246.\nThe development of software-based fault injection tools has greatly expanded the capabilities of researchers and practitioners to study the resilience of ML systems to hardware faults. By leveraging the speed, flexibility, and accessibility of these tools, the ML community can develop more robust and reliable systems that can operate safely and effectively in the presence of hardware faults.\n\n\n\n\n17.6.4 Bridging the Gap between Hardware and Software Error Models\nWhile software-based fault injection tools offer many advantages in speed, flexibility, and accessibility, they may not always accurately capture the full range of effects that hardware faults can have on the system. This is because software-based tools operate at a higher level of abstraction than hardware-based methods and may miss some of the low-level hardware interactions and error propagation mechanisms that can impact the behavior of the ML system.\nAs Bolchini et al. (2023) illustrates in their work, hardware errors can manifest in complex spatial distribution patterns that are challenging to fully replicate with software-based fault injection alone. They identify four distinct patterns: (a) single point, where the fault corrupts a single value in a feature map; (b) same row, where the fault corrupts a partial or entire row in a single feature map; (c) bullet wake, where the fault corrupts the same location across multiple feature maps; and (d) shatter glass, which combines the effects of same row and bullet wake patterns, as shown in Figure 17.42. These intricate error propagation mechanisms highlight the need for hardware-aware fault injection techniques to accurately assess the resilience of ML systems.\n\n\n\n\n\n\nFigure 17.42: Hardware errors may manifest themselves in different ways at the software level, as classified by Bolchini et al. (Bolchini et al. 2023)\n\n\nBolchini, Cristiana, Luca Cassano, Antonio Miele, and Alessandro Toschi. 2023. “Fast and Accurate Error Simulation for CNNs Against Soft Errors.” IEEE Trans. Comput. 72 (4): 984–97. https://doi.org/10.1109/tc.2022.3184274.\n\n\nResearchers have developed tools to address this issue by bridging the gap between low-level hardware error models and higher-level software error models. One such tool is Fidelity, designed to map patterns between hardware-level faults and their software-level manifestations.\n\nFidelity: Bridging the Gap\nFidelity (He, Balaprakash, and Li 2020) is a tool for accurately modeling hardware faults in software-based fault injection experiments. It achieves this by carefully studying the relationship between hardware-level faults and their impact on the software representation of the ML system.\n\nHe, Yi, Prasanna Balaprakash, and Yanjing Li. 2020. “FIdelity: Efficient Resilience Analysis Framework for Deep Learning Accelerators.” In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 270–81. IEEE; IEEE. https://doi.org/10.1109/micro50266.2020.00033.\nThe key insights behind Fidelity are:\n\nFault Propagation: Fidelity models how faults propagate through the hardware and manifest as errors in the software-visible state of the system. By understanding these propagation patterns, Fidelity can more accurately simulate the effects of hardware faults in software-based experiments.\nFault Equivalence: Fidelity identifies equivalent classes of hardware faults that produce similar software-level errors. This allows researchers to design software-based fault models that are representative of the underlying hardware faults without the need to model every possible hardware fault individually.\nLayered Approach: Fidelity employs a layered approach to fault modeling, where the effects of hardware faults are propagated through multiple levels of abstraction, from the hardware to the software level. This approach ensures that the software-based fault models are grounded in the actual behavior of the hardware.\n\nBy incorporating these insights, Fidelity enables software-based fault injection tools to capture the effects of hardware faults on ML systems accurately. This is particularly important for safety-critical applications, where the system’s resilience to hardware faults is paramount.\n\n\nImportance of Capturing True Hardware Behavior\nCapturing true hardware behavior in software-based fault injection tools is crucial for several reasons:\n\nAccuracy: By accurately modeling the effects of hardware faults, software-based tools can provide more reliable insights into the resilience of ML systems. This is essential for designing and validating fault-tolerant systems that can operate safely and effectively in the presence of hardware faults.\nReproducibility: When software-based tools accurately capture hardware behavior, fault injection experiments become more reproducible across different platforms and environments. This is important for the scientific study of ML system resilience, as it allows researchers to compare and validate results across different studies and implementations.\nEfficiency: Software-based tools that capture true hardware behavior can be more efficient in their fault injection experiments by focusing on the most representative and impactful fault models. This allows researchers to cover a wider range of fault scenarios and system configurations with limited computational resources.\nMitigation Strategies: Understanding how hardware faults manifest at the software level is crucial for developing effective mitigation strategies. By accurately capturing hardware behavior, software-based fault injection tools can help researchers identify the most vulnerable components of the ML system and design targeted hardening techniques to improve resilience.\n\nTools like Fidelity are vital in advancing the state-of-the-art in ML system resilience research. These tools enable researchers to conduct more accurate, reproducible, and efficient fault injection experiments by bridging the gap between hardware and software error models. As the complexity and criticality of ML systems continue to grow, the importance of capturing true hardware behavior in software-based fault injection tools will only become more apparent.\nOngoing research in this area aims to refine the mapping between hardware and software error models and develop new techniques for efficiently simulating hardware faults in software-based experiments. As these tools mature, they will provide the ML community with increasingly powerful and accessible means to study and improve the resilience of ML systems to hardware faults.", + "text": "17.6 Tools and Frameworks\nGiven the significance or importance of developing robust AI systems, in recent years, researchers and practitioners have developed a wide range of tools and frameworks to understand how hardware faults manifest and propagate to impact ML systems. These tools and frameworks play a crucial role in evaluating the resilience of ML systems to hardware faults by simulating various fault scenarios and analyzing their impact on the system’s performance. This enables designers to identify potential vulnerabilities and develop effective mitigation strategies, ultimately creating more robust and reliable ML systems that can operate safely despite hardware faults. This section provides an overview of widely used fault models in the literature and the tools and frameworks developed to evaluate the impact of such faults on ML systems.\n\n17.6.1 Fault Models and Error Models\nAs discussed previously, hardware faults can manifest in various ways, including transient, permanent, and intermittent faults. In addition to the type of fault under study, how the fault manifests is also important. For example, does the fault happen in a memory cell or during the computation of a functional unit? Is the impact on a single bit, or does it impact multiple bits? Does the fault propagate all the way and impact the application (causing an error), or does it get masked quickly and is considered benign? All these details impact what is known as the fault model, which plays a major role in simulating and measuring what happens to a system when a fault occurs.\nTo effectively study and understand the impact of hardware faults on ML systems, it is essential to understand the concepts of fault models and error models. A fault model describes how a hardware fault manifests itself in the system, while an error model represents how the fault propagates and affects the system’s behavior.\nFault models can be categorized based on various characteristics:\n\nDuration: Transient faults occur briefly and then disappear, while permanent faults persist indefinitely. Intermittent faults occur sporadically and may be difficult to diagnose.\nLocation: Faults can occur in hardware parts, such as memory cells, functional units, or interconnects.\nGranularity: Faults can affect a single bit (e.g., bitflip) or multiple bits (e.g., burst errors) within a hardware component.\n\nOn the other hand, error models describe how a fault propagates through the system and manifests as an error. An error may cause the system to deviate from its expected behavior, leading to incorrect results or even system failures. Error models can be defined at different levels of abstraction, from the hardware level (e.g., register-level bitflips) to the software level (e.g., corrupted weights or activations in an ML model).\nThe fault model (or error model, typically the more applicable terminology in understanding the robustness of an ML system) plays a major role in simulating and measuring what happens to a system when a fault occurs. The chosen model informs the assumptions made about the system being studied. For example, a system focusing on single-bit transient errors (Sangchoolie, Pattabiraman, and Karlsson 2017) would not be well-suited to understand the impact of permanent, multi-bit flip errors (Wilkening et al. 2014), as it is designed assuming a different model altogether.\n\nWilkening, Mark, Vilas Sridharan, Si Li, Fritz Previlon, Sudhanva Gurumurthi, and David R. Kaeli. 2014. “Calculating Architectural Vulnerability Factors for Spatial Multi-Bit Transient Faults.” In 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture, 293–305. IEEE; IEEE. https://doi.org/10.1109/micro.2014.15.\nFurthermore, implementing an error model is also an important consideration, particularly regarding where an error is said to occur in the compute stack. For instance, a single-bit flip model at the architectural register level differs from a single-bit flip in the weight of a model at the PyTorch level. Although both target a similar error model, the former would usually be modeled in an architecturally accurate simulator (like gem5 [binkert2011gem5]), which captures error propagation compared to the latter, focusing on value propagation through a model.\nRecent research has shown that certain characteristics of error models may exhibit similar behaviors across different levels of abstraction (Sangchoolie, Pattabiraman, and Karlsson 2017) (Papadimitriou and Gizopoulos 2021). For example, single-bit errors are generally more problematic than multi-bit errors, regardless of whether they are modeled at the hardware or software level. However, other characteristics, such as error masking (Mohanram and Touba 2003) as shown in Figure 17.37, may not always be accurately captured by software-level models, as they can hide underlying system effects.\n\nSangchoolie, Behrooz, Karthik Pattabiraman, and Johan Karlsson. 2017. “One Bit Is (Not) Enough: An Empirical Study of the Impact of Single and Multiple Bit-Flip Errors.” In 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 97–108. IEEE; IEEE. https://doi.org/10.1109/dsn.2017.30.\n\nPapadimitriou, George, and Dimitris Gizopoulos. 2021. “Demystifying the System Vulnerability Stack: Transient Fault Effects Across the Layers.” In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), 902–15. IEEE; IEEE. https://doi.org/10.1109/isca52012.2021.00075.\n\nMohanram, K., and N. A. Touba. 2003. “Partial Error Masking to Reduce Soft Error Failure Rate in Logic Circuits.” In Proceedings. 16th IEEE Symposium on Computer Arithmetic, 433–40. IEEE; IEEE Comput. Soc. https://doi.org/10.1109/dftvs.2003.1250141.\n\n\n\n\n\n\nFigure 17.37: Example of error masking in microarchitectural components (Ko 2021)\n\n\nKo, Yohan. 2021. “Characterizing System-Level Masking Effects Against Soft Errors.” Electronics 10 (18): 2286. https://doi.org/10.3390/electronics10182286.\n\n\nSome tools, such as Fidelity (He, Balaprakash, and Li 2020), aim to bridge the gap between hardware-level and software-level error models by mapping patterns between the two levels of abstraction (Cheng et al. 2016). This allows for more accurate modeling of hardware faults in software-based tools, essential for developing robust and reliable ML systems. Lower-level tools typically represent more accurate error propagation characteristics but must be faster in simulating many errors due to the complex nature of hardware system designs. On the other hand, higher-level tools, such as those implemented in ML frameworks like PyTorch or TensorFlow, which we will discuss soon in the later sections, are often faster and more efficient for evaluating the robustness of ML systems.\n\nCheng, Eric, Shahrzad Mirkhani, Lukasz G. Szafaryn, Chen-Yong Cher, Hyungmin Cho, Kevin Skadron, Mircea R. Stan, et al. 2016. “Clear: \\(<\\)U\\(>\\)c\\(<\\)/u\\(>\\) Ross \\(<\\)u\\(>\\)-l\\(<\\)/u\\(>\\) Ayer \\(<\\)u\\(>\\)e\\(<\\)/u\\(>\\) Xploration for \\(<\\)u\\(>\\)a\\(<\\)/u\\(>\\) Rchitecting \\(<\\)u\\(>\\)r\\(<\\)/u\\(>\\) Esilience - Combining Hardware and Software Techniques to Tolerate Soft Errors in Processor Cores.” In Proceedings of the 53rd Annual Design Automation Conference, 1–6. ACM. https://doi.org/10.1145/2897937.2897996.\nIn the following subsections, we will discuss various hardware-based and software-based fault injection methods and tools, highlighting their capabilities, limitations, and the fault and error models they support.\n\n\n17.6.2 Hardware-based Fault Injection\nAn error injection tool is a tool that allows the user to implement a particular error model, such as a transient single-bit flip during inference Figure 17.38. Most error injection tools are software-based, as software-level tools are faster for ML robustness studies. However, hardware-based fault injection methods are still important for grounding the higher-level error models, as they are considered the most accurate way to study the impact of faults on ML systems by directly manipulating the hardware to introduce faults. These methods allow researchers to observe the system’s behavior under real-world fault conditions. Both software-based and hardware-based error injection tools are described in this section in more detail.\n\n\n\n\n\n\nFigure 17.38: Hardware errors can occur due to a variety of reasons and at different times and/or locations in a system, which can be explored when studying the impact of hardware-based errors on systems (Ahmadilivani et al. 2024)\n\n\nAhmadilivani, Mohammad Hasan, Mahdi Taheri, Jaan Raik, Masoud Daneshtalab, and Maksim Jenihhin. 2024. “A Systematic Literature Review on Hardware Reliability Assessment Methods for Deep Neural Networks.” ACM Comput. Surv. 56 (6): 1–39. https://doi.org/10.1145/3638242.\n\n\n\nMethods\nTwo of the most common hardware-based fault injection methods are FPGA-based fault injection and radiation or beam testing.\nFPGA-based Fault Injection: Field-Programmable Gate Arrays (FPGAs) are reconfigurable integrated circuits that can be programmed to implement various hardware designs. In the context of fault injection, FPGAs offer high precision and accuracy, as researchers can target specific bits or sets of bits within the hardware. By modifying the FPGA configuration, faults can be introduced at specific locations and times during the execution of an ML model. FPGA-based fault injection allows for fine-grained control over the fault model, enabling researchers to study the impact of different types of faults, such as single-bit flips or multi-bit errors. This level of control makes FPGA-based fault injection a valuable tool for understanding the resilience of ML systems to hardware faults.\nRadiation or Beam Testing: Radiation or beam testing (Velazco, Foucard, and Peronnard 2010) involves exposing the hardware running an ML model to high-energy particles, such as protons or neutrons as illustrated in Figure 17.39. These particles can cause bitflips or other types of faults in the hardware, mimicking the effects of real-world radiation-induced faults. Beam testing is widely regarded as a highly accurate method for measuring the error rate induced by particle strikes on a running application. It provides a realistic representation of the faults in real-world environments, particularly in applications exposed to high radiation levels, such as space systems or particle physics experiments. However, unlike FPGA-based fault injection, beam testing could be more precise in targeting specific bits or components within the hardware, as it might be difficult to aim the beam of particles to a particular bit in the hardware. Despite being quite expensive from a research standpoint, beam testing is a well-regarded industry practice for reliability.\n\nVelazco, Raoul, Gilles Foucard, and Paul Peronnard. 2010. “Combining Results of Accelerated Radiation Tests and Fault Injections to Predict the Error Rate of an Application Implemented in SRAM-Based FPGAs.” IEEE Trans. Nucl. Sci. 57 (6): 3500–3505. https://doi.org/10.1109/tns.2010.2087355.\n\n\n\n\n\n\n\nFigure 17.39: Radiation test setup for semiconductor components (Lee et al. 2022) (Credit: JD Instrument)\n\n\nLee, Minwoong, Namho Lee, Huijeong Gwon, Jongyeol Kim, Younggwan Hwang, and Seongik Cho. 2022. “Design of Radiation-Tolerant High-Speed Signal Processing Circuit for Detecting Prompt Gamma Rays by Nuclear Explosion.” Electronics 11 (18): 2970. https://doi.org/10.3390/electronics11182970.\n\n\n\n\nLimitations\nDespite their high accuracy, hardware-based fault injection methods have several limitations that can hinder their widespread adoption:\nCost: FPGA-based fault injection and beam testing require specialized hardware and facilities, which can be expensive to set up and maintain. The cost of these methods can be a significant barrier for researchers and organizations with limited resources.\nScalability: Hardware-based methods are generally slower and less scalable than software-based methods. Injecting faults and collecting data on hardware can take time, limiting the number of experiments performed within a given timeframe. This can be particularly challenging when studying the resilience of large-scale ML systems or conducting statistical analyses that require many fault injection experiments.\nFlexibility: Hardware-based methods may not be as flexible as software-based methods in terms of the range of fault models and error models they can support. Modifying the hardware configuration or the experimental setup to accommodate different fault models can be more challenging and time-consuming than software-based methods.\nDespite these limitations, hardware-based fault injection methods remain essential tools for validating the accuracy of software-based methods and for studying the impact of faults on ML systems in realistic settings. By combining hardware-based and software-based methods, researchers can gain a more comprehensive understanding of ML systems’ resilience to hardware faults and develop effective mitigation strategies.\n\n\n\n17.6.3 Software-based Fault Injection Tools\nWith the rapid development of ML frameworks in recent years, software-based fault injection tools have gained popularity in studying the resilience of ML systems to hardware faults. These tools simulate the effects of hardware faults by modifying the software representation of the ML model or the underlying computational graph. The rise of ML frameworks such as TensorFlow, PyTorch, and Keras has facilitated the development of fault injection tools that are tightly integrated with these frameworks, making it easier for researchers to conduct fault injection experiments and analyze the results.\n\nAdvantages and Trade-offs\nSoftware-based fault injection tools offer several advantages over hardware-based methods:\nSpeed: Software-based tools are generally faster than hardware-based methods, as they do not require the modification of physical hardware or the setup of specialized equipment. This allows researchers to conduct more fault injection experiments in a shorter time, enabling more comprehensive analyses of the resilience of ML systems.\nFlexibility: Software-based tools are more flexible than hardware-based methods in terms of the range of fault and error models they can support. Researchers can easily modify the fault injection tool’s software implementation to accommodate different fault models or to target specific components of the ML system.\nAccessibility: Software-based tools are more accessible than hardware-based methods, as they do not require specialized hardware or facilities. This makes it easier for researchers and practitioners to conduct fault injection experiments and study the resilience of ML systems, even with limited resources.\n\n\nLimitations\nSoftware-based fault injection tools also have some limitations compared to hardware-based methods:\nAccuracy: Software-based tools may not always capture the full range of effects that hardware faults can have on the system. As these tools operate at a higher level of abstraction, they may need to catch up on some of the low-level hardware interactions and error propagation mechanisms that can impact the behavior of the ML system.\nFidelity: Software-based tools may provide a different level of Fidelity than hardware-based methods in terms of representing real-world fault conditions. The accuracy of the results obtained from software-based fault injection experiments may depend on how closely the software model approximates the actual hardware behavior.\n\n\n\n\n\n\nFigure 17.40: Comparison of techniques at layers of abstraction (Credit: MAVFI)\n\n\n\n\n\nTypes of Fault Injection Tools\nSoftware-based fault injection tools can be categorized based on their target frameworks or use cases. Here, we will discuss some of the most popular tools in each category:\nAres (Reagen et al. 2018), a fault injection tool initially developed for the Keras framework in 2018, emerged as one of the first tools to study the impact of hardware faults on deep neural networks (DNNs) in the context of the rising popularity of ML frameworks in the mid-to-late 2010s. The tool was validated against a DNN accelerator implemented in silicon, demonstrating its effectiveness in modeling hardware faults. Ares provides a comprehensive study on the impact of hardware faults in both weights and activation values, characterizing the effects of single-bit flips and bit-error rates (BER) on hardware structures. Later, the Ares framework was extended to support the PyTorch ecosystem, enabling researchers to investigate hardware faults in a more modern setting and further extending its utility in the field.\n\nReagen, Brandon, Udit Gupta, Lillian Pentecost, Paul Whatmough, Sae Kyu Lee, Niamh Mulholland, David Brooks, and Gu-Yeon Wei. 2018. “Ares: A Framework for Quantifying the Resilience of Deep Neural Networks.” In 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC), 1–6. IEEE. https://doi.org/10.1109/dac.2018.8465834.\n\n\n\n\n\n\nFigure 17.41: Hardware bitflips in ML workloads can cause phantom objects and misclassifications, which can erroneously be used downstream by larger systems, such as in autonomous driving. Shown above is a correct and faulty version of the same image using the PyTorchFI injection framework.\n\n\n\nPyTorchFI (Mahmoud et al. 2020), a fault injection tool specifically designed for the PyTorch framework, was developed in 2020 in collaboration with Nvidia Research. It enables the injection of faults into the weights, activations, and gradients of PyTorch models, supporting a wide range of fault models. By leveraging the GPU acceleration capabilities of PyTorch, PyTorchFI provides a fast and efficient implementation for conducting fault injection experiments on large-scale ML systems, as shown in Figure 17.41. The tool’s speed and ease of use have led to widespread adoption in the community, resulting in multiple developer-led projects, such as PyTorchALFI by Intel Labs, which focuses on safety in automotive environments. Follow-up PyTorch-centric tools for fault injection include Dr. DNA by Meta (Ma et al. 2024) (which further facilitates the Pythonic programming model for ease of use), and the GoldenEye framework (Mahmoud et al. 2022), which incorporates novel numerical datatypes (such as AdaptivFloat (Tambe et al. 2020) and BlockFloat in the context of hardware bit flips.\n\nMahmoud, Abdulrahman, Neeraj Aggarwal, Alex Nobbe, Jose Rodrigo Sanchez Vicarte, Sarita V. Adve, Christopher W. Fletcher, Iuri Frosio, and Siva Kumar Sastry Hari. 2020. “PyTorchFI: A Runtime Perturbation Tool for DNNs.” In 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-w), 25–31. IEEE; IEEE. https://doi.org/10.1109/dsn-w50199.2020.00014.\n\nMa, Dongning, Fred Lin, Alban Desmaison, Joel Coburn, Daniel Moore, Sriram Sankar, and Xun Jiao. 2024. “Dr. DNA: Combating Silent Data Corruptions in Deep Learning Using Distribution of Neuron Activations.” In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, 239–52. ACM. https://doi.org/10.1145/3620666.3651349.\n\nMahmoud, Abdulrahman, Thierry Tambe, Tarek Aloui, David Brooks, and Gu-Yeon Wei. 2022. “GoldenEye: A Platform for Evaluating Emerging Numerical Data Formats in DNN Accelerators.” In 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 206–14. IEEE. https://doi.org/10.1109/dsn53405.2022.00031.\n\nTambe, Thierry, En-Yu Yang, Zishen Wan, Yuntian Deng, Vijay Janapa Reddi, Alexander Rush, David Brooks, and Gu-Yeon Wei. 2020. “Algorithm-Hardware Co-Design of Adaptive Floating-Point Encodings for Resilient Deep Learning Inference.” In 2020 57th ACM/IEEE Design Automation Conference (DAC), 1–6. IEEE; IEEE. https://doi.org/10.1109/dac18072.2020.9218516.\n\nChen, Zitao, Niranjhana Narayanan, Bo Fang, Guanpeng Li, Karthik Pattabiraman, and Nathan DeBardeleben. 2020. “TensorFI: A Flexible Fault Injection Framework for TensorFlow Applications.” In 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), 426–35. IEEE; IEEE. https://doi.org/10.1109/issre5003.2020.00047.\n\nChen, Zitao, Guanpeng Li, Karthik Pattabiraman, and Nathan DeBardeleben. 2019. “\\(<\\)i\\(>\\)BinFI\\(<\\)/i\\(>\\): An Efficient Fault Injector for Safety-Critical Machine Learning Systems.” In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. SC ’19. New York, NY, USA: ACM. https://doi.org/10.1145/3295500.3356177.\nTensorFI (Chen et al. 2020), or the TensorFlow Fault Injector, is a fault injection tool developed specifically for the TensorFlow framework. Analogous to Ares and PyTorchFI, TensorFI is considered the state-of-the-art tool for ML robustness studies in the TensorFlow ecosystem. It allows researchers to inject faults into the computational graph of TensorFlow models and study their impact on the model’s performance, supporting a wide range of fault models. One of the key benefits of TensorFI is its ability to evaluate the resilience of various ML models, not just DNNs. Further advancements, such as BinFi (Chen et al. 2019), provide a mechanism to speed up error injection experiments by focusing on the \"important\" bits in the system, accelerating the process of ML robustness analysis and prioritizing the critical components of a model.\nNVBitFI (T. Tsai et al. 2021), a general-purpose fault injection tool developed by Nvidia for their GPU platforms, operates at a lower level compared to framework-specific tools like Ares, PyTorchFI, and TensorFlow. While these tools focus on various deep learning platforms to implement and perform robustness analysis, NVBitFI targets the underlying hardware assembly code for fault injection. This allows researchers to inject faults into any application running on Nvidia GPUs, making it a versatile tool for studying the resilience of ML systems and other GPU-accelerated applications. By enabling users to inject errors at the architectural level, NVBitFI provides a more general-purpose fault model that is not restricted to just ML models. As Nvidia’s GPU systems are commonly used in many ML-based systems, NVBitFI is a valuable tool for comprehensive fault injection analysis across various applications.\n\nTsai, Timothy, Siva Kumar Sastry Hari, Michael Sullivan, Oreste Villa, and Stephen W. Keckler. 2021. “NVBitFI: Dynamic Fault Injection for GPUs.” In 2021 51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 284–91. IEEE; IEEE. https://doi.org/10.1109/dsn48987.2021.00041.\n\nDomain-specific Examples\nDomain-specific fault injection tools have been developed to address various ML application domains’ unique challenges and requirements, such as autonomous vehicles and robotics. This section highlights three domain-specific fault injection tools: DriveFI and PyTorchALFI for autonomous vehicles and MAVFI for uncrewed aerial vehicles (UAVs). These tools enable researchers to inject hardware faults into these complex systems’ perception, control, and other subsystems, allowing them to study the impact of faults on system performance and safety. The development of these software-based fault injection tools has greatly expanded the capabilities of the ML community to develop more robust and reliable systems that can operate safely and effectively in the presence of hardware faults.\nDriveFI (Jha et al. 2019) is a fault injection tool designed for autonomous vehicles. It enables the injection of hardware faults into the perception and control pipelines of autonomous vehicle systems, allowing researchers to study the impact of these faults on the system’s performance and safety. DriveFI has been integrated with industry-standard autonomous driving platforms, such as Nvidia DriveAV and Baidu Apollo, making it a valuable tool for evaluating the resilience of autonomous vehicle systems.\n\nJha, Saurabh, Subho Banerjee, Timothy Tsai, Siva K. S. Hari, Michael B. Sullivan, Zbigniew T. Kalbarczyk, Stephen W. Keckler, and Ravishankar K. Iyer. 2019. “ML-Based Fault Injection for Autonomous Vehicles: A Case for Bayesian Fault Injection.” In 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), 112–24. IEEE; IEEE. https://doi.org/10.1109/dsn.2019.00025.\n\nGräfe, Ralf, Qutub Syed Sha, Florian Geissler, and Michael Paulitsch. 2023. “Large-Scale Application of Fault Injection into PyTorch Models -an Extension to PyTorchFI for Validation Efficiency.” In 2023 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks - Supplemental Volume (DSN-s), 56–62. IEEE; IEEE. https://doi.org/10.1109/dsn-s58398.2023.00025.\nPyTorchALFI (Gräfe et al. 2023) is an extension of PyTorchFI developed by Intel Labs for the autonomous vehicle domain. It builds upon PyTorchFI’s fault injection capabilities. It adds features specifically tailored for evaluating the resilience of autonomous vehicle systems, such as the ability to inject faults into the camera and LiDAR sensor data.\nMAVFI (Hsiao et al. 2023) is a fault injection tool designed for the robotics domain, specifically for uncrewed aerial vehicles (UAVs). MAVFI is built on top of the Robot Operating System (ROS) framework and allows researchers to inject faults into the various components of a UAV system, such as sensors, actuators, and control algorithms. By evaluating the impact of these faults on the UAV’s performance and stability, researchers can develop more resilient and fault-tolerant UAV systems.\n\nHsiao, Yu-Shun, Zishen Wan, Tianyu Jia, Radhika Ghosal, Abdulrahman Mahmoud, Arijit Raychowdhury, David Brooks, Gu-Yeon Wei, and Vijay Janapa Reddi. 2023. “MAVFI: An End-to-End Fault Analysis Framework with Anomaly Detection and Recovery for Micro Aerial Vehicles.” In 2023 Design, Automation &Amp; Test in Europe Conference &Amp; Exhibition (DATE), 1–6. IEEE; IEEE. https://doi.org/10.23919/date56975.2023.10137246.\nThe development of software-based fault injection tools has greatly expanded the capabilities of researchers and practitioners to study the resilience of ML systems to hardware faults. By leveraging the speed, flexibility, and accessibility of these tools, the ML community can develop more robust and reliable systems that can operate safely and effectively in the presence of hardware faults.\n\n\n\n\n17.6.4 Bridging the Gap between Hardware and Software Error Models\nWhile software-based fault injection tools offer many advantages in speed, flexibility, and accessibility, they may not always accurately capture the full range of effects that hardware faults can have on the system. This is because software-based tools operate at a higher level of abstraction than hardware-based methods and may miss some of the low-level hardware interactions and error propagation mechanisms that can impact the behavior of the ML system.\nAs Bolchini et al. (2023) illustrates in their work, hardware errors can manifest in complex spatial distribution patterns that are challenging to fully replicate with software-based fault injection alone. They identify four distinct patterns: (a) single point, where the fault corrupts a single value in a feature map; (b) same row, where the fault corrupts a partial or entire row in a single feature map; (c) bullet wake, where the fault corrupts the same location across multiple feature maps; and (d) shatter glass, which combines the effects of same row and bullet wake patterns, as shown in Figure 17.42. These intricate error propagation mechanisms highlight the need for hardware-aware fault injection techniques to accurately assess the resilience of ML systems.\n\n\n\n\n\n\nFigure 17.42: Hardware errors may manifest themselves in different ways at the software level, as classified by Bolchini et al. (Bolchini et al. 2023)\n\n\nBolchini, Cristiana, Luca Cassano, Antonio Miele, and Alessandro Toschi. 2023. “Fast and Accurate Error Simulation for CNNs Against Soft Errors.” IEEE Trans. Comput. 72 (4): 984–97. https://doi.org/10.1109/tc.2022.3184274.\n\n\nResearchers have developed tools to address this issue by bridging the gap between low-level hardware error models and higher-level software error models. One such tool is Fidelity, designed to map patterns between hardware-level faults and their software-level manifestations.\n\nFidelity: Bridging the Gap\nFidelity (He, Balaprakash, and Li 2020) is a tool for accurately modeling hardware faults in software-based fault injection experiments. It achieves this by carefully studying the relationship between hardware-level faults and their impact on the software representation of the ML system.\n\nHe, Yi, Prasanna Balaprakash, and Yanjing Li. 2020. “FIdelity: Efficient Resilience Analysis Framework for Deep Learning Accelerators.” In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 270–81. IEEE; IEEE. https://doi.org/10.1109/micro50266.2020.00033.\nThe key insights behind Fidelity are:\n\nFault Propagation: Fidelity models how faults propagate through the hardware and manifest as errors in the software-visible state of the system. By understanding these propagation patterns, Fidelity can more accurately simulate the effects of hardware faults in software-based experiments.\nFault Equivalence: Fidelity identifies equivalent classes of hardware faults that produce similar software-level errors. This allows researchers to design software-based fault models that are representative of the underlying hardware faults without the need to model every possible hardware fault individually.\nLayered Approach: Fidelity employs a layered approach to fault modeling, where the effects of hardware faults are propagated through multiple levels of abstraction, from the hardware to the software level. This approach ensures that the software-based fault models are grounded in the actual behavior of the hardware.\n\nBy incorporating these insights, Fidelity enables software-based fault injection tools to capture the effects of hardware faults on ML systems accurately. This is particularly important for safety-critical applications, where the system’s resilience to hardware faults is paramount.\n\n\nImportance of Capturing True Hardware Behavior\nCapturing true hardware behavior in software-based fault injection tools is crucial for several reasons:\n\nAccuracy: By accurately modeling the effects of hardware faults, software-based tools can provide more reliable insights into the resilience of ML systems. This is essential for designing and validating fault-tolerant systems that can operate safely and effectively in the presence of hardware faults.\nReproducibility: When software-based tools accurately capture hardware behavior, fault injection experiments become more reproducible across different platforms and environments. This is important for the scientific study of ML system resilience, as it allows researchers to compare and validate results across different studies and implementations.\nEfficiency: Software-based tools that capture true hardware behavior can be more efficient in their fault injection experiments by focusing on the most representative and impactful fault models. This allows researchers to cover a wider range of fault scenarios and system configurations with limited computational resources.\nMitigation Strategies: Understanding how hardware faults manifest at the software level is crucial for developing effective mitigation strategies. By accurately capturing hardware behavior, software-based fault injection tools can help researchers identify the most vulnerable components of the ML system and design targeted hardening techniques to improve resilience.\n\nTools like Fidelity are vital in advancing the state-of-the-art in ML system resilience research. These tools enable researchers to conduct more accurate, reproducible, and efficient fault injection experiments by bridging the gap between hardware and software error models. As the complexity and criticality of ML systems continue to grow, the importance of capturing true hardware behavior in software-based fault injection tools will only become more apparent.\nOngoing research in this area aims to refine the mapping between hardware and software error models and develop new techniques for efficiently simulating hardware faults in software-based experiments. As these tools mature, they will provide the ML community with increasingly powerful and accessible means to study and improve the resilience of ML systems to hardware faults.", "crumbs": [ "MAIN", "17  Robust AI"