diff --git a/episodes/chap1_introduction_FL.md b/episodes/chap1_introduction_FL.md index 9c44c1b6..e21f15f8 100644 --- a/episodes/chap1_introduction_FL.md +++ b/episodes/chap1_introduction_FL.md @@ -38,7 +38,7 @@ In classic data analysis, all data is copied over into a single place. This make use conventional data analysis software and tools to gain insights. -![In classic analysis all the data is brought together](fig/classic%20analysis.jpg){alt='Two tables +![In classic analysis all the data is brought together](fig/chapter1/classic_analysis.jpg){alt='Two tables with data are moved to a central location'} Unfortunately this way of working does not respect the privacy of the people contained within the @@ -74,7 +74,7 @@ will in this case be removed so that individuals stay anonymous. Data _pseudonim similar process, but in this case, the records will be assigned an id that will make it possible to link individuals across datasets. -![Data anonymization](fig%2Fdata_anonymization.jpg){alt='Names are censored before the datasets +![Data anonymization](fig/chapter1/data_anonymization.jpg){alt='Names are censored before the datasets are sent to a central place'} While data anonymization and pseudonymization are often a good first step, there is no guarantee @@ -93,11 +93,14 @@ over it. In this case, it is not the data that travels, but the analysis itself. a query or instruction to the data and only the results will get back to the user. The results are often akin to a form of _aggregation_ of the data. This can be in the shape of traditional -statistics like the mean, or it can be more intricate like a machine learning model. +statistics like the mean, or it could be a combination of aggregations to form a more complicated +analysis. Aggregating the data does not ensure complete protection of person-level information, but it certainly makes it less likely that this will happen. +![In federated analysis, an aggregation of the local data is sent to a central point. At the central point the local aggregations are combined. This can also be a small step in a larger analysis.](fig/chapter1/federated_sum.jpg) + TODO: Example of data leakage in simple aggregated case ## Federated learning @@ -109,11 +112,13 @@ traditional federated learning, the clients train machine learning models, and o the models are sent back to the central server. The central server combines the updates from all the individual clients into one final machine learning model. +![In federated learning only the gradients of models are shared](fig/chapter1/federated_learning.jpg) + There are caveats to using this type of data analysis though. Although the data transmitted from the clients to the server are an aggregation of the raw data, researchers have found a way to use this data to reconstruct the original data. This vulnerability is called _gradient leakage_. -![An example of gradient leakage](fig/gradient_leakage.jpg) +![An example of gradient leakage. The order might not be correct but the images are still very close to the originals.](fig/chapter1/gradient_leakage.jpg) ## Secure Multiparty Computation @@ -133,6 +138,8 @@ retrieved. When combining multiple values using secret sharing, this will result in the parties owning new puzzle pieces that when put together will reveal the result of the computation. +![In secure multiparty computation parties collaboratively perform an analysis while holding only encrypted pieces of the data](fig/chapter1/mpc.jpg) + ::: callout ### Secret sharing, an example @@ -173,6 +180,9 @@ selection of inputs with random noise. A single individual will then always be a their data has contributed to the final result. An individual has _plausible deniability_ with regards to whether it was part of the dataset. +![Differential privacy replaces a subset of the data with random values](fig/chapter1/differential_privacy.jpg) + + ## Blocks upon blocks The previously mentioned techniques are not used in isolation, but are usually stacked on top of @@ -204,6 +214,8 @@ of privacy enhancing algorithms than horizontal partitioning. Data can even be horizontally and vertically partitioned at the same time. In these cases, it may be necessary to combine multiple techniques. +![Horizontal and vertical partitioning refers to how data is separated](fig/chapter1/horizontal_vertical_partitioning.jpg) + ## Technology doesn't solve everything You have now learned about various technologies for analyzing data while preserving privacy of diff --git a/episodes/fig/chapter1/classic_analysis.jpg b/episodes/fig/chapter1/classic_analysis.jpg new file mode 100644 index 00000000..0b3d9d36 Binary files /dev/null and b/episodes/fig/chapter1/classic_analysis.jpg differ diff --git a/episodes/fig/chapter1/data_anonymization.jpg b/episodes/fig/chapter1/data_anonymization.jpg new file mode 100644 index 00000000..dc856877 Binary files /dev/null and b/episodes/fig/chapter1/data_anonymization.jpg differ diff --git a/episodes/fig/chapter1/differential_privacy.jpg b/episodes/fig/chapter1/differential_privacy.jpg new file mode 100644 index 00000000..4db3d764 Binary files /dev/null and b/episodes/fig/chapter1/differential_privacy.jpg differ diff --git a/episodes/fig/chapter1/federated_learning.jpg b/episodes/fig/chapter1/federated_learning.jpg new file mode 100644 index 00000000..1e779459 Binary files /dev/null and b/episodes/fig/chapter1/federated_learning.jpg differ diff --git a/episodes/fig/chapter1/federated_sum.jpg b/episodes/fig/chapter1/federated_sum.jpg new file mode 100644 index 00000000..489023aa Binary files /dev/null and b/episodes/fig/chapter1/federated_sum.jpg differ diff --git a/episodes/fig/gradient_leakage.jpg b/episodes/fig/chapter1/gradient_leakage.jpg similarity index 100% rename from episodes/fig/gradient_leakage.jpg rename to episodes/fig/chapter1/gradient_leakage.jpg diff --git a/episodes/fig/chapter1/horizontal_vertical_partitioning.jpg b/episodes/fig/chapter1/horizontal_vertical_partitioning.jpg new file mode 100644 index 00000000..80e70c8b Binary files /dev/null and b/episodes/fig/chapter1/horizontal_vertical_partitioning.jpg differ diff --git a/episodes/fig/chapter1/mpc.jpg b/episodes/fig/chapter1/mpc.jpg new file mode 100644 index 00000000..40110378 Binary files /dev/null and b/episodes/fig/chapter1/mpc.jpg differ diff --git a/episodes/fig/data_anonymization.jpg b/episodes/fig/data_anonymization.jpg deleted file mode 100644 index 1ce3fb3c..00000000 Binary files a/episodes/fig/data_anonymization.jpg and /dev/null differ