Skip to content

Commit

Permalink
Merge branch 'main' into public_main
Browse files Browse the repository at this point in the history
  • Loading branch information
haase committed Oct 29, 2024
2 parents ca49092 + 0c2aa65 commit 68a271c
Show file tree
Hide file tree
Showing 156 changed files with 17,989 additions and 50 deletions.
50 changes: 48 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,48 @@
# ddhcb-release
Public Release of DDHC-B
This repository contains source code for the SafeTab-H disclosure
avoidance application. SafeTab-H was used by the Census Bureau for the
protection of individual 2020 Census responses in the tabulation and
publication of the Detailed Demographic and Housing Characteristics
File B (DDHC-B). Previously, the Census Bureau has released the source
code for SafeTab-P, the application used to protect the Detailed
Demographic and Housing Characteristics File A (DDHC-A).

Using the mathematical principles of formal privacy, SafeTab-H infused
noise into Census survey results to create *privacy-protected
microdata* which were used by Bureau subject matter experts to
tabulate the 2020 DDHC-H product. SafeTab-H was built on Tumult's
"Analytics" and "Core" platforms. both SafeTab-H and the underlying
platforms are implemented in Python. The latest version of the
platforms can be found at [[https://tmlt.dev/]].

In the interests of both transparency and scientific advancement, the
Census Bureau committed to releasing any source code used in creation
of products protected by formal privacy guarantees. In the case of the
the Detailed Demographic & Housing Characteristics publications, this
includes code developed under contract by Tumult Software (tmlt.io)
and MITRE corporation. Tumult's underlying platform is evolving and
the code in the repository is a snapshot of the code used for the
production of the DDHC-B product.

The bureau has already separately released the internally developed
software for the Top Down Algorithm (TDA) used in production of the
2020 Redistricting and the 2020 Demographic & Housing Characteristics
products.

This software int this repository is divided across multiple
sub-directories, including:
* `configs` contains the specific configuration files used for the
production DDHC-B runs, including privacy loss budget (PLB) allocations
and the rules for adaptive table generation. These configurations reflect
decisions by the Bureau's DSEP (Data Stewardship Executive Policy) committee
based on experiments conducted by Census Bureau staff.
* `safetab-h/safetab_h` contains the source code for the application itself as used
to generate the protected microdata used in production.
* `safetab-h/safetab_utils` contains utilities common among the SafeTab products
developed by Tumult for the Census Bureau.
* `mitre/cef_readers` contains code by MITRE to read the Census input
files used by the SafeTab applications.
* `tumult` contains the Tumult Analytics platform. This is divided
into `common`, `analytics`, and `core` directories. The `core` directory
also includes a pre-packaged Python *wheel* for the core library.
* `ctools` contains Python utility libraries developed the the Census
Bureau's DAS team and used by the MITRE CEF readers.
106 changes: 68 additions & 38 deletions mitre/README.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,14 @@
# das-phsafe-cef-reader
Safetab CEF Reader

## Installation

This repo is published as the python package das-phsafe-cef-reader

It's hosted by DAS in
Nexus at https://repo.rm.census.gov/repository/DAS_Python/.

To install the latest version on DAS systems:

```
pip3 install --upgrade das-phsafe-cef-reader
```

To install on non-DAS systems:

```
pip3 install --upgrade --extra-index-url http://repo.rm.census.gov/repository/DAS_Python/das-phsafe-cef-reader
```

## Development
This package supports an editable install for easy development and debugging.

To install locally, run the following command. It is recommended to do this inside a virtual environment to isolate
any changes/dependencies.
Once this command is run, you can edit the code in src/phsafe_safetab_reader and it will be immediately available
for use in python as if installed
To install locally, run the following command. It is recommended to
do this inside a virtual environment to isolate any
changes/dependencies. Once this command is run, you can edit the code
in src/phsafe_safetab_reader and it will be immediately available for
use in python as if installed

[//]: # (via a package.)

Expand Down Expand Up @@ -69,7 +51,11 @@ das-phsafe-cef-reader versioning uses a 4 digit semantic versioning convention:
```
<major>.<minor>.<patch>.<build>
```
A change to the major version number indicates a significant change in functionality that is not backwards compatible
<<<<<<< HEAD
A change to the major version number indicates a significant change in functionality that is not backwards compatible
=======
A change to the major version number indicates a significant change in functionality that is not backwards compatible
>>>>>>> main
with previous versions.

A change to the minor version number indicates a backwards compatible change to functionality.
Expand All @@ -83,42 +69,78 @@ The version of the package is managed in __init__.py.

Update the MAJOR, MINOR and PATCH versions as appropriate and commit the change.

Note: Whenever a higher level value is updated, reset the lower to 0. If you increase the MAJOR version, set MINOR,
<<<<<<< HEAD
Note: Whenever a higher level value is updated, reset the lower to 0. If you increase the MAJOR version, set MINOR,
=======
Note: Whenever a higher level value is updated, reset the lower to 0. If you increase the MAJOR version, set MINOR,
>>>>>>> main
PATCH and BUILD to 0.

## Package Information

This repo was cloned from the "phsafe_safetab_reader" directory of das_decennial. It also includes a file from
<<<<<<< HEAD
This repo was cloned from the "phsafe_safetab_reader" directory of das_decennial. It also includes a file from
the "programs/reader/cef2020" directory of the same repo.

This repo has a merge of files from both the PHSafe and Safetab branches,
=======
This repo was cloned from the "phsafe_safetab_reader" directory of das_decennial. It also includes a file from
the "programs/reader/cef2020" directory of the same repo.

This repo has a merge of files from both the PHSafe and Safetab branches,
This repo has a merge of files from both the PHSafe and Safetab branches,
>>>>>>> main
prefixed with `cef_` and `safetab_cef_` respectively.

The phsafe_safetab_reader folder contains all the safetab-p and safetab-h cef readers.

This is used for the SafeTab and PHSafe CEF Readers, which convert the CEF microdata files into Spark Dataframes that
<<<<<<< HEAD
This is used for the SafeTab and PHSafe CEF Readers, which convert the CEF microdata files into Spark Dataframes that
SafeTab and PHSafe use as inputs.

SafeTab Documentation for the format of its inputs is here:
SafeTab Documentation for the format of its inputs is here:
=======
This is used for the SafeTab and PHSafe CEF Readers, which convert the CEF microdata files into Spark Dataframes that
SafeTab and PHSafe use as inputs.

SafeTab Documentation for the format of its inputs is here:
>>>>>>> main
https://github.t26.it.census.gov/DAS/tumult-decennial-census-2022-products/blob/main/tumult/safetab_p/SafeTab_P_Documentation.pdf

### Important Files:

**safetab_cef_config.ini** - Contains the locations of the 2020 CEF microdata files on s3 which are the inputs for the
<<<<<<< HEAD
**safetab_cef_config.ini** - Contains the locations of the 2020 CEF microdata files on s3 which are the inputs for the
CEF Reader

**safetab_cef_config_2010.ini** - Used for safetab P Cef Reader (safetab_cef_reader). It contains the locations of the
2010 CEF microdata files on s3 which are the inputs for the CEF Reader

**safetab_h_cef_config_2010.ini** - Used for safetab_h_cef_reader. It contains the locations of the 2010 CEF microdata
files on s3 which are the inputs for the CEF Reader

**safetab_cef_reader.py** - Reads the fixed-width CEF files in and uses the CEF Validator to turn them into dataframes.
Does additional modifications to get dataframe to SafeTab's expected input.

**safetab_h_cef_reader.py** - Reads fixed-width CEF files and the outputs T1 files from
Safetab-H, and uses the CEF Validator to turn them into dataframes. Does additional modifications to get dataframe
to SafeTab's expected input. CEF Validator located here for reference:
=======
**safetab_cef_config.ini** - Contains the locations of the 2020 CEF microdata files on s3 which are the inputs for the
CEF Reader

**safetab_cef_config_2010.ini** - Used for safetab P Cef Reader (safetab_cef_reader). It contains the locations of the
**safetab_cef_config_2010.ini** - Used for safetab P Cef Reader (safetab_cef_reader). It contains the locations of the
2010 CEF microdata files on s3 which are the inputs for the CEF Reader

**safetab_h_cef_config_2010.ini** - Used for safetab_h_cef_reader. It contains the locations of the 2010 CEF microdata
**safetab_h_cef_config_2010.ini** - Used for safetab_h_cef_reader. It contains the locations of the 2010 CEF microdata
files on s3 which are the inputs for the CEF Reader

**safetab_cef_reader.py** - Reads the fixed-width CEF files in and uses the CEF Validator to turn them into dataframes.
**safetab_cef_reader.py** - Reads the fixed-width CEF files in and uses the CEF Validator to turn them into dataframes.
Does additional modifications to get dataframe to SafeTab's expected input.

**safetab_h_cef_reader.py** - Reads fixed-width CEF files and the outputs T1 files from
Safetab-H, and uses the CEF Validator to turn them into dataframes. Does additional modifications to get dataframe
to SafeTab's expected input. CEF Validator located here for reference:
**safetab_h_cef_reader.py** - Reads fixed-width CEF files and the outputs T1 files from
Safetab-H, and uses the CEF Validator to turn them into dataframes. Does additional modifications to get dataframe
to SafeTab's expected input. CEF Validator located here for reference:
>>>>>>> main
https://github.t26.it.census.gov/DAS/das_decennial/blob/mitre_safetab_frame/programs/reader/cef_2020/cef_validator_classes.py

**safetab_p_cef_reader.py** - Reads fixed-width CEF files and the outputs T1 files from
Expand All @@ -132,10 +154,18 @@ https://github.t26.it.census.gov/DAS/das_decennial/blob/mitre_safetab_frame/prog

**cef_runner.sh** - used to run the PHSafe CER reader

**cef_reader.py** - Reads the fixed-width CEF files in and uses the CEF Validator to turn them into dataframes.
<<<<<<< HEAD
**cef_reader.py** - Reads the fixed-width CEF files in and uses the CEF Validator to turn them into dataframes.
=======
**cef_reader.py** - Reads the fixed-width CEF files in and uses the CEF Validator to turn them into dataframes.
>>>>>>> main
Does additional modifications to get dataframe to PHSafe's expected input.

**cef_config.ini** - Inputs input file locations used by PHSafe CEF reader

**cef_validator_classes.py** A dependency for some of the python files, originally from the reader in das_decennial's
<<<<<<< HEAD
**cef_validator_classes.py** A dependency for some of the python files, originally from the reader in das_decennial's
=======
**cef_validator_classes.py** A dependency for some of the python files, originally from the reader in das_decennial's
>>>>>>> main
programs/reader directory.
13 changes: 3 additions & 10 deletions safetab-h/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,16 +17,6 @@ Copyright 2024 Tumult Labs

This repository contains SafeTab-H and its supporting Tumult-developed libraries. For instructions on running SafeTab-H, see its [README](safetab_h/README.md).

### Access to the Deliverable

The source code and documentation for this deliverable can be accessed by executing the following command at the command line (or entering the URL into the clone window of a client, e.g., Github Desktop):

```
git clone https://decennial-census:[email protected]/tumult-labs/safetab-h-release.git
```

In the URL above, `AaY8XLQ8_zanZhSiKtJf` is a Gitlab deploy token associated with the username `decennial-census`. This grants read access to this repository.

### Contents

In the repository there are six folders, each of which contains a component of the release:
Expand All @@ -42,6 +32,7 @@ SafeTab-H also requires a CEF reader module for reading data from Census' file f

For details, consult each library's `README` within its respective subfolder. To see which new features have been added since the previous versions, consult their respective `CHANGELOG`s.

<<<<<<< HEAD
### Synthetic Data

This release also comes with a set of synthetic data files that can be used to test SafeTab-H. The ZIP file containing the sample files is hosted on Amazon Simple Storage Service (Amazon S3). Please note that the download link will be valid until 2024-04-09 at 12:00 pm Eastern.
Expand Down Expand Up @@ -80,3 +71,5 @@ The download file is `safetab-h-full-size-synthetic-data.zip` will contain the f
- `pop-group-totals.txt`: The T1 output file from a SafeTab-P run on a 300 million record synthetic dataset.

See [SafeTab-H Spec Doc](safetab_h/SafeTab_H_Documentation.pdf) for a description of each file. See the [SafeTab-H Library `README`](safetab_h/README.md) for more input directory setup notes.
=======
>>>>>>> main
11 changes: 11 additions & 0 deletions uscb/ctools/.gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
*.xlsx binary
*.docx binary
*.py text eol=auto
*.ini text eol=auto
*.md text eol=auto
*.tex text eol=auto
*.txt text eol=auto
*.bat text eol=auto
*.log text eol=auto
.gitattributes text eol=auto
.gitignore text eol=auto
101 changes: 101 additions & 0 deletions uscb/ctools/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@

# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
# Byte-compiled / optimized / DLL files
# C extensions
# Distribution / packaging
# Django stuff:
# Environments
# Flask stuff:
# Installer logs
# Jupyter Notebook
# PyBuilder
# PyInstaller
# Rope project settings
# SageMath parsed files
# Scrapy stuff:
# Sphinx documentation
# Spyder project settings
# Translations
# Unit test / coverage reports
# celery beat schedule file
# mkdocs documentation
# mypy
# pyenv
*$py.class
*.cover
*.egg
*.egg-info/
*.log
*.manifest
*.mo
*.pot
*.py[cod]
*.sage.py
*.so
*.spec
*~
.DS_Store
.Python
.cache
.coverage
.coverage.*
.eggs/
.env
.hypothesis/
.installed.cfg
.ipynb_checkpoints
.mypy_cache/
.pytest_cache/
.python-version
.ropeproject
.scrapy
.spyderproject
.spyproject
.tox/
.venv
.webassets-cache
/site
ENV/
MANIFEST
__pycache__/
build/
celerybeat-schedule
coverage.xml
db.sqlite3
develop-eggs/
dist/
docs/_build/
downloads/
eggs/
env.bak/
env/
htmlcov/
instance/
lib/
lib64/
local_settings.py
nosetests.xml
parts/
pip-delete-this-directory.txt
pip-log.txt
sdist/
target/
var/
venv.bak/
venv/
wheels/
output.*
*.aux
*.pdf
*.tex
demo?.html
demo?.md
demo*.png
*.png
test.gz
.idea
.idea/

TAGS
tydoc_awsome_demo.html
Loading

0 comments on commit 68a271c

Please sign in to comment.