Skip to content

Commit

Permalink
MongoDB Connection Documentation (#153)
Browse files Browse the repository at this point in the history
Co-authored-by: Kaleb <[email protected]>
  • Loading branch information
BeniDage and SassafrasAU authored Sep 23, 2024
1 parent c375f4e commit b82a001
Show file tree
Hide file tree
Showing 3 changed files with 159 additions and 38 deletions.
80 changes: 42 additions & 38 deletions docs/data-warehousing/Data Anonymization/dataanonymization.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,69 +3,73 @@ sidebar_position: 3
---

# Anonymization and Masking in Healthcare Data

Implementation and Rationale

:::info
**Effective Date:** 30 Apr 2024. **Last Edited:** 10 May 2024. **Author:** Meghana Kaveti
**Document Reference:** Data Anonymization. **Expiry Date:** 29 April 2025. **Version:** 1.0.
:::


## Introduction:

In the realm of healthcare data management, preserving patient privacy and confidentiality is
of utmost importance. Anonymization and masking techniques serve as essential tools in
safeguarding sensitive information while allowing for meaningful analysis and research. This
document elucidates the implementation of anonymization and masking in a heart attack
In the realm of healthcare data management, preserving patient privacy and confidentiality is
of utmost importance. Anonymization and masking techniques serve as essential tools in
safeguarding sensitive information while allowing for meaningful analysis and research. This
document elucidates the implementation of anonymization and masking in a heart attack
prediction dataset and provides insights into the rationale behind their application.

## Code Implementation:
The provided code utilizes Python libraries such as Pandas, Faker, and hashlib to anonymize

The provided code utilizes Python libraries such as Pandas, Faker, and hashlib to anonymize
and mask sensitive columns within the heart attack prediction dataset. Let's delve into the implementation details

### Reading the Dataset:

The original dataset is read into a Pandas DataFrame, facilitating data
The original dataset is read into a Pandas DataFrame, facilitating data
manipulation and transformation.

### Initializing Faker:
### Initializing Faker:

An instance of the Faker library is initialized to generate fake data for non-sensitive columns.

### Anonymization and Masking:

- Patient ID: Hashing using SHA-256 ensures irreversible transformation, preserving
anonymity while retaining uniqueness.
- Age: Age values are generalized into ranges to conceal precise age information, enhancing
privacy.
- Binary Attributes: Columns representing binary attributes such as sex, diabetes, smoking,
etc., are masked as 'Yes' or 'No' to obscure specific health conditions or behaviors.
- Heart Attack Risk: Masked as 'High' or 'Low' to conceal exact risk prediction outcomes.
- Numeric Attributes: Numeric values such as cholesterol, blood pressure, etc., are replaced
with random values within a specified range, preventing re-identification while preserving
statistical properties.

### Saving the Anonymized Dataset:
- Patient ID: Hashing using SHA-256 ensures irreversible transformation, preserving
anonymity while retaining uniqueness.
- Age: Age values are generalized into ranges to conceal precise age information, enhancing
privacy.
- Binary Attributes: Columns representing binary attributes such as sex, diabetes, smoking,
etc., are masked as 'Yes' or 'No' to obscure specific health conditions or behaviors.
- Heart Attack Risk: Masked as 'High' or 'Low' to conceal exact risk prediction outcomes.
- Numeric Attributes: Numeric values such as cholesterol, blood pressure, etc., are replaced
with random values within a specified range, preventing re-identification while preserving
statistical properties.

### Saving the Anonymized Dataset:

The anonymized dataset is saved to a CSV file for further analysis and research purposes.

### Rationale for Anonymization and Masking:
- Privacy Preservation: Anonymizing sensitive attributes such as patient IDs and masking
identifiable information mitigate the risk of unauthorized access and identity disclosure, thus
preserving patient privacy.
- Regulatory Compliance: Adherence to regulations such as HIPAA and GDPR mandates the
protection of patient data through anonymization and masking, ensuring compliance and
avoiding legal ramifications.
- Facilitating Research: Anonymized datasets enable researchers and analysts to conduct
studies and derive insights without compromising patient privacy, fostering collaboration and
innovation in healthcare research.
- Building Trust: Demonstrating a commitment to protecting patient privacy through
anonymization and masking fosters trust among patients, healthcare providers, and regulatory
bodies, bolstering the integrity of healthcare data management practices.

- Privacy Preservation: Anonymizing sensitive attributes such as patient IDs and masking
identifiable information mitigate the risk of unauthorized access and identity disclosure, thus
preserving patient privacy.
- Regulatory Compliance: Adherence to regulations such as HIPAA and GDPR mandates the
protection of patient data through anonymization and masking, ensuring compliance and
avoiding legal ramifications.
- Facilitating Research: Anonymized datasets enable researchers and analysts to conduct
studies and derive insights without compromising patient privacy, fostering collaboration and
innovation in healthcare research.
- Building Trust: Demonstrating a commitment to protecting patient privacy through
anonymization and masking fosters trust among patients, healthcare providers, and regulatory
bodies, bolstering the integrity of healthcare data management practices.

## Conclusion:
The implementation of anonymization and masking techniques in healthcare data
management is indispensable for preserving patient privacy, complying with regulations,
facilitating research, and building trust within the healthcare ecosystem. By anonymizing
sensitive attributes and masking identifiable information, organizations uphold ethical
standards while harnessing the power of data-driven insights to improve patient outcomes and
healthcare delivery

The implementation of anonymization and masking techniques in healthcare data
management is indispensable for preserving patient privacy, complying with regulations,
facilitating research, and building trust within the healthcare ecosystem. By anonymizing
sensitive attributes and masking identifiable information, organizations uphold ethical
standards while harnessing the power of data-driven insights to improve patient outcomes and
healthcare delivery
9 changes: 9 additions & 0 deletions docs/data-warehousing/MongoDb Connection/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
{
"label": "MongoDB Connection",
"position": 3,
"link": {
"type": "generated-index",
"description": "Documentation for MongoDB Connection "
}
}

108 changes: 108 additions & 0 deletions docs/data-warehousing/MongoDb Connection/mongodbconnection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
---
sidebar_position: 1
---

# MongoDB Connection Server

:::info
**Effective Date:** 15 August 2024. **Last Edited:** 20 September 2024. **Author:** Ben Dang (Redback Operations).
**Document Reference:** MongoDB Connection. **Expiry Date:** 15 August 2025. **Version:** 1.0.
:::

This project is a web server application that connects to a MongoDB database. The setup uses Docker Compose to manage the services.

## Prerequisites

- Docker
- Docker Compose

## Setup

### 1. Clone the Repository

```sh
git clone https://github.com/Redback-Operations/redback-data-warehouse.git

cd "MongoDB Connection/Project1"

```

### 2. Create .env at your root directory

- MONGO_URI="mongodb://your_username:your_password@your_host:your_port/?authSource=your_authSource"
- DB_NAME="your_database_name"
- COLLECTION_NAME="your_collection_name"

### 3. Run Docker Compose to build the images and run the services:

```bash
- docker-compose up --build
```

### 4. View the Application

- Open your browser and navigate to http://localhost:5003/

## Configuring MongoDB and Monitoring Logs

### Changing MongoDB Documents and Collections as needed

- config.py contains the MongoDB connection string.
- document_model.py contains the MongoDB collection name.

### Check logs application

- All the logs are stored in the logs folder at the root of the project.(app.log)

## API Endpoints

### 1. Get All Documents

- **Endpoint**: `/documents`
- **Method**: `GET`
- **Description**: Retrieves all documents from the database.
- **Response**:
- `200 OK`: Returns a JSON array of documents.

### 2. Get Document by ID

- **Endpoint**: `/documents/<document_id>`
- **Method**: `GET`
- **Description**: Retrieves a document by its ID.
- **Parameters**:
- `document_id` (path): The ID of the document to retrieve.
- **Response**:
- `200 OK`: Returns the document as a JSON object.
- `404 Not Found`: If the document is not found.

### 3. Insert Document

- **Endpoint**: `/documents`
- **Method**: `POST`
- **Description**: Inserts a new document into the database.
- **Request Body**: JSON object representing the document to insert.
- **Response**:
- `201 Created`: Returns a success message and the ID of the inserted document.

### 4. Update Document

- **Endpoint**: `/documents/<document_id>`
- **Method**: `PUT`
- **Description**: Updates an existing document by its ID.
- **Parameters**:
- `document_id` (path): The ID of the document to update.
- **Request Body**: JSON object representing the updated document data.
- **Response**:
- `200 OK`: Returns a success message if the document was updated.
- `404 Not Found`: If the document is not found or no changes were made.

### 5. Delete Document

- **Endpoint**: `/documents/<document_id>`
- **Method**: `DELETE`
- **Description**: Deletes a document by its ID.
- **Parameters**:
- `document_id` (path): The ID of the document to delete.
- **Response**:
- `200 OK`: Returns a success message if the document was deleted.
- `404 Not Found`: If the document is not found.

0 comments on commit b82a001

Please sign in to comment.