From b82a0018f08e4a6e59be3c16420029f020ec86e1 Mon Sep 17 00:00:00 2001 From: BeniDage <112965704+BeniDage@users.noreply.github.com> Date: Tue, 24 Sep 2024 09:29:29 +1000 Subject: [PATCH] MongoDB Connection Documentation (#153) Co-authored-by: Kaleb <82347290+SassafrasAU@users.noreply.github.com> --- .../Data Anonymization/dataanonymization.md | 80 +++++++------ .../MongoDb Connection/_category_.json | 9 ++ .../MongoDb Connection/mongodbconnection.md | 108 ++++++++++++++++++ 3 files changed, 159 insertions(+), 38 deletions(-) create mode 100644 docs/data-warehousing/MongoDb Connection/_category_.json create mode 100644 docs/data-warehousing/MongoDb Connection/mongodbconnection.md diff --git a/docs/data-warehousing/Data Anonymization/dataanonymization.md b/docs/data-warehousing/Data Anonymization/dataanonymization.md index deb19bf0f..00f3d13ae 100644 --- a/docs/data-warehousing/Data Anonymization/dataanonymization.md +++ b/docs/data-warehousing/Data Anonymization/dataanonymization.md @@ -3,6 +3,7 @@ sidebar_position: 3 --- # Anonymization and Masking in Healthcare Data + Implementation and Rationale :::info @@ -10,62 +11,65 @@ Implementation and Rationale **Document Reference:** Data Anonymization. **Expiry Date:** 29 April 2025. **Version:** 1.0. ::: - ## Introduction: -In the realm of healthcare data management, preserving patient privacy and confidentiality is -of utmost importance. Anonymization and masking techniques serve as essential tools in -safeguarding sensitive information while allowing for meaningful analysis and research. This -document elucidates the implementation of anonymization and masking in a heart attack +In the realm of healthcare data management, preserving patient privacy and confidentiality is +of utmost importance. Anonymization and masking techniques serve as essential tools in +safeguarding sensitive information while allowing for meaningful analysis and research. This +document elucidates the implementation of anonymization and masking in a heart attack prediction dataset and provides insights into the rationale behind their application. ## Code Implementation: -The provided code utilizes Python libraries such as Pandas, Faker, and hashlib to anonymize + +The provided code utilizes Python libraries such as Pandas, Faker, and hashlib to anonymize and mask sensitive columns within the heart attack prediction dataset. Let's delve into the implementation details ### Reading the Dataset: -The original dataset is read into a Pandas DataFrame, facilitating data +The original dataset is read into a Pandas DataFrame, facilitating data manipulation and transformation. -### Initializing Faker: +### Initializing Faker: An instance of the Faker library is initialized to generate fake data for non-sensitive columns. ### Anonymization and Masking: - - Patient ID: Hashing using SHA-256 ensures irreversible transformation, preserving -anonymity while retaining uniqueness. - - Age: Age values are generalized into ranges to conceal precise age information, enhancing -privacy. - - Binary Attributes: Columns representing binary attributes such as sex, diabetes, smoking, -etc., are masked as 'Yes' or 'No' to obscure specific health conditions or behaviors. - - Heart Attack Risk: Masked as 'High' or 'Low' to conceal exact risk prediction outcomes. - - Numeric Attributes: Numeric values such as cholesterol, blood pressure, etc., are replaced -with random values within a specified range, preventing re-identification while preserving -statistical properties. - -### Saving the Anonymized Dataset: +- Patient ID: Hashing using SHA-256 ensures irreversible transformation, preserving + anonymity while retaining uniqueness. +- Age: Age values are generalized into ranges to conceal precise age information, enhancing + privacy. +- Binary Attributes: Columns representing binary attributes such as sex, diabetes, smoking, + etc., are masked as 'Yes' or 'No' to obscure specific health conditions or behaviors. +- Heart Attack Risk: Masked as 'High' or 'Low' to conceal exact risk prediction outcomes. +- Numeric Attributes: Numeric values such as cholesterol, blood pressure, etc., are replaced + with random values within a specified range, preventing re-identification while preserving + statistical properties. + +### Saving the Anonymized Dataset: + The anonymized dataset is saved to a CSV file for further analysis and research purposes. ### Rationale for Anonymization and Masking: - - Privacy Preservation: Anonymizing sensitive attributes such as patient IDs and masking -identifiable information mitigate the risk of unauthorized access and identity disclosure, thus -preserving patient privacy. - - Regulatory Compliance: Adherence to regulations such as HIPAA and GDPR mandates the -protection of patient data through anonymization and masking, ensuring compliance and -avoiding legal ramifications. - - Facilitating Research: Anonymized datasets enable researchers and analysts to conduct -studies and derive insights without compromising patient privacy, fostering collaboration and -innovation in healthcare research. - - Building Trust: Demonstrating a commitment to protecting patient privacy through -anonymization and masking fosters trust among patients, healthcare providers, and regulatory -bodies, bolstering the integrity of healthcare data management practices. + +- Privacy Preservation: Anonymizing sensitive attributes such as patient IDs and masking + identifiable information mitigate the risk of unauthorized access and identity disclosure, thus + preserving patient privacy. +- Regulatory Compliance: Adherence to regulations such as HIPAA and GDPR mandates the + protection of patient data through anonymization and masking, ensuring compliance and + avoiding legal ramifications. +- Facilitating Research: Anonymized datasets enable researchers and analysts to conduct + studies and derive insights without compromising patient privacy, fostering collaboration and + innovation in healthcare research. +- Building Trust: Demonstrating a commitment to protecting patient privacy through + anonymization and masking fosters trust among patients, healthcare providers, and regulatory + bodies, bolstering the integrity of healthcare data management practices. ## Conclusion: -The implementation of anonymization and masking techniques in healthcare data -management is indispensable for preserving patient privacy, complying with regulations, -facilitating research, and building trust within the healthcare ecosystem. By anonymizing -sensitive attributes and masking identifiable information, organizations uphold ethical -standards while harnessing the power of data-driven insights to improve patient outcomes and -healthcare delivery \ No newline at end of file + +The implementation of anonymization and masking techniques in healthcare data +management is indispensable for preserving patient privacy, complying with regulations, +facilitating research, and building trust within the healthcare ecosystem. By anonymizing +sensitive attributes and masking identifiable information, organizations uphold ethical +standards while harnessing the power of data-driven insights to improve patient outcomes and +healthcare delivery diff --git a/docs/data-warehousing/MongoDb Connection/_category_.json b/docs/data-warehousing/MongoDb Connection/_category_.json new file mode 100644 index 000000000..6677d54a7 --- /dev/null +++ b/docs/data-warehousing/MongoDb Connection/_category_.json @@ -0,0 +1,9 @@ +{ + "label": "MongoDB Connection", + "position": 3, + "link": { + "type": "generated-index", + "description": "Documentation for MongoDB Connection " + } + } + \ No newline at end of file diff --git a/docs/data-warehousing/MongoDb Connection/mongodbconnection.md b/docs/data-warehousing/MongoDb Connection/mongodbconnection.md new file mode 100644 index 000000000..85a78e67a --- /dev/null +++ b/docs/data-warehousing/MongoDb Connection/mongodbconnection.md @@ -0,0 +1,108 @@ +--- +sidebar_position: 1 +--- + +# MongoDB Connection Server + +:::info +**Effective Date:** 15 August 2024. **Last Edited:** 20 September 2024. **Author:** Ben Dang (Redback Operations). +**Document Reference:** MongoDB Connection. **Expiry Date:** 15 August 2025. **Version:** 1.0. +::: + +This project is a web server application that connects to a MongoDB database. The setup uses Docker Compose to manage the services. + +## Prerequisites + +- Docker +- Docker Compose + +## Setup + +### 1. Clone the Repository + +```sh +git clone https://github.com/Redback-Operations/redback-data-warehouse.git + +cd "MongoDB Connection/Project1" + +``` + +### 2. Create .env at your root directory + +- MONGO_URI="mongodb://your_username:your_password@your_host:your_port/?authSource=your_authSource" +- DB_NAME="your_database_name" +- COLLECTION_NAME="your_collection_name" + +### 3. Run Docker Compose to build the images and run the services: + +```bash +- docker-compose up --build +``` + +### 4. View the Application + +- Open your browser and navigate to http://localhost:5003/ + +## Configuring MongoDB and Monitoring Logs + +### Changing MongoDB Documents and Collections as needed + +- config.py contains the MongoDB connection string. +- document_model.py contains the MongoDB collection name. + +### Check logs application + +- All the logs are stored in the logs folder at the root of the project.(app.log) + +## API Endpoints + +### 1. Get All Documents + +- **Endpoint**: `/documents` +- **Method**: `GET` +- **Description**: Retrieves all documents from the database. +- **Response**: + - `200 OK`: Returns a JSON array of documents. + +### 2. Get Document by ID + +- **Endpoint**: `/documents/` +- **Method**: `GET` +- **Description**: Retrieves a document by its ID. +- **Parameters**: + - `document_id` (path): The ID of the document to retrieve. +- **Response**: + - `200 OK`: Returns the document as a JSON object. + - `404 Not Found`: If the document is not found. + +### 3. Insert Document + +- **Endpoint**: `/documents` +- **Method**: `POST` +- **Description**: Inserts a new document into the database. +- **Request Body**: JSON object representing the document to insert. +- **Response**: + - `201 Created`: Returns a success message and the ID of the inserted document. + +### 4. Update Document + +- **Endpoint**: `/documents/` +- **Method**: `PUT` +- **Description**: Updates an existing document by its ID. +- **Parameters**: + - `document_id` (path): The ID of the document to update. +- **Request Body**: JSON object representing the updated document data. +- **Response**: + - `200 OK`: Returns a success message if the document was updated. + - `404 Not Found`: If the document is not found or no changes were made. + +### 5. Delete Document + +- **Endpoint**: `/documents/` +- **Method**: `DELETE` +- **Description**: Deletes a document by its ID. +- **Parameters**: + - `document_id` (path): The ID of the document to delete. +- **Response**: + - `200 OK`: Returns a success message if the document was deleted. + - `404 Not Found`: If the document is not found.