Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core Infrastructure #86

Merged
merged 6 commits into from
Dec 6, 2024
Merged

Conversation

Jesse-Rees
Copy link
Contributor

Added a folder that contains the majority of the DW infrastructure so it can all be ran at once using a single docker-compose up. It now also includes a data provenance pipeline to track historical data like origin, transformations and access for all data that comes in and out of the main infrastructure.

Copy link

github-actions bot commented Dec 4, 2024

🔒 Security Scan Results

🔒 Security Scan Results
=========================

Bandit Scan Results:
-------------------
Run started:2024-12-04 06:36:54.667113

Test results:
>> Issue: [B404:blacklist] Consider possible security implications associated with the subprocess module.
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/blacklists/blacklist_imports.html#b404-import-subprocess
   Location: ./Core DW Infrastructure/app/streamlitdw_fe.py:9:0
8	import datetime
9	import subprocess
10	import pandas as pd

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Core DW Infrastructure/app/streamlitdw_fe.py:115:19
114	    try:
115	        response = requests.post("http://dp-logstash:5044", json=log_data)
116	        if response.status_code != 200:

--------------------------------------------------
>> Issue: [B607:start_process_with_partial_path] Starting a process with a partial executable path
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b607_start_process_with_partial_path.html
   Location: ./Core DW Infrastructure/app/streamlitdw_fe.py:169:17
168	    try:
169	        result = subprocess.run(
170	            ["python", "etl_pipeline.py", file_name, preprocessing_option], 
171	            check=True, 
172	            stdout=subprocess.PIPE, 
173	            stderr=subprocess.PIPE, 
174	            text=True
175	        )
176	        st.success("ETL pipeline executed successfully.")

--------------------------------------------------
>> Issue: [B603:subprocess_without_shell_equals_true] subprocess call - check for execution of untrusted input.
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b603_subprocess_without_shell_equals_true.html
   Location: ./Core DW Infrastructure/app/streamlitdw_fe.py:169:17
168	    try:
169	        result = subprocess.run(
170	            ["python", "etl_pipeline.py", file_name, preprocessing_option], 
171	            check=True, 
172	            stdout=subprocess.PIPE, 
173	            stderr=subprocess.PIPE, 
174	            text=True
175	        )
176	        st.success("ETL pipeline executed successfully.")

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Core DW Infrastructure/app/streamlitdw_fe.py:186:19
185	        api_url = f"http://{api_url_base}/list-files?bucket={bucket}"
186	        response = requests.get(api_url)
187	        if response.status_code == 200:

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Core DW Infrastructure/app/streamlitdw_fe.py:200:19
199	        params = {"bucket": bucket, "project": project, "filename": filename}  # Avoid re-adding the project folder
200	        response = requests.get(api_url, params=params)
201	        st.write(f"API URL: {api_url}, Params: {params}, Status Code: {response.status_code}")  # added logs

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Core DW Infrastructure/dremio-api/api.py:21:20
20	def get_dremio_token():
21	    auth_response = requests.post(f'{dremio_url}/apiv2/login', json={'userName': dremio_username, 'password': dremio_password})
22	    auth_response.raise_for_status()

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Core DW Infrastructure/dremio-api/api.py:32:15
31	    }
32	    response = requests.post(f'{dremio_url}/api/v3/sql', headers=headers, json={'sql': sql})
33	    response.raise_for_status()

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Core DW Infrastructure/dremio-api/api.py:46:19
45	    while True:
46	        response = requests.get(f'{dremio_url}/api/v3/job/{job_id}', headers=headers)
47	        response.raise_for_status()

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Core DW Infrastructure/dremio-api/api.py:55:15
54	    # Fetch the query results
55	    response = requests.get(f'{dremio_url}/api/v3/job/{job_id}/results', headers=headers)
56	    response.raise_for_status()

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Core DW Infrastructure/dremio-api/api.py:66:15
65	    }
66	    response = requests.get(f'{dremio_url}/api/v3/catalog', headers=headers)
67	    response.raise_for_status()

--------------------------------------------------
>> Issue: [B104:hardcoded_bind_all_interfaces] Possible binding to all interfaces.
   Severity: Medium   Confidence: Medium
   CWE: CWE-605 (https://cwe.mitre.org/data/definitions/605.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b104_hardcoded_bind_all_interfaces.html
   Location: ./Core DW Infrastructure/dremio-api/api.py:100:17
99	    port = int(os.getenv('FLASK_RUN_PORT', 5000))
100	    app.run(host='0.0.0.0', port=port)

--------------------------------------------------
>> Issue: [B104:hardcoded_bind_all_interfaces] Possible binding to all interfaces.
   Severity: Medium   Confidence: Medium
   CWE: CWE-605 (https://cwe.mitre.org/data/definitions/605.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b104_hardcoded_bind_all_interfaces.html
   Location: ./Core DW Infrastructure/flask/flaskapi_dw.py:86:17
85	if __name__ == '__main__':
86	    app.run(host='0.0.0.0', port=5000)  # Running on port 5000 IMPORTANT

--------------------------------------------------
>> Issue: [B404:blacklist] Consider possible security implications associated with the subprocess module.
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/blacklists/blacklist_imports.html#b404-import-subprocess
   Location: ./File Upload Service/app/streamlitdw_fe.py:9:0
8	import datetime
9	import subprocess
10	import pandas as pd

--------------------------------------------------
>> Issue: [B607:start_process_with_partial_path] Starting a process with a partial executable path
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b607_start_process_with_partial_path.html
   Location: ./File Upload Service/app/streamlitdw_fe.py:58:17
57	    try:
58	        result = subprocess.run(
59	            ["python", "etl_pipeline.py", file_name, preprocessing_option], 
60	            check=True, 
61	            stdout=subprocess.PIPE, 
62	            stderr=subprocess.PIPE, 
63	            text=True
64	        )
65	        st.success("ETL pipeline executed successfully.")

--------------------------------------------------
>> Issue: [B603:subprocess_without_shell_equals_true] subprocess call - check for execution of untrusted input.
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b603_subprocess_without_shell_equals_true.html
   Location: ./File Upload Service/app/streamlitdw_fe.py:58:17
57	    try:
58	        result = subprocess.run(
59	            ["python", "etl_pipeline.py", file_name, preprocessing_option], 
60	            check=True, 
61	            stdout=subprocess.PIPE, 
62	            stderr=subprocess.PIPE, 
63	            text=True
64	        )
65	        st.success("ETL pipeline executed successfully.")

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./File Upload Service/app/streamlitdw_fe.py:75:19
74	        api_url = f"http://10.137.0.149:5000/list-files?bucket={bucket}"  # Updated
75	        response = requests.get(api_url)
76	        if response.status_code == 200:

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./File Upload Service/app/streamlitdw_fe.py:91:19
90	        params = {"bucket": bucket, "project": project, "filename": filename}  # Avoid re-adding the project folder
91	        response = requests.get(api_url, params=params)
92	        st.write(f"API URL: {api_url}, Params: {params}, Status Code: {response.status_code}")  # added logs

--------------------------------------------------
>> Issue: [B404:blacklist] Consider possible security implications associated with the subprocess module.
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/blacklists/blacklist_imports.html#b404-import-subprocess
   Location: ./File Upload Service/app/streamlitdw_fe_mt.py:8:0
7	import datetime
8	import subprocess  # For triggering ETL pipeline
9	import pandas as pd  # Added for dataframe functionality

--------------------------------------------------
>> Issue: [B607:start_process_with_partial_path] Starting a process with a partial executable path
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b607_start_process_with_partial_path.html
   Location: ./File Upload Service/app/streamlitdw_fe_mt.py:58:8
57	        # Run the ETL script as a subprocess
58	        subprocess.run(["python", "etl_pipeline.py", preprocessing_option], check=True)
59	        st.success("ETL pipeline executed successfully.")

--------------------------------------------------
>> Issue: [B603:subprocess_without_shell_equals_true] subprocess call - check for execution of untrusted input.
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b603_subprocess_without_shell_equals_true.html
   Location: ./File Upload Service/app/streamlitdw_fe_mt.py:58:8
57	        # Run the ETL script as a subprocess
58	        subprocess.run(["python", "etl_pipeline.py", preprocessing_option], check=True)
59	        st.success("ETL pipeline executed successfully.")

--------------------------------------------------
>> Issue: [B104:hardcoded_bind_all_interfaces] Possible binding to all interfaces.
   Severity: Medium   Confidence: Medium
   CWE: CWE-605 (https://cwe.mitre.org/data/definitions/605.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b104_hardcoded_bind_all_interfaces.html
   Location: ./File Upload Service/flask/flaskapi_dw.py:86:17
85	if __name__ == '__main__':
86	    app.run(host='0.0.0.0', port=5000)  # Running on port 5000 IMPORTANT

--------------------------------------------------
>> Issue: [B104:hardcoded_bind_all_interfaces] Possible binding to all interfaces.
   Severity: Medium   Confidence: Medium
   CWE: CWE-605 (https://cwe.mitre.org/data/definitions/605.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b104_hardcoded_bind_all_interfaces.html
   Location: ./MongoDB Connection/Project1/main.py:12:35
11	    debug_mode = os.environ.get('FLASK_DEBUG', 'False').lower() == 'true'
12	    app.run(debug=debug_mode, host='0.0.0.0')

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Structured Dremio Solution/Flask-api/api.py:21:20
20	def get_dremio_token():
21	    auth_response = requests.post(f'{dremio_url}/apiv2/login', json={'userName': dremio_username, 'password': dremio_password})
22	    auth_response.raise_for_status()

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Structured Dremio Solution/Flask-api/api.py:32:15
31	    }
32	    response = requests.post(f'{dremio_url}/api/v3/sql', headers=headers, json={'sql': sql})
33	    response.raise_for_status()

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Structured Dremio Solution/Flask-api/api.py:46:19
45	    while True:
46	        response = requests.get(f'{dremio_url}/api/v3/job/{job_id}', headers=headers)
47	        response.raise_for_status()

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Structured Dremio Solution/Flask-api/api.py:55:15
54	    # Fetch the query results
55	    response = requests.get(f'{dremio_url}/api/v3/job/{job_id}/results', headers=headers)
56	    response.raise_for_status()

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Structured Dremio Solution/Flask-api/api.py:66:15
65	    }
66	    response = requests.get(f'{dremio_url}/api/v3/catalog', headers=headers)
67	    response.raise_for_status()

--------------------------------------------------
>> Issue: [B104:hardcoded_bind_all_interfaces] Possible binding to all interfaces.
   Severity: Medium   Confidence: Medium
   CWE: CWE-605 (https://cwe.mitre.org/data/definitions/605.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b104_hardcoded_bind_all_interfaces.html
   Location: ./Structured Dremio Solution/Flask-api/api.py:100:17
99	    port = int(os.getenv('FLASK_RUN_PORT', 5000))
100	    app.run(host='0.0.0.0', port=port)

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Structured Dremio Solution/Script/pipeline.py:63:20
62	try:
63	    auth_response = requests.post(f'{dremio_url}/apiv2/login', json={'userName': username, 'password': password})
64	    auth_response.raise_for_status()

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Structured Dremio Solution/Script/pipeline.py:115:23
114	    try:
115	        sql_response = requests.post(f'{dremio_url}/api/v3/sql', headers=headers, json={'sql': command})
116	        sql_response.raise_for_status()

--------------------------------------------------
>> Issue: [B608:hardcoded_sql_expressions] Possible SQL injection vector through string-based query construction.
   Severity: Medium   Confidence: Low
   CWE: CWE-89 (https://cwe.mitre.org/data/definitions/89.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b608_hardcoded_sql_expressions.html
   Location: ./Structured Dremio Solution/Script/pipeline.py:168:12
167	    placeholders = ', '.join(['?' for _ in data[0]])
168	    query = f"INSERT INTO {table_name} VALUES ({placeholders})"
169	    cursor = conn.cursor()

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Structured Dremio Solution/Script/pipeline.py:182:19
181	    try:
182	        response = requests.get(url)
183	        response.raise_for_status()

--------------------------------------------------
>> Issue: [B108:hardcoded_tmp_directory] Probable insecure usage of temp file/directory.
   Severity: Medium   Confidence: Medium
   CWE: CWE-377 (https://cwe.mitre.org/data/definitions/377.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b108_hardcoded_tmp_directory.html
   Location: ./pre-processing/pre-processing.py:177:29
176	
177	            temp_file_path = f'/tmp/{obj.object_name}'
178	

--------------------------------------------------

Code scanned:
	Total lines of code: 1930
	Total lines skipped (#nosec): 0
	Total potential issues skipped due to specifically being disabled (e.g., #nosec BXXX): 0

Run metrics:
	Total issues (by severity):
		Undefined: 0
		Low: 9
		Medium: 25
		High: 0
	Total issues (by confidence):
		Undefined: 0
		Low: 19
		Medium: 6
		High: 9
Files skipped (0):

Dependency Check Results:
-----------------------

No critical security issues detected.

The code has passed all critical security checks.

…allow video and image files for File Upload Service
Copy link

github-actions bot commented Dec 5, 2024

🔒 Security Scan Results

🔒 Security Scan Results
=========================

Bandit Scan Results:
-------------------
Run started:2024-12-05 10:12:45.709496

Test results:
>> Issue: [B404:blacklist] Consider possible security implications associated with the subprocess module.
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/blacklists/blacklist_imports.html#b404-import-subprocess
   Location: ./Core DW Infrastructure/app/streamlitdw_fe.py:9:0
8	import datetime
9	import subprocess
10	import pandas as pd

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Core DW Infrastructure/app/streamlitdw_fe.py:115:19
114	    try:
115	        response = requests.post("http://dp-logstash:5044", json=log_data)
116	        if response.status_code != 200:

--------------------------------------------------
>> Issue: [B607:start_process_with_partial_path] Starting a process with a partial executable path
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b607_start_process_with_partial_path.html
   Location: ./Core DW Infrastructure/app/streamlitdw_fe.py:169:17
168	    try:
169	        result = subprocess.run(
170	            ["python", "etl_pipeline.py", file_name, preprocessing_option], 
171	            check=True, 
172	            stdout=subprocess.PIPE, 
173	            stderr=subprocess.PIPE, 
174	            text=True
175	        )
176	        st.success("ETL pipeline executed successfully.")

--------------------------------------------------
>> Issue: [B603:subprocess_without_shell_equals_true] subprocess call - check for execution of untrusted input.
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b603_subprocess_without_shell_equals_true.html
   Location: ./Core DW Infrastructure/app/streamlitdw_fe.py:169:17
168	    try:
169	        result = subprocess.run(
170	            ["python", "etl_pipeline.py", file_name, preprocessing_option], 
171	            check=True, 
172	            stdout=subprocess.PIPE, 
173	            stderr=subprocess.PIPE, 
174	            text=True
175	        )
176	        st.success("ETL pipeline executed successfully.")

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Core DW Infrastructure/app/streamlitdw_fe.py:186:19
185	        api_url = f"http://{api_url_base}/list-files?bucket={bucket}"
186	        response = requests.get(api_url)
187	        if response.status_code == 200:

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Core DW Infrastructure/app/streamlitdw_fe.py:200:19
199	        params = {"bucket": bucket, "project": project, "filename": filename}  # Avoid re-adding the project folder
200	        response = requests.get(api_url, params=params)
201	        st.write(f"API URL: {api_url}, Params: {params}, Status Code: {response.status_code}")  # added logs

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Core DW Infrastructure/dremio-api/api.py:21:20
20	def get_dremio_token():
21	    auth_response = requests.post(f'{dremio_url}/apiv2/login', json={'userName': dremio_username, 'password': dremio_password})
22	    auth_response.raise_for_status()

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Core DW Infrastructure/dremio-api/api.py:32:15
31	    }
32	    response = requests.post(f'{dremio_url}/api/v3/sql', headers=headers, json={'sql': sql})
33	    response.raise_for_status()

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Core DW Infrastructure/dremio-api/api.py:46:19
45	    while True:
46	        response = requests.get(f'{dremio_url}/api/v3/job/{job_id}', headers=headers)
47	        response.raise_for_status()

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Core DW Infrastructure/dremio-api/api.py:55:15
54	    # Fetch the query results
55	    response = requests.get(f'{dremio_url}/api/v3/job/{job_id}/results', headers=headers)
56	    response.raise_for_status()

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Core DW Infrastructure/dremio-api/api.py:66:15
65	    }
66	    response = requests.get(f'{dremio_url}/api/v3/catalog', headers=headers)
67	    response.raise_for_status()

--------------------------------------------------
>> Issue: [B104:hardcoded_bind_all_interfaces] Possible binding to all interfaces.
   Severity: Medium   Confidence: Medium
   CWE: CWE-605 (https://cwe.mitre.org/data/definitions/605.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b104_hardcoded_bind_all_interfaces.html
   Location: ./Core DW Infrastructure/dremio-api/api.py:100:17
99	    port = int(os.getenv('FLASK_RUN_PORT', 5000))
100	    app.run(host='0.0.0.0', port=port)

--------------------------------------------------
>> Issue: [B104:hardcoded_bind_all_interfaces] Possible binding to all interfaces.
   Severity: Medium   Confidence: Medium
   CWE: CWE-605 (https://cwe.mitre.org/data/definitions/605.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b104_hardcoded_bind_all_interfaces.html
   Location: ./Core DW Infrastructure/flask/flaskapi_dw.py:86:17
85	if __name__ == '__main__':
86	    app.run(host='0.0.0.0', port=5000)  # Running on port 5000 IMPORTANT

--------------------------------------------------
>> Issue: [B404:blacklist] Consider possible security implications associated with the subprocess module.
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/blacklists/blacklist_imports.html#b404-import-subprocess
   Location: ./File Upload Service/app/streamlitdw_fe.py:9:0
8	import datetime
9	import subprocess
10	import pandas as pd

--------------------------------------------------
>> Issue: [B607:start_process_with_partial_path] Starting a process with a partial executable path
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b607_start_process_with_partial_path.html
   Location: ./File Upload Service/app/streamlitdw_fe.py:58:17
57	    try:
58	        result = subprocess.run(
59	            ["python", "etl_pipeline.py", file_name, preprocessing_option], 
60	            check=True, 
61	            stdout=subprocess.PIPE, 
62	            stderr=subprocess.PIPE, 
63	            text=True
64	        )
65	        st.success("ETL pipeline executed successfully.")

--------------------------------------------------
>> Issue: [B603:subprocess_without_shell_equals_true] subprocess call - check for execution of untrusted input.
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b603_subprocess_without_shell_equals_true.html
   Location: ./File Upload Service/app/streamlitdw_fe.py:58:17
57	    try:
58	        result = subprocess.run(
59	            ["python", "etl_pipeline.py", file_name, preprocessing_option], 
60	            check=True, 
61	            stdout=subprocess.PIPE, 
62	            stderr=subprocess.PIPE, 
63	            text=True
64	        )
65	        st.success("ETL pipeline executed successfully.")

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./File Upload Service/app/streamlitdw_fe.py:75:19
74	        api_url = f"http://10.137.0.149:5000/list-files?bucket={bucket}"  # Updated
75	        response = requests.get(api_url)
76	        if response.status_code == 200:

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./File Upload Service/app/streamlitdw_fe.py:91:19
90	        params = {"bucket": bucket, "project": project, "filename": filename}  # Avoid re-adding the project folder
91	        response = requests.get(api_url, params=params)
92	        st.write(f"API URL: {api_url}, Params: {params}, Status Code: {response.status_code}")  # added logs

--------------------------------------------------
>> Issue: [B404:blacklist] Consider possible security implications associated with the subprocess module.
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/blacklists/blacklist_imports.html#b404-import-subprocess
   Location: ./File Upload Service/app/streamlitdw_fe_mt.py:8:0
7	import datetime
8	import subprocess  # For triggering ETL pipeline
9	import pandas as pd  # Added for dataframe functionality

--------------------------------------------------
>> Issue: [B607:start_process_with_partial_path] Starting a process with a partial executable path
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b607_start_process_with_partial_path.html
   Location: ./File Upload Service/app/streamlitdw_fe_mt.py:58:8
57	        # Run the ETL script as a subprocess
58	        subprocess.run(["python", "etl_pipeline.py", preprocessing_option], check=True)
59	        st.success("ETL pipeline executed successfully.")

--------------------------------------------------
>> Issue: [B603:subprocess_without_shell_equals_true] subprocess call - check for execution of untrusted input.
   Severity: Low   Confidence: High
   CWE: CWE-78 (https://cwe.mitre.org/data/definitions/78.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b603_subprocess_without_shell_equals_true.html
   Location: ./File Upload Service/app/streamlitdw_fe_mt.py:58:8
57	        # Run the ETL script as a subprocess
58	        subprocess.run(["python", "etl_pipeline.py", preprocessing_option], check=True)
59	        st.success("ETL pipeline executed successfully.")

--------------------------------------------------
>> Issue: [B104:hardcoded_bind_all_interfaces] Possible binding to all interfaces.
   Severity: Medium   Confidence: Medium
   CWE: CWE-605 (https://cwe.mitre.org/data/definitions/605.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b104_hardcoded_bind_all_interfaces.html
   Location: ./File Upload Service/flask/flaskapi_dw.py:86:17
85	if __name__ == '__main__':
86	    app.run(host='0.0.0.0', port=5000)  # Running on port 5000 IMPORTANT

--------------------------------------------------
>> Issue: [B104:hardcoded_bind_all_interfaces] Possible binding to all interfaces.
   Severity: Medium   Confidence: Medium
   CWE: CWE-605 (https://cwe.mitre.org/data/definitions/605.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b104_hardcoded_bind_all_interfaces.html
   Location: ./MongoDB Connection/Project1/main.py:12:35
11	    debug_mode = os.environ.get('FLASK_DEBUG', 'False').lower() == 'true'
12	    app.run(debug=debug_mode, host='0.0.0.0')

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Structured Dremio Solution/Flask-api/api.py:21:20
20	def get_dremio_token():
21	    auth_response = requests.post(f'{dremio_url}/apiv2/login', json={'userName': dremio_username, 'password': dremio_password})
22	    auth_response.raise_for_status()

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Structured Dremio Solution/Flask-api/api.py:32:15
31	    }
32	    response = requests.post(f'{dremio_url}/api/v3/sql', headers=headers, json={'sql': sql})
33	    response.raise_for_status()

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Structured Dremio Solution/Flask-api/api.py:46:19
45	    while True:
46	        response = requests.get(f'{dremio_url}/api/v3/job/{job_id}', headers=headers)
47	        response.raise_for_status()

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Structured Dremio Solution/Flask-api/api.py:55:15
54	    # Fetch the query results
55	    response = requests.get(f'{dremio_url}/api/v3/job/{job_id}/results', headers=headers)
56	    response.raise_for_status()

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Structured Dremio Solution/Flask-api/api.py:66:15
65	    }
66	    response = requests.get(f'{dremio_url}/api/v3/catalog', headers=headers)
67	    response.raise_for_status()

--------------------------------------------------
>> Issue: [B104:hardcoded_bind_all_interfaces] Possible binding to all interfaces.
   Severity: Medium   Confidence: Medium
   CWE: CWE-605 (https://cwe.mitre.org/data/definitions/605.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b104_hardcoded_bind_all_interfaces.html
   Location: ./Structured Dremio Solution/Flask-api/api.py:100:17
99	    port = int(os.getenv('FLASK_RUN_PORT', 5000))
100	    app.run(host='0.0.0.0', port=port)

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Structured Dremio Solution/Script/pipeline.py:63:20
62	try:
63	    auth_response = requests.post(f'{dremio_url}/apiv2/login', json={'userName': username, 'password': password})
64	    auth_response.raise_for_status()

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Structured Dremio Solution/Script/pipeline.py:115:23
114	    try:
115	        sql_response = requests.post(f'{dremio_url}/api/v3/sql', headers=headers, json={'sql': command})
116	        sql_response.raise_for_status()

--------------------------------------------------
>> Issue: [B608:hardcoded_sql_expressions] Possible SQL injection vector through string-based query construction.
   Severity: Medium   Confidence: Low
   CWE: CWE-89 (https://cwe.mitre.org/data/definitions/89.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b608_hardcoded_sql_expressions.html
   Location: ./Structured Dremio Solution/Script/pipeline.py:168:12
167	    placeholders = ', '.join(['?' for _ in data[0]])
168	    query = f"INSERT INTO {table_name} VALUES ({placeholders})"
169	    cursor = conn.cursor()

--------------------------------------------------
>> Issue: [B113:request_without_timeout] Call to requests without timeout
   Severity: Medium   Confidence: Low
   CWE: CWE-400 (https://cwe.mitre.org/data/definitions/400.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b113_request_without_timeout.html
   Location: ./Structured Dremio Solution/Script/pipeline.py:182:19
181	    try:
182	        response = requests.get(url)
183	        response.raise_for_status()

--------------------------------------------------
>> Issue: [B108:hardcoded_tmp_directory] Probable insecure usage of temp file/directory.
   Severity: Medium   Confidence: Medium
   CWE: CWE-377 (https://cwe.mitre.org/data/definitions/377.html)
   More Info: https://bandit.readthedocs.io/en/1.8.0/plugins/b108_hardcoded_tmp_directory.html
   Location: ./pre-processing/pre-processing.py:177:29
176	
177	            temp_file_path = f'/tmp/{obj.object_name}'
178	

--------------------------------------------------

Code scanned:
	Total lines of code: 1939
	Total lines skipped (#nosec): 0
	Total potential issues skipped due to specifically being disabled (e.g., #nosec BXXX): 0

Run metrics:
	Total issues (by severity):
		Undefined: 0
		Low: 9
		Medium: 25
		High: 0
	Total issues (by confidence):
		Undefined: 0
		Low: 19
		Medium: 6
		High: 9
Files skipped (0):

Dependency Check Results:
-----------------------

No critical security issues detected.

The code has passed all critical security checks.

@AmirZandiehprojects
Copy link
Collaborator

The code infrastructure looks well-organized and implements good practices including containerization, environment variable usage, and proper error handling. While there are some medium/low security notices from Bandit (mostly around request timeouts and interface binding), none are critical blockers for this infrastructure PR. The basic security measures are in place and the code structure is clean. LGTM 👍

Consider adding request timeouts in a future update to improve robustness.

@AmirZandiehprojects AmirZandiehprojects merged commit e2a8b36 into Redback-Operations:main Dec 6, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants