Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added folder for structured solution with script. #15

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions Structured Dremio Solution/Flask-api/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Use the official Python image from the Docker Hub
FROM python:3.9-slim

# Set the working directory in the container
WORKDIR /

# Copy the requirements file into the container
COPY requirements.txt .

# Install the dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the application code into the container
COPY . .

# Copy the .env file into the container
COPY api.env .

# Expose the port the app runs on
EXPOSE 5000

# Command to run the api
CMD ["python", "api.py"]
3 changes: 3 additions & 0 deletions Structured Dremio Solution/Flask-api/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Structured Dremio Solution - Flask api

This folder contains the application and docker files for the structured solution api that allows users connected to deakins network using anyconnect VPN to make sql queries to fetch their data from dremio.
100 changes: 100 additions & 0 deletions Structured Dremio Solution/Flask-api/api.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
from flask import Flask, jsonify, request
import pandas as pd
import io
from dotenv import load_dotenv
import os
import requests
import re

# Load environment variables from .env file
load_dotenv('api.env')

app = Flask(__name__)

# Dremio configuration
dremio_url = os.getenv('DREMIO_URL')
dremio_username = os.getenv('DREMIO_USERNAME')
dremio_password = os.getenv('DREMIO_PASSWORD')

# Authenticate and get token
def get_dremio_token():
auth_response = requests.post(f'{dremio_url}/apiv2/login', json={'userName': dremio_username, 'password': dremio_password})
auth_response.raise_for_status()
return auth_response.json().get('token')

# Function to execute SQL query on Dremio
def execute_dremio_query(sql):
token = get_dremio_token()
headers = {
'Authorization': f'_dremio{token}',
'Content-Type': 'application/json'
}
response = requests.post(f'{dremio_url}/api/v3/sql', headers=headers, json={'sql': sql})
response.raise_for_status()
job_id = response.json().get('id')
return job_id

# Function to get query results from Dremio
def get_dremio_query_results(job_id):
token = get_dremio_token()
headers = {
'Authorization': f'_dremio{token}',
'Content-Type': 'application/json'
}
# Poll the job status endpoint until the job is complete
while True:
response = requests.get(f'{dremio_url}/api/v3/job/{job_id}', headers=headers)
response.raise_for_status()
job_status = response.json().get('jobState')
if job_status == 'COMPLETED':
break
elif job_status in ('FAILED', 'CANCELED'):
raise Exception(f'Query failed with status: {job_status}')

# Fetch the query results
response = requests.get(f'{dremio_url}/api/v3/job/{job_id}/results', headers=headers)
response.raise_for_status()
return response.json()

# Function to list catalog items from Dremio
def list_dremio_catalog():
token = get_dremio_token()
headers = {
'Authorization': f'_dremio{token}',
'Content-Type': 'application/json'
}
response = requests.get(f'{dremio_url}/api/v3/catalog', headers=headers)
response.raise_for_status()
return response.json()

@app.route('/dremio_query', methods=['POST'])
def dremio_query():
sql = request.json.get('sql')
if not sql:
return jsonify({'error': 'SQL query is required'}), 400

# Validate that the query is a SELECT query and does not contain harmful commands
harmful_commands = r'\b(DROP|DELETE|INSERT|UPDATE|ALTER|CREATE|TRUNCATE|REPLACE|MERGE|EXEC|EXECUTE|GRANT|REVOKE|SET|USE|CALL|LOCK|UNLOCK|RENAME|COMMENT|COMMIT|ROLLBACK|SAVEPOINT|RELEASE)\b'
if not re.match(r'^\s*SELECT\b', sql.strip(), re.IGNORECASE) or re.search(harmful_commands, sql, re.IGNORECASE):
return jsonify({'error': 'Only SELECT queries are allowed and no harmful commands are permitted'}), 400

try:
job_id = execute_dremio_query(sql)
result = get_dremio_query_results(job_id)
return jsonify(result)
except requests.exceptions.RequestException as e:
return jsonify({'error': str(e)}), 500
except Exception as e:
return jsonify({'error': str(e)}), 500

@app.route('/dremio_catalog', methods=['GET'])
def dremio_catalog():
try:
catalog = list_dremio_catalog()
return jsonify(catalog)
except requests.exceptions.RequestException as e:
return jsonify({'error': str(e)}), 500

if __name__ == '__main__':
port = int(os.getenv('FLASK_RUN_PORT', 5000))
app.run(host='0.0.0.0', port=port)
18 changes: 18 additions & 0 deletions Structured Dremio Solution/Flask-api/docker-compose-flask.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
version: '3.8'

services:
flaskapp:
build:
context: .
dockerfile: Dockerfile
ports:
- "5000:5000"
env_file:
- api.env
container_name: structured-solution-api
networks:
- iceberg_env

networks:
iceberg_env:
external: true
6 changes: 6 additions & 0 deletions Structured Dremio Solution/Flask-api/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Flask==1.1.4
Jinja2==2.11.3
MarkupSafe==1.1.1
requests
pandas
python-dotenv
1 change: 1 addition & 0 deletions Structured Dremio Solution/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This folder is for code related to the structured Dremio solution running on the vm.
3 changes: 3 additions & 0 deletions Structured Dremio Solution/Script/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Structured Dremio Solution - Script

This script is a working version of a pipeline that pulls csv files from github, converts them into pandas dataframe and then feeds them into sqlite to output sql commands to create a table out of it. This is then passed to the specified dremio url in chunks to create a structured sql table of the data.
Loading