Skip to content
This repository has been archived by the owner on Dec 9, 2024. It is now read-only.

Commit

Permalink
Merge branch 'main' into feature/preprocessing-and-timestamps
Browse files Browse the repository at this point in the history
  • Loading branch information
jaydenstokes authored Jun 18, 2024
2 parents 58ac55d + 59e1ef1 commit 568177f
Show file tree
Hide file tree
Showing 23 changed files with 459 additions and 75 deletions.
60 changes: 48 additions & 12 deletions CODE_OF_CONDUCT.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
# Code of Conduct

## v1 - Drafted by Izaac Saleh & Sukhwan Ko
## Versions

- v2 - Drafted by Sukhwan Ko
- <s>v1 - Drafted by Izaac Saleh & Sukhwan Ko</s>

## Our Pledge

Expand Down Expand Up @@ -55,22 +58,29 @@ any moderations they deem appropriate.

---

- Our online platform (website, forum, chatroom etc.)
- Code repository (including comments, pull requests, issue reports etc.)
- Project-related communication channels such as:
- Email
- Slack
- Microsoft Teams
- Discord
- Asana
- etc.
- In-person events(Conferences, meetups etc.)
**Our online platform**

- **Website**: The Code of Conduct applies to all comments, posts, and messages left on the website, including those in forums, discussion boards, or comment sections.
- **Forum**: The same rules apply to all discussions, threads, and topics within the forum, including those that are public or private.
- **Chatroom**: The Code of Conduct covers all conversations, messages, and files shared in chatroom, whether they're public or invited-only.

**Code repository**

- **Comments**: The Code applies to all comments left on code snippets, commits, or pull requests, as well as any replies or discussions that might arise.
- **Pull requests:** The same rules apply to all pull requests, including those for new features, bug fixes, or refactoring code.
- **Issue reports**: The Code of conduct covers all issue reports, whether they're related to bugs, feature requests, or documentation updates.

**Project-related communication channels**

- **Email**: The Code applies to all emails sent to the project's email list, including these that are public or private.
- **Microsoft Teams:** The Code of Conduct covers all discussions, meetings, and file shared within the project's Microsoft Teams space.
- **Discord(main):** The same rules apply to all voice or text chats, channels, and servers used by the project on Discord.

## Enforcement

---

Community leaders will take action to address violations of the Code of Conduct. The specific consequences will depend on the severity of the violation.
Our Community Leaders are responsible for clarifying and enforcing the standards of acceptable behaviour and are expected to lead by example. If a breach of the Code of Conduct is found to have occurred, Community Leaders are charged with taking appropriate but fair corrective action. Community Leaders have the responsibility, and maintain the right, to remove, edit, or reject comments, commits, code, issues, and other contributions that are not in alignment with this Code of Conduct. They will communicate the reasons for any moderation they deem appropriate.

## Enforcement Guidelines

Expand Down Expand Up @@ -100,6 +110,32 @@ Community leaders will take action to address violations of the Code of Conduct.

- **Consequence:** A permanent ban from any sort of public interaction within the community.

5. Additional Guidelines

- Initial Response: Upon receiving a report of potential Code of Conduct violation, our Community Leaders will acknowledge receipt and promptly investigate.
- Investigation Process:
- Gather information from all parties involved.
- Review relevant records and communications.
- Investigate in a fair and impartial manner.
- Communication During Investigation:
- Maintain open communication with all parties involved.
- Provide updates on the progress and outcome of the investigation.
- Decision-Making:
- Community Leaders will make decisions based on the findings and evidence gathered during the investigation.
- Consequences for Repeat Offenders:
- For repeat offenders, consequences may be escalated or combined (e.g., temporary ban and loss of privileges).
- Appeals Process:
- Establish a clear appeals process for individuals affected by enforcement decisions.
- Ensure that appeals are heard in a fair and impartial manner.

6. Resolving Disputes

- Dispute Resolution Committee: Establish a committee composed of community leaders to resolve disputes related to Code of Conduct violations.
- Dispute Resolution Process:
- Identify the nature of the dispute.
- Gather information from all parties involved.
- Make a fair and impartial decision.

## Attribution

This Code of Conduct is adapted from the GITHUB documents template for "code of conduct" found when adding a file of the
Expand Down
5 changes: 5 additions & 0 deletions app/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,8 +141,10 @@ def send_to_ide():
code = request.get_json().get("code_snippet")
unescaped_code = html.unescape(code)
if utils.send_code_snippet_to_ide(filename, unescaped_code):
utils.playsound_notification("success.mp3")
return "success"
else:
utils.playsound_notification("error.mp3")
return "fail"


Expand Down Expand Up @@ -257,6 +259,7 @@ def update_settings():
def reset_settings():
print("Current working directory:", os.getcwd())
# Delete the existing config.ini file
os.chdir(str(utils.APP_DIR))
if os.path.exists('config.ini'):
os.remove('config.ini')
shutil.copy('config.example.ini', 'config.ini')
Expand Down Expand Up @@ -302,6 +305,8 @@ def update_tesseract_path():
if __name__ == "__main__":
host = "localhost"
port = 5000
if os.name == 'posix':
port = 5001
logging.basicConfig(filename="app.log", filemode="w", level=logging.DEBUG, format="%(levelname)s - %(message)s")
print("[*] Starting OcrRoo Server")
print(f"[*] OcrRoo Server running on http://{host}:{port}/")
Expand Down
45 changes: 33 additions & 12 deletions app/extract_all_code.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import openai
import time
import ast
import os

# Set OpenAI API key
openai.api_key = 'YOUR_API_KEY_HERE'
Expand All @@ -15,24 +16,23 @@

# Specify project headers
project_headers = {
"Authorization": "Bearer " + openai.api_key,
"Authorization" : "Bearer " + openai.api_key,
# "OpenAI-Project" : ocrroo_project_id
}


# ChatGPT
# python multiprocessing program to extract only programming code from video using opencv and tesseract ocr with
# limited memory saving frames into text file
# python multiprocessing program to extract only programming code from video using opencv and tesseract ocr with limited memory saving frames into text file

# Set up pytesseract path (if required)
# For example, on Windows:
pytesseract.pytesseract.tesseract_cmd = r'C:\Users\user\AppData\Local\Programs\Tesseract-OCR\tesseract.exe'


# Function to check if the text is likely to be programming code # using common keywords and symbols
# Function to check if the text is likely to be programming code
def is_code(text):
code_pattern = re.compile(r"""
(\b(if|else|while|for|return|int|float|double|char|void|import|from|class|def|print|include|main)\b|
(\b(if|else|while|for|return|int|float|double|char|void|import|from|class|def|print|include|main)\b| # common keywords
[\{\}\[\]\(\)<>;:=]| # common symbols
\b\d+\b| # numbers
[\w]+\.\w+| # object properties or functions
Expand Down Expand Up @@ -109,7 +109,6 @@ def extract_code_from_frame(frame):
code_lines = [line.strip() for line in text.split('\n') if is_code(line)]
return '\n'.join(code_lines)


# Function to save frames containing code as images
def save_frames(frames, output_dir, output_file):
unique_code = set()
Expand All @@ -128,7 +127,6 @@ def save_frames(frames, output_dir, output_file):
for code in unique_code:
f.write(code + '\n')


def is_valid_python_code(code_line):
"""
Validate if a line of Python code is syntactically correct.
Expand All @@ -139,7 +137,6 @@ def is_valid_python_code(code_line):
except SyntaxError:
return False


def process_code_file(input_filename, output_filename):
"""
Read a code file, trim off lines starting with '>>>', validate remaining lines,
Expand All @@ -162,6 +159,28 @@ def process_code_file(input_filename, output_filename):
outfile.write(valid_line + '\n')


def remove_duplicate_lines(input_file, output_file):
try:
with open(input_file, 'r') as file:
lines = file.readlines()

# Remove duplicate lines while preserving the order
seen = set()
unique_lines = []
for line in lines:
if line not in seen:
unique_lines.append(line)
seen.add(line)

# Write the unique lines to the output file
with open(output_file, 'w') as file:
file.writelines(unique_lines)

except FileNotFoundError:
print(f"The file {input_file} does not exist.")
except Exception as e:
print(f"An error occurred: {e}")

def process_text_file(input_file, output_file):
with open(input_file, 'r') as file:
text = file.read()
Expand All @@ -178,7 +197,7 @@ def process_text_file(input_file, output_file):
# {"role": "system",
# "content": f"You are a coding assistant. You reply only in programming code "
# "that is correct and formatted. Do NOT reply with any explanation, "
# "only code. If you are given something that is not programming code, "
# f"only code. If you are given something that is not programming code, "
# "you must NOT include it in your response. If nothing is present, "
# "simply return 'ERROR' and nothing else. Do NOT return leading or "
# "trailing"
Expand Down Expand Up @@ -215,6 +234,7 @@ def process_text_file(input_file, output_file):
output_dir = 'frames_with_code'
raw_code_file = 'extracted_code.txt'
valid_code_file = 'valid_code.txt'
clean_code_file = 'clean_code.txt'
gpt_output_file = "gpt_output.txt"

frames_with_code = process_video(video_path)
Expand All @@ -223,7 +243,8 @@ def process_text_file(input_file, output_file):

process_code_file(raw_code_file, valid_code_file)

process_text_file(valid_code_file, gpt_output_file)
remove_duplicate_lines(valid_code_file, clean_code_file)

process_text_file(clean_code_file, gpt_output_file)

print(f"Code-containing frames extraction complete. Check '{output_dir}' for the output images. "
f"Check '{gpt_output_file}' for the output file.")
print(f"Code-containing frames extraction complete. Check '{output_dir}' for the output images. Check '{gpt_output_file}' for the output file.")
3 changes: 3 additions & 0 deletions app/extract_text.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,14 @@ def extract_code_at_timestamp(filename: str, timestamp: float) -> str:
:param timestamp: Time stamp of the frame to extract
:return: Formatted code as a string
"""
utils.playsound_notification("capture.mp3")
frame = ExtractText.extract_frame_at_timestamp(filename, timestamp)
if frame is not None:
extracted_text = pytesseract.image_to_string(frame)
logging.info(f"Successfully extracted code from frame @ {timestamp}s in file {filename}")
return ExtractText.format_raw_ocr_string(extracted_text)
else:
utils.playsound_notification("capture_fail_tone.wav")
logging.error(f"Unable to extract code from frame @ {timestamp}s in file {filename}")
return "ERROR"

Expand All @@ -50,6 +52,7 @@ def format_raw_ocr_string(extracted_text: str) -> str:
formatted_text = formatted_text.replace("```", "")
if config("Formatting", "remove_language_name"):
formatted_text = formatted_text.replace(language, "", 1)
utils.playsound_notification("success.mp3")
return formatted_text

@staticmethod
Expand Down
Binary file added app/static/audio/capture.mp3
Binary file not shown.
Binary file added app/static/audio/error.mp3
Binary file not shown.
Binary file added app/static/audio/success.mp3
Binary file not shown.
Loading

0 comments on commit 568177f

Please sign in to comment.