Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚡️ Speed up method AssemblyAILeMUR.run_lemur by 7% in src/backend/base/langflow/components/assemblyai/assemblyai_lemur.py #104

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Dec 12, 2024

📄 AssemblyAILeMUR.run_lemur in src/backend/base/langflow/components/assemblyai/assemblyai_lemur.py

✨ Performance Summary:

  • Speed Increase: 📈 7% (0.07x faster)
  • Runtime Reduction: ⏱️ From 17.9 milliseconds down to 16.8 milliseconds (best of 22 runs)

📝 Explanation and details

Here is the optimized version of your Python program.

Key Optimizations.

  1. Simplified and Efficient Loops in validate_data: Utilizing list operations and setdefault to streamline dictionary manipulations.
  2. Removed Redundant Checks in serialize_model: Eliminated unnecessary hasattr checks within dictionary comprehensions.
  3. Efficient Merging in __add__ Method: Combined the dictionaries more efficiently with the setdefault method.
  4. Optimized to_lc_message Method: Streamlined conditional and loop constructs to minimize redundant checks and unroll repeated operations.
  5. Iterating Efficiently: Replaced copy() operations with direct manipulations to eliminate redundant dictionary operations.
  6. Refactored Repeated Code: Consolidated similar conditional branches in run_lemur and reused methods to reduce redundancy.

Correctness verification

The new optimized code was tested for correctness. The results are listed below:

Test Status Details
⚙️ Existing Unit Tests 🔘 None Found
🌀 Generated Regression Tests 20 Passed See below
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Coverage 82.2%

🌀 Generated Regression Tests Details

Click to view details
import copy
from typing import cast
from unittest.mock import MagicMock, patch

# function to test
import assemblyai as aai
# imports
import pytest  # used for our unit tests
from langchain_core.documents import Document
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage
from langflow.components.assemblyai.assemblyai_lemur import AssemblyAILeMUR
from langflow.custom import Component
from langflow.schema import Data
from langflow.utils.constants import MESSAGE_SENDER_AI, MESSAGE_SENDER_USER
from langflow.utils.image import create_data_url
from loguru import logger
from pydantic import BaseModel, model_serializer, model_validator


# unit tests
@pytest.fixture
def lemur_instance():
    instance = AssemblyAILeMUR()
    instance.api_key = "test_api_key"
    instance.transcription_result = None
    instance.transcript_ids = None
    instance.endpoint = "task"
    instance.prompt = "What is the weather today?"
    instance.questions = "What is your name?, How old are you?"
    return instance

def test_valid_transcription_result_and_prompt(lemur_instance):
    # Mock valid transcription result
    lemur_instance.transcription_result = Data(data={"id": "12345"})
    lemur_instance.transcript_ids = None

    # Mock TranscriptGroup and perform_lemur_action
    with patch('assemblyai.TranscriptGroup') as MockTranscriptGroup:
        mock_transcript_group = MockTranscriptGroup.return_value
        mock_transcript_group.wait_for_completion.return_value = (mock_transcript_group, [])
        mock_transcript_group.transcripts = [MagicMock(status=aai.TranscriptStatus.completed)]
        lemur_instance.perform_lemur_action = MagicMock(return_value={"result": "success"})

        codeflash_output = lemur_instance.run_lemur()

def test_valid_transcript_ids_and_questions(lemur_instance):
    # Mock valid transcript IDs
    lemur_instance.transcription_result = None
    lemur_instance.transcript_ids = "12345,67890"
    lemur_instance.endpoint = "question-answer"

    # Mock TranscriptGroup and perform_lemur_action
    with patch('assemblyai.TranscriptGroup') as MockTranscriptGroup:
        mock_transcript_group = MockTranscriptGroup.return_value
        mock_transcript_group.wait_for_completion.return_value = (mock_transcript_group, [])
        mock_transcript_group.transcripts = [MagicMock(status=aai.TranscriptStatus.completed)]
        lemur_instance.perform_lemur_action = MagicMock(return_value={"result": "success"})

        codeflash_output = lemur_instance.run_lemur()

def test_no_transcription_result_or_transcript_ids(lemur_instance):
    lemur_instance.transcription_result = None
    lemur_instance.transcript_ids = None

    codeflash_output = lemur_instance.run_lemur()

def test_empty_transcription_result_with_error(lemur_instance):
    lemur_instance.transcription_result = Data(data={"error": "Invalid transcription"})

    codeflash_output = lemur_instance.run_lemur()

def test_no_prompt_for_task_endpoint(lemur_instance):
    lemur_instance.endpoint = "task"
    lemur_instance.prompt = None

    codeflash_output = lemur_instance.run_lemur()

def test_no_questions_for_question_answer_endpoint(lemur_instance):
    lemur_instance.endpoint = "question-answer"
    lemur_instance.questions = None

    codeflash_output = lemur_instance.run_lemur()


def test_transcript_group_failure(lemur_instance):
    lemur_instance.transcription_result = Data(data={"id": "12345"})

    with patch('assemblyai.TranscriptGroup') as MockTranscriptGroup:
        mock_transcript_group = MockTranscriptGroup.return_value
        mock_transcript_group.wait_for_completion.return_value = (mock_transcript_group, ["Failure message"])

        codeflash_output = lemur_instance.run_lemur()

def test_transcript_status_error(lemur_instance):
    lemur_instance.transcription_result = Data(data={"id": "12345"})

    with patch('assemblyai.TranscriptGroup') as MockTranscriptGroup:
        mock_transcript_group = MockTranscriptGroup.return_value
        mock_transcript_group.wait_for_completion.return_value = (mock_transcript_group, [])
        mock_transcript_group.transcripts = [MagicMock(status=aai.TranscriptStatus.error, error="Transcript error")]

        codeflash_output = lemur_instance.run_lemur()

def test_multiple_valid_transcript_ids(lemur_instance):
    lemur_instance.transcription_result = None
    lemur_instance.transcript_ids = "12345,67890,11223"

    with patch('assemblyai.TranscriptGroup') as MockTranscriptGroup:
        mock_transcript_group = MockTranscriptGroup.return_value
        mock_transcript_group.wait_for_completion.return_value = (mock_transcript_group, [])
        mock_transcript_group.transcripts = [MagicMock(status=aai.TranscriptStatus.completed)]
        lemur_instance.perform_lemur_action = MagicMock(return_value={"result": "success"})

        codeflash_output = lemur_instance.run_lemur()

def test_mixed_valid_and_invalid_transcript_ids(lemur_instance):
    lemur_instance.transcription_result = None
    lemur_instance.transcript_ids = "12345,invalid_id,67890"

    with patch('assemblyai.TranscriptGroup') as MockTranscriptGroup:
        mock_transcript_group = MockTranscriptGroup.return_value
        mock_transcript_group.wait_for_completion.return_value = (mock_transcript_group, [])
        mock_transcript_group.transcripts = [MagicMock(status=aai.TranscriptStatus.completed)]
        lemur_instance.perform_lemur_action = MagicMock(return_value={"result": "success"})

        codeflash_output = lemur_instance.run_lemur()


import copy
from typing import cast
from unittest.mock import MagicMock, patch

# function to test
import assemblyai as aai
# imports
import pytest  # used for our unit tests
from langchain_core.documents import Document
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage
from langflow.components.assemblyai.assemblyai_lemur import AssemblyAILeMUR
from langflow.custom import Component
from langflow.schema import Data
from langflow.utils.constants import MESSAGE_SENDER_AI, MESSAGE_SENDER_USER
from langflow.utils.image import create_data_url
from loguru import logger
from pydantic import BaseModel, model_serializer, model_validator

# unit tests

# Mocking the dependencies
@pytest.fixture
def mock_assemblyai():
    with patch('assemblyai.TranscriptGroup') as MockTranscriptGroup:
        yield MockTranscriptGroup

@pytest.fixture
def mock_component():
    component = AssemblyAILeMUR()
    component.api_key = "test_api_key"
    component.transcription_result = None
    component.transcript_ids = None
    component.endpoint = "task"
    component.prompt = "test prompt"
    component.questions = "test question"
    component.final_model = "test_model"
    component.temperature = 0.7
    component.max_output_size = 100
    yield component

# 1. Valid Input Scenarios
def test_valid_transcription_result_task_endpoint(mock_component, mock_assemblyai):
    # Mock valid transcription result
    mock_component.transcription_result = Data(data={"id": "test_id"})
    mock_component.endpoint = "task"
    mock_component.prompt = "test prompt"
    
    # Mock the transcript group and its methods
    mock_transcript_group = mock_assemblyai.return_value
    mock_transcript_group.wait_for_completion.return_value = (mock_transcript_group, [])
    mock_transcript_group.transcripts = [MagicMock(status=aai.TranscriptStatus.completed)]
    mock_transcript_group.lemur.task.return_value.dict.return_value = {"result": "task result"}
    
    codeflash_output = mock_component.run_lemur()

def test_valid_transcript_ids_summary_endpoint(mock_component, mock_assemblyai):
    # Mock valid transcript IDs
    mock_component.transcript_ids = "test_id1,test_id2"
    mock_component.endpoint = "summary"
    
    # Mock the transcript group and its methods
    mock_transcript_group = mock_assemblyai.return_value
    mock_transcript_group.wait_for_completion.return_value = (mock_transcript_group, [])
    mock_transcript_group.transcripts = [MagicMock(status=aai.TranscriptStatus.completed)]
    mock_transcript_group.lemur.summarize.return_value.dict.return_value = {"result": "summary result"}
    
    codeflash_output = mock_component.run_lemur()

# 2. Missing or Invalid Input Scenarios
def test_missing_transcription_result_and_transcript_ids(mock_component):
    codeflash_output = mock_component.run_lemur()

def test_invalid_transcription_result(mock_component):
    mock_component.transcription_result = Data(data={"error": "test error"})
    codeflash_output = mock_component.run_lemur()

# 3. Endpoint-Specific Edge Cases
def test_task_endpoint_without_prompt(mock_component):
    mock_component.endpoint = "task"
    mock_component.prompt = ""
    codeflash_output = mock_component.run_lemur()

def test_question_answer_endpoint_without_questions(mock_component):
    mock_component.endpoint = "question-answer"
    mock_component.questions = ""
    codeflash_output = mock_component.run_lemur()

# 4. Transcript Group Errors
def test_transcript_group_failures(mock_component, mock_assemblyai):
    mock_component.transcript_ids = "test_id"
    
    # Mock the transcript group and its methods
    mock_transcript_group = mock_assemblyai.return_value
    mock_transcript_group.wait_for_completion.return_value = (mock_transcript_group, ["failure"])
    
    codeflash_output = mock_component.run_lemur()

# 5. Exception Handling
def test_exceptions_during_lemur_action(mock_component, mock_assemblyai):
    mock_component.transcript_ids = "test_id"
    
    # Mock the transcript group and its methods
    mock_transcript_group = mock_assemblyai.return_value
    mock_transcript_group.wait_for_completion.return_value = (mock_transcript_group, [])
    mock_transcript_group.transcripts = [MagicMock(status=aai.TranscriptStatus.completed)]
    mock_transcript_group.lemur.task.side_effect = Exception("test exception")
    
    codeflash_output = mock_component.run_lemur()

# 6. Large Scale Test Cases
def test_performance_with_large_data_samples(mock_component, mock_assemblyai):
    mock_component.transcript_ids = ",".join([f"test_id_{i}" for i in range(1000)])
    
    # Mock the transcript group and its methods
    mock_transcript_group = mock_assemblyai.return_value
    mock_transcript_group.wait_for_completion.return_value = (mock_transcript_group, [])
    mock_transcript_group.transcripts = [MagicMock(status=aai.TranscriptStatus.completed) for _ in range(1000)]
    mock_transcript_group.lemur.task.return_value.dict.return_value = {"result": "task result"}
    
    codeflash_output = mock_component.run_lemur()

# 7. Side Effects Verification
def test_logging_and_status_updates(mock_component, mock_assemblyai, caplog):
    mock_component.transcript_ids = "test_id"
    
    # Mock the transcript group and its methods
    mock_transcript_group = mock_assemblyai.return_value
    mock_transcript_group.wait_for_completion.return_value = (mock_transcript_group, [])
    mock_transcript_group.transcripts = [MagicMock(status=aai.TranscriptStatus.completed)]
    mock_transcript_group.lemur.task.return_value.dict.return_value = {"result": "task result"}
    
    with caplog.at_level('INFO'):
        codeflash_output = mock_component.run_lemur()
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

📣 **Feedback**

If you have any feedback or need assistance, feel free to join our Discord community:

Discord

Here is the optimized version of your Python program.



### Key Optimizations.
1. **Simplified and Efficient Loops in `validate_data`:** Utilizing list operations and `setdefault` to streamline dictionary manipulations.
2. **Removed Redundant Checks in `serialize_model`:** Eliminated unnecessary `hasattr` checks within dictionary comprehensions.
3. **Efficient Merging in `__add__` Method:** Combined the dictionaries more efficiently with the `setdefault` method.
4. **Optimized `to_lc_message` Method:** Streamlined conditional and loop constructs to minimize redundant checks and unroll repeated operations.
5. **Iterating Efficiently:** Replaced `copy()` operations with direct manipulations to eliminate redundant dictionary operations.
6. **Refactored Repeated Code:** Consolidated similar conditional branches in `run_lemur` and reused methods to reduce redundancy.
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Dec 12, 2024
@codeflash-ai codeflash-ai bot requested a review from misrasaurabh1 December 12, 2024 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚡️ codeflash Optimization PR opened by Codeflash AI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants