Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(helpers): cost of prompt and completion tokens for OpenAI models #618

Merged
merged 2 commits into from
Oct 5, 2023

Conversation

mspronesti
Copy link
Contributor

@mspronesti mspronesti commented Oct 4, 2023

Hi @gventuri,
this PR aims at separating the cost evaluation of prompt tokens and completion/chat completion tokens, which are charged differently.

Summary by CodeRabbit

  • New Feature: Added support for GPT-4 and GPT-3.5 models in the OpenAI helper.
  • Refactor: Updated get_openai_token_cost_for_model function to distinguish between prompt tokens and completion tokens, improving cost calculation accuracy.
  • New Feature: Enhanced the __call__ method to separately calculate costs for prompt and completion tokens, providing more granular control over token usage.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 4, 2023

Walkthrough

The changes introduce new model names for GPT-4 and GPT-3.5 in the openai_info.py file, with a focus on distinguishing between prompt tokens and completion tokens. The cost calculation for each token type has been separated, enhancing the granularity of the process.

Changes

File Path Summary
.../helpers/openai_info.py Introduced GPT-4 and GPT-3.5 models, distinguished between prompt and completion tokens, and modified cost calculation methods to handle these tokens separately.

🐇💻

In the land of code where the shadows lie,

New models emerge under the AI sky.

Tokens prompt, and tokens complete,

Each with a cost, none obsolete.

Celebrate the change, for it's no small feat,

In the world of AI, we never admit defeat! 🎉🥕


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • Mention @coderabbitai in any review comment for bot assistance.
  • Note: Review comments are made on code diffs or files, not on the PR overview.

Pause Incremental Reviews

  • Insert @coderabbitai: ignore in the PR description to halt the bot's ongoing reviews. Remove the line to resume.
  • Additionally, enter @coderabbitai pause to halt the bot's ongoing reviews as any issue or review comment. To
    resume reviews, enter @coderabbitai resume as any issue or review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Commits Files that changed from the base of the PR and between a19cfe6 and ebb91e3.
Files selected for processing (1)
  • pandasai/helpers/openai_info.py (2 hunks)
Additional comments (Suppressed): 3
pandasai/helpers/openai_info.py (3)
  • 10-44: The MODEL_COST_PER_1K_TOKENS dictionary has been updated to include new model names for GPT-4, GPT-3.5, and Azure GPT-35, along with their respective costs per 1000 tokens. The cost values are now differentiated between input and output (completion) tokens for each model. Ensure that these cost values are accurate and up-to-date.

  • 47-75: The function get_openai_token_cost_for_model() has been modified to handle completion tokens separately. A new argument is_completion has been added to the function signature. If is_completion is True and the model name starts with "gpt-4", "gpt-3.5", or "gpt-35", the function appends "-completion" to the model name before looking up the cost in the MODEL_COST_PER_1K_TOKENS dictionary. This change allows the function to calculate the cost of prompt tokens and completion tokens independently, enhancing the accuracy of token cost calculation for different models and operations. Make sure all calls to this function throughout the codebase have been updated to match the new signature.

  • 99-112: In the __call__ method of the OpenAICallbackHandler class, the total cost calculation has been updated to separately account for the cost of prompt tokens and completion tokens. The get_openai_token_cost_for_model() function is called twice, once for prompt tokens and once for completion tokens, with the is_completion argument set to True for the latter. The costs are then added together to get the total cost. This change improves the precision of the cost tracking mechanism by treating prompt tokens and completion tokens as distinct entities. Ensure that this modification does not introduce any unexpected side effects in the overall cost calculation process.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Commits Files that changed from the base of the PR and between ebb91e3 and 065ba14.
Files selected for processing (1)
  • pandasai/helpers/openai_info.py (2 hunks)
Additional comments (Suppressed): 3
pandasai/helpers/openai_info.py (3)
  • 10-44: The MODEL_COST_PER_1K_TOKENS dictionary has been updated to include new model names for GPT-4, GPT-3.5, and Azure GPT-35. The cost per 1000 tokens is now differentiated between input (prompt) and output (completion) tokens. Ensure that these costs are accurate and up-to-date with OpenAI's pricing.

  • 47-75: The function get_openai_token_cost_for_model() has been modified to handle prompt and completion tokens separately. An additional parameter is_completion has been introduced to indicate whether the tokens are for completion or not. This change allows for more accurate cost calculation based on token type. However, ensure that all calls to this function throughout the codebase have been updated to match the new signature.

- def get_openai_token_cost_for_model(
-    model_name: str,
-    num_tokens: int,
- ) -> float:
+ def get_openai_token_cost_for_model(
+        model_name: str,
+        num_tokens: int,
+        is_completion: bool = False,
+ ) -> float:
  • 99-114: The __call__ method in the OpenAICallbackHandler class has been updated to calculate the cost of prompt tokens and completion tokens independently. This separation of cost calculation enhances the accuracy of the system's token cost evaluation. Ensure that the response.usage object contains both prompt_tokens and completion_tokens fields.
- total_cost = get_openai_token_cost_for_model(model_name, usage.total_tokens)
- self.total_cost += total_cost
+ prompt_cost = get_openai_token_cost_for_model(
+     model_name, usage.prompt_tokens
+ )
+ completion_cost = get_openai_token_cost_for_model(
+     model_name, usage.completion_tokens, is_completion=True
+ )
+ self.total_cost += prompt_cost + completion_cost

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Commits Files that changed from the base of the PR and between ebb91e3 and f1bf92d.
Files selected for processing (1)
  • pandasai/helpers/openai_info.py (2 hunks)
Additional comments (Suppressed): 3
pandasai/helpers/openai_info.py (3)
  • 10-44: The MODEL_COST_PER_1K_TOKENS dictionary has been updated to include new model names for GPT-4, GPT-3.5, and Azure GPT-35. The cost per 1000 tokens is now differentiated between input and output (completion) for each model. Ensure that these costs are accurate and up-to-date with OpenAI's pricing.

  • 47-75: The function get_openai_token_cost_for_model() has been modified to include an additional parameter is_completion. This parameter is used to differentiate between prompt and completion tokens when calculating the cost. If is_completion is True, "-completion" is appended to the model name to fetch the correct cost from the MODEL_COST_PER_1K_TOKENS dictionary. This change allows for more precise cost calculation by treating prompt and completion tokens separately.

- def get_openai_token_cost_for_model(
-    model_name: str,
-    num_tokens: int,
- ) -> float:
+ def get_openai_token_cost_for_model(
+        model_name: str,
+        num_tokens: int,
+        is_completion: bool = False,
+ ) -> float:
  • 99-114: The __call__() method of OpenAICallbackHandler has been updated to calculate the cost of prompt tokens and completion tokens separately. It uses the get_openai_token_cost_for_model() function with the is_completion parameter set accordingly. This change aligns with the modification in the get_openai_token_cost_for_model() function and ensures a more accurate total cost calculation.
- total_cost = get_openai_token_cost_for_model(model_name, usage.total_tokens)
- self.total_cost += total_cost
+ prompt_cost = get_openai_token_cost_for_model(
+     model_name, usage.prompt_tokens
+ )
+ completion_cost = get_openai_token_cost_for_model(
+     model_name, usage.completion_tokens, is_completion=True
+ )
+ self.total_cost += prompt_cost + completion_cost

@codecov-commenter
Copy link

Codecov Report

Merging #618 (f1bf92d) into main (a19cfe6) will increase coverage by 0.01%.
The diff coverage is 100.00%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@            Coverage Diff             @@
##             main     #618      +/-   ##
==========================================
+ Coverage   83.53%   83.54%   +0.01%     
==========================================
  Files          55       55              
  Lines        2696     2699       +3     
==========================================
+ Hits         2252     2255       +3     
  Misses        444      444              
Files Coverage Δ
pandasai/helpers/openai_info.py 80.00% <100.00%> (+1.62%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mspronesti
Copy link
Contributor Author

@gventuri Could you re-run the failing workflow? Judging from the log files, it seems like the specific runner on GitHub side failed to install the dependencies. Re-running the jobs could solve it.

@mspronesti mspronesti changed the title fix(helpers): separate cost of prompt and completion fix(helpers): separate cost of prompt and completion for OpenAI Oct 5, 2023
@mspronesti mspronesti changed the title fix(helpers): separate cost of prompt and completion for OpenAI fix(helpers): separate cost of prompt and completion for OpenAI models Oct 5, 2023
@mspronesti mspronesti changed the title fix(helpers): separate cost of prompt and completion for OpenAI models fix(helpers): cost of prompt and completion tokens for OpenAI models Oct 5, 2023
@gventuri
Copy link
Collaborator

gventuri commented Oct 5, 2023

@mspronesti done, merging :)

@gventuri gventuri merged commit 297b36e into Sinaptik-AI:main Oct 5, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants