Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prompt and cleaning updates #211

Merged
merged 7 commits into from
Aug 15, 2024
Merged

Conversation

wendy-aw
Copy link
Contributor

Just realized this hasn't been pushed to main.

  • Added query cleaning post-generation to fix float divisions and spaces between >= and <=
  • Separate join_hints as a new field that can be added/removed within prompt template
  • Added join_hints to prompt_cot_postgres
  • Added pruned_join_hints field (removing obvious join columns with the same name). This is however only currently enabled for when columns_to_keep == 0)
  • Took the chance to modify one SQL to follow instructions that requests for percentage value instead of rate.

Copy link
Member

@rishsriv rishsriv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and changes! And sorry for forgetting to include this earlier 😅

@wendy-aw
Copy link
Contributor Author

Yea for a while I was actually doubting myself about whether we had these changes made 😅

@wendy-aw wendy-aw merged commit 868752b into main Aug 15, 2024
2 checks passed
@wendy-aw wendy-aw deleted the rishabh/prompt-and-cleaning-updates branch August 15, 2024 02:39
@@ -4,7 +4,7 @@ TAC = Total Active Customers who have recently joined
MoMC = Month-over-month change in average closing price for each ticker.
ACP = Average Closing Price of tickers over a recent period"
broker,bigquery,instructions_cte_join,"WITH popular_stocks AS (SELECT t.sbTickerSymbol, COUNT(*) AS tx_count FROM broker.sbTransaction AS tx JOIN broker.sbTicker AS t ON tx.sbTxTickerId = t.sbTickerId WHERE tx.sbTxType = 'buy' AND tx.sbTxDateTime >= CURRENT_DATE - INTERVAL '10' DAY GROUP BY t.sbTickerSymbol) SELECT sbTickerSymbol, tx_count FROM popular_stocks ORDER BY tx_count DESC NULLS FIRST LIMIT 2;",What are the 2 most frequently bought stock ticker symbols in the past 10 days? Return the ticker symbol and number of buy transactions.,"To find the most popular stocks in the past 10 days, join the transaction and ticker tables, filter for buy transactions in the last 10 days, group by ticker and count transactions.","MoMC = month-over-month change in average closing price Weekend days refer to Saturday and Sunday; adjust dates to weeks for aggregation. To find the most popular stocks in the past 10 days, join the transaction and ticker tables, filter for buy transactions in the last 10 days, group by ticker and count transactions. CR = customer rank by total transaction volume, where rank 1 belongs to the customer with the highest volume"
broker,bigquery,instructions_cte_join,"WITH cust_tx_stats AS (SELECT c.sbCustId, c.sbCustName, COUNT(t.sbTxId) AS total_tx, SUM(CASE WHEN t.sbTxStatus = 'success' THEN 1 ELSE 0 END) AS success_tx FROM broker.sbCustomer AS c JOIN broker.sbTransaction AS t ON c.sbCustId = t.sbTxCustId GROUP BY c.sbCustId, c.sbCustName) SELECT sbCustName, CAST(success_tx AS FLOAT64) / total_tx AS success_rate FROM cust_tx_stats WHERE total_tx >= 5 ORDER BY success_rate NULLS LAST;","For customers with at least 5 total transactions, what is their transaction success rate? Return the customer name and success rate, ordered from lowest to highest success rate.","To get the success rate of transactions per customer, join customer and transaction tables, group by customer, and calculate the percentage of successful transactions.","CR = customer rank by total transaction amount, with different rankings based on transaction amounts MoMC = month-over-month change in average closing price based on previous month's averages for each ticker each month To get the success rate of transactions per customer, join customer and transaction tables, group by customer, and calculate the percentage of successful transactions. Always join transactions with customers before using the transactions table. TAC = Total Active Customers who joined after a certain date"
broker,bigquery,instructions_cte_join,"WITH cust_tx_stats AS (SELECT c.sbCustId, c.sbCustName, COUNT(t.sbTxId) AS total_tx, SUM(CASE WHEN t.sbTxStatus = 'success' THEN 1 ELSE 0 END) AS success_tx FROM broker.sbCustomer AS c JOIN broker.sbTransaction AS t ON c.sbCustId = t.sbTxCustId GROUP BY c.sbCustId, c.sbCustName) SELECT sbCustName, CAST(success_tx AS FLOAT64) / total_tx * 100 AS success_rate FROM cust_tx_stats WHERE total_tx >= 5 ORDER BY success_rate NULLS LAST;","For customers with at least 5 total transactions, what is their transaction success rate? Return the customer name and success rate, ordered from lowest to highest success rate.","To get the success rate of transactions per customer, join customer and transaction tables, group by customer, and calculate the percentage of successful transactions.","CR = customer rank by total transaction amount, with different rankings based on transaction amounts MoMC = month-over-month change in average closing price based on previous month's averages for each ticker each month To get the success rate of transactions per customer, join customer and transaction tables, group by customer, and calculate the percentage of successful transactions. Always join transactions with customers before using the transactions table. TAC = Total Active Customers who joined after a certain date"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we include both variants (with and without *100) since the question doesn't specify the exact format? same for all dialects

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh i thought like a whole number (with *100) would be more consistent with the instructions that say "calculate the percentage of successful transactions"

join_str = f"{col_1} can be joined with {col_2}"
if join_str not in join_list:
join_list.append(join_str)
# add to pruned_join_list if column names are not equal
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice heuristic here for pruning join hints!

Comment on lines +250 to +251
join_hints=join_str,
pruned_join_hints=pruned_join_str,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice thanks for adding these as separate sections!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants