Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated questions and golden queries to prevent accept multiple correct answers and reduce ambiguity #37

Merged
merged 3 commits into from
Oct 19, 2023

Conversation

rishsriv
Copy link
Member

  • Updated 6 questions in data/questions_gen.csv
  • Added a prompt for fairly evaluating OpenAI models

@rishsriv rishsriv requested a review from wongjingping October 19, 2023 10:00
Copy link
Collaborator

@wongjingping wongjingping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix and ship!

@@ -70,7 +70,7 @@ What is the total cost of round-trip fares for each airline code?,"SELECT fare.f
"How many meals are served in each compartment, sorted by the number of meals in descending order?","SELECT food_service.compartment, COUNT(food_service.meal_number) AS number_of_meals FROM food_service GROUP BY food_service.compartment ORDER BY number_of_meals DESC NULLS LAST;",atis,group_by
"How many flights depart from each airport code, excluding stopovers?","SELECT airport.airport_code, COUNT(flight.from_airport) AS num_departures FROM airport LEFT JOIN flight ON airport.airport_code = flight.from_airport GROUP BY airport.airport_code;SELECT airport.airport_code, COUNT(flight.from_airport) AS num_departures FROM airport JOIN flight ON airport.airport_code = flight.from_airport GROUP BY airport.airport_code;",atis,group_by
"Which flight ids to Chicago (ORD) have the longest duration from departure to arrival, sorted in ascending order?","SELECT flight.flight_id, (flight.arrival_time - flight.departure_time) AS duration FROM flight WHERE to_airport = 'ORD' ORDER BY duration ASC NULLS LAST;",atis,order_by
"Which airport(s) have the shortest minimum connect time, sorted in ascending order? Show the minimum connect time.","SELECT {airport.airport_name, airport.airport_code}, airport.minimum_connect_time FROM airport ORDER BY airport.minimum_connect_time ASC NULLS LAST LIMIT 1;",atis,order_by
"Which airports have the shortest minimum connect time, sorted in ascending order? Show the minimum connect time.","SELECT {airport.airport_name, airport.airport_code}, airport.minimum_connect_time FROM airport ORDER BY airport.minimum_connect_time ASC NULLS LAST;",atis,order_by
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want the airports with the shortest minimum_connect_time, then we would need a subquery that gets the shortest minimum_connect_time, and then finds the airports which have that particular value right?

SELECT {airport.airport_name, airport.airport_code}, airport.minimum_connect_time FROM airport WHERE airport.minimum_connect_time = (SELECT MIN(airport.minimum_connect_time) FROM airport) ORDER BY airport.minimum_connect_time ASC NULLS LAST;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I read it as give me the connect time in ascending order. I think the current one is fine?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just texted a couple of folks to get more opinions and it seems like give me the connect time in ascending order is the consensus understanding of this. Sticking with this for now, but we can change later :)

This was a very instructive lessons in phrasing, though. Context understanding is so non-trivial, and is one of the things our upcoming instruction-fine-tuned models will eventually get better at

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining - agreed about the ascending order. I was focusing more on the 'shortest' bit, and it seemed to imply that we only want the one with the smallest value. Do you think it'd be ok if we remove the word 'shortest' here?

@@ -146,7 +146,7 @@ Give me the total number of papers published in the first 12 months of 2019.,SEL
"On average, how many papers per month were published in the whole of 2020?",SELECT cast(count(*) AS float)/ 12 AS average_papers_per_month FROM paper WHERE YEAR = 2020;,scholar,date_functions
What is the total number of papers published per year?,"SELECT paper.year, COUNT(paper.paperid) AS total_papers FROM paper GROUP BY paper.year ORDER BY paper.year NULLS LAST;",scholar,group_by
What is the total number of papers published in each year?,"SELECT paper.year, COUNT(paper.paperid) AS total_papers FROM paper GROUP BY paper.year ORDER BY paper.year;",scholar,group_by
What is the total number of papers associated with each dataset?,"SELECT paperdataset.datasetid, COUNT(DISTINCT paperdataset.paperid) AS total_papers FROM paperdataset GROUP BY paperdataset.datasetid;",scholar,group_by
What is the total number of papers associated with each dataset?,"SELECT paperdataset.datasetid, COUNT(DISTINCT paperdataset.paperid) AS total_papers FROM paperdataset GROUP BY paperdataset.datasetid;SELECT dataset.datasetname, COUNT(paperdataset.paperid) AS total_papers FROM paperdataset JOIN dataset ON paperdataset.datasetid = dataset.datasetid GROUP BY dataset.datasetname ORDER BY total_papers DESC NULLS LAST;SELECT p.title, COUNT(DISTINCT a.authorid) AS num_authors FROM paper p JOIN writes w ON p.paperid = w.paperid JOIN author a ON w.authorid = a.authorid GROUP BY p.title ORDER BY num_authors DESC;",scholar,group_by
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2nd query looks good to me but I think the 3rd one might not be related? It seems to be returning the paper title instead of the dataset name/id:

SELECT p.title, COUNT(DISTINCT a.authorid) AS num_authors FROM paper p JOIN writes w ON p.paperid = w.paperid JOIN author a ON w.authorid = a.authorid GROUP BY p.title ORDER BY num_authors DESC;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah my bad – pushing a fix!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@rishsriv rishsriv merged commit ca1ee12 into main Oct 19, 2023
2 checks passed
@rishsriv rishsriv deleted the rishabh/fixes branch October 19, 2023 10:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants