-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated questions and golden queries to prevent accept multiple correct answers and reduce ambiguity #37
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix and ship!
@@ -70,7 +70,7 @@ What is the total cost of round-trip fares for each airline code?,"SELECT fare.f | |||
"How many meals are served in each compartment, sorted by the number of meals in descending order?","SELECT food_service.compartment, COUNT(food_service.meal_number) AS number_of_meals FROM food_service GROUP BY food_service.compartment ORDER BY number_of_meals DESC NULLS LAST;",atis,group_by | |||
"How many flights depart from each airport code, excluding stopovers?","SELECT airport.airport_code, COUNT(flight.from_airport) AS num_departures FROM airport LEFT JOIN flight ON airport.airport_code = flight.from_airport GROUP BY airport.airport_code;SELECT airport.airport_code, COUNT(flight.from_airport) AS num_departures FROM airport JOIN flight ON airport.airport_code = flight.from_airport GROUP BY airport.airport_code;",atis,group_by | |||
"Which flight ids to Chicago (ORD) have the longest duration from departure to arrival, sorted in ascending order?","SELECT flight.flight_id, (flight.arrival_time - flight.departure_time) AS duration FROM flight WHERE to_airport = 'ORD' ORDER BY duration ASC NULLS LAST;",atis,order_by | |||
"Which airport(s) have the shortest minimum connect time, sorted in ascending order? Show the minimum connect time.","SELECT {airport.airport_name, airport.airport_code}, airport.minimum_connect_time FROM airport ORDER BY airport.minimum_connect_time ASC NULLS LAST LIMIT 1;",atis,order_by | |||
"Which airports have the shortest minimum connect time, sorted in ascending order? Show the minimum connect time.","SELECT {airport.airport_name, airport.airport_code}, airport.minimum_connect_time FROM airport ORDER BY airport.minimum_connect_time ASC NULLS LAST;",atis,order_by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we want the airports with the shortest minimum_connect_time, then we would need a subquery that gets the shortest minimum_connect_time, and then finds the airports which have that particular value right?
SELECT {airport.airport_name, airport.airport_code}, airport.minimum_connect_time FROM airport WHERE airport.minimum_connect_time = (SELECT MIN(airport.minimum_connect_time) FROM airport) ORDER BY airport.minimum_connect_time ASC NULLS LAST;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I read it as give me the connect time in ascending order. I think the current one is fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just texted a couple of folks to get more opinions and it seems like give me the connect time in ascending order
is the consensus understanding of this. Sticking with this for now, but we can change later :)
This was a very instructive lessons in phrasing, though. Context understanding is so non-trivial, and is one of the things our upcoming instruction-fine-tuned models will eventually get better at
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explaining - agreed about the ascending order. I was focusing more on the 'shortest' bit, and it seemed to imply that we only want the one with the smallest value. Do you think it'd be ok if we remove the word 'shortest' here?
data/questions_gen.csv
Outdated
@@ -146,7 +146,7 @@ Give me the total number of papers published in the first 12 months of 2019.,SEL | |||
"On average, how many papers per month were published in the whole of 2020?",SELECT cast(count(*) AS float)/ 12 AS average_papers_per_month FROM paper WHERE YEAR = 2020;,scholar,date_functions | |||
What is the total number of papers published per year?,"SELECT paper.year, COUNT(paper.paperid) AS total_papers FROM paper GROUP BY paper.year ORDER BY paper.year NULLS LAST;",scholar,group_by | |||
What is the total number of papers published in each year?,"SELECT paper.year, COUNT(paper.paperid) AS total_papers FROM paper GROUP BY paper.year ORDER BY paper.year;",scholar,group_by | |||
What is the total number of papers associated with each dataset?,"SELECT paperdataset.datasetid, COUNT(DISTINCT paperdataset.paperid) AS total_papers FROM paperdataset GROUP BY paperdataset.datasetid;",scholar,group_by | |||
What is the total number of papers associated with each dataset?,"SELECT paperdataset.datasetid, COUNT(DISTINCT paperdataset.paperid) AS total_papers FROM paperdataset GROUP BY paperdataset.datasetid;SELECT dataset.datasetname, COUNT(paperdataset.paperid) AS total_papers FROM paperdataset JOIN dataset ON paperdataset.datasetid = dataset.datasetid GROUP BY dataset.datasetname ORDER BY total_papers DESC NULLS LAST;SELECT p.title, COUNT(DISTINCT a.authorid) AS num_authors FROM paper p JOIN writes w ON p.paperid = w.paperid JOIN author a ON w.authorid = a.authorid GROUP BY p.title ORDER BY num_authors DESC;",scholar,group_by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 2nd query looks good to me but I think the 3rd one might not be related? It seems to be returning the paper title instead of the dataset name/id:
SELECT p.title, COUNT(DISTINCT a.authorid) AS num_authors FROM paper p JOIN writes w ON p.paperid = w.paperid JOIN author a ON w.authorid = a.authorid GROUP BY p.title ORDER BY num_authors DESC;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah my bad – pushing a fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
data/questions_gen.csv