Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated questions and golden queries to prevent accept multiple correct answers and reduce ambiguity #37

Merged
merged 3 commits into from
Oct 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions data/questions_gen.csv
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,8 @@ What are the names of all the courses offered by the department of Computer Scie
"What are the easiness scores for courses in the ""Computer Science"" department?","SELECT {course.name, course.course_id, course.number}, course.easiness_score FROM course WHERE course.department ilike '%Computer Science%';",advising,where
How many students have taken a course in-person or online?,SELECT count(DISTINCT sr.student_id) AS num_students FROM student_record sr JOIN student s ON sr.student_id = s.student_id WHERE sr.how ilike '%in-person%' OR sr.how ilike '%online%';,advising,where
Which flight has the shortest duration between departure and arrival times? Convert to minutes.,"SELECT {flight.flight_number, flight.flight_id}, (arrival_time - departure_time) / 60 AS duration_minutes FROM flight ORDER BY duration_minutes LIMIT 1;",atis,date_functions
What's the average duration between departure and arrival times minus 34 minutes? Convert from UNIX to regular datetime.,SELECT avg(to_timestamp(arrival_time) - to_timestamp(departure_time) - interval '34 minutes') AS average_duration FROM flight;,atis,date_functions
Count the number of flight departures for each month?,"SELECT month.month_name, count(*) AS departure_count FROM flight JOIN MONTH ON extract(MONTH FROM to_timestamp(flight.departure_time)) = month.month_number GROUP BY month.month_name, month.month_number ORDER BY month.month_number;",atis,date_functions
What's the average duration between departure and arrival times minus 34 minutes? Convert from UNIX to regular datetime.,"SELECT avg(to_timestamp(arrival_time) - to_timestamp(departure_time) - interval '34 minutes') AS average_duration FROM flight;SELECT AVG(arrival_time - departure_time)/60 - 34 AS average_duration FROM flight;",atis,date_functions
Count the number of flight departures for each month?,"SELECT month.month_name, count(*) AS departure_count FROM flight JOIN MONTH ON extract(MONTH FROM to_timestamp(flight.departure_time)) = month.month_number GROUP BY month.month_name, month.month_number ORDER BY month.month_number;SELECT date_trunc('month', to_timestamp(flight.departure_time)) AS MONTH, COUNT(*) AS num_departures FROM flight GROUP BY MONTH ORDER BY MONTH;",atis,date_functions
What's the earliest flight departure time in the day in HH:MM?,"SELECT to_char(to_timestamp(departure_time)::TIME, 'HH24:MI') AS earliest_departure_time FROM flight ORDER BY earliest_departure_time LIMIT 1",atis,date_functions
What's the difference in time in days between today and the earliest flight departure?,"SELECT date_part('day', CURRENT_DATE - to_timestamp(departure_time)) AS difference_in_days FROM flight ORDER BY departure_time LIMIT 1;",atis,date_functions
What is the total cost of round-trip fares for each airline code?,"SELECT fare.fare_airline, SUM(fare.round_trip_cost) AS total_round_trip_cost FROM fare GROUP BY fare.fare_airline ORDER BY total_round_trip_cost DESC;",atis,group_by
Expand All @@ -70,7 +70,7 @@ What is the total cost of round-trip fares for each airline code?,"SELECT fare.f
"How many meals are served in each compartment, sorted by the number of meals in descending order?","SELECT food_service.compartment, COUNT(food_service.meal_number) AS number_of_meals FROM food_service GROUP BY food_service.compartment ORDER BY number_of_meals DESC NULLS LAST;",atis,group_by
"How many flights depart from each airport code, excluding stopovers?","SELECT airport.airport_code, COUNT(flight.from_airport) AS num_departures FROM airport LEFT JOIN flight ON airport.airport_code = flight.from_airport GROUP BY airport.airport_code;SELECT airport.airport_code, COUNT(flight.from_airport) AS num_departures FROM airport JOIN flight ON airport.airport_code = flight.from_airport GROUP BY airport.airport_code;",atis,group_by
"Which flight ids to Chicago (ORD) have the longest duration from departure to arrival, sorted in ascending order?","SELECT flight.flight_id, (flight.arrival_time - flight.departure_time) AS duration FROM flight WHERE to_airport = 'ORD' ORDER BY duration ASC NULLS LAST;",atis,order_by
"Which airport(s) have the shortest minimum connect time, sorted in ascending order? Show the minimum connect time.","SELECT {airport.airport_name, airport.airport_code}, airport.minimum_connect_time FROM airport ORDER BY airport.minimum_connect_time ASC NULLS LAST LIMIT 1;",atis,order_by
"Which airports have the shortest minimum connect time, sorted in ascending order? Show the minimum connect time.","SELECT {airport.airport_name, airport.airport_code}, airport.minimum_connect_time FROM airport ORDER BY airport.minimum_connect_time ASC NULLS LAST;",atis,order_by
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we want the airports with the shortest minimum_connect_time, then we would need a subquery that gets the shortest minimum_connect_time, and then finds the airports which have that particular value right?

SELECT {airport.airport_name, airport.airport_code}, airport.minimum_connect_time FROM airport WHERE airport.minimum_connect_time = (SELECT MIN(airport.minimum_connect_time) FROM airport) ORDER BY airport.minimum_connect_time ASC NULLS LAST;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I read it as give me the connect time in ascending order. I think the current one is fine?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just texted a couple of folks to get more opinions and it seems like give me the connect time in ascending order is the consensus understanding of this. Sticking with this for now, but we can change later :)

This was a very instructive lessons in phrasing, though. Context understanding is so non-trivial, and is one of the things our upcoming instruction-fine-tuned models will eventually get better at

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explaining - agreed about the ascending order. I was focusing more on the 'shortest' bit, and it seemed to imply that we only want the one with the smallest value. Do you think it'd be ok if we remove the word 'shortest' here?

Which aircraft code can carry the highest weight of cargo that any aircraft can carry?,SELECT aircraft.aircraft_code FROM aircraft ORDER BY pay_load DESC NULLS LAST LIMIT 1;,atis,order_by
What are the top 2 airlines with the most flights?,"SELECT {airline.airline_name, airline.airline_code}, COUNT(flight.flight_id) AS number_of_flights FROM flight JOIN airline ON flight.airline_code = airline.airline_code GROUP BY {} ORDER BY number_of_flights DESC NULLS LAST LIMIT 2;",atis,order_by
What are the aircraft codes for all aircraft with a cruising speed of over 200 mph? sort the aircraft codes in ascending order.,SELECT aircraft.aircraft_code FROM aircraft WHERE aircraft.cruising_speed > 200 ORDER BY aircraft.aircraft_code ASC NULLS LAST;,atis,order_by
Expand All @@ -82,7 +82,7 @@ How does the average ratio of the cruising speed to the payload of an aircraft v
Which flights serve meals in first class? Give me the flight id and meal description.,"SELECT flight.flight_id, food_service.meal_description FROM flight JOIN food_service ON flight.meal_code = food_service.meal_code WHERE LOWER(food_service.compartment) LIKE '%first class%';",atis,table_join
Which airlines offer flights with a stopover in Dallas?,"SELECT DISTINCT {airline.airline_name, airline.airline_code} FROM flight_stop JOIN airport ON flight_stop.stop_airport = airport.airport_code JOIN flight ON flight_stop.flight_id = flight.flight_id JOIN airline ON flight.airline_code = airline.airline_code WHERE airport.airport_location ILIKE '%Dallas%';",atis,table_join
Which airlines offer flights from LAX to ORD?,"SELECT DISTINCT {airline.airline_name, airline.airline_code} FROM flight JOIN airline ON flight.airline_code = airline.airline_code WHERE flight.from_airport = 'LAX' AND flight.to_airport = 'ORD';",atis,table_join
"Which airlines offer flights from Chicago (ORD) to New York (JFK), and how many stops do they have, sorted by number of stops?","SELECT {airline.airline_name, airline.airline_code}, flight.stops FROM flight JOIN airline ON flight.airline_code = airline.airline_code WHERE flight.from_airport = 'ORD' AND flight.to_airport = 'JFK' GROUP BY {}, flight.stops ORDER BY flight.stops NULLS LAST;",atis,table_join
"Which airlines offer flights from Chicago (ORD) to New York (JFK), and how many stops do they have, sorted by number of stops in ascending order?","SELECT {airline.airline_name, airline.airline_code}, flight.stops FROM flight JOIN airline ON flight.airline_code = airline.airline_code WHERE flight.from_airport = 'ORD' AND flight.to_airport = 'JFK' GROUP BY {}, flight.stops ORDER BY flight.stops NULLS LAST;",atis,table_join
"Which airlines do not have flights that depart or arrive at JFK, excluding stopovers?","SELECT DISTINCT {a.airline_name, a.airline_code} FROM public.airline a LEFT JOIN public.flight f ON a.airline_code = f.airline_code AND (f.to_airport = 'JFK' OR f.from_airport = 'JFK') GROUP BY {} HAVING COUNT(f.flight_id) = 0;",atis,table_join
Which state code is Orlando International Airport in?,SELECT state_code FROM airport WHERE airport_name ILIKE '%Orlando International Airport%';,atis,where
Which flights operate on Mondays and Wednesdays? Give me the relevant flight numbers,"SELECT {flight.flight_number, flight.flight_id} FROM flight WHERE LOWER(flight.flight_days) LIKE '%mon%' AND LOWER(flight.flight_days) LIKE '%wed%';",atis,where
Expand Down Expand Up @@ -146,7 +146,7 @@ Give me the total number of papers published in the first 12 months of 2019.,SEL
"On average, how many papers per month were published in the whole of 2020?",SELECT cast(count(*) AS float)/ 12 AS average_papers_per_month FROM paper WHERE YEAR = 2020;,scholar,date_functions
What is the total number of papers published per year?,"SELECT paper.year, COUNT(paper.paperid) AS total_papers FROM paper GROUP BY paper.year ORDER BY paper.year NULLS LAST;",scholar,group_by
What is the total number of papers published in each year?,"SELECT paper.year, COUNT(paper.paperid) AS total_papers FROM paper GROUP BY paper.year ORDER BY paper.year;",scholar,group_by
What is the total number of papers associated with each dataset?,"SELECT paperdataset.datasetid, COUNT(DISTINCT paperdataset.paperid) AS total_papers FROM paperdataset GROUP BY paperdataset.datasetid;",scholar,group_by
What is the total number of papers associated with each dataset?,"SELECT paperdataset.datasetid, COUNT(DISTINCT paperdataset.paperid) AS total_papers FROM paperdataset GROUP BY paperdataset.datasetid;SELECT dataset.datasetname, COUNT(paperdataset.paperid) AS total_papers FROM paperdataset JOIN dataset ON paperdataset.datasetid = dataset.datasetid GROUP BY dataset.datasetname ORDER BY total_papers DESC NULLS LAST;",scholar,group_by
How many keyphrases are associated with each paper?,"SELECT paperkeyphrase.paperid, COUNT(paperkeyphrase.keyphraseid) AS keyphrase_count FROM paperkeyphrase GROUP BY paperkeyphrase.paperid ORDER BY keyphrase_count DESC NULLS LAST;",scholar,group_by
How many authors have published more than 2 papers?,SELECT COUNT(*) AS number_of_authors FROM (SELECT writes.authorid FROM writes GROUP BY writes.authorid HAVING COUNT(writes.paperid) > 2) AS subquery;,scholar,group_by
"Which papers have the highest number of authors, ordered by the number of authors in descending order?","SELECT writes.paperid, COUNT(writes.authorid) AS num_authors FROM writes GROUP BY writes.paperid ORDER BY num_authors DESC NULLS LAST;",scholar,order_by
Expand Down
12 changes: 12 additions & 0 deletions prompts/prompt_openai.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
### Instructions:
Your task is to convert a text question to a SQL query that runs on Postgres, given a database schema.

### Input:
Generate a SQL query that answers the question `{user_question}`.

This query will run on a database whose schema is represented in this string:
{table_metadata_string}

### Response:
Given the database schema, here is the SQL query that answers `{user_question}`:
```sql