The Front-End: Our front-end consists of six html files as templates, as well as a static style.css file. The layout html serves as a formatting file for all pages of the website and includes the links to each of the other html pages as an inline list at the top of the website, similar to a navigation bar. Each of the child html files has a format of a division displaying the title of the page and then a division with the actual contents of that page (like the registration form, login form, preferences form, etc). When no user is logged in, the only options for links will be to go to the home page, register for a new account, or to login to an existing account, since we wanted to ensure that the preferences form, user courses links, and option to logout were only visible once a user actually logged into the site. We implemented the requirement to be logged in using a jinja if statement in the body of layout.html, making sure that session[“user_id”] exists in order for the logout, preferences, and course resources links to appear on the website page. The title of each child html file will also be inserted into the html of the website using jinja. Under the main tag in layout.html, we also used jinja in order to locate where children html files would extend layout from. We chose to make a very simple home page for our site with the main.html file. We wanted to make the focus of the website very clear to the user. Thus, the display on the top of the page clearly shows the name of the site, Study.group, and there is a short “about” description underneath. This is the only place on the site where the “header” class is used because we wanted to make the title of the main page stand out to make it more recognizable to the users. The register.html code is a child file for layout.html. We chose to create the registration form using the form html tag with the post method so that the information can then be accessed by the app.py. We created input fields for the first name, last name, email address, and password for the account that is being created, as well as an input field to confirm the new user password. We also put a label tag above each field to make it easier for the user to see what information was needed in each input field. At the bottom of the form is a paragraph tag containing a jinja variable “error”; we decided that if any of the fields were left empty or the password confirmation did not match the password input, we would display the error message for the user’s specific error (like not filling in the email input or the passwords not matching) at this location at the bottom of the registration form. All of these input fields were created in order to create the users.db users table for the back-end of the site. Similar to the register.html, the login.html code extends layout.html and consists mostly of a form that can be submitted. This form only contains two fields: email and password. The placeholders within each input field instruct the user for what to enter into each field; we chose this rather than labeling each to make the login page very simple and easy to look at. And again, if input fields are not filled or the given email and password do not correspond with a user from the users table, the error message telling the user what their error is will appear at the bottom of the form through the jinja variable. The main form for the back-end matching algorithm is found in prefs.html. We created this form to get user input for preferences for times for each day of the week, location, and group size for each course that they want a study group for. We changed the course id input field so that the input should be 6 characters long, since each course id on my.harvard is a six digit number. For the preferred times input, we chose to display the time choices for each day in a table format, almost like a weekly calendar. The first row of the table uses a jinja for loop to display the days of the week. The second row of the table uses a jinja for loop to create a select menu for each day of the week. We linked the name and id of each time select menu to each day of the week so that when the form is submitted, we would get one entry in the prefs table for each user for each course for each day of the week where the times in the entry were those selected in the time select menu for that day of the week. The locations preference also uses a jinja for loop to create a select menu with the locations widener, lamont, cabot, dorm, and smith center as options in the menu so that the user can select one preferred location (since the same user should prefer the same location no matter the day of the week). Because there are only three different group sizes, we decided to not use a for loop to create the group size preference select menu and instead just wrote out the three different size options. The form ends with a button of type submit so that the user can submit all of their preferences. The courses resources page is found at courses.html, which is also an extension of layout.html. For each course that the user is registered for, this file uses a jinja for loop to go through each course and list out the resource links in the resource table that are for the given course. Jinja if statements are also used to make sure that the links exist. Each course has a form to add a new resource link with a title. To give each input field and button for each course a different id to be called on by the javascript code at the end of the html (since JavaScript cannot include jinja), we used jinja and variables to assign a different number id. This way, the JavaScript at the end of the html code is specific to each course add resource button and the new resources will be added with the correct course id.
The Back-End: Our code features a list of days of the week as well as dictionaries which convert between the times that we receive through preferences with indices that we can work with through the algorithm.
We ended up using three tables in our users.db. One is our users table, which consists of the list of emails, first names, last names, and hashed passwords. This table we used primarily to verify logins (and to generate the session id from the user’s email). We also use the prefs table, consisting of email addresses, courses, preferred days, preferred times, preferred locations, and preferred group sizes. This was used when users filled out their preferences, in order to store the data that we could use for the matching algorithm. A key decision we made here was to not only make a new entry for every course a person submitted, but also a new entry for every preferred day the person submitted for each course. This made it easier for us to be able to distinguish which times selected by the user corresponded to which days (even if it ends up leading to more entries in the table). The final table we created was the courses table whose columns were courses, urls, and titles. This table allows us to store input from the users (any resources they found helpful for specific classes), which we could then query to display on the same page the users were submitting the information for us.
Our registration, login, and logout procedures are very similar to how they were implemented in the finance problem set - such as taking into account that passwords need to be hashed for security. What makes our procedure different is that we store the user’s email as the session id when they are logged in (allowing us to store their email when they fill out the preference form for courses even though the course preference form doesn’t explicitly ask for email, this in turn helps us email the right people when doing the matching algorithm.)
For the prefs.html, the back-end is both sending information to the front-end and receiving information back that is stored in tables. The back-end sends the daysoftheweek, the list of locations that we determined would be likely for students to want to study at, and the list of times (which we pulled from the time_to_index dictionary), templating the preferences form without us having to type out all of that. In return, the back-end receives the results from the preferences form, creating a timedict which associates the times that the user prefers to meet with the days on which they prefer to meet at those times. As a result, this timedict, in conjunction with a for loop, allows us to insert the times into the prefs table, while still knowing which times correspond to which days.
For courses.html, similar to prefs.html, the back-end is both sending information to the front-end and receiving information back that is stored in tables. The courses.html page shows the current courses the user has submitted preferences for, allows them to see resources others have recommended, and then allows them to submit other resources they would like to suggest for the course. When submitting a get or post request, render_template makes sure that courses.html always shows the current courses associated with that user in our prefs table and in turn the resources associated with that course in our courses table. When the user submits a post request, we in turn use SQL to store their response in our courses table, adding the resource link and resource title that they submitted to our database. The next time they load the page, they should see the resource that they submitted, allowing each course to have a continually updating list of resources crowdsourced by class members!
The most important part of the back-end, aside from the matching algorithm, is our matchemails function. The matchemails function is passed a list of people who are grouped together, a list of all the locations entered by the group of people, the course which the people are being grouped together for, as well as a bit telling whether the people are matched together or not. (If they are matched together, the matchemails function also gets what day and timeblock the people are matched for.)
The matchemails function, however, doesn’t want to work with the list of locations - we wanted instead to find out how many people preferred each location and send that instead. As a result, we convert locations from a list to a dictionary, where the key is each location and the value is how many people preferred each location. As a result, we can create a formatted locationtext basically saying how many people preferred each location.
If the people are not matched together, the matchemails function finds all the days and times the people in the group were available to meet for that specific course. As a result, all the unmatched people are sent a list of all the times and locations preferred by other unmatched people in the same course who were grouped together. We did this so that even if people aren’t matched, they are still put in contact with other people in the course who were unmatched, allowing them to potentially figure out a solution that our algorithm was unable to find.
If the people are matched, we convert the timeblock the people are matched for into a string of the format “STARTTIME-ENDTIME” (so instead of sending “10:00 am 10:30 am 11:00 am, we send 10:00 AM-11:00 AM) to make it cleaner.
Whether they are matched or not matched, the most important thing matchemails does is send an email. At the top of our code we declared the information about the email account which we set up for our project (including the password for our email - please don’t take advantage of this!) and imported the functionality to send mails through flask. In each of our emails, we use one of two templates (depending on if they are matched or not), and take advantage of the replace feature in python to customize each email for the group being matched (or not matched). We declare an app context (since otherwise the email won’t send), and within the app context send our templated emails using information that is sent to matchemail from grouper. Thus, all the work of taking the data from the preferences form and running the matching algorithm comes to fruition when we finally send everyone their titular study groups!
(simple, just talk about how we expressed the front-end in the back-end to tie them together - talk about structure of tables and why we structured them that way - also discuss how email works)
The Algorithm: The heart of the back-end, or really of the project itself, is our matching algorithm. The matching algorithm uses the preferences submitted by users and attempts to match everyone possible to a study group that fits as closely to their preferences as possible.
From every user we have their email (acting as their user id), their courses, the days and times they wish to meet for each course, their preferred group sizes, and their preferred location.
The match algorithm is invoked by calling the match() function in app.py. In match(), we begin by going course through course to try to match people in the same course. We first use a SQL query to get all distinct courses in the prefs table. Then, for each course out of all of these distinct courses, we call the timematch() function for every course and every day (in order to distinguish the time preferences by day). The timematch() function takes in a unique course and day. Then it creates a 2D list called stackedTimelines. stackedTimelines is a list of lists, each list consisting of [number of people, time] where number of people is the number of people who want a study group at time time for the day and course fed into timematch(). stackedTimelines is created so that it can then be sorted by number of people to figure out which times are the most popular in order to find the time blocks where the most people can be matched into study groups.
We then iterate through stackedTimelines which is now sorted from greatest number of people per time to smallest number of people per time. We loop through stackedTimelines from the list containing the greatest number of people per time to the list containing the smallest number of people per time. In the first iteration, we find the time that corresponds to the greatest number of people, and go through that list of times with overlap from the greatest number of people to the least number of people. stackedTimelines is ordered from the times which overlap with the greatest amount of people to times which overlap with the least amount of people. For each time in stackedTimelines, then the next time has the same amount of people which that time works with or less. We look to see if each consecutive time in stackedTimelines has the same people that prefer that time, creating a more complete timeblock for that group of people. (For example, if the same 5 people prefer 10:00 am, 10:30 am, and 11:00 am, our timeblock for those same 5 extends from 10:00 am to 11:00 am). Every time after the current time we are looking in stackedTimelines will either have the same or less number of people and so we only look to the times after the current time we are looking at in stackedTimelines to find consecutive times for a study group. We store all the times that work into two lists, one for “am” times and one for “pm” times initially. We do this so that we can then use python’s sort function to sort the two lists before combining them into one matched time list which we call timeblock.
After we find timeblocks and are ready to send the people in the same timeblock to be grouped, we want to disregard any further groups containing them (so they don’t get matched in multiple groups on the same day). In order to do this, we create a list called duplicates where we put every email that has been sent to be “grouped.” Thus, when we go through stackedTimelines again, we look out for people in the duplicates list, ignoring any group in stackedTimelines which contains people who have already been sent to be grouped to our grouper function.
Thus by the end of the function we have a series of timeblocks which represents the intersection of time preferences for people in the same course. We then send those timeblocks and the people who those times work for to be grouped into their study groups.
Now that we know the timeblocks of people who have overlapping times for the same course on the same day, we run them into our grouper function: finally matching people into their study groups! If there are 3 or less people who are signed up with the same course (where course is the current course we are matching for), the largest group possible is small. Because of this, we check if the times match for all of these people. If they do, they are a match, so we will send them the email telling them their group. If the times don’t match, since these are the only people with this course, they will still be sent a group email, but this email will be a no match email where they are sent the times at which other group members are available.
If there are only 4 to 6 people who are signed up with the same course (where course is the current course we are matching for), the largest group possible is medium. Because of this, we check the size preferences for these people. If none of these people want a small group, they will all automatically be grouped together because if no one wants small, that means the smallest group size preferred by everyone is medium, which is the largest possible group size with the given number of people. Thus, everyone’s group preferences will be answered as much as possible by grouping them all together rather than splitting them up into small groups. However, if anyone puts small as a size preference, we then check if the majority of these people want a small group. If this is true, we then split the 4 to 6 people into small size groups because this will satisfy the greatest number of people. If the small group preference is not the majority, we will put everyone in a group together because medium size should satisfy the greatest number of people (or get closest to the greatest number of preferences).
If there are 7 or more people who are signed up with the same course (where course is the current course we are matching for), all group sizes are possible. Thus, we look at the time preferences of these people.
If there are 7 or more people who have a matching timeblock, we follow the following procedure (We call the people who have a matching timeblock the “Blockers” for the purpose of this explanation):
If only one of the matching people prefer a small group, we ignore their preference (they aren’t big enough to form a small group alone), and consider themselves as instead wanting a medium group.
Otherwise, you group as many people who prefer a small group into a small group as possible (while making sure at least 4 ungrouped people are left in the Blockers, to ensure there is enough people to form a medium group)
Within the people that prefer a small group who we are blocking into the small group, we match as many people into small groups of 3 as possible (if necessary we do one group of 2 and rest group of 3 or two groups of 2 and rest groups of 3) - the groups that are formed are considered “matched” and sent to matchemails
All the people who preferred a small group who weren’t matched into a small group are now considered as preferring a medium group (they will now be matched into a medium group rather than a small group)
Next, we look at the number of Blockers who prefer a large group. If there are less than 7 Blockers, then there are not enough Blockers to form a large group by themselves, and so we consider them as preferring a medium group (there will be no large group in this case).
If there are more than 7 Blockers who prefer a large group, we are then able to all put them in large groups of size 7-13. If there are 8 Blockers who prefer a large group, that is a group of 8. If there are 19 Blockers who prefer a large group, that is a group of 7 and a group of 12. The number of large groups that we form when the amount of Blockers who prefer a large group is greater than 7 is equal to (Blockers - Blockers%7)/7. (If there are 30 Blockers who prefer a large group, we can then form 4 groups of 7, 7, 7, 9.)
Now, every Blocker who has not been matched should be considered as preferring a medium group. This is made up of all the people who initially preferred a medium group, the one person who preferred a small group if there wasn’t enough people to form a small group, the people who preferred a small group who weren’t placed in a small group to ensure there was enough people for a medium group, and the people who preferred a large group when there weren’t enough people to form a large group.
We find the remainder for the number of medium people divided by 6. We note that we know there are at least 4 medium people left by the constraints we set earlier. If the remainder is 0, then we match all the medium people into groups of 6. If the remainder is 1, then we have one exception group with a size of 7 (the only case where we have a “medium” group whose size is outside the 4-6 people boundaries for a medium group) with the rest matched into groups of 6. If the remainder is 2, then we have two medium groups of size 4, and the rest matched into groups of 6. If the remainder is 3, we have one medium group of 4, one medium group of 5, and the rest matched into groups of 6. If the remainder is 4, we have one medium group of 4 and the rest matched into groups of 6. If the remainder is 5, we have one medium group of 5 and the rest matched into groups of 6.
Thus, at the conclusion of this algorithm we should be able to match everyone with an overlapping timeblock with someone else. The grouper function sends all the matching groups to matchemails, which sends the proper email letting people know that they have been matched for a certain course at a certain time on a certain day with certain people preferring certain locations.
And that was study.group!