-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
05 participants #26
base: main
Are you sure you want to change the base?
05 participants #26
Conversation
…ter a commit message to explain why this merge is necessary, # especially if it merges an updated upstream into a topic branch. # # Lines starting with '#' will be ignored, and an empty message aborts # the commit. Updating branch to match latest state of project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some feedback, and i assume you'll be adding to & cleaning up the testing code a bit more, but overall this looks great, ty!
tasks/05_participants/task.py
Outdated
AND c.participants_raw <> '' | ||
AND e.participants_parse_error IS NULL | ||
OR e.participants_parse_error = true | ||
limit 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this limit in place for testing? if so, you'll want to remove it before merging
tasks/05_participants/task.py
Outdated
c = cnx.cursor() | ||
c.execute("select count(*) from pages;") | ||
t = time.time() - t1 | ||
part_rate = round((n - c.rowcount) / t, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sqlite3 connection objects don't have a rowcount attribute, so you'll need to check the db type here, similar to line 33 above
def html_raw_participants(html_str: str) -> list: | ||
""" | ||
Reads in an html string from the `raw_text` column in the `pages` table, | ||
finds the participants table, collects the rows in the table, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor suggestion, but you may want to distinguish between database and HTML tables in comments for clarity, e.g. "Reads in an html string from the raw_text
column in the pages
database table, finds the participants HTML table in the string, collects the rows in the HTML table...."
def add_participant_row(case_id: int, r: list): | ||
# insert relevant info to participants table in the db | ||
try: | ||
if db_config.db_type == "sqlite": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this if/elif
block that sets the query can be moved above and outside of the try/except
block, since it's unlikely to cause an exception
tasks/05_participants/task.py
Outdated
|
||
def main(): | ||
participants_query = """ | ||
SELECT c.id as case_id, c.case_number, c.participants_raw, e.participants_parse_error, p.raw_text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
putting a note in after our conversation today: we probably only need the raw_text
column from the pages
column, so references to the cases
table and participants_raw
column can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up changing this query. Instead of referencing the cases
table, it now selects only from the pages
table where the html in the raw_text
column contains a participants table element.
I've made some revisions to most aspects of the task, especially in the testing process. I ran this on the full
Here's a subsample of the |
A draft for the task in #4 . Three main places need attention:
task.py
is not currently using the error table implemented in Move error logging out of cases table #22 .