Skip to content

Commit

Permalink
Merge pull request #107 from codelion/fix-readurls
Browse files Browse the repository at this point in the history
Update readurls plugin to read tables in web pages
  • Loading branch information
codelion authored Nov 28, 2024
2 parents 7ac44fc + 45d255e commit d5b468c
Show file tree
Hide file tree
Showing 9 changed files with 8,197 additions and 9 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -307,8 +307,8 @@ Authorization: Bearer your_secret_api_key
### readurls&memory-gpt-4o-mini on Google FRAMES Benchmark (Oct 2024)
| Model | Accuracy |
| ----- | -------- |
| readurls&memory-gpt-4o-mini | 65.66 |
| gpt-4o-mini | 50.0 |
| readurls&memory-gpt-4o-mini | 61.29 |
| gpt-4o-mini | 50.61 |
| readurls&memory-Gemma2-9b | 30.1 |
| Gemma2-9b | 5.1 |
| Gemma2-27b | 30.8 |
Expand Down
39 changes: 33 additions & 6 deletions optillm/plugins/readurls_plugin.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ def extract_urls(text: str) -> List[str]:
def fetch_webpage_content(url: str, max_length: int = 100000) -> str:
try:
headers = {
'User-Agent': 'optillm/0.0.1 (hhttps://github.com/codelion/optillm)'
'User-Agent': 'optillm/0.0.1 (https://github.com/codelion/optillm)'
}

response = requests.get(url, headers=headers, timeout=10)
Expand All @@ -45,15 +45,42 @@ def fetch_webpage_content(url: str, max_length: int = 100000) -> str:
for tag in ['article', 'main', 'div[role="main"]', '.main-content']:
content = soup.select_one(tag)
if content:
text_elements.extend(content.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'p']))
text_elements.extend(content.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'p', 'table']))
break

# If no main content found, fall back to all headers and paragraphs
# If no main content found, fall back to all headers, paragraphs, and tables
if not text_elements:
text_elements = soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'p'])
text_elements = soup.find_all(['h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'p', 'table'])

# Extract text from elements
text = ' '.join(element.get_text(strip=True) for element in text_elements)
# Process all elements including tables
content_parts = []

for element in text_elements:
if element.name == 'table':
# Process table
table_content = []

# Get headers
headers = element.find_all('th')
if headers:
header_text = ' | '.join(header.get_text(strip=True) for header in headers)
table_content.append(header_text)

# Get rows
for row in element.find_all('tr'):
cells = row.find_all(['td', 'th'])
if cells:
row_text = ' | '.join(cell.get_text(strip=True) for cell in cells)
table_content.append(row_text)

# Add table content with proper spacing
content_parts.append('\n' + '\n'.join(table_content) + '\n')
else:
# Process regular text elements
content_parts.append(element.get_text(strip=False))

# Join all content
text = ' '.join(content_parts)

# Remove extra whitespace
text = re.sub(r'\s+', ' ', text).strip()
Expand Down
1 change: 1 addition & 0 deletions scripts/eval_frames_benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ def get_llm_response(prompt: str, model: str) -> str:
n=1,
stop=None,
temperature=0.7,
extra_body={"optillm_approach": "readurls&memory"}
)
return response.choices[0].message.content.strip()

Expand Down
7,418 changes: 7,418 additions & 0 deletions scripts/evaluation_results_gpt-4o-mini.json

Large diffs are not rendered by default.

242 changes: 242 additions & 0 deletions scripts/evaluation_results_hf.co_codelion_MathCoT:F16.json

Large diffs are not rendered by default.

242 changes: 242 additions & 0 deletions scripts/evaluation_results_llama3.1:8b-instruct-fp16.json

Large diffs are not rendered by default.

242 changes: 242 additions & 0 deletions scripts/evaluation_results_mathcot.json

Large diffs are not rendered by default.

16 changes: 16 additions & 0 deletions scripts/evaluation_results_qwq_pass_at_1.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[
{
"index": 0,
"problem": "Every morning Aya goes for a $9$-kilometer-long walk and stops at a coffee shop afterwards. When she walks at a constant speed of $s$ kilometers per hour, the walk takes her 4 hours, including $t$ minutes spent in the coffee shop. When she walks $s+2$ kilometers per hour, the walk takes her 2 hours and 24 minutes, including $t$ minutes spent in the coffee shop. Suppose Aya walks at $s+\\frac{1}{2}$ kilometers per hour. Find the number of minutes the walk takes her, including the $t$ minutes spent in the coffee shop.",
"attempts": [
{
"attempt_number": 1,
"response": "So I'm trying to solve this AIME problem about Aya's walking speeds and times. Let me read it again carefully.\n\n\"Every morning Aya goes for a 9-kilometer-long walk and stops at a coffee shop afterwards. When she walks at a constant speed of s kilometers per hour, the walk takes her 4 hours, including t minutes spent in the coffee shop. When she walks s+2 kilometers per hour, the walk takes her 2 hours and 24 minutes, including t minutes spent in the coffee shop. Suppose Aya walks at s + 1/2 kilometers per hour. Find the number of minutes the walk takes her, including the t minutes spent in the coffee shop.\"\n\nAlright, so we have two scenarios with different walking speeds and total times, which include both walking time and time spent in the coffee shop. The distance is constant at 9 kilometers.\n\nLet me define some variables:\n\n- Let s be the walking speed in kilometers per hour.\n\n- Let t be the time spent in the coffee shop in minutes.\n\nIn the first scenario:\n\n- Speed: s km/h\n\n- Total time including coffee break: 4 hours, which is 240 minutes.\n\n- Time spent walking: time = distance / speed = 9 / s hours.\n\n- But the total time is walking time plus t minutes.\n\nWait, but walking time is in hours, and t is in minutes. I need to make sure all units are consistent. Let's convert everything to minutes.\n\n1 hour = 60 minutes, so walking time in minutes is (9 / s) * 60.\n\nTotal time: walking time + t = (9 / s) * 60 + t = 240 minutes.\n\nSimilarly, in the second scenario:\n\n- Speed: s + 2 km/h\n\n- Total time: 2 hours and 24 minutes, which is 144 minutes.\n\n- Walking time: (9 / (s + 2)) * 60 minutes.\n\n- Total time: walking time + t = (9 / (s + 2)) * 60 + t = 144 minutes.\n\nSo now I have two equations:\n\n1) (9 / s)*60 + t = 240\n\n2) (9 / (s + 2))*60 + t = 144\n\nI need to solve for s and t.\n\nLet me write the equations more clearly:\n\nEquation A: (9 / s)*60 + t = 240\n\nEquation B: (9 / (s + 2))*60 + t = 144\n\nI can subtract equation B from equation A to eliminate t:\n\n[(9 / s)*60 + t] - [(9 / (s + 2))*60 + t] = 240 - 144\n\nSimplify:\n\n(9 / s)*60 - (9 / (s + 2))*60 = 96\n\nFactor out 60:\n\n60*(9/s - 9/(s + 2)) = 96\n\nDivide both sides by 12 to simplify:\n\n5*(9/s - 9/(s + 2)) = 8\n\nNow divide both sides by 5:\n\n9/s - 9/(s + 2) = 8/5\n\nHmm, this is getting a bit messy. Maybe I should try substitution or another approach.\n\nAlternatively, I can express t from equation A and plug into equation B.\n\nFrom equation A:\n\nt = 240 - (9 / s)*60\n\nPlug this into equation B:\n\n(9 / (s + 2))*60 + [240 - (9 / s)*60] = 144\n\nSimplify:\n\n(9 / (s + 2))*60 + 240 - (9 / s)*60 = 144\n\nCombine like terms:\n\n240 - 144 + (9 / (s + 2))*60 - (9 / s)*60 = 0\n\n96 + 60*(9/(s + 2) - 9/s) = 0\n\nWait, that seems similar to what I did earlier. Maybe I need to find a common denominator or simplify the fraction.\n\nLet me try to combine the fractions:\n\n9/(s + 2) - 9/s = 9*s - 9*(s + 2))/(s(s + 2)) = (9s - 9s - 18)/(s(s + 2)) = (-18)/(s(s + 2))\n\nSo,\n\n60*(-18)/(s(s + 2)) ) + 96 = 0\n\nMultiply both sides by s(s + 2):\n\n60*(-18) + 96*s*(s + 2) = 0\n\nSimplify:\n\n-1080 + 96*s^2 + 192*s = 0\n\nDivide the entire equation by 24 to simplify:\n\n-45 + 4*s^2 + 8*s = 0\n\nRearrange:\n\n4*s^2 + 8*s - 45 = 0\n\nNow, I can use the quadratic formula to solve for s:\n\ns = [-b \u00b1 sqrt(b^2 - 4ac)] / (2a)\n\nHere, a = 4, b = 8, c = -45\n\nDiscriminant D = b^2 - 4*a*c = 64 - 4*4*(-45) = 64 + 720 = 784\n\nsqrt(D) = sqrt(784) = 28\n\nSo,\n\ns = [-8 \u00b1 28]/8\n\nPossible solutions:\n\ns = (-8 + 28)/8 = 20/8 = 2.5 km/h\n\ns = (-8 - 28)/8 = -36/8 = -4.5 km/h\n\nSince speed can't be negative, s = 2.5 km/h\n\nNow, find t from equation A:\n\nt = 240 - (9 / 2.5)*60\n\nFirst, 9 / 2.5 = 3.6 hours\n\nThen, 3.6 * 60 = 216 minutes\n\nSo, t = 240 - 216 = 24 minutes\n\nNow, the question asks for the total time when she walks at s + 1/2 km/h.\n\ns + 1/2 = 2.5 + 0.5 = 3 km/h\n\nFind the walking time at 3 km/h:\n\ntime = distance / speed = 9 / 3 = 3 hours\n\nConvert to minutes: 3 * 60 = 180 minutes\n\nTotal time including coffee break: 180 + t = 180 + 24 = 204 minutes\n\nSo, the answer should be 204 minutes.\n\nBut wait, the problem says \"find the number of minutes the walk takes her, including the t minutes spent in the coffee shop.\"\n\nHmm, is the walking time already including the coffee time? Wait, no. The total time includes both walking and coffee time.\n\nWait, but in the scenario where she walks at s + 1/2 km/h, which is 3 km/h, the walking time is 3 hours, and then she spends t minutes in the coffee shop.\n\nBut according to earlier calculation, t = 24 minutes.\n\nSo total time is walking time plus t.\n\nBut walking time is 180 minutes, plus 24 minutes, totaling 204 minutes.\n\nBut let's double-check if this makes sense.\n\nFrom earlier:\n\nAt s = 2.5 km/h, total time is 240 minutes, which includes walking time of 216 minutes and t = 24 minutes.\n\nAt s + 2 = 4.5 km/h, total time is 144 minutes.\n\nWalking time at 4.5 km/h: 9 / 4.5 = 2 hours, which is 120 minutes, plus t = 24 minutes, totaling 144 minutes, which matches.\n\nSo, at s + 1/2 = 3 km/h, walking time is 3 hours = 180 minutes, plus t = 24 minutes, totaling 204 minutes.\n\nTherefore, the answer should be 204 minutes.\n\nBut the problem might expect just the walking time, not including the coffee break. Let's check.\n\nWait, no, it specifically says \"the walk takes her,\" which includes the walking time and the coffee time.\n\nWait, but \"the walk takes her\" probably refers only to the walking time, and then separately, she spends t minutes in the coffee shop.\n\nBut the problem says \"the walk takes her [time], including the t minutes spent in the coffee shop.\"\n\nWait, that's ambiguous. Let's look back at the problem statement.\n\n\"The walk takes her [time], including the t minutes spent in the coffee shop.\"\n\nSo, the total time is walking time plus t.\n\nBut in our earlier calculation, t is already added to the walking time.\n\nWait, but in the scenario at s = 2.5 km/h, total time is 240 minutes, which includes walking time of 216 minutes and t = 24 minutes.\n\nSimilarly, at s + 1/2 = 3 km/h, walking time is 180 minutes, plus t = 24 minutes, totaling 204 minutes.\n\nSo, the answer should be 204 minutes.\n\nBut perhaps I need to express it in hours or another format, but the problem asks for minutes.\n\nTherefore, the answer is 204 minutes.\n\n**Final Answer**\n\n\\[ \\boxed{204} \\]",
"predicted_answer": 204
}
],
"correct_answer": 204,
"is_correct": true,
"first_correct_attempt": 1
}
]
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

setup(
name="optillm",
version="0.0.14",
version="0.0.15",
packages=find_packages(),
py_modules=['optillm'],
package_data={
Expand Down

0 comments on commit d5b468c

Please sign in to comment.