Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add URL input support to streamlit demo #80

Merged
merged 4 commits into from
Jan 6, 2025

Conversation

fivestarspicy
Copy link
Contributor

@fivestarspicy fivestarspicy commented Jan 3, 2025

What's changing

Adding URL input support to allow users to fetch and clean content directly from websites. Changes include:

  1. UI Changes (see app.py):

    • Added URL text input field
    • Added "Clean URL Content" button
    • Maintains visual hierarchy with file upload
  2. Backend Implementation (see app.py):

    • Added URL content fetching using requests
    • Integrated with existing HTML cleaning pipeline
    • Added specific error handling for network and parsing issues
  3. Testing (see test_url_input.py):

    • Added URL integration tests in test_url_input.py
    • Verified existing file upload tests in test_data_load_and_clean.py
    • Added error handling tests
    • Added content quality verification
    • Added size limit checks
  4. Documentation Updates:

    • Updated step-by-step guide (see step-by-step-guide.md)
    • Updated README (see README.md)
  5. Fixes and Improvements:

    • Restored file upload functionality that was accidentally removed
    • Fixed formatting issues in test files
    • Added proper error handling for URL processing
    • Improved state management with st.session_state['clean_text']
    • Fixed linting issues and code formatting

Closes #32

How to test it

  1. Run the demo:

bash

python -m streamlit run demo/app.py

  1. Test URL input:

  2. Test error handling:

    • Try invalid URL
    • Verify error message displays

Additional notes for reviewers

  • Uses existing DATA_CLEANERS[".html"] for consistency with file upload
  • Error handling matches Mozilla's error handling patterns
  • Integration tests verify same cleaning quality as file upload
  • Documentation updated to maintain consistency

I already...

  • Tested in local environment and verified functionality
  • Added integration tests in tests/integration/test_url_input.py
  • Updated documentation in README and step-by-step guide
  • Fixed linting issues and restored proper code formatting
  • Verified both URL and file upload functionality work together
  • Ran and passed all integration tests for both features

fivestarspicy added 4 commits January 3, 2025 11:50
- Add URL text input and clean button
- Implement URL content fetching and cleaning
- Add integration tests
- Update documentation
- Maintain same cleaning quality as file upload

Closes mozilla-ai#32
Tests:
- Add content quality test
- Add size limit test
- Remove invalid content test
- Follow patterns from test_data_load_and_clean.py

Docs:
- Update README to include URL input option
- Clarify document preprocessing steps
Copy link
Contributor

@stefanfrench stefanfrench left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fivestarspicy - Thanks very much for the contribution!

I have tested the URL input end-to-end and it worked with no issues. Also tested with invalid URL which worked as expected. Existing upload method also still works.

Looks good to me - approved.

@stefanfrench stefanfrench merged commit 09faf73 into mozilla-ai:main Jan 6, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Website URL input demo app
2 participants