Skip to content

Commit

Permalink
feat: ⛏️ enhanced contribution and precommit added
Browse files Browse the repository at this point in the history
  • Loading branch information
PeriniM committed Jan 6, 2025
1 parent 21147c4 commit fcbfe78
Show file tree
Hide file tree
Showing 129 changed files with 3,174 additions and 1,671 deletions.
150 changes: 150 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,153 @@ lib/
# extras
cache/
run_smart_scraper.py

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
.ruff_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
.python-version

# pipenv
Pipfile.lock

# poetry
poetry.lock

# pdm
pdm.lock
.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/

# PyCharm
.idea/

# VS Code
.vscode/

# macOS
.DS_Store

dev.ipynb
23 changes: 23 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
repos:
- repo: https://github.com/psf/black
rev: 24.8.0
hooks:
- id: black

- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: v0.6.9
hooks:
- id: ruff

- repo: https://github.com/pycqa/isort
rev: 5.13.2
hooks:
- id: isort

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
exclude: mkdocs.yml
127 changes: 44 additions & 83 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -1,83 +1,44 @@
# Contributing to ScrapeGraphAI

Thank you for your interest in contributing to **ScrapeGraphAI**! We welcome contributions from the community to help improve and grow the project. This document outlines the guidelines and steps for contributing.

## Table of Contents

- [Getting Started](#getting-started)
- [Contributing Guidelines](#contributing-guidelines)
- [Code Style](#code-style)
- [Submitting a Pull Request](#submitting-a-pull-request)
- [Reporting Issues](#reporting-issues)
- [License](#license)

## Getting Started

To get started with contributing, follow these steps:

1. Fork the repository on GitHub **(FROM pre/beta branch)**.
2. Clone your forked repository to your local machine.
3. Install the necessary dependencies from requirements.txt or via pyproject.toml as you prefere :).
4. Make your changes or additions.
5. Test your changes thoroughly.
6. Commit your changes with descriptive commit messages.
7. Push your changes to your forked repository.
8. Submit a pull request to the pre/beta branch.

N.B All the pull request to the main branch will be rejected!

## Contributing Guidelines

Please adhere to the following guidelines when contributing to ScrapeGraphAI:

- Follow the code style and formatting guidelines specified in the [Code Style](#code-style) section.
- Make sure your changes are well-documented and include any necessary updates to the project's documentation and requirements if needed.
- Write clear and concise commit messages that describe the purpose of your changes and the last commit before the pull request has to follow the following format:
- `feat: Add new feature`
- `fix: Correct issue with existing feature`
- `docs: Update documentation`
- `style: Improve formatting and style`
- `refactor: Restructure code`
- `test: Add or update tests`
- `perf: Improve performance`
- Be respectful and considerate towards other contributors and maintainers.

## Code Style

Please make sure to format your code accordingly before submitting a pull request.

### Python

- [Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/)
- [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html)
- [The Hitchhiker's Guide to Python](https://docs.python-guide.org/writing/style/)
- [Pylint style of code for the documentation](https://pylint.pycqa.org/en/1.6.0/tutorial.html)

## Submitting a Pull Request

To submit your changes for review, please follow these steps:

1. Ensure that your changes are pushed to your forked repository.
2. Go to the main repository on GitHub and navigate to the "Pull Requests" tab.
3. Click on the "New Pull Request" button.
4. Select your forked repository and the branch containing your changes.
5. Provide a descriptive title and detailed description for your pull request.
6. Reviewers will provide feedback and discuss any necessary changes.
7. Once your pull request is approved, it will be merged into the pre/beta branch.

## Reporting Issues

If you encounter any issues or have suggestions for improvements, please open an issue on the GitHub repository. Provide a clear and detailed description of the problem or suggestion, along with any relevant information or steps to reproduce the issue.

## License

ScrapeGraphAI is licensed under the **MIT License**. See the [LICENSE](LICENSE) file for more information.
By contributing to this project, you agree to license your contributions under the same license.

ScrapeGraphAI uses code from the Langchain
frameworks. You find their original licenses below.

LANGCHAIN LICENSE
https://github.com/langchain-ai/langchain/blob/master/LICENSE

Can't wait to see your contributions! :smile:
# Contributing to ScrapeGraphAI 🚀

Hey there! Thanks for checking out **ScrapeGraphAI**! We're excited to have you here! 🎉

## Quick Start Guide 🏃‍♂️

1. Fork the repository from the **pre/beta branch** 🍴
2. Clone your fork locally 💻
3. Install uv (if you haven't):
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
4. Run `uv sync` (creates virtual env & installs dependencies) ⚡
5. Run `uv run pre-commit install` 🔧
6. Make your awesome changes ✨
7. Test thoroughly 🧪
8. Push & open a PR to the pre/beta branch 🎯

## Contribution Guidelines 📝

Keep it clean and simple:
- Follow our code style (PEP 8 & Google Python Style) 🎨
- Document your changes clearly 📚
- Use these commit prefixes for your final PR commit:
```
feat: ✨ New feature
fix: 🐛 Bug fix
docs: 📚 Documentation
style: 💅 Code style
refactor: ♻️ Code changes
test: 🧪 Testing
perf: ⚡ Performance
```
- Be nice to others! 💝

## Need Help? 🤔

Found a bug or have a cool idea? Open an issue and let's chat! 💬

## License 📜

MIT Licensed. See [LICENSE](LICENSE) file for details.

Let's build something amazing together! 🌟
49 changes: 49 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Makefile for Project Automation

.PHONY: install lint type-check test build all clean

# Variables
PACKAGE_NAME = scrapegraphai
TEST_DIR = tests

# Default target
all: lint type-check test

# Install project dependencies
install:
uv sync
uv run pre-commit install

# Linting and Formatting Checks
lint:
uv run ruff check $(PACKAGE_NAME) $(TEST_DIR)
uv run black --check $(PACKAGE_NAME) $(TEST_DIR)
uv run isort --check-only $(PACKAGE_NAME) $(TEST_DIR)

# Type Checking with MyPy
type-check:
uv run mypy $(PACKAGE_NAME) $(TEST_DIR)

# Run Tests with Coverage
test:
uv run pytest --cov=$(PACKAGE_NAME) --cov-report=xml $(TEST_DIR)/

# Run Pre-Commit Hooks
pre-commit:
uv run pre-commit run --all-files

# Clean Up Generated Files
clean:
rm -rf dist/
rm -rf build/
rm -rf *.egg-info
rm -rf htmlcov/
rm -rf .mypy_cache/
rm -rf .pytest_cache/
rm -rf .ruff_cache/
rm -rf .uv/
rm -rf .venv/

# Build the Package
build:
uv build --no-sources
11 changes: 7 additions & 4 deletions examples/openai/smart_scraper_openai.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
"""
"""
Basic example of scraping pipeline using SmartScraper
"""
import os

import json
import os

from dotenv import load_dotenv

from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info

Expand All @@ -17,7 +20,7 @@
graph_config = {
"llm": {
"api_key": os.getenv("OPENAI_API_KEY"),
"model": "openai/gpt-4o",
"model": "openai/gpt-4o00",
},
"verbose": True,
"headless": False,
Expand All @@ -30,7 +33,7 @@
smart_scraper_graph = SmartScraperGraph(
prompt="Extract me the first article",
source="https://www.wired.com",
config=graph_config
config=graph_config,
)

result = smart_scraper_graph.run()
Expand Down
Loading

0 comments on commit fcbfe78

Please sign in to comment.