Skip to content

Commit

Permalink
deploy: 110f0b7
Browse files Browse the repository at this point in the history
  • Loading branch information
arxyzan committed Nov 22, 2023
0 parents commit e3f0985
Show file tree
Hide file tree
Showing 566 changed files with 168,464 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .buildinfo
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 46dac3e782f238b726e25024d3fa6112
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file added .doctrees/contributing.doctree
Binary file not shown.
Binary file added .doctrees/environment.pickle
Binary file not shown.
Binary file added .doctrees/get_started/index.doctree
Binary file not shown.
Binary file added .doctrees/get_started/installation.doctree
Binary file not shown.
Binary file added .doctrees/get_started/overview.doctree
Binary file not shown.
Binary file added .doctrees/get_started/quick_tour.doctree
Binary file not shown.
Binary file added .doctrees/guide/advanced_training.doctree
Binary file not shown.
Binary file added .doctrees/guide/hezar_architecture.doctree
Binary file not shown.
Binary file added .doctrees/guide/index.doctree
Binary file not shown.
Binary file added .doctrees/guide/models_advanced.doctree
Binary file not shown.
Binary file added .doctrees/index.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.builders.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.configs.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.constants.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.data.datasets.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.data.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.embeddings.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.metrics.accuracy.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.metrics.bleu.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.metrics.cer.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.metrics.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.metrics.f1.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.metrics.metric.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.metrics.precision.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.metrics.recall.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.metrics.rouge.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.metrics.seqeval.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.metrics.wer.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.models.backbone.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.models.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.models.image2text.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.models.model.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.preprocessors.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.registry.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.trainer.doctree
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.trainer.trainer.doctree
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.utils.audio_utils.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.utils.common_utils.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.utils.data_utils.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.utils.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.utils.file_utils.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.utils.hub_utils.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.utils.image_utils.doctree
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.utils.logging.doctree
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/index.doctree
Binary file not shown.
Binary file added .doctrees/source/modules.doctree
Binary file not shown.
Binary file added .doctrees/tutorial/datasets.doctree
Binary file not shown.
Binary file added .doctrees/tutorial/embeddings.doctree
Binary file not shown.
Binary file added .doctrees/tutorial/index.doctree
Binary file not shown.
Binary file added .doctrees/tutorial/models.doctree
Binary file not shown.
Binary file added .doctrees/tutorial/preprocessors.doctree
Binary file not shown.
Binary file added .doctrees/tutorial/training.doctree
Binary file not shown.
Empty file added .nojekyll
Empty file.
662 changes: 662 additions & 0 deletions _modules/hezar/builders.html

Large diffs are not rendered by default.

911 changes: 911 additions & 0 deletions _modules/hezar/configs.html

Large diffs are not rendered by default.

656 changes: 656 additions & 0 deletions _modules/hezar/constants.html

Large diffs are not rendered by default.

786 changes: 786 additions & 0 deletions _modules/hezar/data/data_collators.html

Large diffs are not rendered by default.

591 changes: 591 additions & 0 deletions _modules/hezar/data/datasets/dataset.html

Large diffs are not rendered by default.

709 changes: 709 additions & 0 deletions _modules/hezar/data/datasets/ocr_dataset.html

Large diffs are not rendered by default.

672 changes: 672 additions & 0 deletions _modules/hezar/data/datasets/sequence_labeling_dataset.html

Large diffs are not rendered by default.

638 changes: 638 additions & 0 deletions _modules/hezar/data/datasets/text_classification_dataset.html

Large diffs are not rendered by default.

645 changes: 645 additions & 0 deletions _modules/hezar/data/datasets/text_summarization_dataset.html

Large diffs are not rendered by default.

913 changes: 913 additions & 0 deletions _modules/hezar/embeddings/embedding.html

Large diffs are not rendered by default.

751 changes: 751 additions & 0 deletions _modules/hezar/embeddings/fasttext.html

Large diffs are not rendered by default.

770 changes: 770 additions & 0 deletions _modules/hezar/embeddings/word2vec.html

Large diffs are not rendered by default.

574 changes: 574 additions & 0 deletions _modules/hezar/metrics/accuracy.html

Large diffs are not rendered by default.

572 changes: 572 additions & 0 deletions _modules/hezar/metrics/bleu.html

Large diffs are not rendered by default.

608 changes: 608 additions & 0 deletions _modules/hezar/metrics/cer.html

Large diffs are not rendered by default.

597 changes: 597 additions & 0 deletions _modules/hezar/metrics/f1.html

Large diffs are not rendered by default.

528 changes: 528 additions & 0 deletions _modules/hezar/metrics/metric.html

Large diffs are not rendered by default.

600 changes: 600 additions & 0 deletions _modules/hezar/metrics/precision.html

Large diffs are not rendered by default.

600 changes: 600 additions & 0 deletions _modules/hezar/metrics/recall.html

Large diffs are not rendered by default.

592 changes: 592 additions & 0 deletions _modules/hezar/metrics/rouge.html

Large diffs are not rendered by default.

611 changes: 611 additions & 0 deletions _modules/hezar/metrics/seqeval.html

Large diffs are not rendered by default.

586 changes: 586 additions & 0 deletions _modules/hezar/metrics/wer.html

Large diffs are not rendered by default.

581 changes: 581 additions & 0 deletions _modules/hezar/models/backbone/bert/bert.html

Large diffs are not rendered by default.

515 changes: 515 additions & 0 deletions _modules/hezar/models/backbone/bert/bert_config.html

Large diffs are not rendered by default.

568 changes: 568 additions & 0 deletions _modules/hezar/models/backbone/distilbert/distilbert.html

Large diffs are not rendered by default.

513 changes: 513 additions & 0 deletions _modules/hezar/models/backbone/distilbert/distilbert_config.html

Large diffs are not rendered by default.

580 changes: 580 additions & 0 deletions _modules/hezar/models/backbone/roberta/roberta.html

Large diffs are not rendered by default.

517 changes: 517 additions & 0 deletions _modules/hezar/models/backbone/roberta/roberta_config.html

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

529 changes: 529 additions & 0 deletions _modules/hezar/models/image2text/crnn/crnn_decode_utils.html

Large diffs are not rendered by default.

614 changes: 614 additions & 0 deletions _modules/hezar/models/image2text/crnn/crnn_image2text.html

Large diffs are not rendered by default.

507 changes: 507 additions & 0 deletions _modules/hezar/models/image2text/crnn/crnn_image2text_config.html

Large diffs are not rendered by default.

601 changes: 601 additions & 0 deletions _modules/hezar/models/image2text/trocr/trocr_image2text.html

Large diffs are not rendered by default.

565 changes: 565 additions & 0 deletions _modules/hezar/models/image2text/trocr/trocr_image2text_config.html

Large diffs are not rendered by default.

601 changes: 601 additions & 0 deletions _modules/hezar/models/image2text/vit_gpt2/vit_gpt2_image2text.html

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

621 changes: 621 additions & 0 deletions _modules/hezar/models/language_modeling/bert/bert_lm.html

Large diffs are not rendered by default.

515 changes: 515 additions & 0 deletions _modules/hezar/models/language_modeling/bert/bert_lm_config.html

Large diffs are not rendered by default.

613 changes: 613 additions & 0 deletions _modules/hezar/models/language_modeling/distilbert/distilbert_lm.html

Large diffs are not rendered by default.

Large diffs are not rendered by default.

620 changes: 620 additions & 0 deletions _modules/hezar/models/language_modeling/roberta/roberta_lm.html

Large diffs are not rendered by default.

517 changes: 517 additions & 0 deletions _modules/hezar/models/language_modeling/roberta/roberta_lm_config.html

Large diffs are not rendered by default.

1,042 changes: 1,042 additions & 0 deletions _modules/hezar/models/model.html

Large diffs are not rendered by default.

604 changes: 604 additions & 0 deletions _modules/hezar/models/model_outputs.html

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

1,231 changes: 1,231 additions & 0 deletions _modules/hezar/models/speech_recognition/whisper/whisper_tokenizer.html

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

597 changes: 597 additions & 0 deletions _modules/hezar/models/text_generation/gpt2/gpt2_text_generation.html

Large diffs are not rendered by default.

Large diffs are not rendered by default.

616 changes: 616 additions & 0 deletions _modules/hezar/models/text_generation/t5/t5_text_generation.html

Large diffs are not rendered by default.

Large diffs are not rendered by default.

833 changes: 833 additions & 0 deletions _modules/hezar/preprocessors/audio_feature_extractor.html

Large diffs are not rendered by default.

749 changes: 749 additions & 0 deletions _modules/hezar/preprocessors/image_processor.html

Large diffs are not rendered by default.

650 changes: 650 additions & 0 deletions _modules/hezar/preprocessors/preprocessor.html

Large diffs are not rendered by default.

646 changes: 646 additions & 0 deletions _modules/hezar/preprocessors/text_normalizer.html

Large diffs are not rendered by default.

611 changes: 611 additions & 0 deletions _modules/hezar/preprocessors/tokenizers/bpe.html

Large diffs are not rendered by default.

616 changes: 616 additions & 0 deletions _modules/hezar/preprocessors/tokenizers/sentencepiece_bpe.html

Large diffs are not rendered by default.

614 changes: 614 additions & 0 deletions _modules/hezar/preprocessors/tokenizers/sentencepiece_unigram.html

Large diffs are not rendered by default.

1,368 changes: 1,368 additions & 0 deletions _modules/hezar/preprocessors/tokenizers/tokenizer.html

Large diffs are not rendered by default.

599 changes: 599 additions & 0 deletions _modules/hezar/preprocessors/tokenizers/wordpiece.html

Large diffs are not rendered by default.

731 changes: 731 additions & 0 deletions _modules/hezar/registry.html

Large diffs are not rendered by default.

693 changes: 693 additions & 0 deletions _modules/hezar/trainer/metrics_handlers.html

Large diffs are not rendered by default.

1,069 changes: 1,069 additions & 0 deletions _modules/hezar/trainer/trainer.html

Large diffs are not rendered by default.

602 changes: 602 additions & 0 deletions _modules/hezar/trainer/trainer_utils.html

Large diffs are not rendered by default.

1,058 changes: 1,058 additions & 0 deletions _modules/hezar/utils/audio_utils.html

Large diffs are not rendered by default.

569 changes: 569 additions & 0 deletions _modules/hezar/utils/common_utils.html

Large diffs are not rendered by default.

597 changes: 597 additions & 0 deletions _modules/hezar/utils/data_utils.html

Large diffs are not rendered by default.

546 changes: 546 additions & 0 deletions _modules/hezar/utils/file_utils.html

Large diffs are not rendered by default.

632 changes: 632 additions & 0 deletions _modules/hezar/utils/hub_utils.html

Large diffs are not rendered by default.

744 changes: 744 additions & 0 deletions _modules/hezar/utils/image_utils.html

Large diffs are not rendered by default.

549 changes: 549 additions & 0 deletions _modules/hezar/utils/integration_utils.html

Large diffs are not rendered by default.

513 changes: 513 additions & 0 deletions _modules/hezar/utils/logging.html

Large diffs are not rendered by default.

674 changes: 674 additions & 0 deletions _modules/hezar/utils/registry_utils.html

Large diffs are not rendered by default.

575 changes: 575 additions & 0 deletions _modules/index.html

Large diffs are not rendered by default.

96 changes: 96 additions & 0 deletions _sources/contributing.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Contribution Guidelines

Thank you for considering contributing to Hezar. We appreciate your interest in making Hezar even more powerful and
versatile.
Before you start contributing, please take a moment to review the following guidelines.

## Code of Conduct

This project and its community adhere to
the [Contributor Code of Conduct](https://github.com/hezarai/hezar/blob/main/CODE_OF_CONDUCT.md).

## How to Contribute

### Reporting Bugs

If you come across a bug or unexpected behavior, please help us by reporting it.
Use the [GitHub Issue Tracker](https://github.com/hezarai/hezar/issues) to create a detailed bug report.
Include information such as:

- A clear and descriptive title.
- Steps to reproduce the bug.
- Expected behavior.
- Actual behavior.
- Your operating system and Python version.

### Adding features

Have a great idea for a new feature or improvement? We'd love to hear it. You can open an issue and add your suggestion
with a clear description and further suggestions on how it can be implemented. Also, if you already can implement it
yourself,
just follow the instructions on how you can send a PR.

### Adding/Improving documents

Have a suggestion to enhance our documentation or want to contribute entirely new sections? We welcome your input!<br>
Here's how you can get involved:<br>
Docs website is deployed here: [https://hezarai.github.io/hezar](https://hezarai.github.io/hezar) and the source for the
docs are located at the [docs](https://github.com/hezarai/hezar/tree/main/docs) folder in the root of the repo. Feel
free to apply your changes or add new docs to this section. Notice that docs are written in Markdown format. In case you have
added new files to this section, you must include them in the `index.md` file in the same folder. For example, if you've
added the file `new_doc.md` to the `get_started` folder, you have to modify `get_started/index.md` and put your file
name there.

### Commit guidelines

#### Functional best practices

- Ensure only one "logical change" per commit for efficient review and flaw identification.
- Smaller code changes facilitate quicker reviews and easier troubleshooting using Git's bisect capability.
- Avoid mixing whitespace changes with functional code changes.
- Avoid mixing two unrelated functional changes.
- Refrain from sending large new features in a single giant commit.

#### Styling best practices
- Use imperative mood in the subject (e.g., "Add support for ..." not "Adding support or added support") .
- Keep the subject line short and concise, preferably less than 50 characters.
- Capitalize the subject line and do not end it with a period.
- Wrap body lines at 72 characters.
- Use the body to explain what and why a change was made.
- Do not explain the "how" in the commit message; reserve it for documentation or code.
- For commits referencing an issue or pull request, write the proper commit subject followed by the reference in parentheses (e.g., "Add NFKC normalizer (#9999)").
- Reference codes & paths in back quotes (e.g., `variable`, `method()`, `Class()`, `file.py`).
- Preferably use the following [gitmoji](https://gitmoji.dev/) compatible codes at the beginning of your commit message:
- `:bug:`: Fix a bug or issue
- `:sparkles`: Add feature or improvements
- `:recycle:`: Refactor code (backward compatible refactor)
- `:memo:`: Add or change docs
- `:pencil2:`: Minor change or improvement
- `:fire:`: Remove code or file
- `:boom:`: Introduce breaking or backward-incompatible changes
- `:bookmark:`: Version release
- `:adhesive_bandage:`: Non-critical fix

## Sending a PR

In order to apply any change to the repo, you have to follow these step:

1. Fork the Hezar repository.
2. Create a new branch for your feature, bug fix, etc.
3. Make your changes.
4. Update the documentation to reflect your changes.
5. Ensure your code adheres to the [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html).
6. Format the code using `ruff` (`ruff check --fix .`)
7. Write tests to ensure the functionality if needed.
8. Run tests and make sure all of them pass. (Skip this step if your changes do not involve codes)
9. Open a pull request from your fork and the PR template will be automatically loaded to help you do the rest.
10. Be responsive to feedback and comments during the review process.
11. Thanks for contributing to the Hezar project.😉❤️

## License

By contributing to Hezar, you agree that your contributions will be licensed under
the [MIT License](https://github.com/hezarai/hezar/blob/main/LICENSE).

We look forward to your contributions and appreciate your efforts in making Hezar a powerful AI tool for the Persian
community!
8 changes: 8 additions & 0 deletions _sources/get_started/index.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Get Started
```{toctree}
:maxdepth: 1

overview.md
installation.md
quick_tour.md
```
41 changes: 41 additions & 0 deletions _sources/get_started/installation.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Installation

## Install from PyPi
Installing Hezar is as easy as any other Python library! Most of the requirements are cross-platform and installing
them on any machine is a piece of cake!

```
pip install hezar
```
### Installation variations
Hezar is packed with a lot of tools that are dependent on other packages. Most of the
time you might not want everything to be installed, hence, providing multiple variations of
Hezar so that the installation is light and fast for general use.

You can install optional dependencies for each mode like so:
```
pip install hezar[nlp] # For natural language processing
pip install hezar[vision] # For computer vision and image processing
pip install hezar[audio] # For audio and speech processing
pip install hezar[embeddings] # For word embeddings
```
Or you can also install everything using:
```
pip install hezar[all]
```
## Install from source
Also, you can install the dev version of the library using the source:
```
pip install git+https://github.com/hezarai/hezar.git
```

## Test installation
From a Python console or in CLI just import `hezar` and check the version:
```python
import hezar

print(hezar.__version__)
```
```
0.23.1
```
20 changes: 20 additions & 0 deletions _sources/get_started/overview.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Overview

Welcome to Hezar! A library that makes state-of-the-art machine learning as easy as possible aimed for the Persian
language, built by the Persian community!

In Hezar, the primary goal is to provide plug-and-play AI/ML utilities so that you don't need to know much about what's
going on under the hood. Hezar is not just a model library, but instead it's packed with every aspect you need for any
ML pipeline like datasets, trainers, preprocessors, feature extractors, etc.

Hezar is a library that:
- brings together all the best works in AI for Persian
- makes using AI models as easy as a couple of lines of code
- seamlessly integrates with Hugging Face Hub for all of its models
- has a highly developer-friendly interface
- has a task-based model interface which is more convenient for general users.
- is packed with additional tools like word embeddings, tokenizers, feature extractors, etc.
- comes with a lot of supplementary ML tools for deployment, benchmarking, optimization, etc.
- and more!

To find out more, just take the [quick tour](quick_tour.md)!
205 changes: 205 additions & 0 deletions _sources/get_started/quick_tour.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
# Quick Tour
Let's have a quick tour on some of the most important features of Hezar!

### Models
There's a bunch of ready to use trained models for different tasks on the Hub!

**🤗Hugging Face Hub Page**: [https://huggingface.co/hezarai](https://huggingface.co/hezarai)

Let's walk you through some examples!

- **Text Classification (sentiment analysis, categorization, etc)**
```python
from hezar.models import Model

example = ["هزار، کتابخانه‌ای کامل برای به کارگیری آسان هوش مصنوعی"]
model = Model.load("hezarai/bert-fa-sentiment-dksf")
outputs = model.predict(example)
print(outputs)
```
```
[[{'label': 'positive', 'score': 0.812910258769989}]]
```
- **Sequence Labeling (POS, NER, etc.)**
```python
from hezar.models import Model

pos_model = Model.load("hezarai/bert-fa-pos-lscp-500k") # Part-of-speech
ner_model = Model.load("hezarai/bert-fa-ner-arman") # Named entity recognition
inputs = ["شرکت هوش مصنوعی هزار"]
pos_outputs = pos_model.predict(inputs)
ner_outputs = ner_model.predict(inputs)
print(f"POS: {pos_outputs}")
print(f"NER: {ner_outputs}")
```
```
POS: [[{'token': 'شرکت', 'label': 'Ne'}, {'token': 'هوش', 'label': 'Ne'}, {'token': 'مصنوعی', 'label': 'AJe'}, {'token': 'هزار', 'label': 'NUM'}]]
NER: [[{'token': 'شرکت', 'label': 'B-org'}, {'token': 'هوش', 'label': 'I-org'}, {'token': 'مصنوعی', 'label': 'I-org'}, {'token': 'هزار', 'label': 'I-org'}]]
```
- **Language Modeling (Mask Filling)**
```python
from hezar.models import Model

roberta_mlm = Model.load("hezarai/roberta-fa-mlm")
inputs = ["سلام بچه ها حالتون <mask>"]
outputs = roberta_mlm.predict(inputs, top_k=1)
print(outputs)
```
```
[[{'token': 'چطوره', 'sequence': 'سلام بچه ها حالتون چطوره', 'token_id': 34505, 'score': 0.2230483442544937}]]
```
- **Speech Recognition**
```python
from hezar.models import Model

whisper = Model.load("hezarai/whisper-small-fa")
transcripts = whisper.predict("examples/assets/speech_example.mp3")
print(transcripts)
```
```
[{'text': 'و این تنها محدود به محیط کار نیست'}]
```
- **Image to Text (OCR)**
```python
from hezar.models import Model
# OCR with TrOCR
model = Model.load("hezarai/trocr-base-fa-v2")
texts = model.predict(["examples/assets/ocr_example.jpg"])
print(f"TrOCR Output: {texts}")

# OCR with CRNN
model = Model.load("hezarai/crnn-base-fa-64x256")
texts = model.predict("examples/assets/ocr_example.jpg")
print(f"CRNN Output: {texts}")
```
```
TrOCR Output: [{'text': 'چه میشه کرد، باید صبر کنیم'}]
CRNN Output: [{'text': 'چه میشه کرد، باید صبر کنیم'}]
```
![](https://raw.githubusercontent.com/hezarai/hezar/main/examples/assets/ocr_example.jpg)

- **Image to Text (License Plate Recognition)**
```python
from hezar.models import Model

model = Model.load("hezarai/crnn-fa-64x256-license-plate-recognition")
plate_text = model.predict("assets/license_plate_ocr_example.jpg")
print(plate_text) # Persian text of mixed numbers and characters might not show correctly in the console
```
```
[{'text': '۵۷س۷۷۹۷۷'}]
```
![](https://raw.githubusercontent.com/hezarai/hezar/main/examples/assets/license_plate_ocr_example.jpg)

- **Image to Text (Image Captioning)**
```python
from hezar.models import Model

model = Model.load("hezarai/vit-roberta-fa-image-captioning-flickr30k")
texts = model.predict("examples/assets/image_captioning_example.jpg")
print(texts)
```
```
[{'text': 'سگی با توپ تنیس در دهانش می دود.'}]
```
![](https://raw.githubusercontent.com/hezarai/hezar/main/examples/assets/image_captioning_example.jpg)

We constantly keep working on adding and training new models and this section will hopefully be expanding over time ;)
### Word Embeddings
- **FastText**
```python
from hezar.embeddings import Embedding

fasttext = Embedding.load("hezarai/fasttext-fa-300")
most_similar = fasttext.most_similar("هزار")
print(most_similar)
```
```
[{'score': 0.7579, 'word': 'میلیون'},
{'score': 0.6943, 'word': '21هزار'},
{'score': 0.6861, 'word': 'میلیارد'},
{'score': 0.6825, 'word': '26هزار'},
{'score': 0.6803, 'word': '٣هزار'}]
```
- **Word2Vec (Skip-gram)**
```python
from hezar.embeddings import Embedding

word2vec = Embedding.load("hezarai/word2vec-skipgram-fa-wikipedia")
most_similar = word2vec.most_similar("هزار")
print(most_similar)
```
```
[{'score': 0.7885, 'word': 'چهارهزار'},
{'score': 0.7788, 'word': '۱۰هزار'},
{'score': 0.7727, 'word': 'دویست'},
{'score': 0.7679, 'word': 'میلیون'},
{'score': 0.7602, 'word': 'پانصد'}]
```
- **Word2Vec (CBOW)**
```python
from hezar.embeddings import Embedding

word2vec = Embedding.load("hezarai/word2vec-cbow-fa-wikipedia")
most_similar = word2vec.most_similar("هزار")
print(most_similar)
```
```
[{'score': 0.7407, 'word': 'دویست'},
{'score': 0.7400, 'word': 'میلیون'},
{'score': 0.7326, 'word': 'صد'},
{'score': 0.7276, 'word': 'پانصد'},
{'score': 0.7011, 'word': 'سیصد'}]
```
### Datasets
You can load any of the datasets on the [Hub](https://huggingface.co/hezarai) like below:
```python
from hezar.data import Dataset

sentiment_dataset = Dataset.load("hezarai/sentiment-dksf") # A TextClassificationDataset instance
lscp_dataset = Dataset.load("hezarai/lscp-pos-500k") # A SequenceLabelingDataset instance
xlsum_dataset = Dataset.load("hezarai/xlsum-fa") # A TextSummarizationDataset instance
...
```

### Training
Hezar makes it super easy to train models using out-of-the-box models and datasets provided in the library.
```python
from hezar.models import BertSequenceLabeling, BertSequenceLabelingConfig
from hezar.data import Dataset
from hezar.trainer import Trainer, TrainerConfig
from hezar.preprocessors import Preprocessor

base_model_path = "hezarai/bert-base-fa"
dataset_path = "hezarai/lscp-pos-500k"

train_dataset = Dataset.load(dataset_path, split="train", tokenizer_path=base_model_path)
eval_dataset = Dataset.load(dataset_path, split="test", tokenizer_path=base_model_path)

model = BertSequenceLabeling(BertSequenceLabelingConfig(id2label=train_dataset.config.id2label))
preprocessor = Preprocessor.load(base_model_path)

train_config = TrainerConfig(
task="sequence_labeling",
device="cuda",
init_weights_from=base_model_path,
batch_size=8,
num_epochs=5,
checkpoints_dir="checkpoints/",
metrics=["seqeval"],
)

trainer = Trainer(
config=train_config,
model=model,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
data_collator=train_dataset.data_collator,
preprocessor=preprocessor,
)
trainer.train()

trainer.push_to_hub("bert-fa-pos-lscp-500k") # push model, config, preprocessor, trainer files and configs
```

Want to go deeper? Check out the [guides](../guide/index.md).
2 changes: 2 additions & 0 deletions _sources/guide/advanced_training.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Advanced Training
Docs coming soon, stay tuned!
Loading

0 comments on commit e3f0985

Please sign in to comment.