Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
docai_utils.py		docai_utils.py
gcs_utils.py		gcs_utils.py
main.py		main.py
requirements.txt		requirements.txt

README.md

Sorting post-HITL Documents by language

This project uses the languages detected by Document AI (post-HITL) to sort the Document.json files into separate Cloud Storage buckets. The document files are sorted by the most frequent language in the document, if there are multiple detected.

Running the sample

Install the prerequisites: pip install -r requirements.txt

Update the following values with information from your project

PROJECT_ID = "YOUR PROJECT ID"

# Output Files from Human-in-the-loop
GCS_HITL_BUCKET = "input-bucket"
GCS_HITL_PREFIX = "input-directory"

# Output Bucket names will be in the format of GCS_OUTPUT_BUCKET_PREFIX + language
GCS_OUTPUT_BUCKET_PREFIX = "output-bucket-"

Run the sample: python main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

filter-hitl-language

filter-hitl-language

README.md

Sorting post-HITL Documents by language

Running the sample

Files

filter-hitl-language

Directory actions

More options

Directory actions

More options

Latest commit

History

filter-hitl-language

Folders and files

parent directory

README.md

Sorting post-HITL Documents by language

Running the sample