Skip to content

Latest commit

 

History

History
36 lines (23 loc) · 1.57 KB

readme.md

File metadata and controls

36 lines (23 loc) · 1.57 KB

The requested images should be placed in the "./images" directory, and the results will be stored in the "./outputs" directory.

Follow https://github.com/salesforce/LAVIS, https://github.com/facebookresearch/segment-anything, https://github.com/JialianW/GRiT and https://github.com/PaddlePaddle/PaddleOCR.git to prepare the enverionment.

Download GRiT(Dense Captioning on VG Dataset) and place it under ./grit/model_weight.

Download SAM and place it under ./model_weight.

Generation Steps:

  1. Generate global description for each image. python blip2.py

  2. Use the Grit model to generate dense captions for each image. python grit_generate.py

  3. Generate segmentation maps for each image using the SAM model, and save the segmentation maps in the "./masks" directory. python amg.py --checkpoint ./model_weight/<pth name> --model-type <model_type> --input ./images --output ./masks --convert-to-rle

  4. Generate corresponding descriptions for the segmentation maps. python sam_blip.py

  5. Compute the similarity score. python image_text_matching.py --ann_path ./outputs/sam_blip2.json --output_path ./outputs/sam_blip2_score.json

  6. Compute the similarity score. python image_text_matching.py --ann_path ./outputs/grit.json --output_path ./outputs/grit_score.json

  7. Use ppocr to detect text in images.
    python ocr_ppocr.py

  8. Integrate the generated annotations into ann_all.json.
    python add_all_json.py

  9. Use ChatGPT API to generate the final detailed description and save it in ./outputs/ann_all.json.
    python chatgpt.py