Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment crashes #94

Closed
rue-a opened this issue Feb 28, 2023 · 3 comments
Closed

Segment crashes #94

rue-a opened this issue Feb 28, 2023 · 3 comments

Comments

@rue-a
Copy link

rue-a commented Feb 28, 2023

ocrd_cis/ocropy/common.py:643: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  sepslices = np.array(sepslices)
15:55:07.138 INFO processor.OcropySegment - Found 170 text lines for page "SBB-CROP_Ansiedlung_Korotschin_UZS_Sign_22a_0003"
15:56:49.378 INFO processor.OcropySegment - Found 84 text regions for page "SBB-CROP_Ansiedlung_Korotschin_UZS_Sign_22a_0003"
15:56:55.435 WARNING processor.OcropySegment - Label 1 contour 1 is too small (157/4808) in region "SBB-CROP_Ansiedlung_Korotschin_UZS_Sign_22a_0003"
Traceback (most recent call last):
  File "/data/ocr-d/ocrd_all/venv/bin/ocrd-cis-ocropy-segment", line 8, in <module>
    sys.exit(ocrd_cis_ocropy_segment())
  File "click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "ocrd_cis/ocropy/cli.py", line 53, in ocrd_cis_ocropy_segment
    return ocrd_cli_wrap_processor(OcropySegment, *args, **kwargs)
  File "ocrd/decorators/__init__.py", line 117, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "ocrd/processor/helpers.py", line 107, in run_processor
    processor.process()
  File "ocrd_cis/ocropy/segment.py", line 406, in process
    input_file.pageId, zoom, rogroup=rogroup)
  File "ocrd_cis/ocropy/segment.py", line 680, in _process_element
    min_area=640/zoom/zoom)
  File "ocrd_cis/ocropy/segment.py", line 232, in masks2polygons
    for baseline in baselines], name)
  File "ocrd_cis/ocropy/segment.py", line 232, in <listcomp>
    for baseline in baselines], name)
  File "shapely/geometry/base.py", line 582, in intersection
    return shapely.intersection(self, other, grid_size=grid_size)
  File "shapely/decorators.py", line 77, in wrapped
    return func(*args, **kwargs)
  File "shapely/set_operations.py", line 133, in intersection
    return lib.intersection(a, b, **kwargs)
FloatingPointError: invalid value encountered in intersection
@MehmedGIT
Copy link

MehmedGIT commented Sep 26, 2023

I am facing a similar issue:

WARNING:processor.OcropyResegment:baseline part crosses existing x in region "FILE_0025_OCR-D-BIN-DENOISE-DESKEW"
WARNING:processor.OcropyResegment:baseline part crosses existing x in region "FILE_0025_OCR-D-BIN-DENOISE-DESKEW"
WARNING:processor.OcropyResegment:baseline part crosses existing x in region "FILE_0025_OCR-D-BIN-DENOISE-DESKEW"
WARNING:processor.OcropyResegment:baseline part crosses existing x in region "FILE_0025_OCR-D-BIN-DENOISE-DESKEW"
WARNING:processor.OcropyResegment:baseline part crosses existing x in region "FILE_0025_OCR-D-BIN-DENOISE-DESKEW"
WARNING:processor.OcropyResegment:baseline part crosses existing x in region "FILE_0025_OCR-D-BIN-DENOISE-DESKEW"
/home/mm/venv37-ocrd/lib/python3.7/site-packages/ocrd_cis/ocropy/segment.py:852: ShapelyDeprecationWarning: The 'type' attribute is deprecated, and will be removed in the future. You can use the 'geom_type' attribute instead.
  baseline.type in ['Point', 'MultiPoint']):
/home/mm/venv37-ocrd/lib/python3.7/site-packages/ocrd_cis/ocropy/segment.py:859: ShapelyDeprecationWarning: The 'type' attribute is deprecated, and will be removed in the future. You can use the 'geom_type' attribute instead.
  if (baseline.type == 'GeometryCollection' or
/home/mm/venv37-ocrd/lib/python3.7/site-packages/ocrd_cis/ocropy/segment.py:860: ShapelyDeprecationWarning: The 'type' attribute is deprecated, and will be removed in the future. You can use the 'geom_type' attribute instead.
  baseline.type.startswith('Multi')):
/home/mm/venv37-ocrd/lib/python3.7/site-packages/ocrd_cis/ocropy/segment.py:852: ShapelyDeprecationWarning: The 'type' attribute is deprecated, and will be removed in the future. You can use the 'geom_type' attribute instead.
  baseline.type in ['Point', 'MultiPoint']):
/home/mm/venv37-ocrd/lib/python3.7/site-packages/ocrd_cis/ocropy/segment.py:859: ShapelyDeprecationWarning: The 'type' attribute is deprecated, and will be removed in the future. You can use the 'geom_type' attribute instead.
  if (baseline.type == 'GeometryCollection' or
/home/mm/venv37-ocrd/lib/python3.7/site-packages/ocrd_cis/ocropy/segment.py:860: ShapelyDeprecationWarning: The 'type' attribute is deprecated, and will be removed in the future. You can use the 'geom_type' attribute instead.
  baseline.type.startswith('Multi')):
WARNING:processor.OcropySegment:Label 204 contour 10 is too small (133/2097) in region "FILE_0025_OCR-D-BIN-DENOISE-DESKEW"
WARNING:processor.OcropySegment:Label 204 contour 9 is too small (193/2097) in region "FILE_0025_OCR-D-BIN-DENOISE-DESKEW"
12:03:54.743 ERROR ocrd.processor.helpers.run_processor - Failure in processor 'ocrd-cis-ocropy-segment'
Traceback (most recent call last):
  File "/home/mm/Desktop/core/ocrd/ocrd/processor/helpers.py", line 129, in run_processor
    processor.process()
  File "/home/mm/venv37-ocrd/lib/python3.7/site-packages/ocrd_cis/ocropy/segment.py", line 322, in process
    input_file.pageId, zoom, rogroup=rogroup)
  File "/home/mm/venv37-ocrd/lib/python3.7/site-packages/ocrd_cis/ocropy/segment.py", line 596, in _process_element
    min_area=640/zoom/zoom)
  File "/home/mm/venv37-ocrd/lib/python3.7/site-packages/ocrd_cis/ocropy/segment.py", line 148, in masks2polygons
    for baseline in baselines], name)
  File "/home/mm/venv37-ocrd/lib/python3.7/site-packages/ocrd_cis/ocropy/segment.py", line 148, in <listcomp>
    for baseline in baselines], name)
  File "/home/mm/venv37-ocrd/lib/python3.7/site-packages/shapely/geometry/base.py", line 582, in intersection
    return shapely.intersection(self, other, grid_size=grid_size)
  File "/home/mm/venv37-ocrd/lib/python3.7/site-packages/shapely/decorators.py", line 77, in wrapped
    return func(*args, **kwargs)
  File "/home/mm/venv37-ocrd/lib/python3.7/site-packages/shapely/set_operations.py", line 133, in intersection
    return lib.intersection(a, b, **kwargs)
shapely.errors.GEOSException: TopologyException: Input geom 1 is invalid: Ring Self-intersection at or near point 657 659 at 657 659

for the following image (FILE_0025_DEFAULT.jpg of mets):
FILE_0025_DEFAULT

in a workflow having the following steps:

cis-ocropy-binarize -I DEFAULT -O OCR-D-BIN
anybaseocr-crop -I OCR-D-BIN -O OCR-D-CROP
skimage-binarize -I OCR-D-CROP -O OCR-D-BIN2 -P method li
skimage-denoise -I OCR-D-BIN2 -O OCR-D-BIN-DENOISE -P level-of-operation page
tesserocr-deskew -I OCR-D-BIN-DENOISE -O OCR-D-BIN-DENOISE-DESKEW -P operation_level page
cis-ocropy-segment -I OCR-D-BIN-DENOISE-DESKEW -O OCR-D-SEG -P level-of-operation page
cis-ocropy-dewarp -I OCR-D-SEG -O OCR-D-SEG-LINE-RESEG-DEWARP
calamari-recognize -I OCR-D-SEG-LINE-RESEG-DEWARP -O OCR-D-OCR -P checkpoint_dir qurator-gt4histocr-1.0

@stweil
Copy link
Contributor

stweil commented Jan 13, 2024

cis_ocropy_segment also crashes in the QuiVer benchmark tests, see OCR-D/quiver-benchmarks#22:

[...]
Launching `/app/workflows/workspaces/ballenstedt_delatio_1777_selected_pages_ocr/data/ballenstedt_delatio_1777/selected_pages_ocr.txt.nf` [stoic_turing] DSL2 - revision: 8ad3dbf42c
[...]
executor >  local (6)ESC[K
[88/c15647] process > ocrd_cis_ocropy_binarize_0 [100%] 1 of 1 ✔ESC[K
[a7/023237] process > ocrd_tesserocr_crop_1      [100%] 1 of 1 ✔ESC[K
[e4/720726] process > ocrd_skimage_binarize_2    [100%] 1 of 1 ✔ESC[K
[86/34c9af] process > ocrd_skimage_denoise_3     [100%] 1 of 1 ✔ESC[K
[05/44d14c] process > ocrd_tesserocr_deskew_4    [100%] 1 of 1 ✔ESC[K
[92/b8e041] process > ocrd_cis_ocropy_segment_5  [  0%] 0 of 1ESC[K
[-        ] process > ocrd_cis_ocropy_dewarp_6   -ESC[K
[-        ] process > ocrd_calamari_recognize_7  -ESC[K
ESC[31mERROR ~ Error executing process > 'ocrd_cis_ocropy_segment_5'ESC[K
ESC[K
Caused by:ESC[K
  Process `ocrd_cis_ocropy_segment_5` terminated with an error exit status (1)ESC[K
ESC[K
Command executed:ESC[K
ESC[K
  ocrd-cis-ocropy-segment -m mets.xml -I OCR-D-BIN-DENOISE-DESKEW -O OCR-D-SEG -p '{"level-of-operation": "page"}'ESC[K
ESC[K
Command exit status:ESC[K
  1ESC[K
ESC[K
Command output:ESC[K
  (empty)ESC[K
ESC[K
Command error:ESC[K
  21:31:06.567 INFO processor.OcropySegment - Found 5 separators for page "OCR-D-BIN-DENOISE-DESKEW_00005"ESC[K
  21:31:06.674 ERROR ocrd.processor.helpers.run_processor - Failure in processor 'ocrd-cis-ocropy-segment'ESC[K
  Traceback (most recent call last):ESC[K
    File "/build/core/ocrd/ocrd/processor/helpers.py", line 128, in run_processorESC[K
      processor.process()ESC[K
    File "/usr/local/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 404, in processESC[K
      self._process_element(page, ignore, page_image, page_coords,ESC[K
    File "/usr/local/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 750, in _process_elementESC[K
      sep_polygons, _ = masks2polygons(seplines, None, element_bin,ESC[K
    File "/usr/local/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 139, in masks2polygonsESC[K
      hole_idx = np.argmin([cv2.pointPolygonTest(contour, tuple(pt[0]), True)ESC[K
    File "/usr/local/lib/python3.8/site-packages/ocrd_cis/ocropy/segment.py", line 139, in <listcomp>ESC[K
      hole_idx = np.argmin([cv2.pointPolygonTest(contour, tuple(pt[0]), True)ESC[K
  cv2.error: OpenCV(4.7.0) :-1: error: (-5:Bad argument) in function 'pointPolygonTest'ESC[K
  > Overload resolution failed:ESC[K
  >  - Can't parse 'pt'. Sequence item with index 0 has a wrong typeESC[K
  >  - Can't parse 'pt'. Sequence item with index 0 has a wrong typeESC[K
[...]

@bertsky
Copy link
Collaborator

bertsky commented Jan 18, 2024

I'm pretty sure the OP's problem happened on an outdated version (so the original problem has been fixed).

Regarding @MehmedGIT's description, thanks for the detailled report. This likewise does not look like the version we have been using in ocrd_all (from fix-alpha-shape branch with last change in August). Also, in my case the workflow runs through. Here's the result for that page (OCR-D-OCR):

page0025-segmentation

– pretty bad indeed, but not crashing. (Ocropy cannot cope with empty pages, because it relies on connected-component statistics, which in this case will be just noise from the binarization, no actual glyphs.)

@stweil your version is definitely outdated, I remember having fixed that long ago.

@bertsky bertsky closed this as completed Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants