Skip to content

Commit

Permalink
Update Thu Oct 10 15:24:47 CEST 2024
Browse files Browse the repository at this point in the history
  • Loading branch information
kba committed Oct 10, 2024
1 parent ba08569 commit af3bfdf
Show file tree
Hide file tree
Showing 3 changed files with 53 additions and 59 deletions.
108 changes: 51 additions & 57 deletions en/workflows.html
Original file line number Diff line number Diff line change
Expand Up @@ -588,16 +588,6 @@ <h2>
<li><a href="#example-with-ocrd-process">Example with ocrd-process</a></li>
</ul>
</li>
<li><a href="#best-results-for-selected-pages">Best results for selected pages</a>
<ul>
<li><a href="#example-with-ocrd-process-1">Example with ocrd-process</a></li>
</ul>
</li>
<li><a href="#good-results-for-slower-processors">Good results for slower processors</a>
<ul>
<li><a href="#example-with-ocrd-process-2">Example with ocrd-process</a></li>
</ul>
</li>
</ul>
</li>
</ul>
Expand Down Expand Up @@ -2293,10 +2283,10 @@ <h3 id="example-with-ocrd-process">Example with ocrd-process</h3>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ocrd process <span class="s2">"tesserocr-recognize -P segmentation_level region -P textequiv_level word -P find_tables true -P model GT4HistOCR_50000000.997_191951"</span>
</code></pre></div></div>

<h2 id="best-results-for-selected-pages">Best results for selected pages</h2>
<!-- ## Best results for selected pages
<p>The following workflow has produced best results for simple pages (e.g. <a href="https://ocr-d-repo.scc.kit.edu/api/v1/dataresources/dda89351-7596-46eb-9736-593a5e9593d3/data/bagit/data/OCR-D-IMG/OCR-D-IMG_0004.tif">this
page</a>) (CER ~1%).</p>
The following workflow has produced best results for 'simple' pages (e.g. [this
page](https://ocr-d-repo.scc.kit.edu/api/v1/dataresources/dda89351-7596-46eb-9736-593a5e9593d3/data/bagit/data/OCR-D-IMG/OCR-D-IMG_0004.tif)) (CER ~1%).
<table class="processor-table">
<thead>
Expand Down Expand Up @@ -2350,32 +2340,34 @@ <h2 id="best-results-for-selected-pages">Best results for selected pages</h2>
</tbody>
</table>
<h3 id="example-with-ocrd-process-1">Example with ocrd-process</h3>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ocrd process <span class="se">\</span>
<span class="s2">"cis-ocropy-binarize -I OCR-D-IMG -O OCR-D-BIN"</span> <span class="se">\</span>
<span class="s2">"anybaseocr-crop -I OCR-D-BIN -O OCR-D-CROP"</span> <span class="se">\</span>
<span class="s2">"skimage-binarize -I OCR-D-CROP -O OCR-D-BIN2 -P method li"</span> <span class="se">\</span>
<span class="s2">"skimage-denoise -I OCR-D-BIN2 -O OCR-D-BIN-DENOISE -P level-of-operation page"</span> <span class="se">\</span>
<span class="s2">"tesserocr-deskew -I OCR-D-BIN-DENOISE -O OCR-D-BIN-DENOISE-DESKEW -P operation_level page"</span> <span class="se">\</span>
<span class="s2">"cis-ocropy-segment -I OCR-D-BIN-DENOISE-DESKEW -O OCR-D-SEG -P level-of-operation page"</span> <span class="se">\</span>
<span class="s2">"cis-ocropy-dewarp -I OCR-D-SEG -O OCR-D-SEG-LINE-RESEG-DEWARP"</span> <span class="se">\</span>
<span class="s2">"calamari-recognize -I OCR-D-SEG-LINE-RESEG-DEWARP -O OCR-D-OCR -P checkpoint_dir qurator-gt4histocr-1.0"</span>
</code></pre></div></div>

<p><strong>Note:</strong>
(1) This workflow expects your images to be stored in a folder called <code class="language-plaintext highlighter-rouge">OCR-D-IMG</code>. If your images are saved in a different folder,
you need to adjust <code class="language-plaintext highlighter-rouge">-I OCR-D-IMG</code> in the second line of the call above with the name of your folder, e.g. <code class="language-plaintext highlighter-rouge">-I MAX</code>
(2) For the last processor in this workflow, <code class="language-plaintext highlighter-rouge">ocrd-calamari-recognize</code>, you need to specify the model which is to be used.
If you didn’t download it via the <a href="https://ocr-d.de/en/models">OCR-D resource manager</a>, you have to use the <code class="language-plaintext highlighter-rouge">checkpoint</code> parameter
and pass your local path to the model on your hard drive as parameter value! In this case, the last line of the <code class="language-plaintext highlighter-rouge">ocrd-process</code> call above could e.g. look like this:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="s2">"calamari-recognize -I OCR-D-SEG-LINE-RESEG-DEWARP -O OCR-D-OCR -P checkpoint /test/data/calamari_models/</span><span class="se">\*</span><span class="s2">.ckpt.json"</span>
</code></pre></div></div>
<p>All the other lines can just be copied and pasted.</p>

<h2 id="good-results-for-slower-processors">Good results for slower processors</h2>

<p>If your computer is not that powerful you may try this workflow. It works fine for simple pages and produces also good results in shorter time.</p>
### Example with ocrd-process
```sh
ocrd process \
"cis-ocropy-binarize -I OCR-D-IMG -O OCR-D-BIN" \
"anybaseocr-crop -I OCR-D-BIN -O OCR-D-CROP" \
"skimage-binarize -I OCR-D-CROP -O OCR-D-BIN2 -P method li" \
"skimage-denoise -I OCR-D-BIN2 -O OCR-D-BIN-DENOISE -P level-of-operation page" \
"tesserocr-deskew -I OCR-D-BIN-DENOISE -O OCR-D-BIN-DENOISE-DESKEW -P operation_level page" \
"cis-ocropy-segment -I OCR-D-BIN-DENOISE-DESKEW -O OCR-D-SEG -P level-of-operation page" \
"cis-ocropy-dewarp -I OCR-D-SEG -O OCR-D-SEG-LINE-RESEG-DEWARP" \
"calamari-recognize -I OCR-D-SEG-LINE-RESEG-DEWARP -O OCR-D-OCR -P checkpoint_dir qurator-gt4histocr-1.0"
```
**Note:**
(1) This workflow expects your images to be stored in a folder called `OCR-D-IMG`. If your images are saved in a different folder,
you need to adjust `-I OCR-D-IMG` in the second line of the call above with the name of your folder, e.g. `-I MAX`
(2) For the last processor in this workflow, `ocrd-calamari-recognize`, you need to specify the model which is to be used.
If you didn't download it via the [OCR-D resource manager](https://ocr-d.de/en/models), you have to use the `checkpoint` parameter
and pass your local path to the model on your hard drive as parameter value! In this case, the last line of the `ocrd-process` call above could e.g. look like this:
```sh
"calamari-recognize -I OCR-D-SEG-LINE-RESEG-DEWARP -O OCR-D-OCR -P checkpoint /test/data/calamari_models/\*.ckpt.json"
```
All the other lines can just be copied and pasted.
## Good results for slower processors
If your computer is not that powerful you may try this workflow. It works fine for simple pages and produces also good results in shorter time.
<table class="processor-table">
<thead>
Expand Down Expand Up @@ -2424,26 +2416,28 @@ <h2 id="good-results-for-slower-processors">Good results for slower processors</
</tbody>
</table>
<h3 id="example-with-ocrd-process-2">Example with ocrd-process</h3>

<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ocrd process <span class="se">\</span>
<span class="s2">"cis-ocropy-binarize -I OCR-D-IMG -O OCR-D-BIN"</span> <span class="se">\</span>
<span class="s2">"anybaseocr-crop -I OCR-D-BIN -O OCR-D-CROP"</span> <span class="se">\</span>
<span class="s2">"skimage-denoise -I OCR-D-CROP -O OCR-D-BIN-DENOISE -P level-of-operation page"</span> <span class="se">\</span>
<span class="s2">"tesserocr-deskew -I OCR-D-BIN-DENOISE -O OCR-D-BIN-DENOISE-DESKEW -P operation_level page"</span> <span class="se">\</span>
<span class="s2">"tesserocr-segment -I OCR-D-BIN-DENOISE-DESKEW -O OCR-D-SEG -P shrink_polygons true"</span> <span class="se">\</span>
<span class="s2">"cis-ocropy-dewarp -I OCR-D-SEG -O OCR-D-SEG-DEWARP"</span> <span class="se">\</span>
<span class="s2">"tesserocr-recognize -I OCR-D-SEG-DEWARP -O OCR-D-OCR -P textequiv_level glyph -P overwrite_segments true -P model GT4HistOCR_50000000.997_191951"</span>
</code></pre></div></div>

<p><strong>Note:</strong>
(1) This workflow expects your images to be stored in a folder called <code class="language-plaintext highlighter-rouge">OCR-D-IMG</code>. If your images are saved in a different folder,
you need to adjust <code class="language-plaintext highlighter-rouge">-I OCR-D-IMG</code> in the second line of the call above with the name of your folder, e.g. <code class="language-plaintext highlighter-rouge">-I my_images</code>
(2) For the last processor in this workflow, <code class="language-plaintext highlighter-rouge">ocrd-tesserocr-recognize</code>, the environment variable TESSDATA_PREFIX has to be
### Example with ocrd-process
```sh
ocrd process \
"cis-ocropy-binarize -I OCR-D-IMG -O OCR-D-BIN" \
"anybaseocr-crop -I OCR-D-BIN -O OCR-D-CROP" \
"skimage-denoise -I OCR-D-CROP -O OCR-D-BIN-DENOISE -P level-of-operation page" \
"tesserocr-deskew -I OCR-D-BIN-DENOISE -O OCR-D-BIN-DENOISE-DESKEW -P operation_level page" \
"tesserocr-segment -I OCR-D-BIN-DENOISE-DESKEW -O OCR-D-SEG -P shrink_polygons true" \
"cis-ocropy-dewarp -I OCR-D-SEG -O OCR-D-SEG-DEWARP" \
"tesserocr-recognize -I OCR-D-SEG-DEWARP -O OCR-D-OCR -P textequiv_level glyph -P overwrite_segments true -P model GT4HistOCR_50000000.997_191951"
```
**Note:**
(1) This workflow expects your images to be stored in a folder called `OCR-D-IMG`. If your images are saved in a different folder,
you need to adjust `-I OCR-D-IMG` in the second line of the call above with the name of your folder, e.g. `-I my_images`
(2) For the last processor in this workflow, `ocrd-tesserocr-recognize`, the environment variable TESSDATA_PREFIX has to be
set to point to the directory where the used models are stored if they are not in the default location. If you downloaded your models
with the <a href="https://ocr-d.de/en/models">OCR-D resource manager</a>, this is already taken care of.</p>
with the [OCR-D resource manager](https://ocr-d.de/en/models), this is already taken care of.
<!-- END-INCLUDE -->
<-- END-INCLUDE -->
<p>–&gt;</p>

<script src="/js/workflows.js"></script>

Expand Down
2 changes: 1 addition & 1 deletion feed.xml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.2.2">Jekyll</generator><link href="https://ocr-d.de/feed.xml" rel="self" type="application/atom+xml" /><link href="https://ocr-d.de/" rel="alternate" type="text/html" /><updated>2024-10-07T14:38:57+02:00</updated><id>https://ocr-d.de/feed.xml</id><title type="html">OCR-D</title><subtitle>Write an awesome description for your new site here. You can edit this line in _config.yml. It will appear in your document head meta (for Google search results) and in your feed.xml site description.</subtitle><entry xml:lang="de"><title type="html">OCR-D Phase III gestartet</title><link href="https://ocr-d.de/de/2021/08/06/kick-off-phase3.html" rel="alternate" type="text/html" title="OCR-D Phase III gestartet" /><published>2021-08-06T00:00:00+02:00</published><updated>2021-08-06T00:00:00+02:00</updated><id>https://ocr-d.de/de/2021/08/06/kick-off-phase3</id><content type="html" xml:base="https://ocr-d.de/de/2021/08/06/kick-off-phase3.html"><![CDATA[<p>Am 30. Juli fand unser Kick-off-Workshop statt, der die Phase III von OCR-D einläutete.</p>
<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.2.2">Jekyll</generator><link href="https://ocr-d.de/feed.xml" rel="self" type="application/atom+xml" /><link href="https://ocr-d.de/" rel="alternate" type="text/html" /><updated>2024-10-10T15:24:39+02:00</updated><id>https://ocr-d.de/feed.xml</id><title type="html">OCR-D</title><subtitle>Write an awesome description for your new site here. You can edit this line in _config.yml. It will appear in your document head meta (for Google search results) and in your feed.xml site description.</subtitle><entry xml:lang="de"><title type="html">OCR-D Phase III gestartet</title><link href="https://ocr-d.de/de/2021/08/06/kick-off-phase3.html" rel="alternate" type="text/html" title="OCR-D Phase III gestartet" /><published>2021-08-06T00:00:00+02:00</published><updated>2021-08-06T00:00:00+02:00</updated><id>https://ocr-d.de/de/2021/08/06/kick-off-phase3</id><content type="html" xml:base="https://ocr-d.de/de/2021/08/06/kick-off-phase3.html"><![CDATA[<p>Am 30. Juli fand unser Kick-off-Workshop statt, der die Phase III von OCR-D einläutete.</p>

<p>Das Team gab eine Einführung in die <a href="/assets/kick-off/phase3.pdf">Ziele und öffentlichen Kommunikationskanäle von OCR-D in Phase III</a>, in <a href="/assets/kick-off/spec_core_ocrd_all.pdf">Status und Pläne der OCR-Software</a> und der <a href="/assets/kick-off/web-api.pdf">Web-API</a> und in den Umgang mit <a href="/assets/kick-off/gt.pdf">Ground Truth Daten in OCR-D</a>. Zudem gab das Koordinierungsprojekt einen Einblick in die bisherige Praxis der <a href="/assets/kick-off/software-development.pdf">Softwareentwicklung in OCR-D</a> mit Möglichkeiten, mitzuwirken.</p>

Expand Down
2 changes: 1 addition & 1 deletion search-index.json

Large diffs are not rendered by default.

0 comments on commit af3bfdf

Please sign in to comment.