ocrd network client processing processor: is non-blocking #1265

bertsky · 2024-07-31T22:11:16Z

When we first discussed the possible client CLI, we kind of agreed to make the typical behaviour of long-running commands blocking (as in the non-network analogue), encapsulating all internal polling or callback mechanics.

But for now the processor command is non-blocking – it just prints the job ID.

Do we really want that? So far we have no status command, and the user would still need to write their own polling loop.

The text was updated successfully, but these errors were encountered:

MehmedGIT · 2024-08-01T14:27:44Z

But for now the processor command is non-blocking – it just prints the job ID.

That is ideal for non-interactive clients, however, for interactive clients we could also watch the log file of that specific job till the job fails or succeeds. The same goes for the workflow endpoint. Currently, it is possible to poll workflow status from the processing server. Check an example output:

{"ocrd-olena-binarize":{"SUCCESS":25},
"ocrd-anybaseocr-crop":{"SUCCESS":24,"FAILED":1},
"ocrd-cis-ocropy-denoise":{"SUCCESS":24,"CANCELLED":1},
"ocrd-cis-ocropy-deskew":{"SUCCESS":24,"CANCELLED":1},
"ocrd-tesserocr-segment-region":{"SUCCESS":24,"CANCELLED":1},
"ocrd-segment-repair":{"SUCCESS":24,"CANCELLED":1},
"ocrd-cis-ocropy-clip":{"SUCCESS":24,"CANCELLED":1},
"ocrd-cis-ocropy-segment":{"SUCCESS":24,"CANCELLED":1},
"ocrd-cis-ocropy-dewarp":{"SUCCESS":24,"CANCELLED":1},
"ocrd-tesserocr-recognize":{"SUCCESS":24,"CANCELLED":1},
"failed-processor-tasks":{"ocrd-anybaseocr-crop":[{"job_id":"64c73535-bbb9-4595-bfc8-7b84b3dd94e7","page_id":"PHYS_0005"}]}}

There were many missing features of the processing server when the client was first implemented. Most of the processing server functionalities are tested by using the endpoints directly. Please have a look at the DHd2024 demo repo. You can find all the bash scripts there for submitting jobs, checking job log files, and checking the workflow status. The ocrd_network client is very outdated and needs more attention to implement all these calls in Python. Coming soon.

bertsky · 2024-08-01T17:30:24Z

That is ideal for non-interactive clients

Really? IMO some ocrd network client processing processor ... && other actions on the workspace ... is the most useful scenario for the CLI client. After all, who wants to submit jobs without ever controlling their results? Also, if we do wrap this as shell script as agreed earlier for the slim containerised ocrd_all, then only blocking behaviour is a true mimic of the old non-network CLI.

for interactive clients we could also watch the log file of that specific job till the job fails or succeeds

That's effectively a polling paradigm on secondary data. Why not use the callback mechanism directly?

There were many missing features of the processing server when the client was first implemented. Most of the processing server functionalities are tested by using the endpoints directly. Please have a look at the DHd2024 demo repo. You can find all the bash scripts there for submitting jobs, checking job log files, and checking the workflow status. The ocrd_network client is very outdated and needs more attention to implement all these calls in Python. Coming soon.

Thanks!

MehmedGIT · 2024-08-01T18:45:36Z

Really? IMO some ocrd network client processing processor ... && other actions on the workspace ... is the most useful scenario for the CLI client.

So how do you imagine the ocrd network client processing processor command to work?

Also, if we do wrap this as shell script as OCR-D/ocrd_all#69 (comment), then only blocking behaviour is a true mimic of the old non-network CLI.

Well, we could provide both a blocking and non-blocking client?

That's effectively a polling paradigm on secondary data. Why not use the callback mechanism directly?

I am getting confused. Do you ask for a blocking or non-blocking CLI? Of course, you do not need to poll till the job finishes... There is already an implemented user callback mechanism for that, check here and here. In a similar fashion there is an internal callback from the worker to the processing server instead of polling. My idea was rather to get all the logs from the processing job interactively mimicking the old non-network CLI, not because we need that to find out if a job has finished or failed. This said, anything could be implemented on the CLI client side as long as the requirements are clear.

bertsky · 2024-08-01T19:56:19Z

Well, we could provide both a blocking and non-blocking client?

Yes, we could. I guess I'm saying that blocking is more important and should be default.

So how do you image the ocrd network client processing processor command to work?

That's effectively a polling paradigm on secondary data. Why not use the callback mechanism directly?

Do you ask for a blocking or non-blocking CLI? Of course, you do not need to poll till the job finishes...

Well, how does the user get to know that then, if not through the client CLI?

There is already an implemented user callback mechanism for that, check here and here. In a similar fashion there is an internal callback from the worker to the processing server instead of polling.

What I tried to convey is that the client CLI should have an internal server in the background which it can pass as callback URL to the Processing Server to have the callback unblock the CLI. Or some adjustable timeout perhaps.

My idea was rather to get all the logs from the processing job interactively mimicking the old non-network CLI, not because we need that to find out if a job has finished or failed.

Retrieving and re-reproducing the logs before exiting is of course an important point.

MehmedGIT mentioned this issue Aug 6, 2024

Extend the network client #1269

Merged

MehmedGIT closed this as completed Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ocrd network client processing processor: is non-blocking #1265

ocrd network client processing processor: is non-blocking #1265

bertsky commented Jul 31, 2024

MehmedGIT commented Aug 1, 2024

bertsky commented Aug 1, 2024

MehmedGIT commented Aug 1, 2024 •

edited

Loading

bertsky commented Aug 1, 2024

ocrd network client processing processor: is non-blocking #1265

ocrd network client processing processor: is non-blocking #1265

Comments

bertsky commented Jul 31, 2024

MehmedGIT commented Aug 1, 2024

bertsky commented Aug 1, 2024

MehmedGIT commented Aug 1, 2024 • edited Loading

bertsky commented Aug 1, 2024

MehmedGIT commented Aug 1, 2024 •

edited

Loading