Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocrd network client processing processor: is non-blocking #1265

Closed
bertsky opened this issue Jul 31, 2024 · 4 comments
Closed

ocrd network client processing processor: is non-blocking #1265

bertsky opened this issue Jul 31, 2024 · 4 comments

Comments

@bertsky
Copy link
Collaborator

bertsky commented Jul 31, 2024

When we first discussed the possible client CLI, we kind of agreed to make the typical behaviour of long-running commands blocking (as in the non-network analogue), encapsulating all internal polling or callback mechanics.

But for now the processor command is non-blocking – it just prints the job ID.

Do we really want that? So far we have no status command, and the user would still need to write their own polling loop.

@MehmedGIT
Copy link
Contributor

But for now the processor command is non-blocking – it just prints the job ID.

That is ideal for non-interactive clients, however, for interactive clients we could also watch the log file of that specific job till the job fails or succeeds. The same goes for the workflow endpoint. Currently, it is possible to poll workflow status from the processing server. Check an example output:

{"ocrd-olena-binarize":{"SUCCESS":25},
"ocrd-anybaseocr-crop":{"SUCCESS":24,"FAILED":1},
"ocrd-cis-ocropy-denoise":{"SUCCESS":24,"CANCELLED":1},
"ocrd-cis-ocropy-deskew":{"SUCCESS":24,"CANCELLED":1},
"ocrd-tesserocr-segment-region":{"SUCCESS":24,"CANCELLED":1},
"ocrd-segment-repair":{"SUCCESS":24,"CANCELLED":1},
"ocrd-cis-ocropy-clip":{"SUCCESS":24,"CANCELLED":1},
"ocrd-cis-ocropy-segment":{"SUCCESS":24,"CANCELLED":1},
"ocrd-cis-ocropy-dewarp":{"SUCCESS":24,"CANCELLED":1},
"ocrd-tesserocr-recognize":{"SUCCESS":24,"CANCELLED":1},
"failed-processor-tasks":{"ocrd-anybaseocr-crop":[{"job_id":"64c73535-bbb9-4595-bfc8-7b84b3dd94e7","page_id":"PHYS_0005"}]}}

There were many missing features of the processing server when the client was first implemented. Most of the processing server functionalities are tested by using the endpoints directly. Please have a look at the DHd2024 demo repo. You can find all the bash scripts there for submitting jobs, checking job log files, and checking the workflow status. The ocrd_network client is very outdated and needs more attention to implement all these calls in Python. Coming soon.

@bertsky
Copy link
Collaborator Author

bertsky commented Aug 1, 2024

That is ideal for non-interactive clients

Really? IMO some ocrd network client processing processor ... && other actions on the workspace ... is the most useful scenario for the CLI client. After all, who wants to submit jobs without ever controlling their results? Also, if we do wrap this as shell script as agreed earlier for the slim containerised ocrd_all, then only blocking behaviour is a true mimic of the old non-network CLI.

for interactive clients we could also watch the log file of that specific job till the job fails or succeeds

That's effectively a polling paradigm on secondary data. Why not use the callback mechanism directly?

There were many missing features of the processing server when the client was first implemented. Most of the processing server functionalities are tested by using the endpoints directly. Please have a look at the DHd2024 demo repo. You can find all the bash scripts there for submitting jobs, checking job log files, and checking the workflow status. The ocrd_network client is very outdated and needs more attention to implement all these calls in Python. Coming soon.

Thanks!

@MehmedGIT
Copy link
Contributor

MehmedGIT commented Aug 1, 2024

Really? IMO some ocrd network client processing processor ... && other actions on the workspace ... is the most useful scenario for the CLI client.

So how do you imagine the ocrd network client processing processor command to work?

Also, if we do wrap this as shell script as OCR-D/ocrd_all#69 (comment), then only blocking behaviour is a true mimic of the old non-network CLI.

Well, we could provide both a blocking and non-blocking client?

That's effectively a polling paradigm on secondary data. Why not use the callback mechanism directly?

I am getting confused. Do you ask for a blocking or non-blocking CLI? Of course, you do not need to poll till the job finishes... There is already an implemented user callback mechanism for that, check here and here. In a similar fashion there is an internal callback from the worker to the processing server instead of polling. My idea was rather to get all the logs from the processing job interactively mimicking the old non-network CLI, not because we need that to find out if a job has finished or failed. This said, anything could be implemented on the CLI client side as long as the requirements are clear.

@bertsky
Copy link
Collaborator Author

bertsky commented Aug 1, 2024

Well, we could provide both a blocking and non-blocking client?

Yes, we could. I guess I'm saying that blocking is more important and should be default.

So how do you image the ocrd network client processing processor command to work?

That's effectively a polling paradigm on secondary data. Why not use the callback mechanism directly?

Do you ask for a blocking or non-blocking CLI? Of course, you do not need to poll till the job finishes...

Well, how does the user get to know that then, if not through the client CLI?

There is already an implemented user callback mechanism for that, check here and here. In a similar fashion there is an internal callback from the worker to the processing server instead of polling.

What I tried to convey is that the client CLI should have an internal server in the background which it can pass as callback URL to the Processing Server to have the callback unblock the CLI. Or some adjustable timeout perhaps.

My idea was rather to get all the logs from the processing job interactively mimicking the old non-network CLI, not because we need that to find out if a job has finished or failed.

Retrieving and re-reproducing the logs before exiting is of course an important point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants