-
-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Content recognition (NVDA+R): let users choose from a list of available content recognition engines/strategies #17406
Comments
Great proposals, I propose a list which appears when pressing nvda+r, and which is populated with all recognition services available. When pressing enter on a service, the re cognition starts according to the settings for that specific service. |
@Adriani90 I personally would prefer that you select the recognition engin in settings instead. In that way, you can perform OCR on inaccessible objects that disappears when losing focus. An Alternative to this would be to make the selection menu virtual as well. |
I see your point. Alternatively maybe you can press a keystroke to cycle between available services before pressing nvda+r.Von meinem iPhone gesendetAm 19.11.2024 um 23:58 schrieb Emil-18 ***@***.***>:
@Adriani90 I personally would prefer that you select the recognition engin in settings instead. In that way, you can perform OCR on inaccessible objects that disappears when losing focus. An Alternative to this would be to make the selection menu virtual as well.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I have also had the feeling that displaying a menu could have side effects due to focus changes, but I have not found concrete real examples. @Emil-18 could you describe a concrete example of inaccessible objects that disappears when losing focus? Also, I wonder if the following would not be better:
|
@CyrilleB79 would it not be better to have a cycling gesture and choose from a list in the content recognition settings which services should be included in the cycling command? Simmilar to speech modes? |
In my proposal, having many OCRs in the future would not be a problem: with NVDA+R, I will run my preferred OCR. I guess that the majority of people will have one preferred OCR and only use this. If a minority of people use 2 differents OCR, E.g. Windows OCR for general images on the web and Tesseract for specific images (e.g. containing tables), they can still use profiles. But for recognizer types, I do not expect their number to grow so much. I have only two recognizer types in mind today: OCR and image description. Thus this makes only two gestures. |
@CyrilleB79 I see you point, in general it is ok to have these gestures, but why having to assign multiple gestures when you can have only one which provides the same functionality by cycling?
I think still we should try to make processes as easy as it can be while trying to minimize as much as possible the amount of key strokes people have to keep in their minds. Note following OCR services all provide APIs: to name only few of the services available apart from those you mentioned already. And if we start with two services, the community definitely will try out and propose new services as well in the future. |
@Adriani90 it seems that you do not totally understand my point. There will not be multiple gestures to assign. You seem to mix recognition types with OCR services. In the settings dialog, you will have two combo-boxes:
Depending on the selection in the combo-boxes, you may have other controls to set the settings of the selected recognizers. |
Got it now. Thanks for the clarification for the UX. Von meinem iPhone gesendetAm 20.11.2024 um 15:36 schrieb Cyrille Bougot ***@***.***>:
@Adriani90 it seems that you do not totally understand my point.
There will not be multiple gestures to assign. You seem to mix recognition types with OCR services.
OCR is one recognizer type and image description is a second one. I suggest to keep NVDA+R for OCR and for example NVDA+I for image description. This makes only 2 gestures, not multiple ones.
In the settings dialog, you will have two combo-boxes:
The OCR engine/service combo-box: to select the OCR service you want to use when pressing NVDA+R. In this list you will have for example: Windows OCR, Tesseract, ocr.space, Baidu OCR, etc. (depending on the add-ons you have installed of course)
The Image description engine/service combo-box to select the image description service that will be launched upon an image description request with NVDA+I. In this list, you will have for example: AIContentDescriber (add-on), XPoseImage Captioner (add-on), Google vision, etc.
Depending on the selection in the combo-boxes, you may have other controls to set the settings of the selected recognizers.
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
We could then cycle between available engines with nvda+shift+r for ocr or NVDA+shift+i for image description. |
I agree that creating a cycling command for OCR engines and another one for image description engines could be useful for some. Though, I'd keep them unassigned. Because my assumption is that the people who use two engines of the same type (e.g. Windows OCR and Tesseract OCR) are a minority. |
Just a comment... |
Hi @CyrilleB79 |
Add-ons relevant to this discussion: |
In my proposal, the prompt is not a third type of recognition service. It's just a parameter (setting) of one of them. It's worth noting that each service should be able to provide its specific parameters, e.g.:
I wonder if the auto-update could remain a global parameter or not. Maybe not if one wants the OCR to auto-update, but the image description service not to do so. |
Hi,
Follow-up to #17405:
Background:
In addition to Windows OCR, add-ons were developed to offer alternative content recognition strategies such as use of online AI/language model services. However, in ordre to use third party strategies, users must use the interface and commands offered by add-ons.
### Is your feature request related to a problem? Please describe.
At the moment NVDA+R (content recognition) relies on Windows OCR to perform text recognition.
Describe the solution you'd like
Allow users to use different strategies when performing content recognition, thereby allowing scenraios such as pressing NVDA+R to recognize content via third-party services.
Describe alternatives you've considered
Leave the experience as is.
Additional context
There might be cases where Windows OCR may not be optimal despite being a built-in Windows feature (exposed through Universal Windows Platform (UWP) API). Therefore, let users choose the recongition strategy that best suits their needs and context.
Thanks.
The text was updated successfully, but these errors were encountered: