Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify an order for preferred interpretations as low-key disambiguation #151

Open
EdwardChamberlain opened this issue Jun 21, 2020 · 3 comments
Labels
enhancement New feature or request

Comments

@EdwardChamberlain
Copy link

Describe the bug
The shorthand notation of inch (“) is detected but is parsed as second. While technically true the more common use of “ is to mean inch.

To Reproduce
Steps to reproduce the behaviour:

>>> from quantulum3 import parser
>>> s = 'Its about 24" long'
>>> quants = parser.parse(s)
>>> print(s)
Its about 24" long
>>> print(quants)
[Quantity(24, "Unit(name="second of arc", entity=Entity("angle"), uri=Minute_and_second_of_arc)”)]

Expected behavior
I would expect it to default to “ meaning inches rather than seconds.

Screenshots
N/A

Additional information:

  • Python Version: 3.7
  • Classifier activated/ sklearn installed: yes
  • OS: linux
  • Version: 0.7.3

Additional context
Is there anyway to force an override on this?

@nielstron
Copy link
Owner

the more common use of “ is to mean inch.
Do you have some source for this claim? Since this tool should be as general as possible I would prefer to keep all ambiguous units random when not using the disambiguation system.

Relatedly, for disambiguation there is a pretrained classifier included in the system. However, without any context, it is not really likely that it will correctly determine the appropriate unit.

Note taken: A way to pass in an ordering for preferred/less preferred interpretations of some symbols could be included.

@nielstron nielstron added the enhancement New feature or request label Jun 21, 2020
@nielstron nielstron changed the title Inches are detected as seconds Specify an order for preferred interpretations as low-key disambiguation Jun 21, 2020
@EdwardChamberlain
Copy link
Author

Do you have some source for this claim?

Sure:

The inch (abbreviation: in or ″)

Source: https://en.wikipedia.org/wiki/Inch (first line)

It seems to sometimes pickup inch correctly if using “ but I'm not sure how to reproduce yet.

@nielstron
Copy link
Owner

Yes of course " is an abbreviation for inch, but I rather wanted to know whether there is a source for the claim that " more frequently means "inch" than "second" :)

The tool knows that " is an abbreviation for inch just as it knows that " is an abbreviation for seconds, however there is no order of preference of which interpretation to choose. If it picks up " as inch it might be related to the context of the sentence but also due to (something very close to) pure luck, especially if disambiguation is not enabled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants