Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic unit conversion app using Sentier.dev platform #1

Open
9 tasks
cmutel opened this issue Sep 22, 2024 · 7 comments
Open
9 tasks

Semantic unit conversion app using Sentier.dev platform #1

cmutel opened this issue Sep 22, 2024 · 7 comments

Comments

@cmutel
Copy link
Contributor

cmutel commented Sep 22, 2024

Overview

We have a database with units, code for a specific type of unit conversion, and the need for a general API for converting units in the future.

User stories

Let's build a webapp that can do the following:

  • Given a unit string, tell me what the preferred term and URI for that unit concept are in other systems
  • Given a unit concept URI, give me all known units in the same quantity type for a given unit system including conversion factors
  • Given a unit string which doesn't match a preferred term exactly, give me search results for that term which includes descriptions and alternative terms from all systems

And finally, tie all this together so you can start typing a unit, pick the right one from a dropdown, and then get tables of conversion factors for each system we include.

Unit systems

The unit systems we already have:

  • SimaPro
  • ecoinvent (only URIs, need to include other data for search unless we want to query their API as well)
  • QUDT

Unit systems I would like to have:

  • FedEFL / lcacommons.gov
  • OpenLCA

Tasks

  • Pick architecture for frontend and backend
  • Build an API backend which can support the user stories and draws data from our Fuseki database
  • Deploy the API
  • Design the frontend
  • Implement the frontend
  • Add missing unit systems
  • Fix search on skosmos and expose API
  • Implement varnish cache on skosmos deployment
  • Have correct skosmos deployed on our cloud architecture

Stretch goals:

  • Build a client library on top of the API

Skosmos search

This is possibly hard. We have a search index via skosmos (which should also have an API), but it only searches on prefLabel (see search result for btu versus british), and maybe on altLabel. We are currently using notation ("Notations are symbols which are not normally recognizable as words or sequences of words in any natural language and are thus usable independently of natural-language contexts"), but we could change these to altLabel, or add altLabel in addition to notation (there are strings, even if they have custom data types, so should be fine for being instances of RDF plain literal).

@cmutel
Copy link
Contributor Author

cmutel commented Sep 23, 2024

Preliminary plan is to develop a new UI and API using React and FastAPI, and to have our own search index using something like ElasticSearch. The reason we chose not to build on Skosmos is that we can move more quickly by building a more targeted user experience with specific and complicated Sparql queries, and that we want people to think about building apps on top of our data products (this is a good example).

We have three API endpoints in mind:

  • suggest: Uses search index
  • search: Uses search index
  • unit: Uses Fueski database. Includes all information useful for understanding units.

@cmutel
Copy link
Contributor Author

cmutel commented Sep 23, 2024

Hackathon team:

@cmutel
Copy link
Contributor Author

cmutel commented Sep 24, 2024

Quick update from my side: We have an initial unit endpoint available (PR), and this pulls all data for all units of the same quantity kind as the input unit.

The output is a JSON Map with keys of unit IRIS and values of lists of (attribute, value). This needs to be a list because the same attribute can be present more than once. Here is an example:

{
    "https://vocab.sentier.dev/qudt/unit/M-SEC": [
        [
            "type",
            "Concept"
        ],
        [
            "prefLabel",
            "Metre second"
        ],
        [
            "prefLabel",
            "Meter second"
        ],
        [
            "notation",
            "ms"
        ],
        [
            "notation",
            "m.s"
        ],
        [
            "inScheme",
            "https://vocab.sentier.dev/qudt/"
        ],
        [
            "broader",
            "https://vocab.sentier.dev/qudt/quantity-kind/LengthTime"
        ],
        [
            "narrower",
            "https://vocab.sentier.dev/qudt/unit/M-YR"
        ],
        [
            "definition",
            "Meter over one second"
        ],
        [
            "broaderTransitive",
            "https://vocab.sentier.dev/qudt/quantity-kind/LengthTime"
        ],
        [
            "narrowerTransitive",
            "https://vocab.sentier.dev/qudt/unit/M-YR"
        ],
        [
            "hasDimensionVector",
            "http://qudt.org/vocab/dimensionvector/A0E0L1I0M0H0T1D0"
        ],
        [
            "applicableSystem",
            "http://qudt.org/vocab/sou/SI"
        ],
        [
            "applicableSystem",
            "http://qudt.org/vocab/sou/CGS"
        ],
        [
            "conversionMultiplier",
            "1.0"
        ],
        [
            "conversionMultiplierSN",
            "1.0e0"
        ],
        [
            "hasQuantityKind",
            "https://vocab.sentier.dev/qudt/quantity-kind/LengthTime"
        ]
    ]
}

We could imagine having two tables, one with real units (anything not IMPERIAL, PLANCK, USCS), and the other one with the weird stuff.

We can place restrictions on the languages of the string literals returned, see the API docs. I think we need to do this as we can have more than one prefLabel for each concept, and then the UI doesn't know which one to display. In the above case, one has the language string en_GB and the other en_US (this was me being a bit pedantic 😛, but also trying to improve the search data).

My initial idea was that we would display different tables with the conversion factors, with a separate table for each alternative system, such as SimaPro, ecoinvent, etc. I now think that that is a bad idea. We want to support interoperability but also encourage harmonisation to a common standard. So instead I think we should only have a single table, like:

Label Synonyms IRI (click to copy) SimaPro Ecoinvent LCA Commons
Kilogram kg, KGM https://vocab.sentier.dev/qudt/unit/KiloGM kg kg kg

This has implications for the database. I came to this conclusion because I was starting with ecoinvent data, and didn't want to create a separate system for them the way we did for SimaPro.

Please provide feedback so I am not yelling into the void 📢

@janfeitkenhauer
Copy link

janfeitkenhauer commented Sep 24, 2024

Good thing, you added the JSON response. I will work with that until the server is up and running and the endpoint can be accessed from the frontend.

Also, I like the idea to display only one table. It is clean and easy to grasp for the user, without restrictions to what system they are using. If we find, that information is missing, we can easily adapt.
We should definitely add language restrictions!

@janfeitkenhauer
Copy link

So, on my way home I thought about the data structure of the JSON response..
The first positions should contain units of the metric system aka International System of Units, always starting with the reference unit. (Kudos to those who are not using the metric system for the 7 dimensions included. You exceed my level of skill and therefore are very able to look further down for your unit. 🤓)

Symbol Name Quantity
s second time
m metre length
kg kilogram mass
A ampere electric current
K kelvin thermodynamic temperature
mol mole amount of substance
cd candela luminous intensity

All units around the base units (like mega, kilo, milli, etc.) should be displayed below the reference unit, unordered. Below them, all the rest, unordered. There are additional units (e.g. velocity) which should work with the same principle. Please ask for clarification, if necessary.

To answer the question of how many unit pages we need, we agreed on a separate endpoint, that provides an array of all base units to be considered or something similar. With the response the frontend should be able to render the unit pages dynamically.

Thats it from me for today. In the upcoming days I will refine the frontend and also commit the code on github. For the initial commit I'd like some support as to where to put the client data, as you guys have already made the commits for the backend.

Cheers!

@janfeitkenhauer
Copy link

janfeitkenhauer commented Sep 25, 2024

Updated JSON response.

{
    "https://vocab.sentier.dev/qudt/unit/AMU": {
        "type": "Concept",
        "prefLabel": "Atomic mass unit",
        "notation": [
            "amu",
            "u",
            "D43"
        ],
        "inScheme": "https://vocab.sentier.dev/qudt/",
        "broader": "https://vocab.sentier.dev/qudt/unit/KiloGM",
        "definition": "The $\\textit{Unified Atomic Mass Unit}$ (symbol: $\\mu$) or $\\textit{dalton}$ (symbol: Da) is a unit that is used for indicating mass on an atomic or molecular scale. It is defined as one twelfth of the rest mass of an unbound atom of carbon-12 in its nuclear and electronic ground state, and has a value of $1.660538782(83) \\times 10^{-27} kg$.  One $Da$ is approximately equal to the mass of one proton or one neutron. The CIPM have categorised it as a $\\textit{\"non-SI unit whose values in SI units must be obtained experimentally\"}$.",
        "broaderTransitive": [
            "https://vocab.sentier.dev/qudt/unit/KiloGM",
            "https://vocab.sentier.dev/qudt/quantity-kind/Mass"
        ],
        "exactMatch": "http://qudt.org/vocab/unit/AMU",
        "hasDimensionVector": "http://qudt.org/vocab/dimensionvector/A0E0L0I0M1H0T0D0",
        "informativeReference": "http://en.wikipedia.org/wiki/Atomic_mass_unit",
        "conversionMultiplier": "0.00000000000000000000000000166053878283",
        "conversionMultiplierSN": "1.660539E-27",
        "hasQuantityKind": "https://vocab.sentier.dev/qudt/quantity-kind/Mass"
    },
    "https://vocab.sentier.dev/qudt/unit/KiloGM": {
        "type": "Concept",
        "prefLabel": [
            "كيلوغرام",
            "килограм",
            "kilogram",
            "Kilogramm",
            "χιλιόγραμμο",
            "kilogram",
            "kilogramo",
            "کیلوگرم",
            "kilogramme",
            "קילוגרם",
            "किलोग्राम",
            "kilogramm*",
            "chilogrammo",
            "キログラム",
            "chiliogramma",
            "kilogram",
            "kilogram",
            "quilograma",
            "kilogram",
            "килограмм",
            "kilogram",
            "kilogram",
            "公斤"
        ],
        "notation": [
            "0112/2///62720#UAA594",
            "0112/2///62720#UAD720",
            "kg",
            "KGM"
        ],
        "inScheme": "https://vocab.sentier.dev/qudt/",
        "broader": "https://vocab.sentier.dev/qudt/quantity-kind/Mass",
        "related": "http://dbpedia.org/resource/Kilogram",
        "narrower": [
            "https://vocab.sentier.dev/qudt/unit/AMU",
            "https://vocab.sentier.dev/qudt/unit/CARAT",
            "https://vocab.sentier.dev/qudt/unit/CWT_LONG",
            "https://vocab.sentier.dev/qudt/unit/CWT_SHORT",
            "https://vocab.sentier.dev/qudt/unit/CentiGM",
            "https://vocab.sentier.dev/qudt/unit/DRAM_UK",
            "https://vocab.sentier.dev/qudt/unit/DRAM_US",
            "https://vocab.sentier.dev/qudt/unit/DWT",
            "https://vocab.sentier.dev/qudt/unit/DecaGM",
            "https://vocab.sentier.dev/qudt/unit/DeciGM",
            "https://vocab.sentier.dev/qudt/unit/DeciTONNE",
            "https://vocab.sentier.dev/qudt/unit/DeciTON_Metric",
            "https://vocab.sentier.dev/qudt/unit/EarthMass",
            "https://vocab.sentier.dev/qudt/unit/FemtoGM",
            "https://vocab.sentier.dev/qudt/unit/GM",
            "https://vocab.sentier.dev/qudt/unit/GRAIN",
            "https://vocab.sentier.dev/qudt/unit/HectoGM",
            "https://vocab.sentier.dev/qudt/unit/Hundredweight_UK",
            "https://vocab.sentier.dev/qudt/unit/Hundredweight_US",
            "https://vocab.sentier.dev/qudt/unit/KiloTONNE",
            "https://vocab.sentier.dev/qudt/unit/KiloTON_Metric",
            "https://vocab.sentier.dev/qudt/unit/LB",
            "https://vocab.sentier.dev/qudt/unit/LB_M",
            "https://vocab.sentier.dev/qudt/unit/LB_T",
            "https://vocab.sentier.dev/qudt/unit/LunarMass",
            "https://vocab.sentier.dev/qudt/unit/MOMME_Pearl",
            "https://vocab.sentier.dev/qudt/unit/MOMME_Textile",
            "https://vocab.sentier.dev/qudt/unit/MegaGM",
            "https://vocab.sentier.dev/qudt/unit/MegaTON",
            "https://vocab.sentier.dev/qudt/unit/MegaTONNE",
            "https://vocab.sentier.dev/qudt/unit/MicroGM",
            "https://vocab.sentier.dev/qudt/unit/MilliGM",
            "https://vocab.sentier.dev/qudt/unit/NanoGM",
            "https://vocab.sentier.dev/qudt/unit/OZ",
            "https://vocab.sentier.dev/qudt/unit/OZ_M",
            "https://vocab.sentier.dev/qudt/unit/OZ_TROY",
            "https://vocab.sentier.dev/qudt/unit/PFUND",
            "https://vocab.sentier.dev/qudt/unit/Pennyweight",
            "https://vocab.sentier.dev/qudt/unit/PicoGM",
            "https://vocab.sentier.dev/qudt/unit/PlanckMass",
            "https://vocab.sentier.dev/qudt/unit/Quarter_UK",
            "https://vocab.sentier.dev/qudt/unit/SLUG",
            "https://vocab.sentier.dev/qudt/unit/SolarMass",
            "https://vocab.sentier.dev/qudt/unit/Stone_UK",
            "https://vocab.sentier.dev/qudt/unit/TON",
            "https://vocab.sentier.dev/qudt/unit/TONNE",
            "https://vocab.sentier.dev/qudt/unit/TON_Assay",
            "https://vocab.sentier.dev/qudt/unit/TON_LONG",
            "https://vocab.sentier.dev/qudt/unit/TON_Metric",
            "https://vocab.sentier.dev/qudt/unit/TON_SHORT",
            "https://vocab.sentier.dev/qudt/unit/TON_UK",
            "https://vocab.sentier.dev/qudt/unit/TON_US",
            "https://vocab.sentier.dev/qudt/unit/U"
        ],
        "definition": "The kilogram or kilogramme (SI symbol: kg), also known as the kilo, is the base unit of mass in the International System of Units and is defined as being equal to the mass of the International Prototype Kilogram (IPK), which is almost exactly equal to the mass of one liter of water. The avoirdupois (or international) pound, used in both the Imperial system and U.S. customary units, is defined as exactly 0.45359237 kg, making one kilogram approximately equal to 2.2046 avoirdupois pounds.",
        "note": "The kilogram or kilogramme (SI symbol: kg), also known as the kilo, is the base unit of mass in the International System of Units and is defined as being equal to the mass of the International Prototype Kilogram (IPK), which is almost exactly equal to the mass of one liter of water. The avoirdupois (or international) pound, used in both the Imperial system and U.S. customary units, is defined as exactly 0.45359237 kg, making one kilogram approximately equal to 2.2046 avoirdupois pounds.",
        "broaderTransitive": [
            "https://vocab.sentier.dev/qudt/quantity-kind/Mass"
        ],
        "narrowerTransitive": [
            "https://vocab.sentier.dev/qudt/unit/AMU",
            "https://vocab.sentier.dev/qudt/unit/CARAT",
            "https://vocab.sentier.dev/qudt/unit/CWT_LONG",
            "https://vocab.sentier.dev/qudt/unit/CWT_SHORT",
            "https://vocab.sentier.dev/qudt/unit/CentiGM",
            "https://vocab.sentier.dev/qudt/unit/DRAM_UK",
            "https://vocab.sentier.dev/qudt/unit/DRAM_US",
            "https://vocab.sentier.dev/qudt/unit/DWT",
            "https://vocab.sentier.dev/qudt/unit/DecaGM",
            "https://vocab.sentier.dev/qudt/unit/DeciGM",
            "https://vocab.sentier.dev/qudt/unit/DeciTONNE",
            "https://vocab.sentier.dev/qudt/unit/DeciTON_Metric",
            "https://vocab.sentier.dev/qudt/unit/EarthMass",
            "https://vocab.sentier.dev/qudt/unit/FemtoGM",
            "https://vocab.sentier.dev/qudt/unit/GM",
            "https://vocab.sentier.dev/qudt/unit/GRAIN",
            "https://vocab.sentier.dev/qudt/unit/HectoGM",
            "https://vocab.sentier.dev/qudt/unit/Hundredweight_UK",
            "https://vocab.sentier.dev/qudt/unit/Hundredweight_US",
            "https://vocab.sentier.dev/qudt/unit/KiloTONNE",
            "https://vocab.sentier.dev/qudt/unit/KiloTON_Metric",
            "https://vocab.sentier.dev/qudt/unit/LB",
            "https://vocab.sentier.dev/qudt/unit/LB_M",
            "https://vocab.sentier.dev/qudt/unit/LB_T",
            "https://vocab.sentier.dev/qudt/unit/LunarMass",
            "https://vocab.sentier.dev/qudt/unit/MOMME_Pearl",
            "https://vocab.sentier.dev/qudt/unit/MOMME_Textile",
            "https://vocab.sentier.dev/qudt/unit/MegaGM",
            "https://vocab.sentier.dev/qudt/unit/MegaTON",
            "https://vocab.sentier.dev/qudt/unit/MegaTONNE",
            "https://vocab.sentier.dev/qudt/unit/MicroGM",
            "https://vocab.sentier.dev/qudt/unit/MilliGM",
            "https://vocab.sentier.dev/qudt/unit/NanoGM",
            "https://vocab.sentier.dev/qudt/unit/OZ",
            "https://vocab.sentier.dev/qudt/unit/OZ_M",
            "https://vocab.sentier.dev/qudt/unit/OZ_TROY",
            "https://vocab.sentier.dev/qudt/unit/PFUND",
            "https://vocab.sentier.dev/qudt/unit/Pennyweight",
            "https://vocab.sentier.dev/qudt/unit/PicoGM",
            "https://vocab.sentier.dev/qudt/unit/PlanckMass",
            "https://vocab.sentier.dev/qudt/unit/Quarter_UK",
            "https://vocab.sentier.dev/qudt/unit/SLUG",
            "https://vocab.sentier.dev/qudt/unit/SolarMass",
            "https://vocab.sentier.dev/qudt/unit/Stone_UK",
            "https://vocab.sentier.dev/qudt/unit/TON",
            "https://vocab.sentier.dev/qudt/unit/TONNE",
            "https://vocab.sentier.dev/qudt/unit/TON_Assay",
            "https://vocab.sentier.dev/qudt/unit/TON_LONG",
            "https://vocab.sentier.dev/qudt/unit/TON_Metric",
            "https://vocab.sentier.dev/qudt/unit/TON_SHORT",
            "https://vocab.sentier.dev/qudt/unit/TON_UK",
            "https://vocab.sentier.dev/qudt/unit/TON_US",
            "https://vocab.sentier.dev/qudt/unit/U"
        ],
        "exactMatch": [
            "http://qudt.org/vocab/unit/KiloGM",
            "https://si-digital-framework.org/SI/units/kilogram",
            "https://vocab.sentier.dev/simapro/unit/kg",
            "https://glossary.ecoinvent.org/ids/487df68b-4994-4027-8fdc-a4dc298257b7"
        ],
        "hasDimensionVector": "http://qudt.org/vocab/dimensionvector/A0E0L0I0M1H0T0D0",
        "informativeReference": "http://en.wikipedia.org/wiki/Kilogram?oldid=493633626",
        "applicableSystem": [
            "http://qudt.org/vocab/sou/SI",
            "http://qudt.org/vocab/sou/CGS",
            "http://qudt.org/vocab/sou/CGS-EMU",
            "http://qudt.org/vocab/sou/CGS-GAUSS"
        ],
        "conversionMultiplier": "1.0",
        "conversionMultiplierSN": "1.0e0",
        "hasQuantityKind": "https://vocab.sentier.dev/qudt/quantity-kind/Mass"
    },
    "https://vocab.sentier.dev/qudt/unit/CARAT": {
        "type": "Concept",
        "prefLabel": "Carat",
        "notation": [
            "0112/2///62720#UAB166",
            "ct",
            "[car_m]",
            "CTM"
        ],
        "inScheme": "https://vocab.sentier.dev/qudt/",
        "broader": "https://vocab.sentier.dev/qudt/unit/KiloGM",
        "related": "http://dbpedia.org/resource/Carat",
        "definition": "The carat is a unit of mass equal to 200 mg and is used for measuring gemstones and pearls. The current definition, sometimes known as the metric carat, was adopted in 1907 at the Fourth General Conference on Weights and Measures, and soon afterward in many countries around the world. The carat is divisible into one hundred points of two milligrams each. Other subdivisions, and slightly different mass values, have been used in the past in different locations. In terms of diamonds, a paragon is a flawless stone of at least 100 carats (20 g). The ANSI X.12 EDI standard abbreviation for the carat is $CD$.",
        "broaderTransitive": [
            "https://vocab.sentier.dev/qudt/unit/KiloGM",
            "https://vocab.sentier.dev/qudt/quantity-kind/Mass"
        ],
        "exactMatch": "http://qudt.org/vocab/unit/CARAT",
        "hasDimensionVector": "http://qudt.org/vocab/dimensionvector/A0E0L0I0M1H0T0D0",
        "informativeReference": "http://en.wikipedia.org/wiki/Carat?oldid=477129057",
        "applicableSystem": "http://qudt.org/vocab/sou/CGS",
        "conversionMultiplier": "0.0002",
        "conversionMultiplierSN": "2.0E-4",
        "hasQuantityKind": "https://vocab.sentier.dev/qudt/quantity-kind/Mass"
    }
}

@jsvgoncalves
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants