-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathen.search-data.min.e873ac24eb19fe8263f0a0062da3b9c27cbbb03feffc2ec0c88e05e74227370e.json
1 lines (1 loc) · 20.4 KB
/
en.search-data.min.e873ac24eb19fe8263f0a0062da3b9c27cbbb03feffc2ec0c88e05e74227370e.json
1
[{"id":0,"href":"/docs/guide-development/","title":"Guide Development","section":"Docs","content":"Workflow development Guide Workflows are all things that can be computed, broadly speaking.\nFor reproducibility, we want our workflows to be repeatable: producing the same output every time they are computed. This is easy enough to do in first approximation, but might be harder to achieve than it seems when the workflow relies on external resources. But we track every execution so it is not necessary to be overly concerned about these delicate details at every moment.\nGenerally, we want the workflows to be parametrized. Non-parametrized but strickly repeatable notebooks are equivalent to their output data.\nOne way to create them in ODA is to build jupyter notebook.\nSimple ODA Jupyter Notebook to a workflow Write a working repeatable workflow First you need to make sure your notebook runs in a cloud environment. It needs to be repeatable - i.e. you can run it many times. If it depends on external services - try to make sure the requests are also repeatable - you might need to specify sufficient details. If the notebook does not produce the exactly the same result every time - it\u0026rsquo;s unfortunate, but do not worry too much, it might still be reproducible (see motivation on the difference between reproducibility and repeatability)\n write your notebook, and make sure it runs from top to bottom make a requirements.txt will the modules you need for this notebook You can use a mock lightcurve notebook as an example.\nParametetrize the notebook create a cell with the following tag \u0026ldquo;parameters\u0026rdquo; (see papermill manual): Annotate the notebook outputs define the notebook output, similarly creating cell with tag \u0026ldquo;outputs\u0026rdquo;. outputs may be strings, floats, lists outputs may be also strings which contain filenames for valid files. If they do, the whole file will be considered output. if you want to give more detailed description of the notebook input and output, use terms from the ontology. (optional) Try a test service (optional) Add some verification test cases we will explain later how to do this.\nPublish your workflow as a test service publish the workflow to RenkuLab in the dedicated group: https://renkulab.io/gitlab/astronomy/mmoda/ once some bots do their job, the workflow will be automatically installed in MMODA! Try to access your new service Assuming lightcurve-example from above was used, and the notebook name was random, you can run this:\noda-api get -i lightcurve-example -p random -a T1_isot=30000 Maintaining semantic coherence in workflow development progression: from jupyter notebooks to python modules, packages, API\u0026rsquo;s At some point, it may be advisable to move part of code in functions of a python module (e.g. my_functions.py), stored in the same repository. The functions can be called from the workflow notebook as from my_functions import my_nice_function; my_nice_function(argument).\nIf some functions are often re-used, they can be stored in external packages, and even published on pypi (to allow pip install my_function_package).\nSometimes, the function may be in fact called remotely, though API. From the point of view of workflow (e.g. notebook) where the function is called there such a remotely executed function may look very similar to local function from a module, giving similar advantages and posing similar challenges.\nOn should be wary that extracting the functions somewhat obscure content of the workflow, by introducing structure which is not generally automatically traced by workflow execution provenance tracking.\nSo when reusable part of the workflow matures, it may be extracted and treated as another workflow, providing inputs to the current workflow under development.\nIt is not feasible to always design workflow to use other workflows by consuming some pre-computed inputs. As described above in this section, workflow development progression often separates some function from within the workflow, or uses. SmarkSky project and in a way in general renku plugins essentially acknowledges this feature of the workflows: they use external functions from within the code at random locations, possibly calling them multiple times.\nThis additional information about functions called by the workflow can be introduced to the workflow metadata with special annotations (see more about workflow annotation in ODA Workflow Publishing and Discovery Guide), such as oda:requestsAstroqueryService. These annotations should be also include information about parameters used to annotate the workflow. This additional structure associated with workflows will be ingested in the KG. While it can not be directly interpretted as workflow provenance graph, it is possible to produce additional similar-looking graph with inferred provenance, which is different but analogous to strict renku-derivde provenance.\n"},{"id":1,"href":"/docs/guide-discovery/","title":"Guide Discovery","section":"Docs","content":"Workflow Publishing and Discovery with KG: Astronomer Guide Latest-Version https://github.com/oda-hub/workflow-discovery/, also deployed as https://odahub.io/ Purpose of this note We want to demostrate on concrete and scientifically-useful working examples how an astronomer, who might indeed have relatively little interest to look in the code, can leverage ODA Knowledge Base and Knowledge Graphs together with other valuable resources (especially Renku):\n collaborate on workflows discover and use ODA-built services discovery and use our record of globally available web-based data analysis services easily contribute your own analysis as web-servces annotate your work in the ways ready for consumption by the synthetic astronomer robots, making some of the reasonable reasoning for scienstics, and most of all, support scientists with discovery space empowering their irreplacible scientific capacities. It\u0026rsquo;s clear that much of this functionality is available in other frameworks, usually custom for purpose. We make use of many the re-usable open-source technologies, to support an ecosystem of tools and other developments, which can be re-used between some of our projects.\nDeveloping the workflow The simplest way to build a workflow is to write a jupyter notebook. We will not go in every details here, see dedicated guide for step-by-step instructions.\nInstead, this document here focuses on workflow annotation, publishing, and discovery. These features are powered by an RDF Knowledge Graph. What exactly is stored in the Knowledge Graph, is described by the ontologies (which are themselves stored in some KG).\nOntology We will describe here the simplest elements of the ontology, which are necessary for workflow annotation. We will not go into details about how to define various constrains and relations on/between things here.\nOntology describes relations between some things, terms (represented as RDF URIs). URI can look like a URL, e.g. https://odahub.io/ontology/sources/Mrk421 (the URL may or may not be leading to a real location, although it generally should). The URI can be also shortened, assuming a namespace prefix:\nPREFIX odaSources: \u0026lt;https://odahub.io/ontology/sources#\u0026gt; (see some default list of prefixes here)\nThis way, https://odahub.io/ontology/sources/Mrk421 becomes odaSources:Mrk421.\nIt is necessary to annotate the workflow with these terms. Specifically, to make relations between the workflow and these terms. Relations have a form of simply propositions, expressed as subject-predicate-object triples.\nFor example\noda:sdssWorkflow oda:isImportantIn oda:radioAstronomy . or\noda:sdssWorkflow astroquery:uses astroquery:sdssArchive . oda:sdssWorkflow oda:isAbout odaSources:Mrk421 . Consider that it is benefitial to use terms already used by other people, described in existing ontologies. This way we speak in the same language as other people, and will be able to more easily combine our resources. However, it can be quite an effort to understand what other people meant, which is necessary to use their terms correctly. This effort should be made conciously when possible. It is advisable to also discuss unclear points withing our group, and come to a common solution.\nParticular attention should be paid to International Virtual Observatory (IVOA) vocabulaires. See their rdf vocabularies here: https://www.ivoa.net/rdf/index.html and references therein. Developments used in variety of tables managed by CDS-Strasbourg, where much of the needed terms for astrophysical entities can be found (one can start here).\nIt is however, often important to adopt project-specific narrowed-down scope. For example, our understanding of what an AGN is, may differ from that of CDS-Strasbourg. Which is why, in unclear cases, we should not hesitate to use custom terms, such as odaSources:AGN. Then, we can also model and encode equivalence between our own understanding of the AGN with that of CDS. For example, as so:\nodaSources:Mrk421 oda:isSubclassOf odaSources:AGN . odaSources:AGN oda:equivalentTo cds:AGN . Later, these equivalences can be reduced under specific assumptions: for example some agent may assume that oda:equivalentTo implies literal substitution in all contexts.\nWorkflow inputs For our purposes, the most important workflow properties are set by their inputs and outputs.\nWe will use nb2workflow (commands below will need pip install nb2workflow) to add addition details and instrospection on the workflow notebooks.\nname_input = \u0026#34;Mrk 421\u0026#34; # name of the object; if empty coordinates are used http://odahub.io/ontology/sourceName radius_input = 3.0 # arcmin They can be see for example with\n$ nbinspect final.ipynb ... \u0026#34;name_input\u0026#34;: { \u0026#34;comment\u0026#34;: \u0026#34; name of the object; if empty coordinates are used http://odahub.io/ontology/sourceName\u0026#34;, \u0026#34;default_value\u0026#34;: \u0026#34;Mrk 421\u0026#34;, \u0026#34;name\u0026#34;: \u0026#34;name_input\u0026#34;, \u0026#34;owl_type\u0026#34;: \u0026#34;http://odahub.io/ontology/sourceName\u0026#34;, \u0026#34;python_type\u0026#34;: \u0026#34;\u0026lt;class \u0026#39;str\u0026#39;\u0026gt;\u0026#34;, \u0026#34;value\u0026#34;: \u0026#34;Mrk 421\u0026#34; }, ... \u0026#34;radius_input\u0026#34;: { \u0026#34;comment\u0026#34;: \u0026#34; arcmin\u0026#34;, \u0026#34;default_value\u0026#34;: 3.0, \u0026#34;name\u0026#34;: \u0026#34;radius_input\u0026#34;, \u0026#34;owl_type\u0026#34;: \u0026#34;http://www.w3.org/2001/XMLSchema#float\u0026#34;, \u0026#34;python_type\u0026#34;: \u0026#34;\u0026lt;class \u0026#39;float\u0026#39;\u0026gt;\u0026#34;, \u0026#34;value\u0026#34;: 3.0 } ... Notice how in the first case, owl_type (OWL being the ontology definition language) is derived from the comment. And in the second case, it is derived from the variable type.\nTo generate a graph from the notebook:\n$ nb2rdf final-an.ipynb final.rdf This graph can be viewed, for example, with WebVOWL. It can also be published in a common location (see nb2service --help).\nTODO: show command to publish\nWorkflow properties from capturing workflow behavior Renku (with a renku plugin) is currently able to deduce that workflow uses some algorithms, providing basis for useful automatic annotation. Current plugin is dedicated to ML algorithms.\nWe expect in the future to make an ODA-specific (or, more generally, astrquery-specific) renku plugin.\nOther Domain-specific knowledge It is possible to assign any other characteristics to the workflow. It should be seem in case what makes sense. We used oda:importantIn predicate to assign relevance to some domains, e.g. domain:transients. In many cases these predicates can be assigned based on reasoning rules.\nFrom Workflow to Web-Based Data Analysis We made a simple tool to present an HTTP service executing given notebook on demand. See https://github.com/oda-hub/nb2workflow\nReasoning workflows Reasoning is transformation of knowledge.\nSince our knowledge base is the knowledge graph, our reasoning is transformation of the knowledge graph.\nI know, it may sound ambitious and unreasonable to claim capacity of out platform to reason. However, this terminology has been accepted in the community. This restricted form of general reasoning.\nIngesting data into the graph also transforms the graph. Moreoever, workflows ingesting data may be guided by the present, graph content. When possible, we separate reasoning from external source ingestion by ingesting first, and reasoning later: this allows to preserve. But it is not always feasible.\nReasoning is performed by executing these reasoning workflows: in response to external triggers, or just regularly.\nWorkflow exection Curiusly, it is very convenient to see worklow execution as reasoning. See more details.\nOther reasoning rules Various standard reasoning rules can be applied.\nLiterature Literature parsing Simple workflows to read astronomical and arxiv publications and produce some RDF.\nhttps://github.com/oda-hub/literature-to-facts\nLiterature building Integrating data into paper. Adding another compile step deriving data from various sources (a lot of the time - workflow executions) and producing macroses for the latex.\nMade use-case first, for the easist possible latex work.\nhttps://github.com/oda-hub/linked-data-latex\nHuman interventions into the KG Human agents are first-class citizens in the ODA KB/KB, on paar with the automated workflows. Humans are not very reproducible, but provide unique intuitively-guided inputs, owing to their own built-in very large but a bit vague Knowledge \u0026ldquo;Graphs\u0026rdquo;. Key aspect of our development here is to allow data and workflow interoperability. It is only natural that we are concerned with human-ODA unteroperability. Technically, we implement human interactions are implemented in the same way workflow executions.\nMost Humans experience the KB through various frontends. These multiple light-weight frontends allow making pre-defined actions, leading to workflow excutions.\nSome pre-built frontends for develoment needs are presented here:\nhttps://in.odahub.io/odatests/\nViewing computed workflows As described in the details on reasoning engine computed workflows are fully curryied workflows are equivalent to simple data-fetching workflows.\nAdding a workflow it should be as simple as pushing a button. They could be synchronized from Renku. If Renku will provide simple a limited public graph, we could directly use it, without reproducing it part of it in ODA KG.\nExample of adding new workflow which reacts on astro transients TODO\n"},{"id":2,"href":"/docs/guide-ontology/","title":"Guide Ontology","section":"Docs","content":"Ontology Purpose Ontology defines terms in which we describe what we do: data, workflows, publications, etc.\nWhile creating and discovering workflows it is useful to learn to speak in these terms: find and assign suitable annotations. In an increasingly large number of cases we identify and assign annotations automatically.\nThe tools we develop commit to implement the common understanding of the terms.\nDiscovering terms to use The terms look like URLs, e.g. https://odahub.io/ontology/#AstrophysicalObject . These URLs can be directly pasted in the browser, leading to some description:\nWe advise to look into public ontologies like https://www.ivoa.net/rdf/object-type/2020-10-06/object-type.rdf and https://odahub.io/ontology/ for the available terms. If it looks like there is nothing suitable there - it may be necessary to introduce new terms.\nAdvanced: It is also possible to look into an interactive graph explorer http://graphdb.obsuks1.unige.ch/ and https://share.streamlit.io/oda-hub/streamlite-graph/javascript-lib-interaction/main/main.py .\nAdding new terms to the ontology Sometimes, it is necessary to add a new term. In principle, Workflow Developer may add a new term at will - it is their own understanding of what is being labeled. But the term will not be fully used until it is related to other terms in the Ontology, which is done either automatically or by the Ontology Developers.\nOntology Developers can improve the common ontology with http://webprotege.obsuks1.unige.ch .\nThere is also an experimental edit interface here, the edited result should be stored and uploaded manually.\n"},{"id":3,"href":"/docs/issues/","title":"Issues","section":"Docs","content":"What if a user experiences a problem? Purpose Explain to users how issues are handled, and what can be expected.\nProcess user receives a kind message, \u0026ldquo;treatment redirected to humans, follow-up promised\u0026rdquo;. This may be delivered in the interactive session, and/or in the email. issue is addressed by the support, and request can be submitted again. It will be generally pre-computed by the time of the new request. user is informed by the platform that all is good, but it is clear to the user that the result is not satisfactory \u0026ldquo;feedback\u0026rdquo; button anything is unclear please feel free to contact [email protected] Please also consider consulting http://status.odahub.io/ to check for any current problems.\n"},{"id":4,"href":"/docs/reasoning-engine/","title":"Reasoning Engine","section":"Docs","content":"Details about the reasoning engines Workflows entities in the KG can undergo various transformations. One key transformation is currying, understood in the same way as function currying - since workflow, for our purposes, is very similar to a function. Currying transforms workflow with parameters with workflow with less parameters (arguments), possibly without any parameters. We assume that only workflow without parameters can be computed (executed).\nWorkflow execution is\nThis approach separates:\n workflow composition, which becomes one of the workflow transformation operations. workflow execution (computing) Reasoning engine is itself a process (workflow) which takes as an input some KG state, and produces new triples (which can be inserted back in the KG).\nCurrying worker Execution workflows have a property which describes what can execute them. Two forms used now are\nExecuting, computing the workflow is also a reasoning action, deriving equivalence between the given workflow and a trivial worklow which implements to request to data store.\n"},{"id":5,"href":"/docs/workflow-development-progression/","title":"Workflow Development Progression","section":"Docs","content":"Maintaining semantic coherence in workflow development progression: from jupyter notebooks to python modules, packages, API\u0026rsquo;s At some point, it may be advisable to move part of code in functions of a python module (e.g. my_functions.py), stored in the same repository. The functions can be called from the workflow notebook as from my_functions import my_nice_function; my_nice_function(argument).\nIf some functions are often re-used, they can be stored in external packages, and even published on pypi (to allow pip install my_function_package).\nSometimes, the function may be in fact called remotely, though API. From the point of view of workflow (e.g. notebook) where the function is called there such a remotely executed function may look very similar to local function from a module, giving similar advantages and posing similar challenges.\nOn should be wary that extracting the functions somewhat obscure content of the workflow, by introducing structure which is not generally automatically traced by workflow execution provenance tracking.\nSo when reusable part of the workflow matures, it may be extracted and treated as another workflow, providing inputs to the current workflow under development.\nIt is not feasible to always design workflow to use other workflows by consuming some pre-computed inputs. As described above in this section, workflow development progression often separates some function from within the workflow, or uses. SmarkSky project and in a way in general renku plugins essentially acknowledges this feature of the workflows: they use external functions from within the code at random locations, possibly calling them multiple times.\nThis additional information about functions called by the workflow can be introduced to the workflow metadata with special annotations (see more about workflow annotation in ODA Workflow Publishing and Discovery Guide), such as oda:requestsAstroqueryService. These annotations should be also include information about parameters used to annotate the workflow. This additional structure associated with workflows will be ingested in the KG. While it can not be directly interpretted as workflow provenance graph, it is possible to produce additional similar-looking graph with inferred provenance, which is different but analogous to strict renku-derivde provenance.\n"}]