Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically generate text definitions from logical definitions #2349

Merged
merged 8 commits into from
May 1, 2024

Conversation

gouttegd
Copy link
Collaborator

This PR exploits the ODK preprocessing step to automatically inject text definitions for terms that are lacking one, using the rewrite-def command of FlyBase’s ROBOT plugin. That command, as used here, find terms without a text definition (or terms with a definition consisting only of a single dot) and, if they have a logical definition, automatically translate the logical definition to a human-readable definition.

Related to #2342

This commit uses the FlyBase ROBOT plugin to automatically generate text
definitions for terms that do not have any (or that have a "."
definition). The generated definition is derived from the term's logical
definition.
@gouttegd gouttegd self-assigned this Apr 22, 2024
@gouttegd
Copy link
Collaborator Author

Here are the definitions that would be generated (given the current state of the `-edit` file) by this PR:
Term Generated definition
CL:0000006 Any sensory receptor cell that is a(n) neuron and is capable of some detection of stimulus involved in sensory perception (GO:0050906).
CL:0000029 Any neuron that develops from some migratory neural crest cell.
CL:0000154 Any secretory cell that is capable of some protein secretion (GO:0009306).
CL:0000167 Any secretory cell that is capable of some peptide hormone secretion (GO:0030072).
CL:0000168 Any secretory cell that is capable of some insulin secretion (GO:0030073).
CL:0000172 Any secretory cell that is capable of some somatostatin secretion (GO:0070253).
CL:0000174 Any secretory cell that is capable of some steroid hormone secretion (GO:0035929).
CL:0000176 Any secretory cell that is capable of some ecdysteroid secretion (GO:0045457).
CL:0000177 Any secretory cell that is capable of some testosterone secretion (GO:0035936).
CL:0000179 Any secretory cell that is capable of some progesterone secretion (GO:0042701).
CL:0000200 Any neuron that is capable of some detection of mechanical stimulus involved in sensory perception of touch (GO:0050976).
CL:0000203 Any neuronal receptor cell that is capable of some detection of mechanical stimulus involved in sensory perception of gravity (GO:0070999).
CL:0000207 Any neuron that is capable of some detection of chemical stimulus involved in sensory perception of smell (GO:0050911).
CL:0000227 Any cell that has characteristic some binucleate (PATO:0001406).
CL:0000255 Any cell that only exists in Eukaryota (NCBITaxon:2759).
CL:0000257 Any cell that only exists in Eumycetozoa (NCBITaxon:142796).
CL:0000287 Any photoreceptor cell that is part of some eye (UBERON:0000970).
CL:0000319 Any secretory cell that is capable of some mucus secretion (GO:0070254).
CL:0000329 Any cell that is capable of some oxygen transport (GO:0015671).
CL:0000349 Any cell that is part of some extraembryonic structure (UBERON:0000478).
CL:0000350 Any extraembryonic cell that is part of some amnioserosa (UBERON:0010302).
CL:0000397 Any interneuron that has its soma located in some ganglion (UBERON:0000045).
CL:0000408 Any male germ cell that has characteristic some haploid (PATO:0001375) and is capable of some fertilization (GO:0009566).
CL:0000443 Any secretory cell that is capable of some calcitonin secretion (GO:0036161).
CL:0000456 Any secretory cell that is capable of some mineralocorticoid secretion (GO:0035931).
CL:0000460 Any secretory cell that is capable of some glucocorticoid secretion (GO:0035933).
CL:0000521 Any cell that only exists in Fungi (NCBITaxon:4751).
CL:0000611 Any granulocytopoietic cell that has part some transcription factor PU.1 (PR:000001944) and has part some CCAAT/enhancer-binding protein alpha (PR:000005307) and has part some erythroid transcription factor (PR:000007857) and lacks_plasma_membrane_part some CD19 molecule (PR:000001002) and lacks_plasma_membrane_part some CD4 molecule (PR:000001004) and lacks_plasma_membrane_part some integrin alpha-M (PR:000001012) and lacks_plasma_membrane_part some CD3 epsilon (PR:000001020) and lacks_plasma_membrane_part some neural cell adhesion molecule 1 (PR:000001024) and lacks_plasma_membrane_part some CD2 molecule (PR:000001083) and lacks_plasma_membrane_part some T-cell surface glycoprotein CD8 alpha chain (PR:000001084) and lacks_plasma_membrane_part some membrane-spanning 4-domains subfamily A member 1 (PR:000001289) and lacks_plasma_membrane_part some T-cell surface glycoprotein CD5 (PR:000001839) and lacks_plasma_membrane_part some CD14 molecule (PR:000001889) and lacks_plasma_membrane_part some lymphocyte antigen 6G (PR:000002978) and lacks_plasma_membrane_part some lymphocyte antigen 76 (mouse) (PR:000002981) and has plasma membrane part some CD34 molecule (PR:000001003) and has plasma membrane part some ADP-ribosyl cyclase/cyclic ADP-ribose hydrolase 1 (PR:000001408) and has plasma membrane part some interleukin-3 receptor class 2 alpha chain (PR:000001865) and has plasma membrane part some interleukin-5 receptor subunit alpha (PR:000001867) and has plasma membrane part some mast/stem cell growth factor receptor (PR:000002065) and is capable of some eosinophil differentiation (GO:0030222).
CL:0000691 Any interneuron that has characteristic some stellate morphology (PATO:0070010).
CL:0000712 Any epidermal cell that is part of some stratum granulosum of epidermis (UBERON:0002069).
CL:0000725 Any cell that is capable of some nitrogen fixation (GO:0009399).
CL:0002551 Any skin fibroblast that is part of some dermis (UBERON:0002067).
CL:0002552 Any fibroblast that is part of some gingiva (UBERON:0001828).
CL:0002598 Any smooth muscle cell that is part of some bronchus (UBERON:0002185).
CL:0002602 Any connective tissue cell that is part of some annulus fibrosus disci intervertebralis (UBERON:0004715).
CL:0002621 Any epithelial cell that is part of some gingival epithelium (UBERON:0001949).
CL:0002631 Any respiratory epithelial cell that is part of some upper respiratory tract (UBERON:0001557).
CL:0002632 Any respiratory epithelial cell that is part of some lower respiratory tract (UBERON:0001558).
CL:0008021 Any neuron that has its soma located in some anterior lateral line ganglion (UBERON:2001391).
CL:0008028 Any neuron that is capable of part of some visual perception (GO:0007601).
CL:0008035 Any vascular associated smooth muscle cell that is part of some microcirculatory vessel (UBERON:0010523).
CL:0010006 Any blood vessel endothelial cell that is part of some heart (UBERON:0000948).
CL:0010007 Any cell that is part of some His-Purkinje system (UBERON:0004146).
CL:0010008 Any endothelial cell that is part of some heart (UBERON:0000948).
CL:0010009 Any photoreceptor cell that is part of some camera-type eye (UBERON:0000019).
CL:0010010 Any stellate neuron that has its soma located in some cerebellum (UBERON:0002037).
CL:0010020 Any glial cell that is part of some heart (UBERON:0000948).
CL:0010021 Any myoblast that develops into some cardiac muscle cell.
CL:0013000 Any radial glial cell that is part of some forebrain (UBERON:0001890).
CL:0017509 Any nucleus (GO:0005634) that has characteristic some alobate (PATO:0002506).
CL:1000001 Any neuron that has its soma located in some retrotrapezoid nucleus (UBERON:0009918).
CL:1000022 Any epithelial cell that is part of some mesonephric nephron tubule (UBERON:0005329).
CL:1000042 Any neuroblast (sensu Vertebrata) that is part of some forebrain (UBERON:0001890).
CL:1000050 Any glial cell that is part of some lateral line nerve (UBERON:0008906).
CL:1000073 Any radial glial cell that is part of some spinal cord (UBERON:0002240).
CL:1000090 Any epithelial cell that is part of some pronephric nephron tubule (UBERON:0005310).
CL:1000123 Any epithelial cell that is part of some metanephric nephron tubule (UBERON:0005146).
CL:1000143 Any goblet cell that is part of some lung (UBERON:0002048).
CL:1000182 Any tip cell that is part of some Malpighian tubule (UBERON:0001054).
CL:1000217 Any chondrocyte that is part of some growth plate cartilage (UBERON:0004129).
CL:1000222 Any neuroendocrine cell that is part of some stomach (UBERON:0000945).
CL:1000236 Any glial cell that is part of some posterior lateral line nerve (UBERON:2000175).
CL:1000239 Any glial cell that is part of some anterior lateral line nerve (UBERON:2000425).
CL:1000245 Any neuron that has its soma located in some posterior lateral line ganglion (UBERON:2001314).
CL:1000271 Any ciliated cell that is part of some lung (UBERON:0002048).
CL:1000272 Any secretory cell that is part of some lung (UBERON:0002048).
CL:1000510 Any kidney epithelial cell that is part of some glomerular epithelium (UBERON:0004188).
CL:1000550 Any kidney cell that is part of some papillary duct (UBERON:0005167).
CL:1000596 Any kidney cell that is part of some juxtamedullary cortex (UBERON:0005271).
CL:1000600 Any cell that is part of some lower urinary tract (UBERON:0001556).
CL:1000601 Any cell that is part of some ureter (UBERON:0000056).
CL:1000606 Any neuron that has its soma located in some kidney (UBERON:0002113).
CL:1000612 Any renal cortical epithelial cell that is part of some renal corpuscle (UBERON:0001229).
CL:1000615 Any kidney tubule cell that is part of some renal cortex tubule (UBERON:0006853).
CL:1000616 Any kidney medulla cell that is part of some outer medulla of kidney (UBERON:0001293).
CL:1000617 Any kidney medulla cell that is part of some inner medulla of kidney (UBERON:0001294).
CL:1000618 Any kidney cortical cell that is part of some juxtaglomerular apparatus (UBERON:0002303).
CL:1000702 Any smooth muscle cell that is part of some kidney pelvis smooth muscle (UBERON:0004227).
CL:1000703 Any kidney epithelial cell that is part of some kidney pelvis urothelium (UBERON:0004788).
CL:1000706 Any urothelial cell that is part of some urothelium of ureter (UBERON:0001254).
CL:1000708 Any ureteral cell that is part of some adventitia of ureter (UBERON:0001252).
CL:1000714 Any renal principal cell that is part of some cortical collecting duct (UBERON:0004203).
CL:1000715 Any renal intercalated cell that is part of some cortical collecting duct (UBERON:0004203).
CL:1000718 Any renal principal cell that is part of some inner medullary collecting duct (UBERON:0004205).
CL:1000719 Any renal intercalated cell that is part of some inner medullary collecting duct (UBERON:0004205).
CL:1000720 Any renal intercalated cell that is part of some papillary duct (UBERON:0005167).
CL:1000721 Any renal principal cell that is part of some papillary duct (UBERON:0005167).
CL:1000746 Any kidney corpuscule cell that is part of some renal glomerulus (UBERON:0000074).
CL:1000768 Any nephron tubule epithelial cell that is part of some renal connecting tubule (UBERON:0005097).
CL:1000838 Any epithelial cell of proximal tubule that is part of some proximal convoluted tubule (UBERON:0001287).
CL:1000839 Any epithelial cell of proximal tubule that is part of some proximal straight tubule (UBERON:0001290).
CL:1000849 Any epithelial cell of distal tubule that is part of some distal convoluted tubule (UBERON:0001292).
CL:1000850 Any epithelial cell of distal tubule that is part of some macula densa (UBERON:0002335).
CL:1000891 Any kidney blood vessel cell that is part of some kidney arterial blood vessel (UBERON:0003644).
CL:1000892 Any kidney blood vessel cell that is part of some kidney capillary (UBERON:0003527).
CL:1000893 Any kidney blood vessel cell that is part of some renal vein (UBERON:0001140).
CL:1000909 Any nephron tubule epithelial cell that is part of some loop of Henle (UBERON:0001288).
CL:1000979 Any smooth muscle cell that is part of some muscular coat of ureter (UBERON:0006855).
CL:1001005 Any kidney capillary endothelial cell that is part of some glomerular capillary endothelium (UBERON:0004294).
CL:1001006 Any kidney arterial blood vessel cell that is part of some renal afferent arteriole (UBERON:0004639).
CL:1001009 Any kidney arterial blood vessel cell that is part of some renal efferent arteriole (UBERON:0004640).
CL:1001016 Any kidney loop of Henle epithelial cell that is part of some ascending limb of loop of Henle (UBERON:0005164).
CL:1001021 Any kidney loop of Henle epithelial cell that is part of some descending limb of loop of Henle (UBERON:0001289).
CL:1001045 Any kidney arterial blood vessel cell that is part of some renal cortex artery (UBERON:0005268).
CL:1001052 Any kidney venous blood vessel cell that is part of some renal cortex vein (UBERON:0005269).
CL:1001096 Any endothelial cell that is part of some renal afferent arteriole (UBERON:0004639).
CL:1001097 Any smooth muscle cell that is part of some renal afferent arteriole (UBERON:0004639).
CL:1001099 Any endothelial cell that is part of some renal efferent arteriole (UBERON:0004640).
CL:1001100 Any smooth muscle cell that is part of some renal efferent arteriole (UBERON:0004640).
CL:1001123 Any peritubular capillary endothelial cell that is part of some outer renal medulla peritubular capillary (UBERON:0006341).
CL:1001124 Any peritubular capillary endothelial cell that is part of some renal cortex peritubular capillary (UBERON:0006851).
CL:1001126 Any vasa recta cell that is part of some inner renal medulla vasa recta (UBERON:0004776).
CL:1001127 Any vasa recta cell that is part of some outer renal medulla vasa recta (UBERON:0004775).
CL:1001135 Any kidney cortex artery cell that is part of some kidney arcuate artery (UBERON:0001552).
CL:1001138 Any kidney cortex artery cell that is part of some interlobular artery (UBERON:0004723).
CL:1001142 Any kidney cortex vein cell that is part of some kidney arcuate vein (UBERON:0004719).
CL:1001145 Any kidney cortex vein cell that is part of some renal interlobular vein (UBERON:0005168).
CL:1001209 Any vasa recta ascending limb cell that is part of some inner medulla vasa recta ascending limb (UBERON:0009092).
CL:1001210 Any vasa recta ascending limb cell that is part of some outer medulla vasa recta ascending limb (UBERON:0009093).
CL:1001213 Any endothelial cell that is part of some kidney arcuate artery (UBERON:0001552).
CL:1001214 Any smooth muscle cell that is part of some kidney arcuate artery (UBERON:0001552).
CL:1001216 Any endothelial cell that is part of some interlobular artery (UBERON:0004723).
CL:1001217 Any smooth muscle cell that is part of some interlobular artery (UBERON:0004723).
CL:1001220 Any endothelial cell that is part of some kidney arcuate vein (UBERON:0004719).
CL:1001221 Any smooth muscle cell that is part of some kidney arcuate vein (UBERON:0004719).
CL:1001223 Any endothelial cell that is part of some renal interlobular vein (UBERON:0005168).
CL:1001224 Any smooth muscle cell that is part of some renal interlobular vein (UBERON:0005168).
CL:1001286 Any vasa recta descending limb cell that is part of some inner medulla vasa recta descending limb (UBERON:0009089).
CL:1001287 Any vasa recta descending limb cell that is part of some outer medulla vasa recta descending limb (UBERON:0009090).
CL:1001319 Any cell that is part of some urinary bladder (UBERON:0001255).
CL:1001320 Any cell that is part of some urethra (UBERON:0000057).
CL:1001430 Any urothelial cell that is part of some urethra urothelium (UBERON:0004787).
CL:1001431 Any renal principal cell that is part of some collecting duct of renal tubule (UBERON:0001232).
CL:1001432 Any renal intercalated cell that is part of some collecting duct of renal tubule (UBERON:0001232).
CL:1001567 Any endothelial cell of vascular tree that is part of some lung (UBERON:0002048).
CL:1001568 Any endothelial cell of vascular tree that is part of some pulmonary artery (UBERON:0002012).
CL:4023039 Any neuron that has its soma located in some amygdala (UBERON:0001876) and is capable of some glutamate secretion, neurotransmission (GO:0061535).
CL:4023057 Any GABAergic interneuron that has its soma located in some cerebellar cortex (UBERON:0002129).

@gouttegd
Copy link
Collaborator Author

The generated definition for CL:0000611 is particularly ugly, but that’s only because the logical definition of that term is itself ugly (and of dubious usefulness in my opinion).

@dosumis
Copy link
Contributor

dosumis commented Apr 22, 2024

@aleixpuigb @JABelfiore @AvolaAmg @Caroline-99 - I'd like your comments on the autodefs in Damien's table here: #2349 (comment) Do you think they are an acceptable way to get better definition coverage? - perhaps as an interim step before review?

@addiehl
Copy link
Contributor

addiehl commented Apr 22, 2024

I would suggest simply defining CL:0000611 eosinophil progenitor cell in an analogous way to CL:0000834 neutrophil progenitor cell: "A progenitor cell of the neutrophil lineage."

Thus, "A progenitor cell of the eosinophil lineage."

The logical definition captures a lot of marker detail used to identify this cell type uniquely in flow cytometry by excluding so-called lineage markers, to exclude other leukocyte subsets. I agree it is perhaps a bit ornate, but I would prefer to leave until someone has a look at the all the granulocytes in their immature and mature forms.

@gouttegd
Copy link
Collaborator Author

I would suggest simply defining CL:0000611 eosinophil progenitor cell in an analogous way to CL:0000834 neutrophil progenitor cell: "A progenitor cell of the neutrophil lineage."

The rewrite-def command, as used here, will only ever generate definitions for terms that don’t have one, so if an editor wants to “override” the generated definition for any given term, all they have to do is to create a “manual” definition the usual way. This will automatically prevent the generation of a definition derived from logical axioms.

@aleixpuigb
Copy link
Collaborator

Aside of CL:0000611, it is a nice way to have a temporary definition until a more detailed one is added. Is there an easy way to not include the differentia ID? I don't think it is a big problem, but if they can be removed easily, it looks better in my opinion.

@gouttegd
Copy link
Collaborator Author

Is there an easy way to not include the differentia ID?

It requires some changes in the code of the rewrite-def command, but nothing complicated.

This commit updates the FlyBase ROBOT plugin to its latest version so
that we can use the newly introduced `--no-ids` option, to prevent the
insertion of term IDs within auto-generated definitions.
@gouttegd
Copy link
Collaborator Author

Is there an easy way to not include the differentia ID?

It requires some changes in the code of the rewrite-def command, but nothing complicated.

It’s done.

@AvolaAmg
Copy link
Collaborator

Except for the differentia ID, is there a way to flag the fact that these were automatically generated definitions?

@gouttegd
Copy link
Collaborator Author

@AvolaAmg We just lack a standardised way of doing so.

In FlyBase, we annotate automatically generated definitions with the pseudo-CURIE FBC:Autogenerated in a cross-reference (where FBC stands for “FlyBase Curator”). Obviously that would not be suitable here.

Ideally we would have a dedicated annotation property (not oboInOwl:hasDbXref) specifically intended to flag auto-generated content. AFAIK such a property does not exist (yet).

@anitacaron
Copy link
Contributor

@gouttegd
Copy link
Collaborator Author

I found this term in Uberon with an automated definition

Not really keen on embedding the “auto-generated definition” bit directly within the definition itself. Could do as a stop-gap measure until we come up with a proper, dedicated annotation (I’ve asked for one here), though…

@aleixpuigb
Copy link
Collaborator

IIRC we are also not using any label for auto-generated definitions in DOSDP.

@gouttegd
Copy link
Collaborator Author

We also add a FBC:Autogenerated annotation for DOSDP-generated definitions in FBbt, but indeed it doesn’t seem we do that in CL.

So if we already have auto-generated definitions that are currently not flagged as such, I’d be inclined to do nothing on that front for now (that is, not try to flag the definitions derived from logical axioms as being auto-generated). If/when the OBO community agrees on a standard way to flag such auto-generated content, we can then update both our DOSDP patterns and the the mechanism used in this PR to add the proper annotation.

@dosumis
Copy link
Contributor

dosumis commented Apr 24, 2024

I'd really like them flagged. I don't want to let arguments about semantics get in the way of doing that.

@matentzn
Copy link
Contributor

I like the FBC:Autogenerated for now until a more general solution can be found!

@gouttegd
Copy link
Collaborator Author

OK, I’ll add an option to the plugin to allow specifying an arbitrary annotation to add to the newly generated definitions: --add-annotation "oboInOwl:hasDbXref FBC:Autogenerated".

(I am concerned this will turn out to be one of those “temporary solutions” that will stay around for years without ever being replaced by a proper annotation, but I’ve made my point about wanting a standardised solution, no need to elaborate.)

Update the preprocessing step to annotate all automatically generated
definitions with a cross-referencewith the special value
"FBC:Autogenerated".
@gouttegd
Copy link
Collaborator Author

gouttegd commented May 1, 2024

Any other objection or thing that should be changed before we merge here?

The updated PR injects definitions that do not include the differentia ID and that are flagged with a FBC:Autogenerated cross-reference annotation (that I personally think is ridiculous in a CL context, but (1) I seem to be the only one in that case and (2) that can be changed at anytime merely by changing one line in the Makefile).

Anything else?

Copy link
Contributor

@matentzn matentzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally dont really think we need to include any ids in the generated definitions, but this solution here is a 10-fold improvement over the status quo, so APPROVE, and iterate!

@gouttegd
Copy link
Collaborator Author

gouttegd commented May 1, 2024

I personally dont really think we need to include any ids in the generated definitions

Good, because we do not do that. :) The initial version of the PR did (that’s the default behaviour of the rewrite-def command, because that’s we do in FBbt), until someone asked for ID-less definitions. The behaviour is controlled by the --no-ids option, so if someone later decides that actually including IDs is desirable, all that will be needed is to remove that option. (I personally have no opinion on that at all.)

@matentzn
Copy link
Contributor

matentzn commented May 1, 2024

I tried to find an updated version of #2349 (comment), I thought that was the latest state! Sorry. No worries. All is good!

@gouttegd gouttegd merged commit 44e4c8d into master May 1, 2024
1 check passed
@gouttegd gouttegd deleted the automatic-text-definitions branch May 1, 2024 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants