Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Add index all data step #136

Merged
merged 3 commits into from
Apr 21, 2024

Conversation

syphax-bouazzouni
Copy link

@syphax-bouazzouni syphax-bouazzouni commented Apr 19, 2024

Require

Changes

  • Move the submission_all_data concern to a submission process service (80a2d3f)
  • Add index_all step to submission parsing steps (5e70088)
  • Add index all data submission status (87b87ed)

@syphax-bouazzouni syphax-bouazzouni force-pushed the feature/add-index-all-data-step branch 2 times, most recently from 87b87ed to 038883b Compare April 20, 2024 22:38
@syphax-bouazzouni syphax-bouazzouni force-pushed the feature/add-index-all-data-step branch from 038883b to 2c3b3a2 Compare April 20, 2024 22:47
@syphax-bouazzouni syphax-bouazzouni merged commit 135b0df into development Apr 21, 2024
24 checks passed
syphax-bouazzouni added a commit that referenced this pull request May 22, 2024
…ndexing ontologies data and metadata, URI data fetching (#135)

* Feature: Optimize tests run time by 50% (#107)

* update bubastis.jar v1.4.0 and fix missing import exception

* optimize mappings tests

* optimize provisional relation tests

* optimize notes tests

* fix mappings tests

* optimize instances tests

* add generate_missing_labels and extract_metadata process options

* don't index and extract by default in submission process in tests

* optimize ontology submission tests run time

* Feature: Add Virtuso, Allegrograph and Graphdb integration  to OLD  (#106)

* setup multi-store unit-tests environment

* fix unit tests

* add vo parsing optimization

* update RDF version replaced RDF::SKOS with RDF::Vocab::SKOS (#131)

* Fix: an  issue after update RDF gem to 3.0 that frozen request params (#133)

* fix an issue after update RDF gem to 3.0 that frozen request params

* handlee the case when the sparql endpoint default value is empty

* Feature: Migrate SOLR configuration files to use SOLR Schema API (#126)

* use standard SOLR in docker compose with no ontoportal old confgis

* migrate ontology properties SOLR configuration to use Schema API

* migrate ontology classes SOLR configuration to use Schema API

* migrate provisional classes indexation to use Schema API and model hooks

* update tests to handle the new indexation API

* simplify the ontology properties index schema

* update class and properties schema to use the existent dynamic names

* Feature: Index Ontologies metadata and content & Agents  (#130)

* use standard SOLR in docker compose with no ontoportal old confgis

* migrate ontology properties SOLR configuration to use Schema API

* migrate ontology classes SOLR configuration to use Schema API

* migrate provisional classes indexation to use Schema API and model hooks

* update tests to handle the new indexation API

* simplify the ontology properties index schema

* update class and properties schema to use the existent dynamic names

* index submission and ontologies metadata on save

* index agents metadata

* add ontology and agent metadata  indexation tests

* make agent, name , acronym, email and identifiers searchable

* unindex ontology submission when archived

* make ontology acronym and name searchable

* update embedded ontology to all the fields and update submission in save

* fix embed docs search tests

* rename ontology unindex to unindex_all_data to prevent conflicts

* implement index all ontology content

* fix unescaping indexed properties naming

* fix an issue after update RDF gem to 3.0 that frozen request params

* add parallel processing the index_all_data step

* clear indexed data after ontology delete

* optimize index all data in Virtuoso and GraphDb by pre-fetching all ids

- Before optimization
    - fs ⇒ 15.224490000051446s
    - ag ⇒ 19.238805999979377s
    - vo ⇒ 42.95274499990046s
    - gb ⇒ 33.52821200003382s
- After optimization
    - fs ⇒ 15.369778999942355s
    - ag ⇒ 17.367580000078306s
    - vo ⇒ 16.564614000031725s
    - gb ⇒ 15.431716999970376s

* Feature: Add URI fetching related triples and serialization in different formats  (#125)

* Add raptor library to parse ntriples data

* Add resource model to fetch id related triples and serialize it

* Add and inhance xml, ntriples, turtle and json serializers

* Updating rdf version in goo project

* updating resource model

* Adding tests for resource model and serializers

* update the resource test to have a more complete data to test (array, bnodes, typed values)

* re-implement xml serializer using RDF/XML parser instead of Raptor

* implement array handelling of resource to_object

* Enhance and refactor serializers ntriples, turtle and xml

* Enhance and refactor serializers ntriples, turtle and xml

* Handle blank nodes and reverse triples
- handle blank nodes
- fetch reverse triples
- generate random name for models in to_object, because when two model created the same time one overrides the other
- call the new serializer JSONLD and RDF_XML

* Impliment new serializers jsonld and rdf_xml

- impliment jsonld serializer that uses json-ld library
- revert changes in xml.rb file to the original implimentation, and put the new implimentation in rdf_xml.rb file
- Add the media types :jsonld and :rdf_xml

* Add json-ld gem

* Enhance the test resource

- Add some cases to the data tests
- refactor the test of the serializers formats

* Fix test for fetch-related triples and json

* clean and refactor the resource serializer code

* Removed unused methods
* Extracted duplicated code in methods
* Removed skip from the tests

---------

Co-authored-by: Syphax bouazzouni <[email protected]>

* Feature: Add submission metrics to the indexed data

* Feature: isolate ontology submission process steps (#132)

* add an abstraction for submission process steps

* extract submission generate_rdf step to a file

* extract submission generate missing labels steps into a file

* extract the submission archiving step into a file

* add  abstraction to diff tool & extract the submission step to a file

* extract the submission metrics generation step to a file

* extract the submission properties indexation step into a file

* extract the submission terms indexation step into a file

* move the extract metadata concern to submission process step file

* extract the submission generate obsolete classes step from generate rdf

* add the global submission process that call the sub-steps

* Feature: Add index all data step (#136)

* move the submission_all_data concern to a submission process service

* add index_all step to submission parsing steps

* add index all data submission status

* send note creation notification to also the admins (#137)

* change sparql client branch to use development

* fix indexing  all data been removed after the index terms step

---------

Co-authored-by: Imad Bourouche <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant