Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open External tool to alternative sparql endpoint #21

Open
hugolpz opened this issue Apr 24, 2024 · 3 comments
Open

Open External tool to alternative sparql endpoint #21

hugolpz opened this issue Apr 24, 2024 · 3 comments
Assignees

Comments

@hugolpz
Copy link
Member

hugolpz commented Apr 24, 2024

Qichwa services

Lingualibre JS do edit

Approach

Edit the ExternalTools.prototype.WikidataQueryService into a more generalist function.

Context

Lingualibre properties on Lingualibre items :

  • P12 : Wikidata ID, for concepts such as toponyms or people on wikidata
  • P21 : Lexeme ID, for words on wikidata
  • P19 : Wikipedia title.
  • P20 : Wiktionary entry.

Test SPARQL

PREFIX qwb: <https://qichwa.wikibase.cloud/entity/>
PREFIX qdp: <https://qichwa.wikibase.cloud/prop/direct/>
PREFIX qp: <https://qichwa.wikibase.cloud/prop/>
PREFIX qps: <https://qichwa.wikibase.cloud/prop/statement/>
PREFIX qpq: <https://qichwa.wikibase.cloud/prop/qualifier/>
PREFIX qpr: <https://qichwa.wikibase.cloud/prop/reference/>
PREFIX qno: <https://qichwa.wikibase.cloud/prop/novalue/>
select ?entry ?id ?idLabel ?posLabel
where {
?entry a ontolex:LexicalEntry; 
       wikibase:lemma ?idLabel;
       wikibase:lexicalCategory [rdfs:label ?posLabel] . filter(lang(?posLabel)="en")
       # OPTIONAL { 
         ?entry qdp:P1 ?wikidata.
          BIND (iri(concat("http://www.wikidata.org/entity/",?wikidata)) as ?id)
       # }
}

Test url

Run the query > Link > SPARQL enpoint : right click > copy link

@hugolpz hugolpz self-assigned this Apr 24, 2024
@hugolpz
Copy link
Member Author

hugolpz commented Apr 24, 2024

Lingualibre "External tool" query to external endpoints is ideal when we want to keep a joint with Wikidata items or lexemes.
It allows easier later feedback contributions to wikidata, like reinjecting Lingualibre's audios into those correct wikidata items or lexemes pages.

I tested this query on your project :

PREFIX qwb: <https://qichwa.wikibase.cloud/entity/>
PREFIX qdp: <https://qichwa.wikibase.cloud/prop/direct/>
PREFIX qp: <https://qichwa.wikibase.cloud/prop/>
PREFIX qps: <https://qichwa.wikibase.cloud/prop/statement/>
PREFIX qpq: <https://qichwa.wikibase.cloud/prop/qualifier/>
PREFIX qpr: <https://qichwa.wikibase.cloud/prop/reference/>
PREFIX qno: <https://qichwa.wikibase.cloud/prop/novalue/>
select ?entry ?id ?idLabel ?posLabel
where {
?entry a ontolex:LexicalEntry; 
       wikibase:lemma ?idLabel;
       wikibase:lexicalCategory [rdfs:label ?posLabel] . filter(lang(?posLabel)="en")
       OPTIONAL { 
         ?entry qdp:P1 ?wikidata.
          BIND (iri(concat("http://www.wikidata.org/entity/",?wikidata)) as ?id)
       }
}

Your project's data actually very rarely has a Wikidata id (P1), so there is curently no point to be using the external tool.

Solution 1 : low strategy

You can therefore equally create a non-jointed wikipage list (Telegram discussion > solution 2) :

Open https://lingualibre.org/wiki/List:Que/Elwin . Where `Que` is your language's iso 639-3.
Add by hand your 6,000 words, one word per line such as :
# word1
# word2
# word3
Save.
Message me, i will do some edit.

Then, open Lingualibre.org recording studio.
Step2: select Quechua
Step3: select "Local list" > search : List:Que/Elwin

Solution 2 : medium strategy

1. Create 2 new properties on Lingualibre
   - Property `Lexicographic external base` : `qichwa.wikibase.cloud`
   - Property `Lexicographic external base ID` : `L2` (for https://qichwa.wikibase.cloud/wiki/Lexeme:L2 )
2. Finish to fix externaltool.js so it
   2.1 pulls from qichwa : 
      - ?id = L2
      - ?idLabel yaku
   2.2 uploads .wav file to commons
   2.3 records on lingualibre item :
      - Wikimedia Commons recording pointer url ( https://commons.wikimedia.org/wiki/File:*.wav ) See example: https://lingualibre.org/wiki/Q191178#P3
      - Lexicographic external base : qichwa.wikibase.cloud
      - Lexicographic external base ID : L2
3. A Lingualibre <=> Qichwa joint now exists : 
   3.1 On Qichwa.wikibase.cloud, create property 'recording url pointer' on the model of https://lingualibre.org/wiki/Property:P3
   3.2 Use a bot to read Lingualibre Qichwa items, then read
      - ?id = P? `Lexicographic external base ID` value 
      - ?url P3 `recording url pointer`
   3.3 Use bot to update qichwa.wikibase.cloud/wiki/Lexeme:{id}#{url}

but you have one year to do so.

Solution 3: high strategy

  1. On Wikidata, request the creation of a Qichwa_wikibase_identifier. Can refer to https://wikidata.org/wiki/Wikidata:Property_proposal/Lingua_Libre_ID
  2. Mass export relevant Qichwa lexical data to Wikidata with joint via Qichwa_wikibase_identifier
  3. Use unedited ExternalTool.js to query Wikidata lexemes in Qichwa.

Sum up

Title Pro Con
👉🏼 hand made Lingualibre lists with no wikibases joint. Pro: Fasted. Con: Weakest joint.
👉🏼 externaltool.js can be made compatible to pull ?id and ?idLabel from qichwa to generated list and jointed Lingualibre items. Delay: 2~4 weeks to get into prod. Pro: Good joint.
I'm available to do so if needed. Delay: 2~4 weeks to get into prod.
Con: temporary solution, will need a bot to finish it up.
👉🏼 Wikidata property creation for Qichwa_wikibase_id. Pro: Good joint. Con: Slowest.

@hugolpz
Copy link
Member Author

hugolpz commented Apr 24, 2024

This nearly solve the issue. indexOfId switch to clarify.

'use strict';

		var PETSCAN_URL = 'petscan.wmflabs.org/',
			WDQS_URL    = 'query.wikidata.org/',
			QICHWA_URL  = 'qichwa.wikibase.cloud/query/sparql',
			rw = mw.recordWizard;

		var ExternalTools = function ( config ) {
			rw.store.generator.generic.call( this, config );
		};

		OO.inheritClass( ExternalTools, rw.store.generator.generic );

		// This line defines an internal name for the generator
		ExternalTools.static.name = 'externaltools';

		// And this one defines the name for the generator which will be displayed in the UI
		ExternalTools.static.title = 'ExternalTools';

		ExternalTools.prototype.initialize = function () {
			// The two text fields
			this.urlField = new OO.ui.TextInputWidget();
			this.limitField = new OO.ui.NumberInputWidget( { min: 1, max: 2000, value: 500, step: 10, pageStep: 100, isInteger: true } );

			// The custom layout
			this.layout = new OO.ui.Widget( {
				classes: [ 'mwe-recwiz-externaltools' ],
				content: [
					new OO.ui.FieldLayout( this.urlField, {
						align: 'top',
						label: 'ExternalTools URL (PetScan, Wikidata query service):'
					} ),
					new OO.ui.FieldLayout(
						this.limitField, {
							align: 'top',
							label: mw.message( 'mwe-recwiz-nearby-limit' ).text()
						}
					)
				]
			} );

			// To be displayed, all the fields/widgets/... should be appended to "this.content.$element"
			this.content.$element.append( this.layout.$element );

			// Do not remove this line, it will initialize the popup itself
			rw.store.generator.generic.prototype.initialize.call( this );
		};

		ExternalTools.prototype.fetch = function () {
			// Get the values of our text fields
			var generator = this,
				url = this.urlField.getValue();
				
			this.limit = parseInt( this.limitField.getValue() );

			/*
			 * TODO:
			 * - list of turnkey urls
			 */

			// Initialize a new promise
			this.deferred = $.Deferred();

			// Initialize our word list
			this.list = [];

			// Check if the given URL refers to an allowed external tool
			var isPetscan = url.lastIndexOf( 'http://' + PETSCAN_URL, 0 ) === 0 || url.lastIndexOf( 'https://' + PETSCAN_URL, 0 ) === 0,
				isWDQS = url.lastIndexOf( 'https://' + WDQS_URL, 0 ) === 0,
				isQICHWA = url.lastIndexOf( 'https://' + QICHWA_URL, 0 ) === 0 ;
			if ( isPetscan ) {
				// We will do an AJAX request to petscan's API
				$.get( url + '&output_compatability=quick-intersection&format=json&doit=' ).then( this.PetScan.bind( this ), function ( error ) { generator.deferred.reject( new OO.ui.Error( error ) ); } );
			}
			else if ( isWDQS ) {
				// We will do an AJAX request to Wikidata Query Service
				url = url.replace('https://query.wikidata.org/#', 'https://query.wikidata.org/sparql?query=') + '&format=json'
                $.get( url ).then( this.WikidataQueryService.bind( this ), function ( error ) { generator.deferred.reject( new OO.ui.Error( error ) ); } );
			}
			else if ( isWDQS || isQICHWA ) {
				// We will do an AJAX request to provided Query Service
				url = url.replace(/(https:\/\/\w+.\w+.\w+)\/#/, "$1" + '/sparql?query=') + '&format=json';
                $.get( url ).then( this.WikidataQueryService.bind( this ), function ( error ) { generator.deferred.reject( new OO.ui.Error( error ) ); } );
			}
			else {
				this.deferred.reject( new OO.ui.Error( 'This is not an allowed URL... It should link to PetScan or Wikidata Query.' ) );
				return this.deferred.promise();
			}

			this.lockUI();

			// At this point we're not done yet, make the dialog closing process
			// to wait the promise to be resolved or rejected
			this.deferred.then( this.unlockUI.bind( this ), this.unlockUI.bind( this ) );
			return this.deferred.promise();
		};
		
		ExternalTools.prototype.PetScan = function ( data ) {
			var i, page, ns, element, property,
				prefix = '',
				project = mw.util.getParamValue( 'project', data.query ),
				language = mw.util.getParamValue( 'language', data.query );

			// Check whether the response looks fine or not
			if ( data.status !== 'OK' ) {
				this.deferred.reject( new OO.ui.Error( 'Petscan outputs something weird with this URL, check it and come back afterwards.' ) );
			}

			// For projects that have a custom property, select it
			switch ( project ) {
				case 'wikipedia':
					property = 'P19';
					prefix = language + ':';
					break;
				case 'wiktionary':
					property = 'P20';
					prefix = language + ':';
					break;
			}

			// Parse the complete response (or at least until the limit is reached)
			for ( i = 0; i < data.pages.length && i < this.limit; i++ ) {
				page = data.pages[ i ];

				element = { text: page.page_title.replace( /_/g, ' ' ) };
				if ( property !== undefined ) {
					ns = ( page.page_namespace !== 0 ? data.namespaces[ page.page_namespace ] : '' );
					element[ property ] = prefix + ns + page.page_title;
				}

				this.list.push( element );
			}

			this.deferred.resolve();
		};
		
		ExternalTools.prototype.WikidataQueryService = function ( data ) {
			var i, item, id, label, property, element;

			// Check whether the response looks fine or not
			if ( data.results === undefined ) {
				this.deferred.reject( new OO.ui.Error( 'SPARQL Query Service outputs something weird with this URL, check it and come back afterwards.' ) );
				return;
			}
			if ( data.results.bindings.length === 0 ) {
				this.deferred.reject( new OO.ui.Error( 'No results in the request.' ) );
				return;
			}
			if ( data.results.bindings[ 0 ].id === undefined || data.results.bindings[ 0 ].label === undefined ) {
				this.deferred.reject( new OO.ui.Error( 'Result must contain both "id" and "label" field.' ) );
			}

			for( i=0; i < data.results.bindings.length; i++ ) {
				item = data.results.bindings[ i ];

			indexOfId = 31;
/* 
On wikidata indexOfId = 31.
On qichwa indexOfId = 36 ???
Switch to exact position of ID to set
<https://www.wikidata.org/entity/L2>
<https://qichwa.wikibase.cloud/entity/L2>
*/	
				id = item.id.value.substring(indexOfId);
				label = item.label.value;
				switch( id[ 0 ] ) {
					case 'L':
						property = 'P21';
						break;
					default:
						property = 'P12';
						break;
				}
				element = { "text": label };
				element[ property ] = id;
                
				this.list.push( element );
			}

			this.deferred.resolve();
		};

		ExternalTools.prototype.lockUI = function () {
			this.urlField.setDisabled( true );
			this.limitField.setDisabled( true );
		};

		ExternalTools.prototype.unlockUI = function () {
			this.urlField.setDisabled( false );
			this.limitField.setDisabled( false );

			this.getActions().get( { actions: 'save' } )[ 0 ].setDisabled( false );
		};

		rw.store.generator.register( 'externaltools', ExternalTools.static.title, 'll-externaltools', new ExternalTools() );

@ElwinHuaman
Copy link

Hi @hugolpz,
thanks for your support during this process, I really appreciate it!

I think you clarified all questions regarding what approaches to follow (GitHub). Now, I would like to propose to continue with the Solution 2 : medium strategy you proposed (which I understand is temporal):

Roadmap:

  1. Create 2 new properties on Lingualibre
  1. Finish to fix externaltool.js (Sparql Query Service) so it
  1. Read LinguaLibre Wikimedia Commons url (P3) and update qichwa.wikibase.cloud with a similar property.
    -- Lingualibre JS do edit: rw.generator.ExternalTools.js
    -- This nearly solve the issue. indexOfId: GitHub/LinguaLibre/RecordWizard#21
  • Questions aside:
    -- Is it possible to add users on RecordWizard with age ranges?
    --- Temp. Solution: add their age to the “Name to Display” value: “Ninfa_64”.(?)

  • Action Items:
    -- I am planning to organize workshops on how to record voices for the different variants on Qichwabase, so they can use LinguaLibre directly and pull lexemes/forms from Qichwabase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants