Skip to content

Commit

Permalink
Some more ...
Browse files Browse the repository at this point in the history
  • Loading branch information
Querela committed Mar 15, 2024
1 parent 0660a3d commit 44d70bc
Show file tree
Hide file tree
Showing 3 changed files with 272 additions and 0 deletions.
58 changes: 58 additions & 0 deletions fcs-endpoint-dev-slides/query-translation.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,61 @@

[.small]
== Query Languages

[.position-absolute.right--20.zindex--1]
image::cql-js-screenshot.png[CQL-JS Demo]


[.text-left]
== FCS-QL – Visualization

[.position-absolute.right--20.width-50.zindex--1]
image::fcsql-parse-tree-java.png[FCS-QL parse tree]

* Installation
+
[.code-width-full,bash]
----
pip install antlr4-tools
git clone https://github.com/clarin-eric/fcs-ql.git
cd fcs-ql/src/main/antlr4/eu/clarin/sru/fcs/qlparser
----

[.mt-5]
* Visualization according to https://github.com/antlr/antlr4/blob/master/doc/getting-started.md[ANTLR4 > Getting Started]
+
[.code-width-full,bash]
----
antlr4-parse src/fcsql/FCSParser.g4 src/fcsql/FCSLexer.g4 query -gui
[ word = "her.*" ] [ lemma = "Artznei" ] [ pos = "VERB" ]
^D
----


[.text-left.small]
== FCS-QL Query Nodes


[.text-left.small]
== FCS-QL Query Nodes – Aggregator

[.position-absolute.width-50.right--20.opacity-50.zindex--1]
image::fcsql-querybuilder-complex.png[FCS-QL Query Builder]


== FCS-QL – Remarks


[.small]
== Query-Mapping


ifdef::backend-revealjs[]
[.small]
== Query-Mapping (2)
endif::[]

* ElasticSearch

* Solr

84 changes: 84 additions & 0 deletions fcs-endpoint-dev-slides/reference-implementations.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
[background-image="fcs-render-uk.png",background-opacity="0.5"]
= Reference Implementations

[.notes]
--
* Java and Python, focus on FCS endpoints
* Java class hierarchies, organization & structure, processes & lifecycles, configuration
--


[.small]
== CLARIN Reference Libraries (Java)

* Development started ~2012
* Modularized: Client/Server, SRU/FCS, Parser
* in Java 1.8+ (https://endoflife.date/oracle-jdk[_EOL: Ende 2030_])
* Extensive documentation, some tests (_proven by being in use for a long time_)
* Artifacts in https://nexus.clarin.eu[CLARIN Nexus], Code on https://github.com/clarin-eric/?q=fcs[Github]
* Server/endpoint: external dependencies to

** Logging: `slf4j`
** HTTP: `javax.servlet:servlet-api`
** Parser: `antlr4` (FCS-QL) / CQL

* Build: maven
* Deployment: jetty, tomcat, …


[.small]
== CLARIN Reference Libraries (Python)

* ~ 2022: Translation of Java reference libraries to Python
* Strong orientation towards the Java reference libraries
+
→ (fast) (almost) identical interfaces, class/function names
* but: slight optimizations for Python, no 1:1 copy
* Focus on (new) FCS endpoints → no clients!
* Typed, documented; published on PyPI
* Synchronous, minimal WSGI - allows embedding in existing apps
* Python 3.8+
* Dependencies to

** XML parsing: `lxml`
** HTTP/WSGI: `werkzeug`
** Query Parser: `PLY` (CQL), `ANTLR4` (FCS-QL)


[.text-left.small]
== CLARIN Reference Libraries

* FCS SRU Server: https://github.com/clarin-eric/fcs-sru-server/[Java] (https://clarin-eric.github.io/fcs-sru-server/apidocs/index.html[docs]), https://github.com/Querela/fcs-sru-server-python/[Python] (https://fcs-sru-server-python.readthedocs.io/en/latest/[docs])
* FCS Simple Endpoint: https://github.com/clarin-eric/fcs-simple-endpoint[Java] (https://clarin-eric.github.io/fcs-simple-endpoint/apidocs/index.html[docs]), https://github.com/Querela/fcs-simple-endpoint-python[Python] (https://fcs-simple-endpoint-python.readthedocs.io/en/latest/[docs])

[.mt-2]
* FCS SRU Client: https://github.com/clarin-eric/fcs-sru-client/[Java] (https://clarin-eric.github.io/fcs-sru-client/apidocs/index.html[docs])
* FCS Simple Client: https://github.com/clarin-eric/fcs-simple-client[Java] (https://clarin-eric.github.io/fcs-simple-client/apidocs/index.html[docs])

[.mt-2]
* CQL Parser: https://github.com/indexdata/cql-java[Java] (http://zing.z3950.org/cql/java/docs/index.html[docs]?), https://github.com/Querela/cql-python[Python], https://github.com/Querela/cql-js[JavaScript]
* FCS-QL Parser: https://github.com/clarin-eric/fcs-ql[Java], https://github.com/Querela/fcs-ql-python[Python] (https://fcs-ql-python.readthedocs.io/en/latest/[docs])

[.mt-2]
* Maven Endpoint Archetype: https://github.com/clarin-eric/fcs-endpoint-archetype[Java]
* FCS SRU Aggregator: https://github.com/clarin-eric/fcs-sru-aggregator[Java]
* FCS Endpoint Validator: https://github.com/clarin-eric/fcs-endpoint-tester[Java] (old), https://github.com/saw-leipzig/fcs-endpoint-validator[Java] ← test compliance with _SRU/FCS protocol_
* Korp: https://github.com/clarin-eric/fcs-korp-endpoint/[Java], https://github.com/Querela/fcs-korp-endpoint-python/[Python]

_https://github.com/indexdata/[Indexdata]: CQL-Parser, https://github.com/Querela/[Querela]: Python implementations_

[.notes]
--
* Note: concrete examples and implementations will follow in a later section, high-level overview here
--


[.small]
== FCS Endpoint – Design and structure

* Query Parser (CQL, FCS-QL)

[.mt-2]
* *FCS SRU Server*


130 changes: 130 additions & 0 deletions fcs-endpoint-dev-slides/resources-and-dataviews.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
[background-image="fcs-render-uk.png",background-opacity="0.5"]
= Resources and Data Views

[.notes]
--
* Endpoint Capabilities, BASIC/ADVANCED Search, FCS-QL
* Resource, Resource Fragment, Data View (Hits, Advanced)
* Result serialization, query languages
--


[.text-left]
== Endpoint Description – Capabilities

*\http://clarin.eu/fcs/capability/basic-search*

* Mandatory
* DataView: HITS

[.mt-5]
*\http://clarin.eu/fcs/capability/advanced-search*

* Optional
* DataView: HITS and Advanced


ifdef::backend-revealjs[]
== Endpoint Description – Capabilities (2)
endif::[]



[.text-left]
== BASIC Search

[.position-absolute.right--30.width-50.opacity-50,x86asm]
----
cat
"cat"
cat AND dog
"grumpy cat"
"grumpy cat" AND dog
"grumpy cat" OR "lazy dog"
cat AND (mouse OR "lazy dog")
----

*``\http://clarin.eu/fcs/capability/basic-search``*


[.text-left]
== ADVANCED Search

[.position-absolute.right--30.width-50.opacity-50,x86asm]
----
"walking"
[token = "walking"]
"Dog" /c
[word = "Dog" /c]
[pos = "NOUN"]
[pos != "NOUN"]
[lemma = "walk"]
"blaue|grüne" [pos = "NOUN"]
"dogs" []{3,} "cats" within s
[z:pos = "ADJ"]
[z:pos = "ADJ" & q:pos = "ADJ"]
----


*``\http://clarin.eu/fcs/capability/advanced-search``*


== FCS-QL



== FCS-QL – Notes



== FCS-QL – Layer Types

// ._Advanced Search_ Layer types with description and examples
[.x-small%header,cols="1m,5,1,3"]
|===
|{set:cellbgcolor}Layer Type Identifier
|Annotation Layer Description
|Syntax
|Examples (without quotes)

|text
|Textual representation of resource, also the layer that is used in Basic Search
|String
|"Dog", "cat" "walking", "better"

|lemma
|Lemmatisation
|String
|"good", "walk", "dog"

|pos
|Part-of-Speech annotations
|<<ref:UD-POS,Universal POS>> tags
|"NOUN", "VERB", "ADJ"

|orth
|Orthographic transcription of (mostly) spoken resources
|String
|"dug", "cat", "wolking"

|norm
|Orthographic normalization of (mostly) spoken resources
|String
|"dog", "cat", "walking", "best"

|phonetic
|Phonetic transcription
|<<ref:SAMPA,SAMPA>>
|"'du:", "'vi:-d6 'ha:-b@n"
|===

[.refs.xx-small]
--
* [[ref:UD-POS]]Universal Dependencies, https://universaldependencies.github.io/u/pos/index.html[Universal POS tags v2.0]
* [[ref:SAMPA]]Dafydd Gibbon, Inge Mertins, Roger Moore (Eds.): Handbook of Multimodal and Spoken Language Systems. Resources, Terminology and Product Evaluation, Kluwer Academic Publishers, Boston MA, 2000, ISBN 0-7923-7904-7
--


== FCS-QL – Layer Type Identifier


0 comments on commit 44d70bc

Please sign in to comment.