-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
272 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
[background-image="fcs-render-uk.png",background-opacity="0.5"] | ||
= Reference Implementations | ||
|
||
[.notes] | ||
-- | ||
* Java and Python, focus on FCS endpoints | ||
* Java class hierarchies, organization & structure, processes & lifecycles, configuration | ||
-- | ||
|
||
|
||
[.small] | ||
== CLARIN Reference Libraries (Java) | ||
|
||
* Development started ~2012 | ||
* Modularized: Client/Server, SRU/FCS, Parser | ||
* in Java 1.8+ (https://endoflife.date/oracle-jdk[_EOL: Ende 2030_]) | ||
* Extensive documentation, some tests (_proven by being in use for a long time_) | ||
* Artifacts in https://nexus.clarin.eu[CLARIN Nexus], Code on https://github.com/clarin-eric/?q=fcs[Github] | ||
* Server/endpoint: external dependencies to | ||
|
||
** Logging: `slf4j` | ||
** HTTP: `javax.servlet:servlet-api` | ||
** Parser: `antlr4` (FCS-QL) / CQL | ||
|
||
* Build: maven | ||
* Deployment: jetty, tomcat, … | ||
|
||
|
||
[.small] | ||
== CLARIN Reference Libraries (Python) | ||
|
||
* ~ 2022: Translation of Java reference libraries to Python | ||
* Strong orientation towards the Java reference libraries | ||
+ | ||
→ (fast) (almost) identical interfaces, class/function names | ||
* but: slight optimizations for Python, no 1:1 copy | ||
* Focus on (new) FCS endpoints → no clients! | ||
* Typed, documented; published on PyPI | ||
* Synchronous, minimal WSGI - allows embedding in existing apps | ||
* Python 3.8+ | ||
* Dependencies to | ||
|
||
** XML parsing: `lxml` | ||
** HTTP/WSGI: `werkzeug` | ||
** Query Parser: `PLY` (CQL), `ANTLR4` (FCS-QL) | ||
|
||
|
||
[.text-left.small] | ||
== CLARIN Reference Libraries | ||
|
||
* FCS SRU Server: https://github.com/clarin-eric/fcs-sru-server/[Java] (https://clarin-eric.github.io/fcs-sru-server/apidocs/index.html[docs]), https://github.com/Querela/fcs-sru-server-python/[Python] (https://fcs-sru-server-python.readthedocs.io/en/latest/[docs]) | ||
* FCS Simple Endpoint: https://github.com/clarin-eric/fcs-simple-endpoint[Java] (https://clarin-eric.github.io/fcs-simple-endpoint/apidocs/index.html[docs]), https://github.com/Querela/fcs-simple-endpoint-python[Python] (https://fcs-simple-endpoint-python.readthedocs.io/en/latest/[docs]) | ||
|
||
[.mt-2] | ||
* FCS SRU Client: https://github.com/clarin-eric/fcs-sru-client/[Java] (https://clarin-eric.github.io/fcs-sru-client/apidocs/index.html[docs]) | ||
* FCS Simple Client: https://github.com/clarin-eric/fcs-simple-client[Java] (https://clarin-eric.github.io/fcs-simple-client/apidocs/index.html[docs]) | ||
|
||
[.mt-2] | ||
* CQL Parser: https://github.com/indexdata/cql-java[Java] (http://zing.z3950.org/cql/java/docs/index.html[docs]?), https://github.com/Querela/cql-python[Python], https://github.com/Querela/cql-js[JavaScript] | ||
* FCS-QL Parser: https://github.com/clarin-eric/fcs-ql[Java], https://github.com/Querela/fcs-ql-python[Python] (https://fcs-ql-python.readthedocs.io/en/latest/[docs]) | ||
|
||
[.mt-2] | ||
* Maven Endpoint Archetype: https://github.com/clarin-eric/fcs-endpoint-archetype[Java] | ||
* FCS SRU Aggregator: https://github.com/clarin-eric/fcs-sru-aggregator[Java] | ||
* FCS Endpoint Validator: https://github.com/clarin-eric/fcs-endpoint-tester[Java] (old), https://github.com/saw-leipzig/fcs-endpoint-validator[Java] ← test compliance with _SRU/FCS protocol_ | ||
* Korp: https://github.com/clarin-eric/fcs-korp-endpoint/[Java], https://github.com/Querela/fcs-korp-endpoint-python/[Python] | ||
|
||
_https://github.com/indexdata/[Indexdata]: CQL-Parser, https://github.com/Querela/[Querela]: Python implementations_ | ||
|
||
[.notes] | ||
-- | ||
* Note: concrete examples and implementations will follow in a later section, high-level overview here | ||
-- | ||
|
||
|
||
[.small] | ||
== FCS Endpoint – Design and structure | ||
|
||
* Query Parser (CQL, FCS-QL) | ||
|
||
[.mt-2] | ||
* *FCS SRU Server* | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
[background-image="fcs-render-uk.png",background-opacity="0.5"] | ||
= Resources and Data Views | ||
|
||
[.notes] | ||
-- | ||
* Endpoint Capabilities, BASIC/ADVANCED Search, FCS-QL | ||
* Resource, Resource Fragment, Data View (Hits, Advanced) | ||
* Result serialization, query languages | ||
-- | ||
|
||
|
||
[.text-left] | ||
== Endpoint Description – Capabilities | ||
|
||
*\http://clarin.eu/fcs/capability/basic-search* | ||
|
||
* Mandatory | ||
* DataView: HITS | ||
|
||
[.mt-5] | ||
*\http://clarin.eu/fcs/capability/advanced-search* | ||
|
||
* Optional | ||
* DataView: HITS and Advanced | ||
|
||
|
||
ifdef::backend-revealjs[] | ||
== Endpoint Description – Capabilities (2) | ||
endif::[] | ||
|
||
|
||
|
||
[.text-left] | ||
== BASIC Search | ||
|
||
[.position-absolute.right--30.width-50.opacity-50,x86asm] | ||
---- | ||
cat | ||
"cat" | ||
cat AND dog | ||
"grumpy cat" | ||
"grumpy cat" AND dog | ||
"grumpy cat" OR "lazy dog" | ||
cat AND (mouse OR "lazy dog") | ||
---- | ||
|
||
*``\http://clarin.eu/fcs/capability/basic-search``* | ||
|
||
|
||
[.text-left] | ||
== ADVANCED Search | ||
|
||
[.position-absolute.right--30.width-50.opacity-50,x86asm] | ||
---- | ||
"walking" | ||
[token = "walking"] | ||
"Dog" /c | ||
[word = "Dog" /c] | ||
[pos = "NOUN"] | ||
[pos != "NOUN"] | ||
[lemma = "walk"] | ||
"blaue|grüne" [pos = "NOUN"] | ||
"dogs" []{3,} "cats" within s | ||
[z:pos = "ADJ"] | ||
[z:pos = "ADJ" & q:pos = "ADJ"] | ||
---- | ||
|
||
|
||
*``\http://clarin.eu/fcs/capability/advanced-search``* | ||
|
||
|
||
== FCS-QL | ||
|
||
|
||
|
||
== FCS-QL – Notes | ||
|
||
|
||
|
||
== FCS-QL – Layer Types | ||
|
||
// ._Advanced Search_ Layer types with description and examples | ||
[.x-small%header,cols="1m,5,1,3"] | ||
|=== | ||
|{set:cellbgcolor}Layer Type Identifier | ||
|Annotation Layer Description | ||
|Syntax | ||
|Examples (without quotes) | ||
|
||
|text | ||
|Textual representation of resource, also the layer that is used in Basic Search | ||
|String | ||
|"Dog", "cat" "walking", "better" | ||
|
||
|lemma | ||
|Lemmatisation | ||
|String | ||
|"good", "walk", "dog" | ||
|
||
|pos | ||
|Part-of-Speech annotations | ||
|<<ref:UD-POS,Universal POS>> tags | ||
|"NOUN", "VERB", "ADJ" | ||
|
||
|orth | ||
|Orthographic transcription of (mostly) spoken resources | ||
|String | ||
|"dug", "cat", "wolking" | ||
|
||
|norm | ||
|Orthographic normalization of (mostly) spoken resources | ||
|String | ||
|"dog", "cat", "walking", "best" | ||
|
||
|phonetic | ||
|Phonetic transcription | ||
|<<ref:SAMPA,SAMPA>> | ||
|"'du:", "'vi:-d6 'ha:-b@n" | ||
|=== | ||
|
||
[.refs.xx-small] | ||
-- | ||
* [[ref:UD-POS]]Universal Dependencies, https://universaldependencies.github.io/u/pos/index.html[Universal POS tags v2.0] | ||
* [[ref:SAMPA]]Dafydd Gibbon, Inge Mertins, Roger Moore (Eds.): Handbook of Multimodal and Spoken Language Systems. Resources, Terminology and Product Evaluation, Kluwer Academic Publishers, Boston MA, 2000, ISBN 0-7923-7904-7 | ||
-- | ||
|
||
|
||
== FCS-QL – Layer Type Identifier | ||
|
||
|