The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
UpdateEvent
now implementsPartialEq
to make possible to compare changes.
- Deserializing a write-ahead log failed because it was located at the wrong sub-directory and the deserialization routine for the map had a bug.
- Fixed out of bounds error parsing legacy meta queries with multiple alternatives (#308)
- New method
remove_item()
for annotation storages that allows for more efficient removal if not only a single annotation, but the whole item should be deleted. This is used in when applying aDeleteNode
orDeleteEdge
event.
- Added support for coverage edges between span nodes an segmentation nodes when calculating the AQL model index.
- Do not use recursion to calculate the indirect coverage edges in the model index, since this could fail for deeply nested structures.
- Add bug fixes for relANNIS import discovered testing the Annatto relANNIS importer.
- Fix
FileTooLarge
error when searching for token precedence where the statistics indicate that this search is impossible.
- Load existing components from the backup folder instead of the actual location if a backup folder exists.
- When optional nodes where located not at the end but somewhere in between the
query, the output of the
find
query could include the wrong node ID.
- Use a TOML file instead of a binary file format to store the global
statistics. You might have to re-import existing corpora or use the
re-optimize
command on the command line if the global statistics are relevant for optimal speed in returning the token of a corpus.
- Do not reload graph storages when they are already loaded.
- Do not attempt to unload a corpus that is about to be loaded in the next step. This could trigger problematic unload/load cycles.
- Fixed issues with
find_connected
,find_connected_inverse
andis_connected
and excluded ranges (#257) - Updated lalrpop dependency to 0.20 to fix warnings reported in newer clippy versions.
- Fixed compiler warnings in newer Rust versions about unused code.
- Added information about the corpus size to the global statistics and corpus
configuration file. The used token/segmentation layer for the corpus size in
the corpus configuration file
corpus-config.toml
can be configured manually. Or theentries are created automatically during import or when there-optimize
command is run on the command line. The corpus size is given as a combination of a unit and the actual quantitiy. The corpus size unit can be the number of basic token (no outgoing coverage).Or it can describe a specific segmentation layer.[corpus_size] quantity = 44079 [corpus_size.unit] name = "tokens"
When the configuration is created automatically, the corpus view configuration is checked whether it is configured to use a[corpus_size] quantity = 305056 [corpus_size.unit] name = "segmentation" value = "diplomatic"
base_text_segmentation
and uses this segmentation as the corpus size unit. If a corpus size is already configured, only the quantity will be updated but not the unit.
- Fix offset and limitation issue when multiple corpora are selected. After a refactoring, the updated offset was never actually applied when finding the results in the next corpus. This could lead to too many results on the first page and also to missing matches on the second and later pages.
- Fix datasource-gap for zero context by ensuring that tokens are sorted in subgraph iterator. (by https://github.com/matthias-stemmler)
- New disk-based graph storage implementation
DiskPathV1_D15
that stores the outgoing paths from every node when maximum branch-out is 1 and the longest path has the length 15. This is an optimization especially useful for thePartOf
component, since it avoids frequent disk access which would be needed for a adjecency based implementations to get all ancestors. AlsoPartOf
components are not trees, but still have the property of at most 1 outgoing edge which can be used to optimize finding all ancestors. Important You cannot downgrade graphANNIS to an older version if you imported a disk-based corpus with the new version, since old graphANNIS versions won't be able to load the new graph storage implementation. - Add new global statistics that describe the combined graph. Until know, there were only statistics for each graph component and for the node annotation storage.
- Improved handling of
tok
queries for corpora with tens of millions token, by using the newly added graph storage implementation and statistics and providing an optimized implementation for token search if we already know that all token are part of the default ordering component. This fixes #276. - Improve performance for regular expression search when using disk-based annotation storage and the regex has a prefix. This e.g. fixes getting the text for a document in ANNIS when the corpus is large.
- Improve performance for regular expressions that can be replaced by an exact
value search, even when the value is escaped. This can be useful e.g. in the
subgraph extraction queries from ANNIS, where some characters are escaped with
\x
and which was previously not treated as constant value search. - Improve performance for getting all token of a document (e.g. for a subgraph query) when the PartOf graph storage implementation does not have the same cost of the inverse graph storage operations by allowing to use a nested loop join in this particular scenario.
- Do not add "annis:doc" labels to sub-corpora when importing relANNIS corpora.
This will fix queries where you just search for documents, e.g. by
annis:doc
but also got the sub-corpora as result. - Re-enable adding the C-API shared library as release artifacts to GitHub.
- Fix leaf filter for token searches and loading of necessary components (#280)
- Allow to execute AQL directly on loaded
AnnotationGraph
objects by using the newaql::execute_query_on_graph
andaql::parse
functions. This is an alternative for using aCorpusStorage
when only one corpus is handled. - New
Graph::ensure_loaded_parallel
function to load needed graph storages in parallel. - Added
graphannis_core::graph::serialization::graphml::export_stable_order
function that allows to export to GraphML, but with a guaranteed order of the elements.
- Do not attempt to unload corpora that are not loaded when trying to free memory.
- Improve performance of loading a main memory corpus by using the standard
HashMap
for fields that are deserialized.
- Add
has_node_name
function toAnnotationStorage
that can be more efficient thanget_node_id_name
.
- Changed API to use new types
NodeAnnotationStorage
andEdgeAnnotationStorage
instead ofAnnoStorageImpl<NodeID>
orAnnoStorageImpl<NodeID>
. (backward incompatible change in the Rust API) get_node_id_from_name
is now a function of theAnnotationStorage
instead of theGraph
. This allows for more specific and efficient implementations based on the type of annotation storage.- Improved performance of the
Graph::apply_update
function. - Use jemalloc memory allocator for webservice and CLI.
- Remove all heap size estimation code. This also means that information about
heap consumption of a single corpus has been removed, like the fields of the
graphannis::corpusstorage::LoadStatus
enum. - Remove
EvictionStrategy::MaximumBytes
forDiskMap
.
- Polling when importing a web corpus through the webservice could fail because the background job list was not shared between the web server threads.
- Do not output document nodes in
find
query when using quirks mode andmeta::
queries.
- When an optional node (for negation without existence) was not at the end of
the query,
find
queries could give an empty output (#267). - Create default components for the graph type when importing GraphML files.
- Compile release for macOS on version 11 (Big Sur). This is necessary, since GitHub actions deprecated the older macOS version.
- Compile releases on Ubuntu 20.04 instead of 18.04, which means the minimal GLIBC version is 2.31. This is necessary, since GitHub actions deprecated this Ubuntu version.
- Update quick-xml to version 0.28 to avoid issues in future Rust versions
- Update sstable to version 0.11 to avoid issues in future Rust versions
- Update actix-web to version 4 to avoid issues in future Rust versions
- Update config crate to version 0.13 to avoid issues in future Rust versions
- Update diesel to version 2.0 due to issue in sqlite dependency
- Importing a corpus with a relative path directly under the current working directory would fail if the corpus has linked files.
- Output of data items in GraphML for node/edge annotations could be unordered and cause test failures if comparing GraphML files.
- Update smartstring crate to version 1 to avoid issues with newer Rust versions.
- After re-using a deleted symbol ID (used in the annotation storage), the retrieved value was empty.
- When importing relANNIS corpora with sub-corpora, add the
PartOf
edge to the parent corpus node of the document or sub-corpora, but not automatically to the top-level corpus.
- Allow to configure how spans should be interpreted in the view when the token
layer is representing a timeline with the
timeline_strategy
parameter in theview
section of the corpus configuration. This allows the view to reconstruct an implicit relation between spans and their segmentation nodes (which is not possible to represent in the legacy relANNIS data model). New corpora should use explicitCoverage
edges between spans and their segmentation nodes, but in order to maintain backward compatibility with relANNIS, we need to support these older corpus configuration values (virtual_tokenization_mapping
andvirtual_tokenization_from_namespace
), which only affect the display of the corpora.
- Fixed wrong result order for non-token searches.
- Estimation for negated regex was extremely off when the regex could possibly match all values. This caused problematic query plans including those with nested loop joins and long execution times.
- Better estimation of result sizes for regular expressions with multiple prefixes.
- Fix compilation issues in Rust projects that use the 2021 Rust edition. lalrpop/lalrpop#650
- Faster subgraph generation for
subgraph
queries with context. The previous implementation used an AQL query that got quite complex over time and was difficult to execute. The new implemenation directly implements the logic using iterators. It also sorts the nodes in the iterator by the order of the node in the text.
- Add edges to the special
Ordering/annis/datasource-gap
between the last and first token of context regions insubgraph
when the returned context regions do not overlap. This allows sorting the context regions that belong to the same data source but are not connected by ordinaryOrdering/annis/
edges.
- Use external sorting for match results to avoid out of memory errors for large results.
- For subgraph queries with segmentation, the left and right context was switched.
- Allow to configure the expected display order of (sub)-corpus meta annotations
using the
corpus_annotation_order
field in the view configuration.
- Added
anonymous_access_all_corpora
to[auth]
section of the web service configuration to allow read-only access to all corpora without any authentication. (#234) - Added documentation on how to change configuration which group can access which corpora.
- Document how to change the stack size of the CLI in case the import aborts with a stack related error. (#229)
- Near operator failed to work with segmentation constraint (#238)
- Remove corpus storage lock file when exiting the application (#230)
- Fix subgraph generation when a segmentation was defined as context and the match includes a token that is not covered by a segmentation node (there are gaps in the segmentation). This is achieved by explicitly searching for all token between the first and last matched segment and produces a more complex query than before. Because token where missing from the graph, it could appear in ANNIS that there are gaps in the data and that the token order is incorrect.
- Fix timeout handling for queries with a lot of intermediate results, but less than 1000 matches. The timeout was only checked after each 1000th match. This caused troubles for queries with complex temporary results that where discarded. The query execution could take too long time and consume system resources in a multi-user system even when the timeout was configured. The fix is to push down the timeout check to the node search iterators.
- Non-Existing operator did include invalid matches when searching for attributes without a value.
- Fix import of resolver mappings and order configuration for older relANNIS versions.
- Fix handling of corpora with special characters like umlauts or slashes when
deleting corpora, getting the corpus configuration file, getting linked files
(both the
CorpusStorage
API and the web service). - Expliclity escape
/
in node names so we can create hierarchical paths in node names. We already have this assumption at several places, but a corpus with slashes would create ambiguities. This also helps when creating linked files base on the node name. Also, escape all characters that are invalid file names on Windows, because the node name might be used as file name.
- Web service API version prefix should still be
/v1
and not/v2
because this API did not change and is still backward-compatible.
- Refactored the basic
GraphStorage
andAnnotationStorage
APIs to handle errors. Previously, we used mostly main memory containers which had an API that could not fail. This was reflected in theGraphStorage
andAnnotationStorage
APIs, which directly returned the result or an iterator over the non-fallible results. With the addition of disk-based implementations, this API model was not possible to implement without using panics when repeated access to the disk failed. Some of the API that was changed was user visible when using thegraphannis-core
crate (and thus the C-API), so this release is not technically backwards-compatible. Adapting to the updated API should be restricted to handle the errors returned by the functions. - The changes to the error handling also affects the C-API. These following
functions have now a
ErrorList
argument:annis_cs_list_node_annotations
annis_cs_list_edge_annotations
annis_cs_list_components_by_type
annis_cs_unload
annis_iter_nodeid_next
annis_graph_annotations_for_node
annis_graph_outgoing_edges
annis_graph_annotations_for_edge
- Renamed the Criterion-based benchmark CLI to
bench_queries
and synchronize its arguments to the current version of Criterion.
- More efficient node path extraction in
count_extra
function and when sorting the matches. - Avoid large memory consumption when importing GraphML files by resetting an internal buffer on each XML event.
- Limit the number of disk maps for the
GraphUpdate
so there are less issues with large corpora where the maximum number of open files per process might be reached. - Performance improvements when importing large corpora in disk-based mode. This optimizes the DiskMap to use a C0 (normal in memory BTree), a C1 (on disk BTree) and a C2 map when serialized to disk. On compacting, the entries are only written to C1 in O(n*log(n)). Before, multiple on disk maps might need to be merged, which had a much worse complexity. The C1 file uses the transient-btree-index crate.
- Trim mapping entries when importing relANNIS resolver files (#222).
- Fixed schema errors in the Webservice OpenAPI file.
- RelANNIS version 3.3 files with segmentation might also have a missing "span" column. In case the "span" column is null, always attempt to reconstruct the actual value from the corresponding node annotation instead of failing directly.
- Avoid unnecessary compacting of disk tables when collecting graph updates during import. This speeds up both the GraphML and the relANNIS importer and can also reduce the used main memory during import.
- Use release optimization of some of the performance sensitive crates even for debug builds. This allows faster builds and debugging of our own code, while balancing performance.
- Avoid unnecessary memory allocation when checking if a node has outgoing edges in adjacency lists. This improves search for tokens because the Coverage components are typically adjacency lists, and we need to make sure the token nodes don't have any outgoing edges.
- Fixed miscalculation of whitespace string capacity which could lead to
memory allocation failed
error.
- Added
clear()
method to theWriteableGraphStorage
trait.
- Limit the used main memory cache per
DiskTable
by only using a disk block cache for the C1 table. Since we use a lot of disk-based maps during import of relANNIS files, the previous behavior could add up to > 1GB easily, wich amongst other issues caused #205 to happen. With this change, during relANNIS import the main memory usage should be limited to be less than 4GB, which seams more reasonable than the previous 20+GB - Reduce memory footprint during import when corpus contains a lot of escaped strings (as in #205)
- Avoid creating small fragmented main memory when importing corpora from relANNIS to help to fix #205
- Improved overall import speed of relANNIS corpora and when applying graph updates
- The webservice endpoint
/search/node-descriptions
now returns wether a node in the query is optional or not.
- Queries with optional nodes with a smaller index than the last non-optional node could fail. If the execution nodes re-order the match result vector internally, the query node index is used to define the mapping. Unfortunately the largest index could be larger than the size of mappings, which used to be used to create the output vector. By allowing empty elements in the output vector and using the maximum value, we can still map the results properly.
- Don't allow optional operands for non-negated operators
- Added generic operator negation without existence assumption, if only one side of the negated operator is optional (#187).
- Added generic operator negation with existence assumption by adding
!
before the binary operator (#186)
- Compile releases on Ubuntu 18.04 instead of 16.04, which means the minimal GLIBC version is 2.27
- Updated dependencies
- Improved compile time by disabling some dependency features. This also removes some optional features from the command line parser (used in webservice and CLI binaries).
- Don't use RIDGES corpus in search tests and fail search tests when corpus does not exist.
- Use the correct
set-disk-based on
command in the documentation for the CLI - Optimize node annotation storage and graph implementations when importing GraphML files
- Fix issue when deploying release artifacts on GitHub
- Assume that the
annis::node_name
annotation is unique when estimating match size. This should improve e.g. subgraph-queries, where the intermediate result sizes are now better estimated.
- The default context sizes in the corpus configuration now include 0 (#181)
- C-API now implements exporting corpora
- Renamed (public) function
export_corpus_zip
inCorpusStorage
toexport_to_zip
to align with the other export function name.
- Exporting a corpus without a "files" directory failed
- Synchronize REST API error output for bad AQL requests with the OpenAPI specification.
- Fix compilation issues in interaction with lalrpop v0.19.5
- Using the new
SmallVec
-basedMatchGroup
type instead ofVec<Match>
. - The
FixedMaxMemory
CacheStrategy
now uses Megabytes instead of bytes. - The graphannis and core crates now use their own error type instead of the one provided by the
anyhow
crate. - Bundle commonly used search query parameters in
SearchQuery
struct. - Query execution methods now have an optional
timeout
after which an query is aborted. - Annotation keys and values in the
AnnoKey
andAnnotation
structs now use inlined strings from thesmartstrings
crate.
- Replaced the
update_statistics
function inCorpusStorage
with the more generalreoptimize_implementation
function. The new function is available via there-optimize
command in the CLI.
- The webservice configuration now allows to configure the size of the in-memory corpus cache.
- There can be multiple
--cmd
arguments for the CLI, which are executed in the order they are given.
- Importing a relANNIS corpus could fail because the integer would wrap around from negative to a large value when calculating the
tok-whitespace-after
annotation value. This large value would then be used to allocate memory, which will fail. - Adding
\$
to the escaped input sequence in the relANNIS import, fixing issues with some old SFB 632 corpora - Unbound near-by-operator (
^*
) was not limited to 50 in quirks mode - Workaround for duplicated document names when importing invalid relANNIS corpora
- Corpus names with non-ASCII characters where not listed with their decoded name
- Fix memory consumption of AQL parser in repeated calls (like the webservice).
- Limit the memory which is reserved for an internal result vector to avoid out-of-memory errors when the estimation is wrong.
- JWT secret configuration now supports RS256 in addition to HS256. This enables support of applications which use Keycloak as their identity provider, since they only provide public keys.
- JWT tokens now should have the
roles
field instead of using theadmin
field. This enhances compatibility with Keycloak. - Pull requests are now checked with the Clippy static code analyis tool
- Updated Actix Web dependency for webservice to version 3
- The REST API does not act as an identity provider anymore and the
/local-login
endpoint has been removed
- Travis did add the webservice executables to the release
cargo release
did not release all crates
- Node IDs in matches don't have the
salt:/
prefix anymore
- Add non-tokenized primary text segments as special labels "tok-whitespace-before" and "tok-whitespace-after" to the existing token when importing from relANNIS. This allows to re-construct the original relANNIS primary text by iterating over all token in order and be prepending or append these labels to the token values.
- Add a REST based web-service replacing the legacy annis-service
- Load all components when extracting a subgraph using an AQL query
- Web Service with REST API for the corpus storage
- Copy and link files from the ExtData folder when importing relANNIS.
- Map
resolver_vis_map.annis
,example_queries.annis
andcorpus.properties
from relANNIS files to a new unified corpus configuration stored as TOML file. This corpus configuration is also exported to GraphML. - Export and import ZIP files containing multiple corpora.
- Removed Brotli support: use the ZIP file export instead
- Backward incompatible: Return opaque anyhow
Error
type in all functions instead of our own enum. The newError
type also implementsstd::error::Error
and is equivalent to usingBox<dyn std:error::Error>
. - Upgraded parser generator lalrpop to version 0.18.x
- Disk-based implementation of an adjacency list is used when a corpus is configured to be prefer disk over memory.
- Ability to export and import GraphML files. This follows the Neo4j dialect of GraphML. It is also possible to compress the GraphML files with Brotli.
- The dense adjacency list implementation did not implement the
source_nodes
function properly
- Removed the unintentionally public
size_of_cached
function ofGraph
from the API.
- Backward incompatible: the
AnnotationStorage
andWriteableGraphStorage
interfaces have been adjusted to returnResult
types for mutable functions. This change is necessary because on-disk annotation storage implementations might fail, and we want to handle it when modifying the annotation storage. - Improved main memory usage when importing relANNIS files.
The implementation now uses temporary disk-based maps instead of memory-intensive maps.
This change also affects the
GraphUpdate
class, which is now disk-based, too.
- Added disk-based annotation storage for nodes as an alternative to the memory-only variant.
On the console, use
use_disk <on|off>
to set if newly imported corpora prefer disk-based annotation storage.disk_based
parameters are also added to the various "import relANNIS" API functions.
- Reconstruct coverage edges with the correct component, if the actual edges are omitted in rank.annis, but the ones without a parent node are still present. #125
- Inverted sort order did not reverse the corpus name list for multiple corpora
- Workaround for docs.rs problems seem to have caused other problems and graphANNIS was not recognized as library
- Backward incompatible: the several search functions (
find
,count
, etc.) not take several corpus names as argument. This is especially important forfind
, where the implementation can be optimized to correctly skip over a given offset using the internal state. Such an optimization is impossible from outside when calling the API and not having access to the iterator.
- Don't assume inverse operator has the same cost when fan-out is too different.
Subgraph queries could be very slow for corpora with large documents due to an estimation error from this assumption
the
@
operator.
- The annotation storage is now a complete interface which provides all functions necessary to write and read annotations.
To make this less dependent on the current implementation of the
in-memory annotation storage, the annotation key symbol (an integer) has been removed.
This annotation key symbol has been used in the
Match
class as well, which is now using anArc<AnnoKey>
instead. TheAnnoKey
contains the fully qualified name asString
. Several functions of the annotation storage that used to haveString
parameters now take&str
and resulting string values are now returned asCow<str>
. The latter change is also meant to enable more flexible implementations, that can choose to allocate new strings (e.g. from disk) or return references to existing memory locations. - The
Graph
uses a boxed instance of the generalAnnotationStorage
trait. Before, this was anArc
to the specific implementation, which made it possible to simply clone the node annotation storage. Now, references to it must be used, e.g. in the operators. This changes a lot of things in theBinaryOperator
trait, like the signature ofget_inverse_operator()
and the filter functions that are used as conditions for the node search (these need an argument to the node annotation storage now) Graph
does not implement theAnnotationStorage<NodeID>
trait anymore, but provides a getter to reference its field.- Data source nodes are now included when querying for a subgraph with context. This is needed for parallel text support in ANNIS 4.
- Show the used main memory for the node annotations
- Deploying release artifacts by CI was broken due to invalid condition
- Subgraph queries can now define the context using ordering relation names (segmentation)
instead of the default context in tokens. This changes the function signature of the
subgraph(...)
function.
- For performance and stylistic reasons, the GraphStorage API has been changed to accept integer node IDs instead of references to integers.
- Windows DLL in releases is now created by Travis CI instead of Appveyor
- Windows DLL generated by CI was empty
- Updated several dependencies
- Organize documentation topics in sub-folders. Previously, mdbook did not updated the images on these sites on the print.html. Since mdbook >0.3.1 this is fixed and we can use the better layout.
- C API now has an argument to return error messages when creating a corpus storage
- C API now also allows to unload a corpus from the cache manually
- CorpusStorageManager: Escape the corpus name when writing it to its disk location to support e.g. corpora with slash in their name.
- Quirks mode: sort matches by reversed document path (document first)
- Node names/paths where double encoded both when importing them and when executing the "find" function
- Quirks mode: use default collation of Rust for corpora imported from relANNIS 3.3
meta::
queries are now deprecated and can only be used in quirks mode
- Output annotations with the namespace "annis" in find function
- Quirks mode: add additional identity joins in the order as the nodes are defined in the query
- Encode ",", " " and ":" in the Salt ID output of the
find(...)
function - Sort longer vectors ("more specific") before shorter ones in
find(...)
output
- Optimize parallel nested loop join by performing less copy operations
- Quirks mode: meta-data nodes are not part of the match result anymore
- Escape corpus and document paths with percent encoding when importing them from relANNIS
- Use locale aware sorting of the results in quirks mode (which depends on the system graphANNIS is executed on)
- CLI did not allow to turn quirks mode off once activated
- DOI on Zenodo to cite the Software itself
- Utility function
node_names_from_match
for getting the node identifiers from the matches - Tutorial for Python, Java and Rust on how to embedd graphANNIS in other programs
- Citation File Format (https://citation-file-format.github.io/) meta-data
- Renamed the "PartOfSubcorpus" component type to more general "PartOf"
- relANNIS import now takes the sub-corpus structure into account
- Quirks mode now also emulates the component search normalization behavior. Search nodes that where part of multiple dominance/pointing relation joins where duplicated and joined with the identity operator to work around the issue that nodes of different components could not be joined in relANNIS. This leads additional output nodes in the find(...) query. See also the original JavaDoc for an explanation.
- The error_chain crate is no longer used for error reporting, instead a custom Error representation is used
- "NULL" annotation namespaces where imported as "NULL" in relANNIS import
- Result ordering for "find(...)" function was not correct if token helper components where not loaded
- fixed issue where corpora which contain only tokens could not be queried for a subgraph with context
- Release process is now using the cargo-release script
- Separate the update events in smaller chunks for relANNIS import to save memory
- #70 get_all_components() returns all components with matching name if none with the same type exist
- #69 relANNIS-Import: Subgraph query does not work if there is no coverage component.
- #68 Use applyUpdate() API to import legacy relANNIS files
- #67 Document the data model of graphANNIS
- #66 Automatic creation of inherited coverage edges
- #65 Add a new adjecency list based graph storage for dense components.
- #62 Warn about missing coverage edges instead of failing the whole import
- #61 Implement the equal and not equal value operators
- #59 Nodes are not deleted from graph storages via the "applyUpdate" API
- #55 Subgraph query does not work if there is no coverage component.
- #54 Check all existing matches when checking reflexivity
- #58 Implement ^ (near) operator
- #57 Implement ":arity" (number of outgoing edges) unary operator
- #52 Use CSV files for query set definition
- #50 Non-reflexive operator join on "any token search" leads to non-empty result
- #48 Importing PCC 2.1 corpus hangs at "calculating statistics for component LeftToken/annis/"
- #46 Filter not applied for negated annotation search
- #45 Travis configuration used wrong repository and could not deploy release binaries
- #44 Add support for the
_l_
and_r_
alignment AQL operators - #43 Automatic creation of left- and right-most token edges
- #42 Remove inverse coverage and inverse left-/right-most token edges
- #41 Add value negation
- #38 Add an mdBook based documentation
- #36 Add function to only extract a subgraph with components ofa given type
- #34 Fix loading of edge annotation storages
- #33 Improve memory usage of the relANNIS importer
- #32 Faster and more flexible sort of results in "find" function
- #31 Reorder result in find also when acting as a proxy.
- #30 Fix most of the queries in the benchmark test test
- #29 Use the std::ops::Bound class to mark the upper value instead of relaying on usize::max_value()
- #26 Docs.rs does not build because "allocator_api" is not enabled on their rustc
- #24 Implement regular expression search for edge annotations.
- #23 Update the C-API to reflect the changes in the Rust API
- #22 Use the published graphannis-malloc_size_of crate
- #21 Restructure and document the public API
- #15 Move all modules into a private "annis" sub-module
- #14 Simplify the code for the graph storage registry
- #13 Save memory in the annotation storage
- #12 Improve speed of loading adjacency list graph storages
- #11 Use criterion.rs library for benchmarks
- #10 Better error reporting for C-API
- #8 Implement AQL parser and replace JSON query representations with AQL
- #9 Wait for all background writers before dropping the CorpusStorage
- #7 Use error-chain crate for internal error management
- #6 Use features of a single crate instead of multiple crates
- #5 Allow to delete corpora from the command line
- #4 Use file lock to prevent opening the same GraphDB in different processes
- #3 Fix automatic creation of binaries using CI for releases
First release of the Rust port of graphANNIS from C++.
- #23 Problems loading the cereal archive under Windows
- #22 Use text-book function for estimating the selectivity for the abstract edge operator
- #21 Allow to load query in console from file
- #20 UniqueDFS should output each matched node only once, but still visit each node.
- #14 Do not iterate over covered text positions but use the token index
- #13 Fix duplicate matches in case a const anno value is used in a base search
- #19 Update the re2 regex library and make sure it is compiled with -O3 optimizations
- #18 Perform more pessimistic estimates for inclusion and overlap operators
- #17 Optimize meta data search
- #16 Allow base node search by membership in a component
- #15 Better handling of Regular Expressions on a RHS of an index join
- #12 Add support for relANNIS style multiple segmentation
- #8 Fix shared/unique lock handling in CorpusStorageManager when component needs to be loaded
- #4 Node names should include the document name (and the URL specific stuff) when imported from Salt.
- #11 Optimize unbound regex annotation searches
- #10 Do some small enhancements to regex handling
- #9 Add an API to query subgraphs
- #7 Support OR queries
- #6 Add metadata query support
- #5 Add a SIMD based join
- #4 Node names should include the document name (and the URL specific stuff) when imported from Salt.
- #3 Make the graphANNIS API for Java an OSGi bundle
- #2 Avoid local minima when using the random query optimizer
- #1 Use "annis" instead of "annis4_internal" as namespace
Initial development release with an actual release number.
There has been the benchmark-journal-2016-07-27 tag before which was used in a benchmark for a paper. Since then the following improvements have been made:
- using an edge annotation as base for a node search on the LHS of the join
- adding parallel join implementations
This release is also meant to test the release cycle (e.g. Maven Central deployment) itself.