Skip to content

Commit

Permalink
dkpro#1443 - Make BratReader more forgiving
Browse files Browse the repository at this point in the history
- Convert `test__SingleDirWithoutAnnFiles__AssumesEmptyAnnFiles` to use `ReaderAssert`

Merge branch '1.12.x' into Improvement/Make_BratReader_more_forgiving__Take2

* 1.12.x:
  dkpro#1453 - Better I/O testing facilities

% Conflicts:
%	dkpro-core-io-brat-asl/src/test/java/org/dkpro/core/io/brat/BratReaderWriterTest.java
  • Loading branch information
reckart committed Jan 6, 2020
2 parents 588c228 + 173d73f commit 145a7b5
Show file tree
Hide file tree
Showing 22 changed files with 941 additions and 89 deletions.
40 changes: 24 additions & 16 deletions dkpro-core-doc/src/main/asciidoc/developer-guide/testing.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -92,31 +92,39 @@ of annotation supported by DKPro Core, e.g.:

== Testing I/O componets

The IOTestRunner class offers convenient methods to test I/O components:
The `ReaderAssert` and `WriterAssert` classes can be used to text I/O components. They allow building
AssertJ-style unit tests with DKPro Core reader and writer components.

* `testRoundTrip` can be used to test converting a format to CAS, converting it back and comparing
it to the original
* `testOneWay` instead is useful to read data and compare it to a reference file in a different
format (e.g. CasDumpWriter format). It can also be used if there a full round-trip is not possible
because some information is lost or cannot be exported exactly as ingested from the original file.
One of the simplest tests is a *round-trip test* where an input file is read using a reader for a
particular format, then written out again using a writer for the same format.

The input file and reference file path given to these methods is always considered relative to
`src/test/resources`.

.Example using `testRoundTrip` with extra parameters (Conll2006ReaderWriterTest)
.Example of a round-trip test
[source,java,indent=0]
----
include::{source-dir}dkpro-core-io-conll-asl/src/test/java/de/tudarmstadt/ukp/dkpro/core/io/conll/Conll2006ReaderWriterTest.java[tags=testRoundTrip]
----

.Example using `testOneWay` with extra parameters (Conll2006ReaderWriterTest)
The reader is set up to reader the test input file. Instead of setting `PARAM_SOURCE_LOCATION`, it is
also possible to set the input location using `readingFrom()`. The writer automatically makes use of
a test output folder provided by a `DkproTestContext` - therefore a target location does not need to
be configured explicitly.

Assuming the writer produces only a single output file, this file can be accessed for
assertions using `outputAsString()`. If multiple output files are created, an argument can be passed
to that method, e.g. `outputAsString("output.txt")`. This will look for a at the target location whose
name ends in `output.txt`. If there is none or more than one matching file, the test will fail.

If the original input file is in a different format or cannot be fully reproduced by the writer,
then it is easy to set up a *one way test*, simply by changing the final comparison. The following
example also shows how to specify additional parameters on the reader or writer.

.Example of a one-way test
[source,java,indent=0]
----
include::{source-dir}dkpro-core-io-conll-asl/src/test/java/de/tudarmstadt/ukp/dkpro/core/io/conll/Conll2006ReaderWriterTest.java[tags=testOneWay]
----

.Example using `testRoundTrip` with extra parameters (BratReaderWriterTest)
[source,java,indent=0]]
----
include::{source-dir}dkpro-core-io-brat-asl/src/test/java/de/tudarmstadt/ukp/dkpro/core/io/brat/BratReaderWriterTest.java[tags=testOneWay]
----
In order to test the ability of readers to read multiple files, the `asJCasList()` method can be used.
While pipelines typically re-use a single CAS which is repeatedly reset and refilled, this method
generates a list of separate CAS instances which can be individually validated after the test. To
access elements of the list use `element(n)`.
Original file line number Diff line number Diff line change
Expand Up @@ -751,7 +751,8 @@ private Mapping getDefaultMapping() {
"Token"
};
for (String typeName: segTypeNames) {
String aType = "de.tudarmstadt.ukp.dkpro.core.api.segmentation.type." + typeName;
String aType = "de.tudarmstadt.ukp.dkpro.core.api.segmentation.type."
+ typeName;
txtTypeMappingLst.add(new TypeMapping(typeName, aType));
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,14 @@
*/
package org.dkpro.core.io.brat;

import static java.nio.charset.StandardCharsets.UTF_8;
import static java.util.Arrays.asList;
import static org.apache.uima.fit.factory.AnalysisEngineFactory.createEngine;
import static org.apache.uima.fit.factory.AnalysisEngineFactory.createEngineDescription;
import static org.apache.uima.fit.factory.CollectionReaderFactory.createReader;
import static org.apache.uima.fit.factory.CollectionReaderFactory.createReaderDescription;
import static org.assertj.core.api.Assertions.assertThat;
import static org.assertj.core.api.Assertions.contentOf;
import static org.dkpro.core.testing.IOTestRunner.testOneWay;
import static org.dkpro.core.testing.IOTestRunner.testRoundTrip;
import static org.junit.Assert.assertEquals;
Expand All @@ -44,22 +47,16 @@
import org.apache.uima.fit.component.JCasAnnotator_ImplBase;
import org.apache.uima.fit.pipeline.SimplePipeline;
import org.apache.uima.jcas.JCas;
import org.apache.uima.resource.ResourceInitializationException;
import org.dkpro.core.api.io.JCasResourceCollectionReader_ImplBase;
import org.dkpro.core.io.brat.BratReader;
import org.dkpro.core.io.brat.BratWriter;
import org.dkpro.core.io.conll.Conll2009Reader;
import org.dkpro.core.io.conll.Conll2012Reader;
import org.dkpro.core.testing.DkproTestContext;
import org.dkpro.core.testing.EOLUtils;
import org.dkpro.core.testing.ReaderAssert;
import org.dkpro.core.testing.assertions.AssertFile;
import org.junit.Assert;
import org.junit.Ignore;
import org.junit.Rule;
import org.junit.Test;

//NOTE: This file contains Asciidoc markers for partial inclusion of this file in the documentation
//Do not remove these tags!
public class BratReaderWriterTest
{

Expand Down Expand Up @@ -101,8 +98,9 @@ public void test__BratDirectory__ContainingOnlyAnnotationsForStandardDKProUimaTy
}

@Test
public void OLD_TO_DELETE__test__SingleTxtFileWithoutAnAnnFile__AssumesEmptyAnnFiles() throws Exception {

public void OLD_TO_DELETE__test__SingleTxtFileWithoutAnAnnFile__AssumesEmptyAnnFiles()
throws Exception
{
File bratOrigDir = new File("src/test/resources/brat/");
File txtFileRef = new File(bratOrigDir, "document0a.txt");
boolean deleteAnnFiles = true;
Expand Down Expand Up @@ -136,7 +134,8 @@ public void test__SingleTxtFileWithoutAnAnnFile__AssumesEmptyAnnFiles() throws E
// txtFileRef, txtFile, expectEmptyAnnFiles);

boolean deleteAnnFiles = true;
File tempInputsDir = copyBratFilesToTestInputsDir(new File("src/test/resources/brat/"), deleteAnnFiles);
File tempInputsDir = copyBratFilesToTestInputsDir(new File("src/test/resources/brat/"),
deleteAnnFiles);
File tempInputTxtFile = new File(tempInputsDir, "document0a.txt");

Map<String,Object> readerParams = new HashMap<String,Object>();
Expand All @@ -152,34 +151,55 @@ public void test__SingleTxtFileWithoutAnAnnFile__AssumesEmptyAnnFiles() throws E
}

@Test
public void test__SingleDirWithoutAnnFiles__AssumesEmptyAnnFiles() throws Exception {
File bratOrigDir = new File("src/test/resources/brat-only-std-types/");
File txtFileRef = bratOrigDir;
boolean deleteAnnFiles = true;
File tempDir = copyBratFilesToTempLocation(bratOrigDir, deleteAnnFiles);
File txtFile = tempDir;
public void test__SingleDirWithoutAnnFiles__AssumesEmptyAnnFiles() throws Exception
{
ReaderAssert
.assertThat(BratReader.class)
.readingFrom("src/test/resources/text-only")
.usingWriter(BratWriter.class)
.asFiles()
.allSatisfy(file -> {
// The ".ann" files have been freshly generated and are empty
if (file.getName().endsWith(".ann")) {
assertThat(contentOf(file)).isEmpty();
}
// The ".text" files should match the originals
if (file.getName().endsWith(".txt")) {
assertThat(contentOf(file)).isEqualToNormalizingNewlines(
contentOf(new File("src/test/resources/text-only",
file.getName())));
}
})
.extracting(File::getName)
.containsExactlyInAnyOrder("annotation.conf", "document0a.ann", "document0a.txt",
"document0b.ann", "document0b.txt", "document0c.ann", "document0c.txt",
"document0d.ann", "document0d.txt", "visual.conf");

boolean expectEmptyAnnFiles = true;
testReadWrite(
createReader(BratReader.class,
BratReader.PARAM_SOURCE_LOCATION, txtFile.toString()),
createEngine(BratWriter.class,
BratReader.PARAM_SOURCE_LOCATION, txtFile.toString()),
txtFileRef, txtFile, expectEmptyAnnFiles);
}
// File bratOrigDir = new File("src/test/resources/brat-only-std-types/");
// File txtFileRef = bratOrigDir;
// boolean deleteAnnFiles = true;
// File tempDir = copyBratFilesToTempLocation(bratOrigDir, deleteAnnFiles);
// File txtFile = tempDir;
//
// boolean expectEmptyAnnFiles = true;
// testReadWrite(
// createReader(BratReader.class, BratReader.PARAM_SOURCE_LOCATION,
// txtFile.toString()),
// createEngine(BratWriter.class, BratReader.PARAM_SOURCE_LOCATION,
// txtFile.toString()),
// txtFileRef, txtFile, expectEmptyAnnFiles);
}

@Test
public void testConll2009()
throws Exception
{
// tag::testOneWay[]
testOneWay(
createReaderDescription(Conll2009Reader.class), // the reader
createEngineDescription(BratWriter.class, // the writer
BratWriter.PARAM_WRITE_RELATION_ATTRIBUTES, true),
"conll/2009/en-ref.ann", // the reference file for the output
"conll/2009/en-orig.conll"); // the input file for the test
// end::testOneWay[]
}

@Test
Expand Down Expand Up @@ -405,14 +425,17 @@ BratWriter.PARAM_RELATION_TYPES, asList(
public void testBratWithDiscontinuousFragmentNear()
throws Exception
{
testRoundTrip(createReaderDescription(BratReader.class,
ReaderAssert.assertThat(BratReader.class,
BratReader.PARAM_TEXT_ANNOTATION_TYPE_MAPPINGS,
asList("Token -> de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.Token",
"Organization -> de.tudarmstadt.ukp.dkpro.core.api.ner.type.Organization",
"Location -> de.tudarmstadt.ukp.dkpro.core.api.ner.type.Location")),
createEngineDescription(BratWriter.class, BratWriter.PARAM_ENABLE_TYPE_MAPPINGS,
true),
"brat/document0c.ann");
"Location -> de.tudarmstadt.ukp.dkpro.core.api.ner.type.Location"))
.readingFrom("src/test/resources/brat/document0c.ann")
.usingWriter(BratWriter.class,
BratWriter.PARAM_ENABLE_TYPE_MAPPINGS, true)
.outputAsString("document0c.ann")
.isEqualToNormalizingNewlines(
contentOf(new File("src/test/resources/brat/document0c.ann"), UTF_8));
}

@Test
Expand Down Expand Up @@ -499,9 +522,9 @@ private File copyBratFilesToTestInputsDir(File bratDir)
return copyBratFilesToTestInputsDir(bratDir, null);
}

private File copyBratFilesToTestInputsDir(File bratDir, Boolean deleteAnnFiles)
throws IOException {
private File copyBratFilesToTestInputsDir(File bratDir, Boolean deleteAnnFiles)
throws IOException
{
File testContextDir = testContext.getTestOutputFolder();

if (deleteAnnFiles == null) {
Expand Down Expand Up @@ -569,17 +592,19 @@ private void testOneWaySimple(Map<String,Object> readerParams, Map<String,Object

}

private void assertSingleBratFileOK(Map<String, Object> readerParams, Map<String, Object> writerParams)
throws Exception {
private void assertSingleBratFileOK(Map<String, Object> readerParams,
Map<String, Object> writerParams)
throws Exception
{
File sourceLocation = (File) readerParams.get(BratReader.PARAM_SOURCE_LOCATION);
File targetLocation = (File) writerParams.get(BratWriter.PARAM_TARGET_LOCATION);

File sourceTxt = new File(sourceLocation.toString().replaceAll("\\.ann$", ".txt"));
File sourceAnn = new File(sourceLocation.toString().replaceAll("\\.txt$", ".ann"));

String sourceFileName = sourceTxt.getName().replaceAll("\\.txt$", "");
File targetTxt = new File(targetLocation, sourceFileName+".txt");
File targetAnn = new File(targetLocation, sourceFileName+".ann");
File targetTxt = new File(targetLocation, sourceFileName + ".txt");
File targetAnn = new File(targetLocation, sourceFileName + ".ann");

AssertFile.assertFilesHaveSameContent("Outputed .txt file not same as input one",
sourceTxt, targetTxt);
Expand All @@ -588,8 +613,9 @@ private void assertSingleBratFileOK(Map<String, Object> readerParams, Map<String

}

private Object[] paramsMap2Arr(Map<String, Object> paramsMap) {
Object[] paramsArr = new Object[2*paramsMap.keySet().size()];
private Object[] paramsMap2Arr(Map<String, Object> paramsMap)
{
Object[] paramsArr = new Object[2 * paramsMap.keySet().size()];
int pos = 0;
for (String paramName: paramsMap.keySet()) {
paramsArr[pos] = paramName;
Expand All @@ -600,14 +626,14 @@ private Object[] paramsMap2Arr(Map<String, Object> paramsMap) {
return paramsArr;
}

private File copyBratFilesToTempLocation(File bratDir)
throws IOException {
private File copyBratFilesToTempLocation(File bratDir) throws IOException
{
return copyBratFilesToTempLocation(bratDir, null);
}

private File copyBratFilesToTempLocation(File bratDir, Boolean deleteAnnFiles)
throws IOException {
private File copyBratFilesToTempLocation(File bratDir, Boolean deleteAnnFiles)
throws IOException
{
if (deleteAnnFiles == null) {
deleteAnnFiles = false;
}
Expand All @@ -628,13 +654,15 @@ private File copyBratFilesToTempLocation(File bratDir, Boolean deleteAnnFiles)
return tempDir.toFile();
}

public File getBratOutputsDir() {
File testContextDir = testContext.getTestOutputFolder(false);
return new File(testContextDir, "outputs");
}
public File getBratOutputsDir()
{
File testContextDir = testContext.getTestOutputFolder(false);
return new File(testContextDir, "outputs");
}

public File getBratInputsDir() {
File testContextDir = testContext.getTestOutputFolder(false);
return new File(testContextDir, "inputs");
}
public File getBratInputsDir()
{
File testContextDir = testContext.getTestOutputFolder(false);
return new File(testContextDir, "inputs");
}
}
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This is a test.
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
This is a test.
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
This is
a test.
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
This is
an other test.
Original file line number Diff line number Diff line change
Expand Up @@ -17,31 +17,31 @@
*/
package org.dkpro.core.io.conll;

import static org.dkpro.core.testing.IOTestRunner.testOneWay;
import static org.dkpro.core.testing.IOTestRunner.testRoundTrip;
import static java.nio.charset.StandardCharsets.UTF_8;
import static org.assertj.core.api.Assertions.contentOf;

import java.io.File;

import org.dkpro.core.testing.DkproTestContext;
import org.junit.Ignore;
import org.dkpro.core.testing.ReaderAssert;
import org.junit.Rule;
import org.junit.Test;

//NOTE: This file contains Asciidoc markers for partial inclusion of this file in the documentation
//Do not remove these tags!
public class Conll2006ReaderWriterTest
{
// Deleted the test file here because it was malformed *and* we had no provenance info.
// However, leaving the test in right now and ignoring it because it is used in the
// documentation.
@Ignore()
@Test
public void roundTrip()
throws Exception
{
// tag::testRoundTrip[]
testRoundTrip(
Conll2006Reader.class, // the reader
Conll2006Writer.class, // the writer
"conll/2006/fk003_2006_08_ZH1.conll"); // the input also used as output reference
ReaderAssert.assertThat(Conll2006Reader.class) // the reader
.readingFrom("src/test/resources/conll/2006/fi-ref.conll") // the test input file
.usingWriter(Conll2006Writer.class) // the writer
.outputAsString() // access writer output
.isEqualToNormalizingNewlines( // compare to input file
contentOf(new File("src/test/resources/conll/2006/fi-ref.conll"), UTF_8));
// end::testRoundTrip[]
}

Expand All @@ -50,11 +50,14 @@ public void testFinnTreeBank()
throws Exception
{
// tag::testOneWay[]
testOneWay(
Conll2006Reader.class, // the reader
Conll2006Writer.class, // the writer
"conll/2006/fi-ref.conll", // the reference file for the output
"conll/2006/fi-orig.conll"); // the input file for the test
ReaderAssert.assertThat(Conll2006Reader.class, // the reader
Conll2006Reader.PARAM_SOURCE_ENCODING, "UTF-8") // reader parameter
.readingFrom("src/test/resources/conll/2006/fi-orig.conll") // the test input file
.usingWriter(Conll2006Writer.class, // the writer
Conll2006Writer.PARAM_TARGET_ENCODING, "UTF-8") // writer parameter
.outputAsString("fi-orig.conll") // access writer output
.isEqualToNormalizingNewlines( // compare to input file
contentOf(new File("src/test/resources/conll/2006/fi-ref.conll"), UTF_8));
// end::testOneWay[]
}

Expand Down
Loading

0 comments on commit 145a7b5

Please sign in to comment.