-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URI and Path specifier classes for input/output resources. #34
Conversation
Arghh... the tests fail on Windows because I hardcoded "/" in my test path names. Are we really supporting Windows ? |
// URIs with an embedded fragemnt delimiter ("#"); if the file scheme is included, the rest of the path | ||
// must already be escaped; ifno file scheme is included, the path will be escaped by the URI class | ||
{"../data/utils/testDirWith#InName/testTextFile.txt", "Test file."}, | ||
{"file:///" + getCWD() + "../data/utils/testDirWith%23InName/testTextFile.txt", "Test file."}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to resolve how to reference these paths without "..". Need to resolve #19.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you can use Paths.get(getCWD() + "../data/utils/testDirWith%23InName/testTextFile.txt").toUri().toString()
, if the problem is with Windows. I can test if you want locally and let you know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several problems here; the original one about where/how we should store test data, and the Windows one where the path separator is different. The latter has two parts 1) how we specify paths in tests 2) How we test Windows paths (UNC paths, dos drive specifiers, encoded names with embedded spaces etc).
Also, I know you're not yet convinced we need these classes, but for these tests in particular, I want to make sure the raw strings are passed directly to the classes being tested, not strings that have been pre-processed by Paths
and URI
, since I'm trying to test that the classes themselves do this correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, there is the issue that Windows paths (with forward slashes) don't appear to be valid URIs ?
@cmnbroad - I would like to support Windows, also because of the hardcoded "/" that are all around htsjdk2. One of the major points of supporting Windows is that, as we are using java, it will be nice to be multi platform (and I have a project in mind for a desktop app that I will test mostly in Windows). There is no much to do to support windows if we don't use hardcoded "/", as far as we are using By the way, I will start the review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit against the class in the first place because this looks more like a way to distinguish when a passed String
should be considered a path or something else that should be handled with a different reader (record-provider) implementation. I think that handle URI-specific thinks like this should be done at the reader-factory level; we can provide a factory for GenomicDB that takes the URI with the scheme and handle a Path as if it has it (as a different artifact or in a different repository). Then GATK or other framework that requires to provide a common implementation for handling resources could do their own URI
class, which I think that it's out-of-scope for htsjdk3 itself.
@@ -0,0 +1,86 @@ | |||
package org.htsjdk.core.api; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that a better place will be org.htsjdk.core.api.io
, to keep things more organized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
/** | ||
* Interface representing htsjdk-next input/output resources as URIs. | ||
*/ | ||
public interface PathURI { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this will represent a resource, what's about IOResource
? I'm not sure if it is better, but I don't like the PathURI
name...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I don't love the names. IOResource might be an improvement - we'd still need a name for the default implementation class (currently called PathSpecifier
for lack of a better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to IOResource; still not sure about PathSpecifier.
/** | ||
* @return true if this URI has a scheme that has an installed NIO file system provider. This does not | ||
* guarantee the URI can be converted into a {@code java.nio.file.Path}, since the URI can be syntactically | ||
* valid, and specify a valid file system provider, but still fail to be semantically meaningful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to have properly formatted javadocs. Most of the information should be at the begining, and the @return
tag should contain just the information about the return value (in this case, {@code true} if the URI has a installed NIO filesystem provider; {@code false} otherwise
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* guarantee the URI can be converted into a {@code java.nio.file.Path}, since the URI can be syntactically | ||
* valid, and specify a valid file system provider, but still fail to be semantically meaningful. | ||
*/ | ||
boolean isNIO(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly to the new disq, I don't like this isNio
name in the method. I prefer something like hasFileSystem
or throw a meaningful exception if a Path
could not be retrieved to be catched by client code...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clients can ignore this and do exactly as you describe (just call toPath, which will throw).
* | ||
* hdfs://namenode/to/file | ||
* | ||
* The current implementation returns false for these cases (toPath will fail, getInvalidPathReason |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@Override | ||
public OutputStream getOutputStream() { | ||
if (!isPath()) { | ||
throw new HtsjdkException(getToPathFailureReason()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No raw HtsjdkException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - perhaps we need a separate discussion about how we want to partition the exception namespace. #36.
try { | ||
return Files.newOutputStream(resourcePath); | ||
} catch (IOException e) { | ||
throw new HtsjdkException(String.format("Could not open output stream for %s (as URI %s)", getRawInputString(), getURIString()), e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No raw HtsjdkException
@@ -0,0 +1,254 @@ | |||
package org.htsjdk.core.utils; | |||
|
|||
//TODO: NAMING: is there any useful distinction between Unit/Integration tests in this repo ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should set this naming conventions, as we did not though about that yet.
{"file:project/gvcf-pcr/23232_1#1/1.g.vcf.gz", "file:project/gvcf-pcr/23232_1#1/1.g.vcf.gz", true, false}, | ||
{"file:/project/gvcf-pcr/23232_1#1/1.g.vcf.gz", "file:/project/gvcf-pcr/23232_1#1/1.g.vcf.gz", true, false}, | ||
|
||
{"gendb://somegdb", "gendb://somegdb", false, false}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that this is the main reason to have the class in the first place (see the overall comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my overall comment describing the reasons for proposing these classes.
@@ -0,0 +1,2 @@ | |||
Test file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also decide on the naming conventions for test data. Also, as I commented before, it will be nice that the test-resources are located in the src
to be packaged in the test artifact; otherwise, test won't run if the test jar is isolated.
@@ -30,6 +30,9 @@ subprojects { | |||
|
|||
dependencies { | |||
testCompile "org.testng:testng:6.14.3" | |||
testCompile "com.google.jimfs:jimfs:1.1" | |||
|
|||
compile 'commons-io:commons-io:2.5' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to mention: is this needed in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch - its left over from previous changes and can be removed.
I think there is value in having an interface thats more strongly typed than |
I have a proposal for the In #42 I included a proposal; maybe we can discuss on that PR or just here. I did not add to the design a method for the getting the raw/string representation, but actually that could make sense to be able to retrieve it. I will copy-paste the generated HTML from #42 here to show what I propose. |
Sorry, I could only get a temporary PDF because I had some problems with the computer. But it should be enough for the discussion: IOResourceProvider.pdf |
@magicDGS I'm sorry I haven't gotten a chance to comment on this properly yet. I think having a fairly lightweight interface that is essentially a more convenient URI is a good thing to have. My thought is that we have essentially 2 different things that need to be referred to by uri's of some sort.
|
@lbergelson - I am designing (like in my other comment, in my mind by now due to lack of time for personal reasons) the whole interface for IO in a similar way as #42 (that was just a proof-of-concept on how to write and propose this kind of things in a live document in GitHub, which will be nice for several reasons that we can discuss in Slack):
The idea behind this design is:
|
public class PathSpecifier implements IOResource, Serializable { | ||
private static final long serialVersionUID = 1L; | ||
|
||
private final String rawInputString; // raw input string provided by th user; may or may not have a scheme |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private final String rawInputString; // raw input string provided by th user; may or may not have a scheme | |
private final String rawInputString; // raw input string provided by the user; may or may not have a scheme |
|
||
private final String rawInputString; // raw input string provided by th user; may or may not have a scheme | ||
private final URI uri; // working URI; always has a scheme (assume "file" if not provided) | ||
private transient Path cachedPath; // cache the Path associated with this URI if its "Path-able" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private transient Path cachedPath; // cache the Path associated with this URI if its "Path-able" | |
private transient Path cachedPath; // cache the Path associated with this URI if it's "Path-able" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the name of progress I approve this PR.
Not sure if there are better names or package locations.