-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(cram): slice local files #11
Conversation
… type (e.g., calls slice_cram when slicing genomic data
Updating 'slice' branch with 'main'.
…ust needs the data_type and data_path logic to be added in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks 👍 I've made a few suggestions and comments and will push a few commits to address them.
modo/cram.py
Outdated
def slice_array(): # to be defined after we get more knowledge of how the | ||
return None # data looks like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def slice_array(): # to be defined after we get more knowledge of how the | |
return None # data looks like | |
def slice_array(): | |
raise NotImplementedError("Slicing of arrays not supported yet") |
suggestion: explicit error message.
modo/cram.py
Outdated
Examples | ||
-------- | ||
>>> slice("data/ex1/demo1.cram", "chr1:100-200") | ||
def slice(data_name: str, data_path: str, coords: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: passing a DataEntity
here would reduce the number of arguments needed. All the information we need (path, name, ...) is encapsulated in the object.
modo/cram.py
Outdated
... | ||
|
||
# split up coordinate string "chr:start-end" into its three elements | ||
coords = coords.replace("-", ":") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix: this is dangerous, as it is possible for a chromosome name to contain -
. This would break in such cases (chr1-human:1-100
-> chr1:human:1:100
-> 4 parts).
modo/cram.py
Outdated
Usage: | ||
----- | ||
>>> x = slice("data/ex1/demo1.cram", "chr1:100-200") | ||
>>> for read in x: | ||
print(read) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usage: | |
----- | |
>>> x = slice("data/ex1/demo1.cram", "chr1:100-200") | |
>>> for read in x: | |
print(read) |
docs: this usage describes the slice
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is left over from before the split into slice() and slice_cram(), and I forgot to change this to slice_cram()
I have added a slice() and a slice_cram() functions.
MODO.slice(data: str, coords: str) will call slice(), which in turn will call the right slicing function depending on the data format/type that is being sliced (e.g., if it is a CRAM file, it will call slice_cram())
The two things missing ... please add them in ... are,
(1) determining the data_type (CRAM/array/etc.) of the data, so the right slicing function can be called.
(2) determining the data_path, so the slicing function will know where the data is, and whether it is a local file or on a server (logic for this will be added in later, currently it deals with local files).