Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cram): slice local files #11

Merged
merged 15 commits into from
Feb 9, 2024
Merged

feat(cram): slice local files #11

merged 15 commits into from
Feb 9, 2024

Conversation

AssafSternberg
Copy link
Contributor

I have added a slice() and a slice_cram() functions.

MODO.slice(data: str, coords: str) will call slice(), which in turn will call the right slicing function depending on the data format/type that is being sliced (e.g., if it is a CRAM file, it will call slice_cram())

The two things missing ... please add them in ... are,

(1) determining the data_type (CRAM/array/etc.) of the data, so the right slicing function can be called.

(2) determining the data_path, so the slicing function will know where the data is, and whether it is a local file or on a server (logic for this will be added in later, currently it deals with local files).

@cmdoret cmdoret changed the title Slice feat(cram): slice local files Feb 9, 2024
Copy link
Member

@cmdoret cmdoret left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 👍 I've made a few suggestions and comments and will push a few commits to address them.

modo/cram.py Outdated
Comment on lines 49 to 50
def slice_array(): # to be defined after we get more knowledge of how the
return None # data looks like
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def slice_array(): # to be defined after we get more knowledge of how the
return None # data looks like
def slice_array():
raise NotImplementedError("Slicing of arrays not supported yet")

suggestion: explicit error message.

modo/cram.py Outdated
Examples
--------
>>> slice("data/ex1/demo1.cram", "chr1:100-200")
def slice(data_name: str, data_path: str, coords: str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: passing a DataEntity here would reduce the number of arguments needed. All the information we need (path, name, ...) is encapsulated in the object.

modo/cram.py Outdated
...

# split up coordinate string "chr:start-end" into its three elements
coords = coords.replace("-", ":")
Copy link
Member

@cmdoret cmdoret Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: this is dangerous, as it is possible for a chromosome name to contain -. This would break in such cases (chr1-human:1-100 -> chr1:human:1:100 -> 4 parts).

modo/cram.py Outdated
Comment on lines 28 to 32
Usage:
-----
>>> x = slice("data/ex1/demo1.cram", "chr1:100-200")
>>> for read in x:
print(read)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Usage:
-----
>>> x = slice("data/ex1/demo1.cram", "chr1:100-200")
>>> for read in x:
print(read)

docs: this usage describes the slice function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is left over from before the split into slice() and slice_cram(), and I forgot to change this to slice_cram()

@cmdoret cmdoret merged commit dab2567 into main Feb 9, 2024
2 checks passed
@cmdoret cmdoret deleted the slice branch February 19, 2024 09:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants