Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor/create package #1

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

jorenretel
Copy link

Hi Yair,

very nice work. I have a suggestion that I wrapped here in this pull request. By a little bit of restructuring and adding setup.cfg and pyproject.toml files, the code now becomes a package that can be installed like:

pip install -e .

(when located in the repo root). This makes the namespacing a bit easier and prevents needing that manually setting the src dir on the PYTHONPATH that you are doing here:

export PYTHONPATH="${PWD}"

I also put the "caduceus" HuggingFace code inside that package so that there is no name clash between the new "caduceus" python package and the "caduceus" directory.

This is just a suggestion. Hope you will consider it.

Best,

Joren

@yair-schiff
Copy link
Contributor

Hi @jorenretel, thank you very much for this suggestion. I appreciate the pr. Were you able to verify that the code runs as expected with the proposed changes?

@jorenretel
Copy link
Author

Hi @yair-schiff, thanks for your answer. Sorry for getting back to you only now. I had a major life event happening last week, so I was kind of distracted ;-). Everything was working I think, but I had some problems installing Mamba as this was not trivial. I will merge you latest changes into my branch, check again and get back to you.

@jorenretel jorenretel marked this pull request as draft March 21, 2024 10:37
@jorenretel
Copy link
Author

I figured out why my mamba-ssm install was not working. Had a bad pip cache laying around. My bad.

What I did to verify whether things are working in this refactored version is to run train.py as you indicated. After fixing a few settings, this is working:

image

The environment that I am using for this, is based on this environment.yml :

name: caduceus-env
channels:
  - conda-forge
  - pytorch
  - nvidia
dependencies:
  - python==3.10
  - pip
  - nvidia::cuda-nvcc=12.1
  - pytorch::pytorch-cuda=12.1
  - pytorch::pytorch
  - pandas
  - pip:
      - -e .

That last line installs the caduceus library in editable form in the conda environment (assuming that environment.yml is located in the root of the repository). The other pip dependencies of caduceus are then also automatically installed, because they are listed in the setup.cfg .

Note: I can also move src/caduceus dir one level up to omit one level of nested directories. It's probably a matter of taste whether to do this or not. I initially didn't because there was already a directory called "caduceus", but in the end had to move that inside the library as well because other python files were importing things from there. I would be happy to make that one more refactor to omit the "src" dir from the tree if you prefer that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants