Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

amrfinder_update does NOT respect DEFAULT_DB_DIR while amrfinder -u does #89

Open
EricDeveaud opened this issue Jun 3, 2022 · 11 comments
Labels
enhancement New feature or request

Comments

@EricDeveaud
Copy link

Hello,

amrfinder_update does not respect DEFAULT_DB_DIR when build with make DEFAULT_DB_DIR=somewhere

while amrfinder -u as it calls exec (fullProg ("amrfinder_update") + " -d " + shellQuote (dbDir. getParent ()) + ifS (force_update, " --force_update")

also amrfinder_update still place the data in the same location the binary is located. related #8

regards

Eric

@evolarjun
Copy link
Contributor

Hi Eric,

Thanks for writing, and yes the database location behavior is perhaps not ideal. We spent some time discussing alternatives so I'm glad to hear your opinions.

First off the make variable DEFAULT_DB_DIR affects the behavior of amrfinder -u and amrfinder -U. To be honest the program amrfinder_update was designed primarily for internal use and I documented it because a few people wanted to have more control over the update process.

And yes, the default location is still a subdirectory of the location of the binary. We fixed that for the conda release by compiling with a different DEFAULT_DB_DIR which you've figured out how to do. Is there a reason you can't use amrfinder -U or amrfinder -u?

What we came up might not be the ideal solution, but the alternatives we could think of all have issues. We could hardcode the directory to something like ../amrfinder/data, but that could cause problems in some cases. Otherwise we would have to indicate the database directory somehow, either using an environment variable or configuration file which would cause additional installation complexity. We decided the current behavior was the simplest default solution (and it was the first one implemented). We left the grandfathered-in ./data as the default instead of changing it to something more descriptive like ./amrfinderplus-database which would arguably be better.

Thanks for your interest, and I'd be happy to hear any thoughts you have.

Arjun

@EricDeveaud
Copy link
Author

Hello Arjun.

In my opinion data should not go to bin folder. bin must contains only executables.
there is plenty of standard location possible for data eg first that comes to mind is share and or using environnement variable that allow users to overwrite the data location

I would say default data location root should be PREFIX/share/amrfinder and AMRFINDER_DB environnement variable if set will allow to overwrite this value
PLUS amrfinder -u, amrfinder -U and amrfinder_update when run with no database dir specified should honor the default location

currently amrfinder -u, amrfinder -U conforms to this scenarion (no database dir given) do the job as expected on the expected location when compiled with DEFAULT_DB_DIR

hardcoding the directory to to something like ../amrfinder/data will also be a bad idea. sorry to be rude.

sorry I'm dumb at C++ so I can't provide a pull request.

of course I can use only amrfinder -[u|U] I've spotted the problem with armfinder_update while setting the test suite for my fresh installation of your tool on our cluster.
It may confuse some of our users.

regards

Eric

@vbrover
Copy link
Contributor

vbrover commented Jun 3, 2022

I have an opposite extreme idea that all code+data must be in the same place and a user should only update this as one whole. (This is the OOP idea.)
From this idea it follows that there is no need to maintain separate software and data versions, no need of parameters to indicate non-standard data locations, no need for the -U parameter, etc.

@EricDeveaud
Copy link
Author

different point of view ;-)
this is the cause of the headache when I have to install some stuff on our cluster.
central installation vs untar and run from the lolcation just expanded.
how do you handle soft like blast and associated databases (nt, genbank, and so on) with your strategy. each user should download and index their own DBs (what a waste of soace....)
most cluster does not have internet access on compute nodes. eg ours so users won't be abble to download the DBs so in this case it means amrfinder won't be useable.
as packager I fully disagreee with this one.
your mileage may vary .

regards
Eric

@vbrover
Copy link
Contributor

vbrover commented Jun 4, 2022

Could you answer these questions:

  1. you want a central (single) installation, or a local installation on each cluster computer?
  2. does a cluster computer have an internet connection?
  3. does a cluster computer have a C++ compiler?
  4. do all cluster computers have the same UNIX version?
  5. do all cluster computers have access to the same central directory?

@EricDeveaud
Copy link
Author

our cluster setup is quite standard

  1. nop. all our stuff is installed from a dedicated computer to a NFS shared drive mounted read only on each clutser node
  2. no compute nodes does not have internet acces. only submit nodes can acces internet. and we do not allow (weel we discourage) users to run jobs on submit nodes
  3. yes ;-)
  4. yes
  5. yes

Eric

@vbrover
Copy link
Contributor

vbrover commented Jun 6, 2022

  1. you want a central (single) installation, or a local installation on each cluster computer?
  1. nop. all our stuff is installed from a dedicated computer to a NFS shared drive mounted read only on each clutser node

Then you need to install the AMRFinderPlus executables and data only once on the NFS shared drive.
If the executables are installed in the directory $EXEC and the data in the directory $DATA, and these directories are visible from all cluster computers, then AMRFinderPlus can be invoked by the same command on each cluster computer:

$EXEC/amrfinder -d $DATA ...

Why does it matter where exactly $DATA is located?
($DATA can be also PREFIX/share/amrfinder.)

@EricDeveaud
Copy link
Author

many thanks for the explanations.... ;-)

but did you ever read the original question ?
problem was not installation scheme but that amrfinder_update DOES NOT HONOUR the DEFAULT_DB_DIR build directive while amrfinder -u does.

can we go back to the original topic.
this is not a problem of how to install nor does data and binaries should be located in the same location (bad idea IMHO).

regards

Eric

@vbrover
Copy link
Contributor

vbrover commented Jun 6, 2022

amrfinder_update DOES NOT HONOUR the DEFAULT_DB_DIR

Why is it supposed to?

@EricDeveaud
Copy link
Author

maybee because armfinder -u do.

@evolarjun evolarjun added the enhancement New feature or request label Jul 15, 2022
@EricDeveaud
Copy link
Author

thanks for the fix

Eric

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants