Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added hardlink and copy options #172

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Zorvalt
Copy link
Contributor

@Zorvalt Zorvalt commented Apr 10, 2020

This PR adds the ability to leave original files untouched and copy or create a hard link to them.
I think it answers issues partially #46, I think #40 and #10

I added the command line options:

  • -C or --copy To copy a file (compatible with -d option)
  • -l or --link To hard link a file (compatible with -d option)

I added the config options:

  • always_copy To always copy files
  • prefer_hardlink_to_copy To create a hardlink when possible if copying
  • always_hardlink To always hardlink

I am quite new to this tool (and to contributing) so I may have not seen some corner cases.
The biggest change I had to do is renaming file after moving/copying them. This was to avoid making too many changes to the code base.

Copy link
Owner

@dbr dbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! Seems like a useful addition

Two things:

  1. I'll merge Migrate from os.rename to shutil.move #161 just now - would you mind rebasing your changes on to this?

  2. Would it be possible to merge the options into one "how to move the file" option?

    As in instead of separate always_copy and always_hardlink booleans (which are mutually exclusive), it would be neater if there was something like a "move mode" type enum option (which could be set to "always copy", or "hardlink or copy" etc)

    Would need a bit of thought as to what options are needed, particularly to cover what happens when a certain operation isn't possible (e.g hardlink to a different drive isn't possible).

    This would deprecate the always_move option - I think that can be handled quite easily - if always_move is True, set the move_mode option to the appropriate setting and output a deprecation warning.

tvnamer/utils.py Show resolved Hide resolved
tvnamer/utils.py Outdated
@@ -1093,16 +1102,21 @@ def newPath(self, new_path = None, new_fullpath = None, force = False, always_co
new_fullpath, self.filename))

if same_partition(self.filename, new_dir):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, there is another pull request pending merging, #161, which will conflict with this (the same_partition stuff is removed)

@dbr
Copy link
Owner

dbr commented Apr 11, 2020

master branch should be good to rebase with if you have the time - I've also fixed up the integration tests which were failing, so the Travis status should be valid again \o/

@Zorvalt
Copy link
Contributor Author

Zorvalt commented Apr 11, 2020

Thanks for this! Seems like a useful addition

You're welcome. Thank you to you, this tools is very useful and represent a fair amount of work!

  1. I'll merge Migrate from os.rename to shutil.move #161 just now - would you mind rebasing your changes on to this?

Hmm, there is another pull request pending merging, #161, which will conflict with this (the same_partition stuff is removed)

This is indeed a problem. I could do without the same_partition function by using a try/catch but I think the code would loose some clarity over the verbosity of calling same_partition.

  1. Would it be possible to merge the options into one "how to move the file" option?
    As in instead of separate always_copy and always_hardlink booleans (which are mutually exclusive), it would be neater if there was something like a "move mode" type enum option (which could be set to "always copy", or "hardlink or copy" etc)
    Would need a bit of thought as to what options are needed, particularly to cover what happens when a certain operation isn't possible (e.g hardlink to a different drive isn't possible).
    This would deprecate the always_move option - I think that can be handled quite easily - if always_move is True, set the move_mode option to the appropriate setting and output a deprecation warning.

I thought of doing that at the beginning but did not want to propose to big of a change. I also think it would be better but will also require some refactoring. I will look into it. I need to get more into the different config options before doing anything clean.

Just to be clear, if I understand correctly the code and your proposition, your are suggesting to use move_files_enable to indicate any type of move/copy/etc. and then have a complementary move_mode as you described. I would therefor remove all the always_* options but the always_rename.

Also, sometimes functions use Config[...], sometimes they take those as parameter and some other time a mix of both. Is there a guideline of some sort ?

@Zorvalt
Copy link
Contributor Author

Zorvalt commented Apr 11, 2020

master branch should be good to rebase with if you have the time - I've also fixed up the integration tests which were failing, so the Travis status should be valid again \o/

Great! I'll rebase the code ASAP

@dbr
Copy link
Owner

dbr commented Apr 12, 2020

Just to be clear, if I understand correctly the code and your proposition, your are suggesting to use move_files_enable to indicate any type of move/copy/etc. and then have a complementary move_mode as you described. I would therefor remove all the always_* options but the always_rename.

Yep I think this is correct:

  • move_files_enable being true will act as it currently does, meaning the files will end up in move_files_destination; if false the files are renamed in-place
  • The means by which the file ends up there will be controlled by the new move_mode (e.g symlink, move, etc), and this mode is a combination of most of the always_* options
  • As you mention, always_rename is unrelated to all this - it just controls if 'yes/no' prompt

Also, sometimes functions use Config[...], sometimes they take those as parameter and some other time a mix of both. Is there a guideline of some sort ?

A lot of that is legacy of the config being introduced later because lots of functions took too many arguments

When things should be an simple argument (e.g bool/str) versus using the config is slightly tricky, as with hindsight I don't think the config should have been a global object (ideally the config would be passed to the relevant "core" parts of the code, and then when lower level functions just use one or two options, it would be clearer when the config items can be neatly passed in as arguments)

I guess it should mostly boil down to making things easy to unit-test - it is easier to write a test for a function which takes a bool than uses the config. However with methods like EpisodeInfo.generateFilename which inherently use many config options in one place, it is easier to test by varying the config

@dbr
Copy link
Owner

dbr commented Apr 12, 2020

So as a first-pass, I think the modes required are:

  • move - default: move into new location regardless of partitions
  • copy - always duplicate into new location
  • hardlink - like copy but using hardlinks at new location (reverting to copy if hardlinks not available)
  • symlink - leave the source file untouched, but create symlink at destination
  • leave_symlink - move file to destination, but replace source file with a symlink (Add leave_symlink option #59)

@Zorvalt
Copy link
Contributor Author

Zorvalt commented Apr 13, 2020

  1. I'll merge Migrate from os.rename to shutil.move #161 just now - would you mind rebasing your changes on to this?

Hmm, there is another pull request pending merging, #161, which will conflict with this (the same_partition stuff is removed)

This is indeed a problem. I could do without the same_partition function by using a try/catch but I think the code would loose some clarity over the verbosity of calling same_partition.

I started rebasing. Do I revert the same_partition function or not ?
I understand the cleanup of the code and it was needed but the program lost the ability to move between partitions, too. Was this intended ?

@Zorvalt
Copy link
Contributor Author

Zorvalt commented Apr 13, 2020

  1. I'll merge Migrate from os.rename to shutil.move #161 just now - would you mind rebasing your changes on to this?

Hmm, there is another pull request pending merging, #161, which will conflict with this (the same_partition stuff is removed)

This is indeed a problem. I could do without the same_partition function by using a try/catch but I think the code would loose some clarity over the verbosity of calling same_partition.

I started rebasing. Do I revert the same_partition function or not ?
I understand the cleanup of the code and it was needed but the program lost the ability to move between partitions, too. Was this intended ?

Ok, digging more into rename_file and then shutil.move(old, new), I understand better!
I will do without same_partition :-)

@Zorvalt Zorvalt marked this pull request as draft April 13, 2020 12:54
@Zorvalt Zorvalt force-pushed the hardlink_and_copy branch from bdbcce8 to b2558b8 Compare April 13, 2020 12:55
@Zorvalt
Copy link
Contributor Author

Zorvalt commented Apr 13, 2020

hardlink - like copy but using hardlinks at new location (reverting to copy if hardlinks not available)

I am not convinced by the idea that it should always fallback. In my use case, I would prefer it to not copy some huge files and fail or skip it with an error message.

I suggest to use two options hardlink and hardlink_or_copy.
It adds little bit of complexity but if you want to use hardlink, it is probably to avoid copying.

@Zorvalt
Copy link
Contributor Author

Zorvalt commented Apr 13, 2020

I implemented the new move_mode as a python enum with the following default configuration entry:

    # The method used to move files at configured destination
    #
    # * move (default)   - to move files
    # * copy             - to copy files
    # * hardlink         - to link files with a hard link
    # * hardlink_or_copy - to hardlink files when possible otherwise, falls back to COPY
    # * symlink          - to link files with a symbolic link
    # * leave_symlink    - to move files and replace the old ones with a symbolic link to the new location
    'move_mode': 'move',

Should I keep the always_move and leave_symlink options in the config file for retro-compatibility ? They would then simply set the right mode except that I would need to handle the case where the mode is also set at the same time. What should be the behavior then ? I suppose, an error since a user of the previous config won't see anything break.
Maybe marking them deprecated ? If so, should a warning be shown at runtime ?

@dbr
Copy link
Owner

dbr commented Apr 14, 2020

Good point - that set of options seem good to me!

For maintaining backwards compatibility, I think what you suggest is pretty much the way to do it. Specifically I would suggest:

  • Use None as the actual default value (i.e 'move_mode": None in config_defaults.py)
  • Check if move_mode is None and leave_symlink: etc etc and migrate to the appropriate move_mode, and output a warning about the deprecated setting
  • Can also check if move_mode is not None and (leave_symlink or ...): - this could then either warn or throw an error (my first thought would be an error makes most sense)

@Zorvalt
Copy link
Contributor Author

Zorvalt commented Apr 14, 2020

I pass almost all the tests but two because, as stated previously, I had to inverse to order in which we "rename then move" to "move then rename". This is necessary for the copy and linking type of moves as the idea would be to keep the original files unchanged.
The tests expect that, in case of failure to move a file (e. g. file already exists at destination without overwrite flag), the original file should have been renamed but not moved.

As you discussed in #40, the behavior should change and I agree. I think that a small step in that direction is worth taking in this PR as it is the "logical thing to do" to implement copy and linking.

Now, to issue arise: What about backwards compatibility and how to do it ?

  1. For the compatibility, as shown by the two failing tests, it would break the current behavior but I think, the current behavior is not what user expects. If this is to be treated as a bug, it's fine. Otherwise, maybe it's time for a 3.0 ? But that seems to be a big move for a small change.
  2. For the "how to do it", I think that the Renamer would be a perfect fit for the job. It would be relatively easy the transform it in a kind of "builder" on which we can call move and/or rename then, finally, an execute method would actually try to do the operation. This would allow to, later, break the multiple operations in smaller cumulative ones if needed.

The failing tests are:
test_forcefully_moving_disabled with result

Expected files: ['Scrubs - [01x01] - My First Day.avi', 'tv/Scrubs/season 1/Scrubs - [01x01] - My First Day.avi']
Got files:      ['tv/Scrubs/season 1/scrubs.s01e01.avi', 'tv/Scrubs/season 1/Scrubs - [01x01] - My First Day.avi']

test_forcefully_moving_default with result

Expected files: ['Scrubs - [01x01] - My First Day.avi', 'tv/Scrubs/season 1/Scrubs - [01x01] - My First Day.avi']
Got files:      ['tv/Scrubs/season 1/scrubs - [01x01].avi', 'tv/Scrubs/season 1/Scrubs - [01x01] - My First Day.avi']

Also, I do not understand why the two input files (scrubs.s01e01.avi and scrubs - [01x01].avi) were processed in one order in test_forcefully_moving_disabled and the inverse in test_forcefully_moving_default.

@dbr
Copy link
Owner

dbr commented Apr 16, 2020

The tests expect that, in case of failure to move a file (e. g. file already exists at destination without overwrite flag), the original file should have been renamed but not moved

This isn't intend behaviour, just a side-effect of of retrofitting the move feature - I'm definitely happy if that can be fixed!

For the implementation, the builder approach sounds reasonable - sounds similar to the approach I started taking with an experimental version of tvnamer written in Rust,
https://github.com/dbr/rstvnamer/blob/3f832e551a070cf31fda8e27d3c8268d9712232c/src/actions.rs#L28
I never got around to fleshing out the idea - allowing multiple chained actions (e.g "move here, then copy into other locaiton with the final name"), or adding more like "run shell command with new file as argument" - would need to be done carefully to avoid ballooning the scope of tvnamer too much, but could be quite useful

Not sure how familiar you are with Rust, but it is essentially how I intended (or intend) tvnamer's code to be structured - fairly well summarised in this method:
https://github.com/dbr/rstvnamer/blob/3f832e551a070cf31fda8e27d3c8268d9712232c/src/main.rs#L13-L25
Basically "parse info out of the file", then "populate it with more data from TVDB", then "format the data into filenames or paths etc", before doing some action with all of this data

..allowing more complex action chaining is definitely a future thing - but if the Renamer class could be refactored to something vaguely in line with this, that'd be good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants