-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
missing large file in remote storage after pushing #10448
Comments
It was primarily maintained by @karajan1001 . I would appreciate his input here. As a workaround, could you try a S3 compatible interface - https://www.alibabacloud.com/help/en/oss/developer-reference/use-amazon-s3-sdks-to-access-oss ?
Hmm, I don't see any details in the logs. Do you see any md5s / hashes for the files that are missing remotely? Is is the full log shared? Could you try delete /var/tmp/dvc/repo/1dec9b5bdab7926326d2cb372ee9b553 and run the command again in a verbose mode? |
I create an empty workspace with a large.bin and a small.txt , and delete all cache in ``/car/tmp/dvc/repo` Only the small file can be found in remote P.S. the -- (dvcenv) admins@test-Ai-largemodel:/mnt/datadisk1/laien/ws-dvc$ dvc push -r oss-s3 -vvv
2024-06-06 10:16:37,283 DEBUG: v3.51.2 (pip), CPython 3.10.14 on Linux-4.15.0-213-generic-x86_64-with-glibc2.27
2024-06-06 10:16:37,283 DEBUG: command: /home/admins/miniconda3/envs/dvcenv/bin/dvc push -r oss-s3 -vvv
2024-06-06 10:16:37,283 TRACE: Namespace(quiet=0, verbose=3, cprofile=False, cprofile_dump=None, yappi=False, yappi_separate_threads=False, viztracer=False, viztracer_depth=None, viztracer_async=False, pdb=False, instrument=False, instrument_open=False, show_stack=False, cd='.', cmd='push', jobs=None, targets=[], remote='oss-s3', all_branches=False, all_tags=False, all_commits=False, with_deps=False, recursive=False, run_cache=True, glob=False, func=<class 'dvc.commands.data_sync.CmdDataPush'>, parser=DvcParser(prog='dvc', usage=None, description='Data Version Control', formatter_class=<class 'dvc.cli.formatter.RawTextHelpFormatter'>, conflict_handler='error', add_help=False))
2024-06-06 10:16:37,519 TRACE: 12.48 ms in collecting stages from /mnt/datadisk1/laien/ws-dvc
Collecting |0.00 [00:00, ?entry/s]
2024-06-06 10:16:37,542 DEBUG: Preparing to transfer data from '/mnt/datadisk1/laien/ws-dvc/.dvc/cache/files/md5' to 's3://[remote-path]/dvc/files/md5'
2024-06-06 10:16:37,542 DEBUG: Preparing to collect status from '[remote-path]/dvc/files/md5'
2024-06-06 10:16:37,542 DEBUG: Collecting status from '[remote-path]/dvc/files/md5'
Pushing '[remote-path]/dvc/files/md5'| |0/? [00:00<?, ?files/s]
Pushing
2024-06-06 10:16:37,896 ERROR: unexpected error - The specified key does not exist.: An error occurred (NoSuchKey) when calling the ListObjectsV2 operation: The specified key does not exist.
Traceback (most recent call last):
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/s3fs/core.py", line 723, in _lsdir
async for c in self._iterdir(
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/s3fs/core.py", line 773, in _iterdir
async for i in it:
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/aiobotocore/paginate.py", line 30, in __anext__
response = await self._make_request(current_kwargs)
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/aiobotocore/client.py", line 412, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.NoSuchKey: An error occurred (NoSuchKey) when calling the ListObjectsV2 operation: The specified key does not exist.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/dvc/cli/__init__.py", line 211, in main
ret = cmd.do_run()
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/dvc/cli/command.py", line 27, in do_run
return self.run()
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/dvc/commands/data_sync.py", line 64, in run
processed_files_count = self.repo.push(
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/dvc/repo/__init__.py", line 58, in wrapper
return f(repo, *args, **kwargs)
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/dvc/repo/push.py", line 147, in push
push_transferred, push_failed = ipush(
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/dvc_data/index/push.py", line 76, in push
result = transfer(
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/dvc_data/hashfile/transfer.py", line 203, in transfer
status = compare_status(
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/dvc_data/hashfile/status.py", line 179, in compare_status
dest_exists, dest_missing = status(
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/dvc_data/hashfile/status.py", line 151, in status
exists.update(odb.oids_exist(hashes, jobs=jobs, progress=pbar.callback))
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/dvc_objects/db.py", line 423, in oids_exist
remote_size, remote_oids = self._estimate_remote_size(
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/dvc_objects/db.py", line 305, in _estimate_remote_size
remote_oids = set(iter_with_pbar(oids))
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/dvc_objects/db.py", line 295, in iter_with_pbar
for oid in oids:
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/dvc_objects/db.py", line 262, in _oids_with_limit
for i, oid in enumerate(self._list_oids(prefixes=prefixes, jobs=jobs), start=1):
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/dvc_objects/db.py", line 250, in _list_oids
for path in self._list_prefixes(prefixes=prefixes, jobs=jobs):
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/dvc_objects/db.py", line 225, in _list_prefixes
yield from self.fs.find(paths, batch_size=jobs, prefix=prefix)
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/dvc_objects/fs/base.py", line 816, in find
yield from self.fs.find(path, prefix=prefix_str)
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/fsspec/asyn.py", line 118, in wrapper
return sync(self.loop, func, *args, **kwargs)
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/fsspec/asyn.py", line 103, in sync
raise return_result
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/fsspec/asyn.py", line 56, in _runner
result[0] = await coro
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/s3fs/core.py", line 848, in _find
out = await self._lsdir(path, delimiter="", prefix=prefix, **kwargs)
File "/home/admins/miniconda3/envs/dvcenv/lib/python3.10/site-packages/s3fs/core.py", line 736, in _lsdir
raise translate_boto_error(e)
FileNotFoundError: The specified key does not exist.
2024-06-06 10:16:37,943 DEBUG: link type reflink is not available ([Errno 95] no more link types left to try out)
2024-06-06 10:16:37,943 DEBUG: Removing '/mnt/datadisk1/laien/.nJecAj_lCq6vSRQavUKoxw.tmp'
2024-06-06 10:16:37,943 DEBUG: Removing '/mnt/datadisk1/laien/.nJecAj_lCq6vSRQavUKoxw.tmp'
2024-06-06 10:16:37,943 DEBUG: Removing '/mnt/datadisk1/laien/.nJecAj_lCq6vSRQavUKoxw.tmp'
2024-06-06 10:16:37,943 DEBUG: Removing '/mnt/datadisk1/laien/ws-dvc/.dvc/cache/files/md5/.YY-I1sW7eTcDot5sfhN07Q.tmp'
2024-06-06 10:16:37,949 DEBUG: Version info for developers:
DVC version: 3.51.2 (pip)
-------------------------
Platform: Python 3.10.14 on Linux-4.15.0-213-generic-x86_64-with-glibc2.27
Subprojects:
dvc_data = 3.15.1
dvc_objects = 5.1.0
dvc_render = 1.0.2
dvc_task = 0.4.0
scmrepo = 3.3.5
Supports:
http (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
https (aiohttp = 3.9.5, aiohttp-retry = 2.8.3),
oss (ossfs = 2023.12.0),
s3 (s3fs = 2024.5.0, boto3 = 1.34.106)
Config:
Global: /home/admins/.config/dvc
System: /etc/xdg/dvc
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme1n1
Caches: local
Remotes: oss, s3
Workspace directory: ext4 on /dev/nvme1n1
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/4f9f0c30c341088cc84e9b8b312f7113
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
2024-06-06 10:16:37,952 DEBUG: Analytics is enabled.
2024-06-06 10:16:37,952 TRACE: Saving analytics report to /tmp/tmphllltv1h
2024-06-06 10:16:37,993 DEBUG: Trying to spawn ['daemon', 'analytics', '/tmp/tmphllltv1h', '-vv']
2024-06-06 10:16:38,001 DEBUG: Spawned ['daemon', 'analytics', '/tmp/tmphllltv1h', '-vv'] with pid 115727
2024-06-06 10:16:38,002 TRACE: Process 115714 exiting with 255 |
I also have this problem, it seems that large files are uploaded using sharding when transferring OSS, but it is directly ended without waiting for the return, resulting in the situation of large files that have been unable to pass up, I hope to solve it as soon as possible |
Could be related to fsspec/ossfs#129. Please file a bug upstream. |
Collecting |2.00 [00:00, 250entry/s] |
I think this is a bug in the dvc-oss plugin |
Has this problem been resolved? |
Bug Report
push:large files are missing in remote storage
Description
after
dvc push
, large files (single file>20GB, ) are missing in the remote storge(AliyunOSS), while small files' md5 are successfully pushed and can be found in the oss pathReproduce
Expected
I can found
large-chkpoint.pt
md5 via oss dashboardEnvironment information
Output of
dvc doctor
:Additional Information (if any):
output of pushing log
Does dvc-oss no longer maintain?
The text was updated successfully, but these errors were encountered: