Bug #14012
closedarv-put does not check cached signatures when resuming
100%
Description
When resuming an upload, the cached manifest may contain block signatures made with a different user token.
This would generate a PermissionDenied
error when trying to create the collection after finishing the upload.
Updated by Lucas Di Pentima about 6 years ago
- Assigned To set to Lucas Di Pentima
- Target version changed from To Be Groomed to 2018-12-12 Sprint
Updated by Tom Clegg about 6 years ago
Suggestion: Do a HEAD request for the first cached block locator. If that doesn't work, consider the cached blocks invalid.
Updated by Lucas Di Pentima about 6 years ago
- Status changed from New to In Progress
Updated by Lucas Di Pentima about 6 years ago
Updates at 1e02903b90dbaf1f0e9fac222f65e3969b5f0352 - branch 14012-arvput-check-cache
Test run: https://ci.curoverse.com/job/developer-run-tests/995/
Replaced block expiration check on every file with a block signature check by making a HEAD request as suggested by Tom at early stages of the execution so it can fail fast.
When encountering an invalid manifest, provide options to the user: either start a new cache or skip it entirely.
Updated by Peter Amstutz about 6 years ago
loc = self._state['manifest'][m.start():m.end()]
Isn't this the same asm.group(0)
?
- There's a note about "do something different when we have a token refresh API". Well, we have that now (HEAD request and X-Keep-Locator response header), so it seems like the most robust solution would be to go through and refresh every token. This would both check that the all the tokens are valid (not just the first one), and reset the time-to-live timestamps. The main concern is probably performance, refreshing 1000s of tokens up front might take some time.
Updated by Lucas Di Pentima about 6 years ago
As discussed on chat, we should be checking the block signature with the lowest expiration timestamp.
Updated by Peter Amstutz about 6 years ago
(12/04/2018 05:21:39 PM) tom: hm. If we are capable of "re-upload the affected files if signatures are expired" then I think we should keep that capability.
(12/04/2018 05:23:29 PM) tom: Seems like maybe that should continue to work as it did before... maybe the initial check should be "find oldest still-valid signature by iterating and comparing the exp timestamps, do a HEAD req on it, and if it's invalid, warn & clear/don't use the cache"?
(12/04/2018 05:27:58 PM) tom: There's no point checking the already-expired signatures since a) we wouldn't have tried to use them anyway and b) of course they won't validate... I think the only thing we really care about here is the most obvious case where all signatures are invalid despite having exp times in the future -- we want to detect right away that something's wrong.
Updated by Lucas Di Pentima about 6 years ago
Updates at 6fc7bd062
Test run: https://ci.curoverse.com/job/developer-run-tests/1002/
- Before loading the cached manifest for resuming, search for the oldest non-expired block signature and do a HEAD request to check if it's valid.
- Restored the per-file expiration check when deciding if each file should be uploaded.
- Added tests for all cases.
Updated by Peter Amstutz about 6 years ago
- Could you add a test case to get some better coverage on _cache_manifest_valid(), I'm thinking a matrix of test cases with two blocks where the 1st block is expired/older/newer/fails head() and the 2nd block is expired/older/newer/fails head().
- Suggest tweaking error message:
"arv-put: Cache seems to contain invalid data: it may have expired",
" or been created with another arvados user's credentials.",
" Use --no-resume to start a new cache file.",
" Use --no-cache to skip current cache file usage."]))"arv-put: Resume cache contains invalid signature: it may have expired",
" or been created with another Arvados user's credentials.",
" Switch user or use one of the following options to restart upload:"
" --no-resume to start with a new resume cache.",
" --no-cache to disable resume cache."]))
Updated by Lucas Di Pentima about 6 years ago
Updates at f1dfa6a02
Test run: https://ci.curoverse.com/job/developer-run-tests/1008/
- Updated the error message as requested
- Took some time to figure out the correct way of testing
_cached_manifest_valid()
method, as it's part of the entireArvPutUploadJob
initialization process: I created a mock subclass that does just the minimum to be able to call the method and test its return value.
Updated by Lucas Di Pentima about 6 years ago
- Target version changed from 2018-12-12 Sprint to 2018-12-21 Sprint
Updated by Peter Amstutz about 6 years ago
Lucas Di Pentima wrote:
Updates at f1dfa6a02
Test run: https://ci.curoverse.com/job/developer-run-tests/1008/
- Updated the error message as requested
- Took some time to figure out the correct way of testing
_cached_manifest_valid()
method, as it's part of the entireArvPutUploadJob
initialization process: I created a mock subclass that does just the minimum to be able to call the method and test its return value.
This LGTM, thanks.
Updated by Lucas Di Pentima about 6 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|8ff3fd06e165a275f53884d1d20287b68c1b32bd.
Updated by Lucas Di Pentima almost 6 years ago
- Related to Bug #14900: [arv-put] when reusing a local cache, arv-put does not check if the blocks exist and blindly creates the collection added