Project

General

Profile

Actions

Bug #14012

closed

arv-put does not check cached signatures when resuming

Added by Lucas Di Pentima over 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
12/03/2018
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

When resuming an upload, the cached manifest may contain block signatures made with a different user token.
This would generate a PermissionDenied error when trying to create the collection after finishing the upload.


Subtasks 1 (0 open1 closed)

Task #14542: Review 14012-arvput-check-cacheResolvedPeter Amstutz12/03/2018

Actions

Related issues 1 (0 open1 closed)

Related to Arvados - Bug #14900: [arv-put] when reusing a local cache, arv-put does not check if the blocks exist and blindly creates the collectionClosedLucas Di Pentima03/27/2019

Actions
Actions #1

Updated by Lucas Di Pentima about 6 years ago

  • Assigned To set to Lucas Di Pentima
  • Target version changed from To Be Groomed to 2018-12-12 Sprint
Actions #2

Updated by Tom Clegg about 6 years ago

Suggestion: Do a HEAD request for the first cached block locator. If that doesn't work, consider the cached blocks invalid.

Actions #3

Updated by Lucas Di Pentima about 6 years ago

  • Status changed from New to In Progress
Actions #4

Updated by Lucas Di Pentima about 6 years ago

Updates at 1e02903b90dbaf1f0e9fac222f65e3969b5f0352 - branch 14012-arvput-check-cache
Test run: https://ci.curoverse.com/job/developer-run-tests/995/

Replaced block expiration check on every file with a block signature check by making a HEAD request as suggested by Tom at early stages of the execution so it can fail fast.
When encountering an invalid manifest, provide options to the user: either start a new cache or skip it entirely.

Actions #5

Updated by Peter Amstutz about 6 years ago

  • loc = self._state['manifest'][m.start():m.end()]
    

    Isn't this the same as m.group(0) ?
  • There's a note about "do something different when we have a token refresh API". Well, we have that now (HEAD request and X-Keep-Locator response header), so it seems like the most robust solution would be to go through and refresh every token. This would both check that the all the tokens are valid (not just the first one), and reset the time-to-live timestamps. The main concern is probably performance, refreshing 1000s of tokens up front might take some time.
Actions #6

Updated by Lucas Di Pentima about 6 years ago

As discussed on chat, we should be checking the block signature with the lowest expiration timestamp.

Actions #7

Updated by Peter Amstutz about 6 years ago

(12/04/2018 05:21:39 PM) tom: hm. If we are capable of "re-upload the affected files if signatures are expired" then I think we should keep that capability.
(12/04/2018 05:23:29 PM) tom: Seems like maybe that should continue to work as it did before... maybe the initial check should be "find oldest still-valid signature by iterating and comparing the exp timestamps, do a HEAD req on it, and if it's invalid, warn & clear/don't use the cache"?
(12/04/2018 05:27:58 PM) tom: There's no point checking the already-expired signatures since a) we wouldn't have tried to use them anyway and b) of course they won't validate... I think the only thing we really care about here is the most obvious case where all signatures are invalid despite having exp times in the future -- we want to detect right away that something's wrong.

Actions #8

Updated by Lucas Di Pentima about 6 years ago

Updates at 6fc7bd062
Test run: https://ci.curoverse.com/job/developer-run-tests/1002/

  • Before loading the cached manifest for resuming, search for the oldest non-expired block signature and do a HEAD request to check if it's valid.
  • Restored the per-file expiration check when deciding if each file should be uploaded.
  • Added tests for all cases.
Actions #9

Updated by Peter Amstutz about 6 years ago

  • Could you add a test case to get some better coverage on _cache_manifest_valid(), I'm thinking a matrix of test cases with two blocks where the 1st block is expired/older/newer/fails head() and the 2nd block is expired/older/newer/fails head().
  • Suggest tweaking error message:

    "arv-put: Cache seems to contain invalid data: it may have expired",
    " or been created with another arvados user's credentials.",
    " Use --no-resume to start a new cache file.",
    " Use --no-cache to skip current cache file usage."]))

    "arv-put: Resume cache contains invalid signature: it may have expired",
    " or been created with another Arvados user's credentials.",
    " Switch user or use one of the following options to restart upload:"
    " --no-resume to start with a new resume cache.",
    " --no-cache to disable resume cache."]))

Actions #10

Updated by Lucas Di Pentima about 6 years ago

Updates at f1dfa6a02
Test run: https://ci.curoverse.com/job/developer-run-tests/1008/

  • Updated the error message as requested
  • Took some time to figure out the correct way of testing _cached_manifest_valid() method, as it's part of the entire ArvPutUploadJob initialization process: I created a mock subclass that does just the minimum to be able to call the method and test its return value.
Actions #11

Updated by Lucas Di Pentima about 6 years ago

  • Target version changed from 2018-12-12 Sprint to 2018-12-21 Sprint
Actions #12

Updated by Peter Amstutz about 6 years ago

Lucas Di Pentima wrote:

Updates at f1dfa6a02
Test run: https://ci.curoverse.com/job/developer-run-tests/1008/

  • Updated the error message as requested
  • Took some time to figure out the correct way of testing _cached_manifest_valid() method, as it's part of the entire ArvPutUploadJob initialization process: I created a mock subclass that does just the minimum to be able to call the method and test its return value.

This LGTM, thanks.

Actions #13

Updated by Lucas Di Pentima about 6 years ago

  • Status changed from In Progress to Resolved
Actions #14

Updated by Tom Morris almost 6 years ago

  • Release set to 15
Actions #15

Updated by Lucas Di Pentima almost 6 years ago

  • Related to Bug #14900: [arv-put] when reusing a local cache, arv-put does not check if the blocks exist and blindly creates the collection added
Actions

Also available in: Atom PDF