Project

General

Profile

Actions

Feature #13126

open

[keep] Investigate using signed URLs to delegate access to cloud buckets

Added by Peter Amstutz almost 7 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Story points:
-
Release:
Release relationship:
Auto

Description

Currently keepstore is the gateway to the backend object store. All data has to flow through the keepstores. This is a bottleneck which is usually addressed by ops using more expensive keepstore nodes (to get more bandwidth) or adding keepstore nodes.

Some object storage systems such as S3 have the concept of "signed URLs". This is similar to Arvados signing tokens, a secret which gives time-limited access to read a specific object.

Investigate the performance/scaling behavior of the following alternate flow:

  1. client requests a block from keepstore
  2. keepstore receives and validates the request as normal
  3. keepstore requests a signed URL from backend object store for the block
  4. keepstore returns 302 Redirect to signed url to client
  5. client receives redirect and makes a new request to fetch the block content from the signed URL
  6. client checks block md5sum and proceeds as normal, or tries another keepstore if there is an error

The benefit of this approach is that the data transfer load is moved off keepstore and nodes compute communicate directly with the object store. This should scale better. However, there is also a potential latency penalty in adding the extra "request signed URL and redirect" operation.

On AWS, signed URLs can also be used for PUT operations. AWS permits signed URLs that assert that only data that hashes to a specific MD5 will be accepted. However, keepstore needs to verify the block and return an Arvados signing token, it is not clear how that would work with S3 signed URLs.

Reference:

https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/s3-example-presigned-urls.html

Actions #1

Updated by Peter Amstutz almost 7 years ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz almost 7 years ago

  • Subject changed from Investigate using signed URLs to delegate access to cloud buckets to [keep] Investigate using signed URLs to delegate access to cloud buckets
  • Description updated (diff)
  • Status changed from In Progress to New
Actions #3

Updated by Peter Amstutz almost 7 years ago

  • Description updated (diff)
Actions #4

Updated by Tom Morris almost 7 years ago

Rather than starting with an answer, I'd like to see us start with a question or problem statement. I'm my mind the goal is to remove all bottlenecks in accessing the storage layer. All cloud vendors provide highly scalable storage fabrics with reliable transport, integrity checksums, and permission mechanisms. To the extent that we can, we should be leverage those capabilities rather than duplicating them.

Actions #5

Updated by Peter Amstutz over 3 years ago

  • Target version deleted (To Be Groomed)
Actions #6

Updated by Lucas Di Pentima almost 2 years ago

  • Release set to 60
Actions

Also available in: Atom PDF