Bug #6605
closed[SDKs] arv-copy can copy git repositories hosted by arv-git-httpd with no SSH key installed
100%
Description
Functional requirements:
- When reading Git repositories from the source cluster, it should try to read it through all the channels specified in the repository object's
clone_urls
field, in the order they're given. - When pushing Git repositories to the destination cluster, it should try pushing by both SSH and HTTPS, unless it has good reason to believe that one of these mechanisms won't work (see below).
- If arv-copy tries one copying method that fails, and then the second copying method succeeds, the user should only see errors from the first method if debug output is enabled.
Nonfunctional requirements to be addressed:
- Is there anything arv-copy can do to detect whether or not arv-git-httpd is available on the destination cluster? If so, should it do that detection, or simply assume that arv-git-httpd is always available and cope with failure?
- One possibility: get a list of one repository, any repository, and check its clone_urls. Assume that the same push mechanisms are available for all repositories on the cluster (currently a good assumption). If clone_urls aren't available, fall back on whatever push_url advertises.
- Should arv-copy use heuristics to determine which copying method is preferable or more likely to work before trying one or the other? (e.g., prefer arv-git-httpd if the user has no AuthorizedKeys?)
- If both channels seem equally likely (or equally unlikely) to work, which should arv-copy try first?
- How should errors and warnings be handled so they can be shown to the user in failure cases, but suppressed if another copy method succeeds?
Updated by Peter Amstutz about 9 years ago
- Status changed from New to In Progress
Updated by Peter Amstutz about 9 years ago
Suggest the following behavior:
Prefer HTTPS if available by looking at clone_urls
. Rationale: HTTPS is more likely to work than ssh because it only requires the API token and we already know it works. It is harder to reliable determine if SSH is going to work, need to probing the local ssh keys and the authorized_keys table).
Updated by Peter Amstutz about 9 years ago
- Status changed from In Progress to New
Updated by Peter Amstutz about 9 years ago
Partial fix on branch 6605-arv-copy-http (uses http in preference to ssh, but doesn't fall back to trying ssh if http fails).
Updated by Peter Amstutz almost 9 years ago
- Prefer HTTPS if available, try ssh if not available or HTTPS fails.
- Tell git not to ask for password
Updated by Brett Smith almost 9 years ago
- Target version changed from Arvados Future Sprints to 2016-02-17 Sprint
Updated by Brett Smith almost 9 years ago
- Assigned To set to Peter Amstutz
Changes in #6605-7 need to be done before this is reviewed. Other parts have been split into separate stories.
Updated by Brett Smith almost 9 years ago
- Target version changed from 2016-02-17 Sprint to 2016-03-02 sprint
Updated by Peter Amstutz almost 9 years ago
6605: Check to see if each git URL works by using git ls-remote.
Preferentially use https.
Set GIT_ASKPASS=/bin/false to suppress asking for credentials.
Updated by Tom Clegg almost 9 years ago
6605-arv-copy-http 5fb73e1
I'm not keen on editing the user's global git config here. This edit seems pretty safe, but it's still rather surprising as a silent side effect of running arv-copy. Could we easily detect whether the credential helper is already in place, and if not, set it up and say something on stderr? Or perhaps just use "-c credential.blah.helper=blah" whenever invoking git, and avoid editing config at all.
If we stick with editing configs, since we're going to try all the http_urls if >1 are given, how about
- if http_url:
- setup_git_http(http_url[0])
+ for url in http_url:
+ setup_git_http(url)
...or even do it in the existing "for url in" loop, so we only do it for the urls that we actually try using.
If you feel like adding scope (but not if you don't) we could detect whether the repository is https://github.com/foo/bar
and skip all the copying stuff (it looks like currently it'll just crash "not found" in that case).
Updated by Peter Amstutz almost 9 years ago
Tom Clegg wrote:
6605-arv-copy-http 5fb73e1
I'm not keen on editing the user's global git config here. This edit seems pretty safe, but it's still rather surprising as a silent side effect of running arv-copy. Could we easily detect whether the credential helper is already in place, and if not, set it up and say something on stderr? Or perhaps just use "-c credential.blah.helper=blah" whenever invoking git, and avoid editing config at all.
Thanks, "-c" is a much better solution, I didn't realize it was an option. Done.
If you feel like adding scope (but not if you don't) we could detect whether the repository is
https://github.com/foo/bar
and skip all the copying stuff (it looks like currently it'll just crash "not found" in that case).
Unfortunately it's not that easy, for this to work it also needs to do the right thing when setting the repository and script_version fields on the destination side, and that's when I decided the hole was looking a little bit dark and ratty.
Updated by Tom Clegg almost 9 years ago
Peter Amstutz wrote:
If you feel like adding scope (but not if you don't) we could detect whether the repository is
https://github.com/foo/bar
and skip all the copying stuff (it looks like currently it'll just crash "not found" in that case).Unfortunately it's not that easy, for this to work it also needs to do the right thing when setting the repository and script_version fields on the destination side, and that's when I decided the hole was looking a little bit dark and ratty.
Ah. I thought "the right thing" would be nothing / leave them alone... but anyway, fine with punting.
I don't like conflating the existing meaning of ARVADOS_API_HOST_INSECURE
("accept any cert at https://api
") with this new usage ("allow fetching/pushing repos using http://
"). Perhaps an --allow-git-http
flag would be more clear, and possible for a user to discover in some way other than trial & error?
Updated by Peter Amstutz almost 9 years ago
Tom Clegg wrote:
I don't like conflating the existing meaning of
ARVADOS_API_HOST_INSECURE
("accept any cert athttps://api
") with this new usage ("allow fetching/pushing repos usinghttp://
"). Perhaps an--allow-git-http
flag would be more clear, and possible for a user to discover in some way other than trial & error?
Added --allow-git-http-src and --allow-git-http-dst instead of looking at ARVADOS_API_HOST_INSECURE. Now at e71b6d2
Updated by Peter Amstutz almost 9 years ago
- Status changed from New to Resolved
Applied in changeset arvados|commit:d77bf0c67422a259afacc17660698729328a1ed3.