Bug #7181
closed[Crunch] crunch-job installer script should handle egg_info errors caused because Git is not installed
100%
Description
- If git is not installed on a compute node, the "egg_info" part of crunch-job's "install" script fails; fails to detect that the problem was that "git" was missing, and therefore fails to fall back to writing its own setup.cfg; and fails to provide any diagnostic info.
Current code:
open(my $egg_info_pipe, "-|",
"python2.7 \Q$python_dir/setup.py\E --quiet egg_info 2>&1 >/dev/null");
my @egg_info_errors = <$egg_info_pipe>;
close($egg_info_pipe);
if ($?) {
if (@egg_info_errors and ($egg_info_errors[-1] =~ /\bgit\b/)) {
Fix:
- Remove --quiet flag
- Extend the $egg_info_errors condition at the bottom of this block to catch the case where Git is not installed, and go into the following block to write a build tag to setup.cfg. (To figure out what this stderr looks like, take a .tar.gz of the Python SDK, and run
python setup.py egg_info
from it on a host that does not have Git installed.)
Updated by Brett Smith over 9 years ago
I'm always for improving the logging, but the main point of this code is to make sure that the SDK can be installed even if it has been extracted directly from a Git repository. Git should not be necessary to do this. I think this code should also check for this condition (either with can_run("git")
or by introspecting the error message), and running its current "set a build tag" branch when it exists.
Updated by Tom Clegg over 9 years ago
- Git is required. We never wanted git to be required, but it is, because the "if there is no git..." code path is impossible to reach, because
>/dev/null
ensures$egg_info_errors[-1]
is alwaysundef
, which never matches/\bgit\b/
. I expect removing>/dev/null
will make the/\bgit\b/
regexp work, and thereby make it possible to install the SDK even though git is not installed on the compute node. - If the
setup.py
command fails and the "if there is no git..." code path doesn't get invoked, the (presumably) non-git-related error is hard to diagnose. Even when we delete>/dev/null
and stop throwing away the setup.py output completely, the--quiet
flag will remove some information that is likely to be helpful. The--quiet
flag would seem reasonable if it avoided clutter in normal logs, but it doesn't: if setup.py succeeds, the entire output is suppressed, quiet or not. So the question is "when setup.py fails in a way that causes the whole job to fail, do we want to see shorter setup.py logs, or longer ones?" and if the answer is "longer" then we should remove the--quiet
flag.
The can_run("git")
approach sounds even better than correcting the existing introspection -- assuming "can't run git" is the only git error we need to work around this way.
Updated by Brett Smith over 9 years ago
Tom Clegg wrote:
To clarify: The apparent intent of the existing code ("use git if it's installed, but don't require git") is correct.
…
Thecan_run("git")
approach sounds even better than correcting the existing introspection -- assuming "can't run git" is the only git error we need to work around this way.
Sorry, that's not quite what the current code is meant to do.
The Python SDK's setup.py says, "If a build tag is not set, generate one from Git." Even when Git is installed on the compute node (as it is on our clusters), this fails because the Python SDK was package with git archive
, and no .git
directory exists. The intent of this code is, "If setup.py fails because it couldn't generate a build tag from Git, set one ourselves."
Right now it properly handles the case where setup.py fails because .git
is missing. I'm proposing that it be extended to handle the case where setup.py fails because Git is not installed.
The changes to make setup.py less quiet may help with that. It may be easier to introspect the error after they're done.
(Before you ask: right now, I'm not aware of any way that we could get to the start of this code and have a build tag already set. I was trying to be future-proof…)
Updated by Brett Smith over 9 years ago
Also,
- Git is required. We never wanted git to be required, but it is, because the "if there is no git..." code path is impossible to reach, because
>/dev/null
ensures$egg_info_errors[-1]
is alwaysundef
, which never matches/\bgit\b/
.
Sorry, this isn't quite right. The full chain of redirects is 2>&1 >/dev/null
. This means @egg_info_errors
has whatever the process writes to stderr. The match works for the ".git
does not exist" case described above.
Updated by Brett Smith over 9 years ago
- Target version changed from Arvados Future Sprints to 2015-09-30 sprint
Updated by Brett Smith over 9 years ago
- Assigned To changed from Brett Smith to Peter Amstutz
Updated by Brett Smith over 9 years ago
- Subject changed from [Crunch] Fix error reporting for setup.py for arvados_sdk_version to [Crunch] crunch-job installer script should handle egg_info errors caused because Git is not installed
Updated by Peter Amstutz over 9 years ago
python2.7 $python_dir/setup.py egg_info 2>&1 >/dev/null
I had to spend some quality time with the bash docs to understand the shell redirection going on here, so for the record:
2>&1
means "modify file descriptor 2 so that it writes to the same pipe as file description 1" >/dev/null
means "modify file descriptor 1 to write to /dev/null"
So the effect is that writes to FD 2 (stderr) go to the original stdout, and writes to FD 1 (stdout) go to /dev/null
.
These are processed in sequence, so >/dev/null 2>&1
would result in all output going to /dev/null
.
Updated by Peter Amstutz over 9 years ago
7181-crunch-missing-git is ready for review.
Testing procedure:
- Make a deliberately broken "x-git-broken" branch of Arvados which has "xgit" instead of "git" in gittaggers (because selectively breaking git in the install script is tricky because other parts of the install rely on git)
- Make a pipeline with arvados_sdk_version set to "x-git-broken"
- Check that crunch-job is able to install the arvados sdk for both "master" and "x-git-broken" (without the fix: "x-git-broken" will fail)
Updated by Nico César over 9 years ago
I look at the code in fc5005a..5e8edef.
The testing procedure is a good one, but is also incredible painful. My opinion is to merge this.
Updated by Peter Amstutz over 9 years ago
- Status changed from New to Resolved
Applied in changeset arvados|commit:99349abd0ee7347b5bac3d4a9638853c6d4b97ab.