Project

General

Profile

Actions

Bug #8499

closed

[SDKs/Crunch] Providing 'arvados_sdk_version' in pipeline templates breaks reproducibility

Added by Abram Connelly about 10 years ago. Updated about 7 years ago.

Status:
Rejected
Priority:
Normal
Assigned To:
-
Category:
-
Target version:
-
Story points:
-

Description

Original bug report

As it's been explained to me by Ward, specifying an 'arvados_sdk_version' in the pipeline template creates a situation where python dependencies are being installed at run-time. In the case of pipeline su92l-d1hrv-p15qumg5thckcgy, this fails to run (because the newer versions of the Arvados SDK libraries installed are incompatible with the older Arvados SDK version?). As I understand it, this also means previously successfully run pipeline instances will fail if re-run if they have the same dependency issue, making pipelines with a pinned Arvados SDK version not reproducible in general.

From what I understand, should the 'arvados_sdk_version' not be specified, then no run-time dependencies will be specified, the SDK will be used in whatever Docker image is provided and reproducibility for the job would be maintained.

Proposed fix

Make the arvados_sdk_version install process more reproducible by installing a known and tested set of dependencies.

  • Add the usual Python project requirements.txt to the Python SDK. It would list every requirement of the Python SDK, and its specific version. The workflow around it would be much the same as for Gemfile.lock: periodically developers would need to install and test new versions of these dependencies, and add those versions to requirements.txt if they passed testing.
  • Update run-tests.sh to pip install -r requirements.txt before running the Python SDK tests.
  • Update the arvados_sdk_version install process in crunch-job: after extracting, but before installing, the Python SDK, check if requirements.txt exists, and run pip install -r requirements.txt when it does.

This still does not provide perfect reproducibility, since PyPI uploads can be changed in extreme circumstances. But those problems would rarely affect people except those working on the bleeding edge of the Python SDK. The proposal here is a major improvement for relatively little code (although it does add some process to working on the Python SDK generally).

Actions #1

Updated by Brett Smith about 10 years ago

  • Subject changed from Providing 'arvados_sdk_version' in pipeline templates breaks reproducibility to [SDKs/Crunch] Providing 'arvados_sdk_version' in pipeline templates breaks reproducibility
  • Description updated (diff)
Actions #2

Updated by Brett Smith about 10 years ago

  • Description updated (diff)
Actions #3

Updated by Tom Morris about 7 years ago

  • Status changed from New to Rejected

Pipeline templates are no longer a thing.

Actions

Also available in: Atom PDF