Project

General

Profile

Actions

Bug #5298

closed

[SDKs] Should CollectionReader.all_streams() iterate lines in the manifest, or "logical" streams?

Added by Brett Smith almost 10 years ago. Updated about 5 years ago.

Status:
Closed
Priority:
Normal
Assigned To:
-
Category:
SDKs
Target version:
-
Start date:
02/13/2015
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

A user just encountered an issue where their Crunch script had surprising behavior because an input manifest defined multiple files in the same stream with one file per line, like this:

. [locator] 0:3:foo
. [locator] 0:6:bar
…

Currently, CollectionReader.all_streams() iterates lines in the manifest. Using this method in a for loop, the user expected to find all of the files listed above in a single iteration. However, that's not the behavior all_streams() actually presented.

I believe our general expectation is that the SDK handles all the work of presenting manifests logically, so I think the method should be changed to iterate over "logical" streams rather than physical lines in the manifest. As long as the final list of files is correct, I believe this would be backward compatible: since writing the manifest on one or multiple lines is functionally indistinguishable, presenting it as such to the SDK client should be indistinguishable, too.


Subtasks 1 (0 open1 closed)

Task #5449: Update examples to use new Python Collection SDK and add deprecation notes to old APIsResolved02/13/2015

Actions

Related issues 1 (0 open1 closed)

Related to Arvados - Story #3706: [SDKs] Remove fallback-to-keep warning from python SDK if block hash has a permission signatureResolvedTom Clegg07/31/2014

Actions
Actions #1

Updated by Brett Smith almost 10 years ago

  • Description updated (diff)
Actions #2

Updated by Tom Clegg almost 10 years ago

It's intended to preserve the streams as given in the manifest. It certainly makes sense that that's not the desired behavior in many cases, but this API is deprecated -- best if the code in question can be updated to either
  1. use the new Collection API, or
  2. normalize() the collection before calling all_streams().
Actions #3

Updated by Peter Amstutz almost 10 years ago

Will update documentation with the existing behavior and note that the method is deprecated (#5449)

Actions #4

Updated by Peter Amstutz almost 10 years ago

  • Target version changed from Bug Triage to 2015-04-01 sprint
Actions #5

Updated by Tom Clegg almost 10 years ago

  • Status changed from New to Feedback
Actions #6

Updated by Peter Amstutz almost 10 years ago

  • Assigned To set to Peter Amstutz
Actions #7

Updated by Peter Amstutz almost 10 years ago

  • Target version changed from 2015-04-01 sprint to 2015-04-29 sprint
Actions #8

Updated by Peter Amstutz almost 10 years ago

  • Story points set to 0.5
Actions #9

Updated by Tom Clegg almost 10 years ago

  • Target version deleted (2015-04-29 sprint)
Actions #10

Updated by Peter Amstutz over 8 years ago

  • Assigned To deleted (Peter Amstutz)
Actions #11

Updated by Peter Amstutz about 5 years ago

  • Status changed from Feedback to Closed
Actions

Also available in: Atom PDF