I don't think we can rely on arv-put in createMultiStreamBlockCollection. It will only write one block per stream unless we write more than 64 MiB of data; we'd have to write ~60 GiB in order to comply with the story spec. We're only interested in the number of blocks and streams, not bytes and files.
How about something like
var locs []string
for s := 0; s < 100; s++ {
manifest += fmt.Sprintf("./stream%d ", s)
for b := 0; b < 10; b++ {
loc, _, err := kc.PutB([]byte(fmt.Sprintf("s %d b %d", s, b)))
locs = append(locs, strings.Split(loc, "+A")[0])
manifest += loc + " "
}
manifest += "0:1:dummyfile.txt\n"
}
coll := make(map[string]string)
arvadosclient.Create("collections", nil, map[string]string{"manifest_text":manifest}, &coll)
return coll["uuid"], locs
The difference betwen TestPutAndGetCollectionsWithMultipleStreamsAndBlocks and TestPutAndGetCollectionsWithMultipleBlocks seems to be that one has multiple streams and the other has just one; both have multiple files, but only a single block per stream. There are also some unnecessary complications that seem to negate the value of the test: if the first block appearing in the manifest is the only one that's backdated, for all we know only the rest of the blocks' age, not their presence in the manifest, is keeping them alive.
I think we only need one test here:
- Create one collection with 100 streams x 10 blocks each = 1000 different block IDs (but the total data should be small)
- Ensure list of locs returned by collection-creator func has 1000 different entries (sort and then test all adjacent pairs?)
- Write one additional "stray" data block (unreferenced by any collection)
- Backdate all of the blocks
- Run garbage collection
- Ensure the "stray" block is missing (i.e., we didn't just skip GC entirely for some reason), and all others ("locs") are still present.