Bug #13609
closed
[R SDK] Copying files using the R SDK results in 0 byte files
Added by Bryan Cosca over 6 years ago.
Updated over 6 years ago.
Estimated time:
(Total: 0.00 h)
Release relationship:
Auto
Description
Using ArvadosR package 0.0.4, This is the script I'm using to reproduce:
library(ArvadosR)
packageVersion("ArvadosR")
arv <- Arvados$new("api-token-redacted", "e51c5.arvadosapi.com", numRetries = 3)
previous_collection <- Collection$new(arv, "e51c5-4zz18-ko243dg1k5c6hid")
message("accessed previous collection")
file_to_be_transferred <- previous_collection$get("foo")
message("created ArvadosFile object to transfer")
newcoll <- arv$createCollection(list(name = "test-copy-foo", description = "foo"))
message("created new collection")
new_collection <- Collection$new(arv, newcoll$uuid)
message("created new collection object")
new_collection$add(file_to_be_transferred)
message("transferred file")
new_collection$uuid
It resulted in this collection: https://workbench.e51c5.arvadosapi.com/collections/e51c5-4zz18-19p0djjs1yt7xq0
$ ls -lah ~/keep-e51c5/e51c5-4zz18-19p0djjs1yt7xq0
total 1.5K
drwxrwxrwx 1 bcosc bcosc 0 Jun 11 19:40 .
dr-xr-xr-x 1 bcosc bcosc 0 May 17 18:25 ..
-rwxrwxrwx 1 bcosc bcosc 0 Jun 11 19:40 foo
- Description updated (diff)
So this seems to be a bug or oversight in the R SDK, it doesn't handle this case at all.
However, I think using the API this way is probably an error. It isn't clear what it should mean to "add" a file that already belongs to another collection. The intention here is was for it to be copied, but it would make just as much sense for it to be moved. WebDAV has MOVE and COPY operations, but they are not implemented for all cases (in particular MOVE between collections isn't supported by WebDAV).
Suggested fix:
- Using Collection$add() with an ArvadosFile object that already belongs to a collection should be an error
- Add a new Collection$copy() method which accepts a source ArvadosFile object and returns a new ArvadosFile representing the file that has been copied into the collection.
In the meantime, the workaround is something like:
library(ArvadosR)
packageVersion("ArvadosR")
arv <- Arvados$new("api-token-redacted", "e51c5.arvadosapi.com", numRetries = 3)
previous_collection <- Collection$new(arv, "e51c5-4zz18-ko243dg1k5c6hid")
message("accessed previous collection")
file_to_be_transferred <- previous_collection$get("foo")
message("created ArvadosFile object to transfer")
newcoll <- arv$createCollection(list(name = "test-copy-foo", description = "foo"))
message("created new collection")
new_collection <- Collection$new(arv, newcoll$uuid)
message("created new collection object")
dest_file <- new_collection$create(file_to_be_transferred$getName())
dest_file$write(file_to_be_transferred$read())
message("transferred file")
new_collection$uuid
- Target version set to Arvados Future Sprints
- Assigned To set to Fuad Muhic
- Target version changed from Arvados Future Sprints to 2018-07-18 Sprint
- Story points set to 0.5
- Subject changed from Copying files using the R SDK results in 0 byte files to [R SDK] Copying files using the R SDK results in 0 byte files
- Target version changed from 2018-07-18 Sprint to 2018-08-01 Sprint
Currently, R SDK doesn't support copy operation at all. Move operation is supported only for moves within same collection. Add operation behaves similarly to create operation. It will always create new empty file(s). So if you grab ArvadosFile from a collection and add it to same collection on difference place, file won't be moved. Instead, new empty file will be created with same name as file we were trying to add (That is probably a mistake, it should be a warning/error or this case should be detected and treated as move). If you want to move file, you need to use move method explicitly.
If we have a composite of ArvadosFiles/Subcollections and it doesn't belong to any collection, we can freely manipulate it (this way we can freely create any tree structure and add it to existing collection, and changes will be applied recursively). However if that composite tree structure belongs to a collection, any change to it will include REST call. This way we ensure that local and remote content are in sync. Also, this interaction is based on assumption that we are always working within same collection. If we want to support any operation between collections in R SDK (add, move or copy) we need WebDAV support for it, and we need to update the way nodes are managed inside collections in-memory tree structure.
Fuad Muhic wrote:
Currently, R SDK doesn't support copy operation at all. Move operation is supported only for moves within same collection. Add operation behaves similarly to create operation. It will always create new empty file(s). So if you grab ArvadosFile from a collection and add it to same collection on difference place, file won't be moved. Instead, new empty file will be created with same name as file we were trying to add (That is probably a mistake, it should be a warning/error or this case should be detected and treated as move). If you want to move file, you need to use move method explicitly.
If we have a composite of ArvadosFiles/Subcollections and it doesn't belong to any collection, we can freely manipulate it (this way we can freely create any tree structure and add it to existing collection, and changes will be applied recursively). However if that composite tree structure belongs to a collection, any change to it will include REST call. This way we ensure that local and remote content are in sync. Also, this interaction is based on assumption that we are always working within same collection. If we want to support any operation between collections in R SDK (add, move or copy) we need WebDAV support for it, and we need to update the way nodes are managed inside collections in-memory tree structure.
I think the confusion is because the R SDK API has two ways to create files:
- collection$create("newfile")
- collection$add(ArvadosFile$new("newfile"))
I assume "add()" was really intended to be a package-internal helper method for constructing the file listing. The problem is that it is a documented public method, but doesn't work (and isn't intended to work) for all the ways people want to use it.
I recommend:
- Remove the add() method from the examples / document that it is internal and users shouldn't use it. Fail if the parent collection of "newfile" isn't null.
- Add Collection$copy() / Subcollection$copy() method:
filecopy = subcollection$copy(source, relativeDestination = "")
Copy the "source" ArvadosFile to "relativeDestination" (a file or subcollection under the subcollection object it is called on) and return a new ArvadosFile for the copy. Should issue a WebDAV COPY request. Tom says COPY is probably supported, but you'll have to confirm that in testing.
- Status changed from New to In Progress
13609-r-sdk-copy @ c9bfa91e3c3ec90778795969fdd3787578a76ed4
test-run: https://ci.curoverse.com/job/developer-run-tests/827/
- implemented copy for for Collection, Subcollection and ArvadosFile (works only for files and folders inside same collection)
- small bug-fix for move in REST
- reworked create method in Collection
- add method in Collection is private now (removed from official documentation)
- updated documentation
WEBDAV move doesn't support moving content between different collection. Result of moving is "Not Found".
- Target version changed from 2018-08-01 Sprint to 2018-08-15 Sprint
- Status changed from In Progress to Resolved
Also available in: Atom
PDF