Actions
Bug #12298
closed[Crunch2] Invalid container output_path causes infinite loop of futile dispatch attempts
Start date:
09/20/2017
Due date:
% Done:
100%
Estimated time:
(Total: 0.00 h)
Story points:
-
Description
Submitting a container request with no mounts and {"output_path":"/out"} results in the container being attempted repeatedly with the same failure:
2017-09-20T20:12:56.920079Z Container 9tee4-dz642-6545kgubg82ssq7 was taken from the queue by a dispatch process 2017-09-20T20:12:59.919298925Z Executing container '9tee4-dz642-6545kgubg82ssq7' 2017-09-20T20:12:59.919400984Z Executing on host 'compute0.9tee4.arvadosapi.com' 2017-09-20T20:12:59.978462723Z Fetching Docker image from collection '9e0a4880d0cde36f8dd691345399a1bf+335' 2017-09-20T20:13:00.072853031Z Using Docker image id 'dada2262dd3bc92f615fea9503116516481ef546c5bcf2014901e686d8049b0b' 2017-09-20T20:13:00.076115858Z Docker image is available 2017-09-20T20:13:00.076335565Z While setting up mounts: Output path does not correspond to a writable mount point 2017-09-20T20:13:00.076354805Z Cancelled 2017-09-20T20:13:18.922196414Z arvados API server error: Log cannot be modified in this state (nil, "4fbfdb5c0f48fe803e8ca641e7477e52+60") (422: 422 Unprocessable Entity) returned by 9tee4.arvadosapi.com 2017-09-20T20:13:19.233763Z Container 9tee4-dz642-6545kgubg82ssq7 was returned to the queue ...
Container record:
{
"uuid": "9tee4-dz642-6545kgubg82ssq7",
"owner_uuid": "9tee4-tpzed-000000000000000",
"created_at": "2017-09-20 20:02:51 UTC",
"modified_at": "2017-09-20 20:15:19 UTC",
"modified_by_client_uuid": "9tee4-ozdt8-wt0x6s6j9yhycfh",
"modified_by_user_uuid": "9tee4-tpzed-000000000000000",
"state": "Queued",
"started_at": null,
"finished_at": null,
"log": null,
"environment": {
},
"cwd": ".",
"command": [
"foobar"
],
"output_path": "/out",
"mounts": {
},
"runtime_constraints": {
"keep_cache_ram": 268435456,
"ram": 1000000,
"vcpus": 1
},
"output": null,
"container_image": "9e0a4880d0cde36f8dd691345399a1bf+335",
"progress": null,
"priority": 1,
"updated_at": null,
"exit_code": null,
"auth_uuid": null,
"locked_by_uuid": null,
"scheduling_parameters": {
}
}
Proposed fix¶
Ideally, add a container request validation so this mistake prevents the non-runnable container from being created in the first place.
Either way, fix the "Log cannot be modified in this state (nil, "7951799e5e3e3c02fee1567e718044e7+60")" error so crunch-run can cancel the container instead of retrying it ad nauseum.
Perhaps in source:services/api/app/models/container.rb
@@ -389,7 +389,7 @@ class Container < ArvadosModel
when Running
permitted.push :finished_at, :output, :log
when Queued, Locked
- permitted.push :finished_at
+ permitted.push :finished_at, :log
end
else
Actions