Project

General

Profile

Actions

Bug #12933

closed

[crunch2] add equivalent of cloud_node line

Added by Ward Vandewege almost 7 years ago. Updated almost 7 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Start date:
01/11/2018
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

In crunchv1, a line is logged in every job that looks like this:

{"cloud_node":{"size":"Standard_D3_v2","price":0.229},"total_cpu_cores":4,"total_scratch_mb":204695,"total_ram_mb":14023}

We need the equivalent in crunchv2. The node-info output is useful, but the string above has several other things going for it:

a) it has cloud node information (node type, price, actual ram)
b) it is machine parsable

The format doesn't need to be exactly like the above ("size" seems a misnomer for "instance_type", e.g.), but it needs to have all relevant cloud node information and it should be machine parsable.


Subtasks 1 (0 open1 closed)

Task #12946: Review 12933-log-node-propertiesResolvedTom Clegg01/11/2018

Actions

Related issues 2 (1 open1 closed)

Related to Arvados - Feature #12746: [crunch2] Add I/O (and other?) stats to crunch-runResolvedTom Clegg01/26/2018

Actions
Related to Arvados - Bug #12465: [crunchv2] Improve crunch-run environment reportingNew

Actions
Actions #1

Updated by Ward Vandewege almost 7 years ago

  • Related to Feature #12746: [crunch2] Add I/O (and other?) stats to crunch-run added
Actions #2

Updated by Tom Clegg almost 7 years ago

This is how we do it in crunch1, in source:sdk/cli/bin/crunch-job (where @node is a list of node names obtained from sinfo [...] --nodes=$SLURM_NODELIST)

my $resp = api_call(
  'nodes/list',
  'filters' => [['hostname', 'in', \@node]],
  'order' => 'hostname',
  'limit' => scalar(@node),
    );
for my $n (@{$resp->{items}}) {
  Log(undef, "$n->{hostname} $n->{uuid} ".JSON::encode_json($n->{properties}));
}
Actions #3

Updated by Tom Clegg almost 7 years ago

In crunch2 we can add this to the node-info logger: If $SLURMD_NODENAME is not empty, call /arvados/v1/nodes?filters=[[hostname,=,$nodename]] and print the uuid and properties hash of the returned item (if any).

Actions #4

Updated by Ward Vandewege almost 7 years ago

Since a container is like a unix process, i.e. it runs exactly once, it would sure be nice if this information was captured in the container object. Then we don't even need to add it to the logs.

Actions #5

Updated by Tom Clegg almost 7 years ago

  • Status changed from New to In Progress
  • Assigned To set to Tom Clegg

12933-log-node-properties @ e128fc5885c553c9e9b55f2529d0ea6937e5a6b7

Actions #6

Updated by Peter Amstutz almost 7 years ago

  • Related to Bug #12465: [crunchv2] Improve crunch-run environment reporting added
Actions #7

Updated by Tom Clegg almost 7 years ago

12933-log-node-properties @ 5469772c43759b8bde77c3d78450658e266b9cf0

This version saves a node.json file in the log (analogous to container.json), with the admin-only "info" field removed. This should be easy to json.Unmarshal into the new Node type in the Go SDK.

Actions #8

Updated by Tom Clegg almost 7 years ago

  • Target version changed from To Be Groomed to 2018-01-17 Sprint
Actions #9

Updated by Lucas Di Pentima almost 7 years ago

  • File services/crunch-run/crunchrun.go
    • Line 749: The comment seems to need an update
    • Lines 741 & 750: I think the params argument should be passed to the CallRaw call, right?
Actions #10

Updated by Tom Clegg almost 7 years ago

Fixed both issues.

12933-log-node-properties @ 813f5f4aad5da71c4fcfe6639c9010e1056acf1f

Actions #11

Updated by Lucas Di Pentima almost 7 years ago

LGTM, thanks!

Actions #12

Updated by Ward Vandewege almost 7 years ago

This works but only when the workflow is run without --local.

I ran a test workflow with

cwl-runner --local download.cwl --bashScript download.sh --urlFile 2.txt

which resulted in

$    cwl-runner --local download.cwl --bashScript download.sh --urlFile 2.txt 
2018-01-13 16:18:58 cwltool INFO: /usr/bin/cwl-runner 1.0.20171211211613 1.0.20171211211613, arvados-python-client 0.1.20171211211613, cwltool 1.0.20170928192020
2018-01-13 16:18:58 cwltool INFO: Resolved 'download.cwl' to 'file:///data-nvme1n1/home/wvandewege/downloader/download.cwl'
2018-01-13 16:18:59 arvados.arv-run INFO: Upload local files: "download.sh" "2.txt" 
2018-01-13 16:18:59 arvados.arv-run INFO: Uploaded to 5e1ebe288e1daccf5744d1849610d292+71 (dhhck-4zz18-0aogl30ddkrk3yk)
2018-01-13 16:18:59 cwltool INFO: [workflow download.cwl] start
2018-01-13 16:18:59 cwltool INFO: [step readUrlList] start
2018-01-13 16:18:59 cwltool INFO: [step readUrlList] completed success
2018-01-13 16:18:59 cwltool INFO: [step downloadUrl] start
2018-01-13 16:18:59 arvados.cwl-runner INFO: [container downloadUrl] dhhck-xvhdp-9299d8y8q16fbfu state is Committed
2018-01-13 16:18:59 cwltool INFO: [step downloadUrl] start
2018-01-13 16:19:00 arvados.cwl-runner INFO: [container downloadUrl_2] dhhck-xvhdp-a2zmgofa2d818dh state is Committed
2018-01-13 16:19:29 arvados.cwl-runner INFO: [container downloadUrl] dhhck-xvhdp-9299d8y8q16fbfu is Final
2018-01-13 16:19:44 arvados.cwl-runner INFO: [container downloadUrl_2] dhhck-xvhdp-a2zmgofa2d818dh is Final
2018-01-13 16:19:44 cwltool INFO: [step downloadUrl] completed success
2018-01-13 16:19:44 cwltool INFO: [workflow download.cwl] completed success
2018-01-13 16:19:44 arvados.cwl-runner INFO: Overall process status is success
2018-01-13 16:19:44 arvados.cwl-runner INFO: Final output collection 660188f814755c40f7a719f2b94d6f19+59 "Output of download.cwl (2018-01-13T16:19:44.827Z)" (dhhck-4zz18-j72zsymb8qxte9u)
{
    "out1": null
}
2018-01-13 16:19:44 cwltool INFO: Final process status is success

The log collections for the containers do not have the node.json file.

When I ran it without --local, like so:

cwl-runner download.cwl --bashScript download.sh --urlFile 2.txt
2018-01-13 16:26:33 cwltool INFO: /usr/bin/cwl-runner 1.0.20171211211613 1.0.20171211211613, arvados-python-client 0.1.20171211211613, cwltool 1.0.20170928192020
2018-01-13 16:26:33 cwltool INFO: Resolved 'download.cwl' to 'file:///data-nvme1n1/home/wvandewege/downloader/download.cwl'
2018-01-13 16:26:34 arvados.arv-run INFO: Upload local files: "download.sh" "2.txt" 
2018-01-13 16:26:34 arvados.arv-run INFO: Uploaded to 5e1ebe288e1daccf5744d1849610d292+71 (dhhck-4zz18-hhc7pht771wfxc0)
2018-01-13 16:26:34 arvados.cwl-runner INFO: [container download.cwl] submitted container dhhck-xvhdp-9gzxm1rcb9s4kne
2018-01-13 16:27:49 arvados.cwl-runner INFO: [container download.cwl] dhhck-xvhdp-9gzxm1rcb9s4kne is Final
2018-01-13 16:27:49 arvados.cwl-runner INFO: Overall process status is success
2018-01-13 16:27:49 arvados.cwl-runner INFO: Final output collection 660188f814755c40f7a719f2b94d6f19+59
{
    "out1": null
}
2018-01-13 16:27:49 cwltool INFO: Final process status is success

The resulting log collections do have the node.json file.

Why is this?

Actions #13

Updated by Tom Clegg almost 7 years ago

Probably because dhhck-xvhdp-9299d8y8q16fbfu ran on compute3, which had

2018-01-13T16:19:10.117547746Z crunch-run 0.1.20171212165144.296aa66 started

while dhhck-xvhdp-9gzxm1rcb9s4kne ran on compute1, which had

2018-01-13T16:26:36.644719254Z crunch-run 0.1.20180111190404.453f922 started
Actions #14

Updated by Tom Clegg almost 7 years ago

  • Status changed from In Progress to Resolved
Actions

Also available in: Atom PDF