Project

General

Profile

Actions

Feature #19886

open

crunch-run commits log collection at container start

Added by Peter Amstutz over 1 year ago. Updated over 1 year ago.

Status:
In Progress
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Start date:
01/13/2023
Due date:
% Done:

0%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

The new Resources panel uses node.json from the log collection. This doesn't appear until the 1st commit of the log collection, which only happens at the 1st checkpoint (after 30 minutes of running, I think?) or the end of the run.

crunch-run should make an early commit of the log collection once the system info files (node.json, node-info.txt, etc) have been recorded.


Files

19886-prove-tests-unrun.patch (1.14 KB) 19886-prove-tests-unrun.patch Brett Smith, 01/13/2023 12:04 AM

Subtasks 1 (1 open0 closed)

Task #19903: Review 19886-crunch-run-early-log-commitIn ProgressTom Clegg01/13/2023

Actions
Actions #1

Updated by Peter Amstutz over 1 year ago

  • Status changed from New to In Progress
Actions #2

Updated by Peter Amstutz over 1 year ago

  • Tracker changed from Bug to Feature
  • Category set to Crunch
  • Status changed from In Progress to New
Actions #3

Updated by Peter Amstutz over 1 year ago

  • Subject changed from crunch-run commits logs at container start to crunch-run commits log collection at container start
  • Description updated (diff)
Actions #4

Updated by Peter Amstutz over 1 year ago

  • Description updated (diff)
Actions #5

Updated by Peter Amstutz over 1 year ago

  • Target version changed from 2023-01-18 sprint to 2023-02-01 sprint
Actions #6

Updated by Tom Clegg over 1 year ago

  • Assigned To set to Tom Clegg
  • Story points set to 0.5
Actions #7

Updated by Peter Amstutz over 1 year ago

  • Target version changed from 2023-02-01 sprint to 2023-01-18 sprint
Actions #8

Updated by Peter Amstutz over 1 year ago

  • Assigned To changed from Tom Clegg to Brett Smith
Actions #9

Updated by Brett Smith over 1 year ago

I have a branch up. 19886-crunch-run-early-log-commit @ 40d8256b92b5e2b64f1ea5a65a03e01e1763a6d9 - developer-run-tests: #3439

The tests are failing with:

17:06:19 ----------------------------------------------------------------------
17:06:19 FAIL: crunchrun_test.go:1714: TestSuite.TestStdoutWithMultipleMountPointsUnderOutputDir
17:06:19 
17:06:19 SecretMounts decoded {map[]} json "{\"secret_mounts\":null}" 
17:06:19 crunchrun_test.go:1756:
17:06:19     c.Check(v["ensure_unique_name"], Equals, true)
17:06:19 ... obtained = nil
17:06:19 ... expected bool = true

Unfortunately this has led me down a rabbit hole, because there's a disconnect between the tests and reality. It looks like what the tests intend to test is that an output collection is created with a certain manifest. However, they do this by searching for the output collection in a loop, and then checking matching results. Right now there are no matching results, and the tests never check that, which is why they're quietly passing on main. You can prove this to yourself by applying the attached patch to main and running the tests.

I've done a little digging but it's still not clear to me whether the tests should pass, and there's a problem in crunch-run or the test harnesses; or if these tests are no longer relevant (at least as written), and they just conveniently took themselves out of commission. I don't mind doing a little test updating as part of this branch but I would appreciate a couple pointers on this since it's not directly relevant to this story, and I'm guessing a little history goes a long way to finding the right solution.

Actions #10

Updated by Brett Smith over 1 year ago

  • Status changed from New to In Progress

19886-crunch-run-early-log-commit @ 499848b567ebecadbdff1fcbbdb4683dfbf097da - developer-run-tests: #3441 - Ready for review

Actions #11

Updated by Tom Clegg over 1 year ago

Hm. The documentation says the container log will be null or a UUID until the container is finished. The existing updateLogs loop periodically updates it to a PDH. And this branch updates it to a UUID during the Locked→Running transition.

In RailsAPI, Container#update_cr_logs calls ContainerRequest#update_collections, which assumes the log is a PDH.

I think the easy solution is for the new code to save a PDH instead of UUID, and update the documentation to say "...otherwise a PDH, UUID, or null", and a better solution (unless I'm missing some reason this won't work) is to update RailsAPI and crunch-run to do what the current documentation says.

Other than that, this LGTM, thanks.

Actions #12

Updated by Peter Amstutz over 1 year ago

  • Target version changed from 2023-01-18 sprint to 2023-02-01 sprint
Actions #13

Updated by Brett Smith over 1 year ago

Tom Clegg wrote in #note-11:

I think the easy solution is for the new code to save a PDH instead of UUID, and update the documentation to say "...otherwise a PDH, UUID, or null", and a better solution (unless I'm missing some reason this won't work) is to update RailsAPI and crunch-run to do what the current documentation says.

I'm not sure I understand why the second option is better? ContainerRequest#update_collections ensures that every request has its own log collection. Assuming we maintain that behavior, I'm not sure what the "log is a UUID before container is finished" rule is intended to accomplish for us, in terms of implementation flexibility or what.

Without more background, it seems to me like the best option that minimizes work while maximizing consistency is to update crunch-run to always send a PDH, and update the documentation to say that the log will always be a PDH or null, perhaps with a note that this restriction may be relaxed to allow UUIDs in the future. Is there some reason I'm missing that's not a good idea?

Actions #14

Updated by Tom Clegg over 1 year ago

I'm not sure I understand why the second option is better?

I was thinking it would be better in that it would allow a client to get the log collection UUID and then watch that collection for updates, rather than having to use a special-to-container-logs approach, and would eliminate the "log pdh is now X" API call after saving the collection. Practically speaking neither seems like a huge deal though.

Without more background, it seems to me like the best option that minimizes work while maximizing consistency is to update crunch-run to always send a PDH, and update the documentation to say that the log will always be a PDH or null, perhaps with a note that this restriction may be relaxed to allow UUIDs in the future. Is there some reason I'm missing that's not a good idea?

I agree this is what we should do for now, and maybe we'll revisit the UUID option in the future.

Actions #15

Updated by Brett Smith over 1 year ago

Tom Clegg wrote in #note-14:

I'm not sure I understand why the second option is better?

I was thinking it would be better in that it would allow a client to get the log collection UUID and then watch that collection for updates, rather than having to use a special-to-container-logs approach

Isn't this provided by the log propagation the API server does though? When a container request is updated with a container's new log, it makes a point to load and update its existing collection. Following updates for a specific container request's logs seems like it should just work. I realize you said "container logs" and I'm saying "container request logs" and those aren't the same thing, but practically is there any reason for clients to prefer reading the container logs directly over the container request logs?

I agree this is what we should do for now, and maybe we'll revisit the UUID option in the future.

19886-crunch-run-early-log-commit @ 6477db8a1f5ed0b2f81cf743bbea32c681c7c10c - developer-run-tests: #3449

Actions

Also available in: Atom PDF