Project

General

Profile

Story #5554

Updated by Tom Clegg almost 10 years ago

After a job completes, the full job log is written to Keep, so it isn't necessary to leave the (possibly incomplete) copy in the Postgres database as well. 

 In order to do this on the fly without adversely affecting users, we would need to ensure the Workbench log viewer retrieves the log messages from Keep when possible -- including the case of the Log tab of pipeline_instances#show when some of the pipeline's jobs are still running and some have had their logs written to Keep and deleted from the API server's database. 

 Alternatively, we could get nearly equivalent behavior by scheduling a task to delete the "stderr" log entries for jobs that finished >30 days ago. This would address the long term proliferation problem while permitting more admin troubleshooting opportunities, avoiding race conditions like "last log deleted before it got sent to workbench via websockets", and avoiding logs falling through the cracks due after "job finished but cleanup statement failed", "crunch-job crashed", etc. 
 * The 30 day threshold should be configurable. 
 * Ideally the cleanup script should be an API client, but if there's no "delete records matching filters" API, we might have to make do with a rake task for now. 
 * crunch-dispatch already takes care of marking jobs as failed if they have been stuck in the queue too long, so this cleanup job doesn't have to worry about those -- just check the finished_at timestamp. 

Back