Troubleshooting aids » History » Version 1
Tom Clegg, 04/03/2024 07:40 PM
| 1 | 1 | Tom Clegg | h1. Troubleshooting aids |
|---|---|---|---|
| 2 | |||
| 3 | Troubleshoot usage problems: |
||
| 4 | * Improve error messages (e.g., clients should not crash and dump stack when a server is slow/unresponsive) |
||
| 5 | |||
| 6 | Troubleshoot compute nodes/images: |
||
| 7 | * {{issue(21581)}} |
||
| 8 | * {{issue(21424)}} |
||
| 9 | |||
| 10 | Troubleshoot arvados system services: |
||
| 11 | * Save snapshot of internals (goroutines / memory profile) of specified system service(s) to a collection, and provide instructions for viewing |
||
| 12 | * Save last N minutes of logs from all arvados services running on this host |
||
| 13 | |||
| 14 | Expose config/scaling issues: |
||
| 15 | * Scan metrics for recent "near/at capacity" signals |
||
| 16 | * Probe for proper nginx/proxy config (e.g., max request body size) |