Bug #17339
closed[keepstore] unexplained memory use with new S3 driver
100%
Description
I'm using the new S3 driver on 2xpu4. This is keepstore 2.1.1:
keep0.2xpu4:~# keepstore -version keepstore 2.1.1 (go1.13.4)
The keepstore daemon on keep0.2xpu4 seems to use more memory than expected.
The machine it runs on is modest, with just 3.7G of ram:
# free total used free shared buff/cache available Mem: 3827036 3219100 317440 1000 290496 3255540 Swap: 0 0 0
I used our little formula to calculate a suitable value for MaxKeepBlobBuffers:
$ perl -e 'print 3827036/88/1024' 42.4697709517045
That was too many; keepstore kept getting killed by the OOM killer. I reduced it to 40, same problem. I reduced it to 20 and that made it happy, stable at an RSS use of 2999284 bytes, and suitably making clients wait when its 20 buffers are in use (writing roughly 600 MiB/sec into S3).
After an hour of being mostly idle, the keepstore RSS memory use is unchanged:
root 3366 31.2 78.3 4218032 2999284 ? Ssl 16:22 25:05 /usr/bin/keepstore
Two surprising things/questions here:
a) why is it using ~2.86GiB of ram with only 20 buffers? I was expecting memory use to be 1.28GiB of ram (20 * 64 MiB) plus a bit of overhead.
b) why is the memory use not going down after a long period of idle?
Here is a snapshot of the metrics for the keepstore (also available in prometheus/grafana as a time series):
# HELP arvados_concurrent_requests Number of requests in progress # TYPE arvados_concurrent_requests gauge arvados_concurrent_requests 0 # HELP arvados_keepstore_bufferpool_allocated_bytes Number of bytes allocated to buffers # TYPE arvados_keepstore_bufferpool_allocated_bytes gauge arvados_keepstore_bufferpool_allocated_bytes 5.905580032e+09 # HELP arvados_keepstore_bufferpool_inuse_buffers Number of buffers in use # TYPE arvados_keepstore_bufferpool_inuse_buffers gauge arvados_keepstore_bufferpool_inuse_buffers 0 # HELP arvados_keepstore_bufferpool_max_buffers Maximum number of buffers allowed # TYPE arvados_keepstore_bufferpool_max_buffers gauge arvados_keepstore_bufferpool_max_buffers 20 # HELP arvados_keepstore_pull_queue_inprogress_entries Number of pull requests in progress # TYPE arvados_keepstore_pull_queue_inprogress_entries gauge arvados_keepstore_pull_queue_inprogress_entries 0 # HELP arvados_keepstore_pull_queue_pending_entries Number of queued pull requests # TYPE arvados_keepstore_pull_queue_pending_entries gauge arvados_keepstore_pull_queue_pending_entries 0 # HELP arvados_keepstore_trash_queue_inprogress_entries Number of trash requests in progress # TYPE arvados_keepstore_trash_queue_inprogress_entries gauge arvados_keepstore_trash_queue_inprogress_entries 0 # HELP arvados_keepstore_trash_queue_pending_entries Number of queued trash requests # TYPE arvados_keepstore_trash_queue_pending_entries gauge arvados_keepstore_trash_queue_pending_entries 0 # HELP arvados_keepstore_volume_errors Number of volume errors # TYPE arvados_keepstore_volume_errors counter arvados_keepstore_volume_errors{device_id="s3:///2xpu4-nyw5e-000000000000000-volume",error_type="*s3manager.multiUploadError 000 MultipartUpload"} 2 arvados_keepstore_volume_errors{device_id="s3:///2xpu4-nyw5e-000000000000000-volume",error_type="s3.requestFailure 404 NotFound"} 1927 # HELP arvados_keepstore_volume_io_bytes Volume I/O traffic in bytes # TYPE arvados_keepstore_volume_io_bytes counter arvados_keepstore_volume_io_bytes{device_id="s3:///2xpu4-nyw5e-000000000000000-volume",direction="in"} 2.63069164e+08 arvados_keepstore_volume_io_bytes{device_id="s3:///2xpu4-nyw5e-000000000000000-volume",direction="out"} 1.25395672825e+11 # HELP arvados_keepstore_volume_operations Number of volume operations # TYPE arvados_keepstore_volume_operations counter arvados_keepstore_volume_operations{device_id="s3:///2xpu4-nyw5e-000000000000000-volume",operation="get"} 28 arvados_keepstore_volume_operations{device_id="s3:///2xpu4-nyw5e-000000000000000-volume",operation="head"} 5857 arvados_keepstore_volume_operations{device_id="s3:///2xpu4-nyw5e-000000000000000-volume",operation="list"} 112 arvados_keepstore_volume_operations{device_id="s3:///2xpu4-nyw5e-000000000000000-volume",operation="put"} 5821 # HELP arvados_max_concurrent_requests Maximum number of concurrent requests # TYPE arvados_max_concurrent_requests gauge arvados_max_concurrent_requests 0 # HELP request_duration_seconds Summary of request duration. # TYPE request_duration_seconds summary request_duration_seconds_sum{code="200",method="get"} 118.02206067300001 request_duration_seconds_count{code="200",method="get"} 42 request_duration_seconds_sum{code="200",method="put"} 29078.17948122801 request_duration_seconds_count{code="200",method="put"} 3906 request_duration_seconds_sum{code="500",method="put"} 19.722034916 request_duration_seconds_count{code="500",method="put"} 4 # HELP time_to_status_seconds Summary of request TTFB. # TYPE time_to_status_seconds summary time_to_status_seconds_sum{code="200",method="get"} 106.74335901900001 time_to_status_seconds_count{code="200",method="get"} 42 time_to_status_seconds_sum{code="200",method="put"} 29075.862950967014 time_to_status_seconds_count{code="200",method="put"} 3906 time_to_status_seconds_sum{code="500",method="put"} 19.72120516 time_to_status_seconds_count{code="500",method="put"} 4
Files
Updated by Ward Vandewege almost 4 years ago
- Status changed from New to In Progress
Updated by Ward Vandewege almost 4 years ago
- Subject changed from [keepstore] unexplained memory use to [keepstore] unexplained memory use with new S3 driver
Updated by Ward Vandewege almost 4 years ago
- Target version changed from 2021-02-17 sprint to 2021-03-03 sprint
Updated by Peter Amstutz almost 4 years ago
- Target version changed from 2021-03-03 sprint to 2021-03-17 sprint
Updated by Ward Vandewege almost 4 years ago
- Target version changed from 2021-03-17 sprint to 2021-03-31 sprint
Updated by Peter Amstutz almost 4 years ago
- Target version changed from 2021-03-31 sprint to 2021-04-14 sprint
Updated by Peter Amstutz almost 4 years ago
- Target version changed from 2021-04-14 sprint to 2021-04-28 bughunt sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-04-28 bughunt sprint to 2021-05-12 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-05-12 sprint to 2021-05-26 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-05-26 sprint to 2021-06-09 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-06-09 sprint to 2021-06-23 sprint
Updated by Ward Vandewege over 3 years ago
- Target version changed from 2021-06-23 sprint to 2021-07-07 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-07-07 sprint to 2021-07-21 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-07-21 sprint to 2021-08-04 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-08-04 sprint to 2021-08-18 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-08-18 sprint to 2021-09-01 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-09-01 sprint to 2021-09-15 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-09-15 sprint to 2021-09-29 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-09-29 sprint to 2021-10-13 sprint
Updated by Peter Amstutz over 3 years ago
- Target version changed from 2021-10-13 sprint to 2021-10-27 sprint
Updated by Peter Amstutz about 3 years ago
- Target version changed from 2021-10-27 sprint to 2021-11-10 sprint
Updated by Ward Vandewege about 3 years ago
- Target version changed from 2021-11-10 sprint to 2021-11-24 sprint
Updated by Peter Amstutz about 3 years ago
- Target version changed from 2021-11-24 sprint to 2021-12-08 sprint
Updated by Ward Vandewege about 3 years ago
- File before-profile-arv-put.bb.gz before-profile-arv-put.bb.gz added
- File before-pprof-arv-put-alloc_space.svg before-pprof-arv-put-alloc_space.svg added
Running some experiments in a controlled environment. I've attached before-profile-arv-put.bb.gz which is a pprof snapshot taken after running an upload experiment a couple times. This is the corresponding graph that shows cumulative memory allocations, which points at the problem:
Clearly, an (unnecessary) copy of uploaded blocks is being made inside the S3 sdk.
This is also visible in the "top10" output:
File: keepstore Build ID: 78315dc4f3813def38521cb385cc1fdaecdcf72b Type: inuse_space Time: Dec 2, 2021 at 2:43pm (EST) Entering interactive mode (type "help" for commands, "o" for options) (pprof) sample_index=alloc_space (pprof) top10 Showing nodes accounting for 6319.45MB, 98.28% of 6430.30MB total Dropped 308 nodes (cum <= 32.15MB) Showing top 10 nodes out of 28 flat flat% sum% cum cum% 3140.14MB 48.83% 48.83% 3140.14MB 48.83% github.com/aws/aws-sdk-go-v2/service/s3/s3manager.(*maxSlicePool).newSlice 3136MB 48.77% 97.60% 3136MB 48.77% git.arvados.org/arvados.git/services/keepstore.(*handler).setup.func1 42.81MB 0.67% 98.27% 62.87MB 0.98% io.copyBuffer 0.50MB 0.0078% 98.28% 47.61MB 0.74% github.com/aws/aws-sdk-go-v2/service/s3/s3manager.(*multiuploader).send 0 0% 98.28% 58.24MB 0.91% bufio.(*Writer).ReadFrom 0 0% 98.28% 3155.16MB 49.07% git.arvados.org/arvados.git/services/keepstore.(*S3AWSVolume).Put 0 0% 98.28% 3155.16MB 49.07% git.arvados.org/arvados.git/services/keepstore.(*S3AWSVolume).WriteBlock 0 0% 98.28% 3155.16MB 49.07% git.arvados.org/arvados.git/services/keepstore.(*S3AWSVolume).writeObject 0 0% 98.28% 3136MB 48.77% git.arvados.org/arvados.git/services/keepstore.(*bufferPool).Get 0 0% 98.28% 3154.66MB 49.06% git.arvados.org/arvados.git/services/keepstore.PutBlock.func1
Tom helped looking at the code, and found that the issue is that we are passing in an io.Reader, not a `readerAtSeeker`, which causes nextReader() in `github.com/aws/aws-sdk-go-v2@v0.23.0/service/s3/s3manager/upload.go` to make an extra memory allocation.
The cause of this is our use of `putWithPipe` in the `Put` method in `services/keepstore/s3aws_volume.go`, which is fundamentally incompatible with seeking. We use `putWithPipe` to be able to release the allocated memory as soon as possible when a client hangs up. We can outsource that problem to the S3 sdk since we pass in a context.
After 464c1562415ac7b5b5503f41b20c3183610dc899, the extra memory allocation on the write path is gone. This is a Profile (after-profile-arv-put.bb.gz) taken after writing roughly 15GB of data with arv-put:
File: keepstore Build ID: 3fa6ed4b9b8bab06a255c44ae9b017f5743ad637 Type: inuse_space Time: Dec 2, 2021 at 3:50pm (EST) Entering interactive mode (type "help" for commands, "o" for options) (pprof) sample_index=alloc_space (pprof) top Showing nodes accounting for 2094.03MB, 94.76% of 2209.78MB total Dropped 327 nodes (cum <= 11.05MB) Showing top 10 nodes out of 67 flat flat% sum% cum cum% 1728MB 78.20% 78.20% 1728MB 78.20% git.arvados.org/arvados.git/services/keepstore.(*handler).setup.func1 160.04MB 7.24% 85.44% 160.04MB 7.24% git.arvados.org/arvados.git/services/keepstore.(*statsTicker).TickOutBytes 109.33MB 4.95% 90.39% 300.90MB 13.62% io.copyBuffer 30.50MB 1.38% 91.77% 30.50MB 1.38% context.WithCancel 16.56MB 0.75% 92.52% 16.56MB 0.75% bufio.NewReaderSize 15.05MB 0.68% 93.20% 15.05MB 0.68% bufio.NewWriterSize 14.02MB 0.63% 93.83% 14.02MB 0.63% strings.(*Builder).grow (inline) 9.01MB 0.41% 94.24% 15.01MB 0.68% github.com/aws/aws-sdk-go-v2/aws/signer/v4.(*httpSigner).buildCanonicalHeaders 7.51MB 0.34% 94.58% 13.01MB 0.59% github.com/aws/aws-sdk-go-v2/aws.New 4MB 0.18% 94.76% 55.54MB 2.51% github.com/aws/aws-sdk-go-v2/aws/signer/v4.(*httpSigner).Build
Updated by Ward Vandewege about 3 years ago
Updated by Ward Vandewege about 3 years ago
On the read path, a similar thing happens, though this time the extra memory allocation happens in the Keep driver, not in the S3 sdk.
Top10:
File: keepstore Build ID: 3fa6ed4b9b8bab06a255c44ae9b017f5743ad637 Type: inuse_space Time: Dec 2, 2021 at 4:08pm (EST) Entering interactive mode (type "help" for commands, "o" for options) (pprof) sample_index=alloc_space (pprof) top10 Showing nodes accounting for 27992.35MB, 98.70% of 28359.95MB total Dropped 351 nodes (cum <= 141.80MB) Showing top 10 nodes out of 27 flat flat% sum% cum cum% 14976.50MB 52.81% 52.81% 15016.70MB 52.95% git.arvados.org/arvados.git/services/keepstore.(*S3AWSVolume).readWorker 12672MB 44.68% 97.49% 12672MB 44.68% git.arvados.org/arvados.git/services/keepstore.(*handler).setup.func1 189.81MB 0.67% 98.16% 525.38MB 1.85% io.copyBuffer 153.54MB 0.54% 98.70% 153.54MB 0.54% git.arvados.org/arvados.git/services/keepstore.(*statsTicker).TickInBytes 0.50MB 0.0018% 98.70% 475.79MB 1.68% github.com/aws/aws-sdk-go-v2/service/s3/s3manager.(*downloader).downloadChunk 0 0% 98.70% 182.03MB 0.64% bufio.(*Reader).Read 0 0% 98.70% 173.16MB 0.61% crypto/tls.(*Conn).HandshakeContext (inline) 0 0% 98.70% 182.03MB 0.64% crypto/tls.(*Conn).Read 0 0% 98.70% 173.16MB 0.61% crypto/tls.(*Conn).handshakeContext 0 0% 98.70% 15174.88MB 53.51% git.arvados.org/arvados.git/services/keepstore.(*S3AWSVolume).ReadBlock
Updated by Ward Vandewege about 3 years ago
Updated by Ward Vandewege about 3 years ago
- File after-pprof-arv-mount-read-alloc_space.svg after-pprof-arv-mount-read-alloc_space.svg added
- File after-profile-arv-mount-read.bb.gz after-profile-arv-mount-read.bb.gz added
After 5ce5bf966dfabbc0beb7330d4c976a23fde3fd83, the extra memory allocation in the read path is gone. This is a Profile (after-profile-arv-mount-read.bb.gz) taken after reading roughly 15GB of data through arv-mount:
pprof after-profile-arv-mount-read.bb.gz File: keepstore Build ID: 91924b715b34947abbb234201802d2364bc4749f Type: inuse_space Time: Dec 3, 2021 at 3:01pm (EST) Entering interactive mode (type "help" for commands, "o" for options) (pprof) sample_index=alloc_space (pprof) top10 Showing nodes accounting for 3460.61MB, 96.73% of 3577.68MB total Dropped 314 nodes (cum <= 17.89MB) Showing top 10 nodes out of 60 flat flat% sum% cum cum% 3072MB 85.87% 85.87% 3072MB 85.87% git.arvados.org/arvados.git/services/keepstore.(*handler).setup.func1 175.88MB 4.92% 90.78% 363.53MB 10.16% io.copyBuffer 136.01MB 3.80% 94.58% 136.01MB 3.80% context.WithCancel 37.65MB 1.05% 95.64% 37.65MB 1.05% bytes.makeSlice 19.07MB 0.53% 96.17% 19.07MB 0.53% bufio.NewWriterSize (inline) 17.50MB 0.49% 96.66% 54.14MB 1.51% crypto/tls.(*Conn).readFromUntil 1.50MB 0.042% 96.70% 447.14MB 12.50% github.com/aws/aws-sdk-go-v2/service/s3/s3manager.(*downloader).downloadChunk 0.50MB 0.014% 96.71% 35.03MB 0.98% github.com/aws/aws-sdk-go-v2/aws/signer/v4.(*httpSigner).Build 0.50MB 0.014% 96.73% 18.06MB 0.5% github.com/aws/aws-sdk-go-v2/private/protocol/xml.NewEncoder (inline) 0 0% 96.73% 187.64MB 5.24% bufio.(*Reader).Read
Updated by Ward Vandewege about 3 years ago
ready for review at 8a353bafe477a114c008b64bd25445266d2cf43c. Tests passed at developer-run-tests: #2826
Updated by Tom Clegg about 3 years ago
Functionally LGTM, just some code suggestions:
In count.go:
NewCountingReaderAtSeeker(r interface{}, f func(uint64)) should be NewCountingReaderAtSeeker(r readerAtSeeker, f func(uint64)) ... then sending the wrong type will be a compile error instead of a runtime panic.
No need to write Seek() explicitly -- countingReaderAtSeeker embeds readerAtSeeker so Seek will pass through implicitly.
In translateError() I think it would be better form to check for the concrete type *aws.RequestCanceledError
before checking the interface type awserr.Error
so we still do the same thing in case the type starts implementing the interface. Also, mixing switch and if-else here is a bit awkward, could just pick one.
if _, ok := err.(*awsRequestCanceledError); ok { } else if err, ok := err.(aws.Error); ok { }
Looks like Get() is an unnecessary wrapper -- could we just rename ReadBlock() to Get()?
Might be better to call v.bucket.stats.TickInBytes(uint64(count))
from readWorker() instead, so ReadBlock() doesn't need to call it twice.
In readWorker, the line buf = awsBuf.Bytes()
doesn't do anything (buf is already the buffer where the block has been downloaded, and if it wasn't, this wouldn't help anyway since we don't do anything with buf after this). (With that, the last 5 lines can reduce to "return count, v.translateError(err)")
WriteBlock() is a bit misleading since it takes an io.Reader but only works the way we want if the io.Reader also happens to be a readAtSeeker. The reason it existed in the first place was to use the "pipe adapter" stuff, which we now know it can't use, so we could just put "rdr := bytes.NewReader(block)" at the top and then WriteBlock would be the whole Put func.
Updated by Ward Vandewege about 3 years ago
Tom Clegg wrote:
Functionally LGTM, just some code suggestions:
In count.go:
NewCountingReaderAtSeeker(r interface{}, f func(uint64)) should be NewCountingReaderAtSeeker(r readerAtSeeker, f func(uint64)) ... then sending the wrong type will be a compile error instead of a runtime panic.
Yes! Fixed.
No need to write Seek() explicitly -- countingReaderAtSeeker embeds readerAtSeeker so Seek will pass through implicitly.
Done.
In translateError() I think it would be better form to check for the concrete type
*aws.RequestCanceledError
before checking the interface typeawserr.Error
so we still do the same thing in case the type starts implementing the interface. Also, mixing switch and if-else here is a bit awkward, could just pick one.
Done, thanks.
Looks like Get() is an unnecessary wrapper -- could we just rename ReadBlock() to Get()?
Yes, done!
Might be better to call
v.bucket.stats.TickInBytes(uint64(count))
from readWorker() instead, so ReadBlock() doesn't need to call it twice.
Fixed.
In readWorker, the line
buf = awsBuf.Bytes()
doesn't do anything (buf is already the buffer where the block has been downloaded, and if it wasn't, this wouldn't help anyway since we don't do anything with buf after this). (With that, the last 5 lines can reduce to "return count, v.translateError(err)")
Done thanks.
WriteBlock() is a bit misleading since it takes an io.Reader but only works the way we want if the io.Reader also happens to be a readAtSeeker. The reason it existed in the first place was to use the "pipe adapter" stuff, which we now know it can't use, so we could just put "rdr := bytes.NewReader(block)" at the top and then WriteBlock would be the whole Put func.
Done.
Ready for another look at 24e333637a1ecf1fd01ad501d62778a24985948f on branch 17339-s3aws-driver-memory-footprint
Tests are running at developer-run-tests: #2832
Updated by Ward Vandewege about 3 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|17a3d6e7fbc8941ad67b3b6cf344ff369ddd6e66.