Bug #3857: [SDKs] Python KeepClient should use different timeouts for different operations - Arvados

Actions

Copy link

Bug #3857

closed

[SDKs] Python KeepClient should use different timeouts for different operations

Added by Brett Smith over 10 years ago. Updated over 10 years ago.

Status:

Resolved

Priority:

Normal

Assigned To:

Tim Pierce

Category:

SDKs

Target version:

2014-11-19 sprint

Start date:

09/11/2014

Due date:

% Done:

100%

Estimated time:

(Total: 0.00 h)

Story points:

1.0

Description

Executive summary:

If an arvados.KeepClient is connecting to a Keep proxy (service_type == 'proxy'):

The TCP connection timeout should be 2 seconds.
Otherwise, connection timeout should be 20 seconds.

Leave all other timeouts as they are.

Updated by Ward Vandewege over 10 years ago

Target version set to Arvados Future Sprints

Actions

Copy link

Updated by Tom Clegg over 10 years ago

Story points set to 2.0

Actions

Copy link

Updated by Tom Clegg over 10 years ago

Target version changed from Arvados Future Sprints to 2014-10-08 sprint

Actions

Copy link

Updated by Tim Pierce over 10 years ago

Assigned To set to Tim Pierce

Actions

Copy link

Updated by Ward Vandewege over 10 years ago

Story points changed from 2.0 to 1.0

Actions

Copy link

Updated by Ward Vandewege over 10 years ago

Description updated (diff)

Actions

Copy link

Updated by Tim Pierce over 10 years ago

Description updated (diff)

Actions

Copy link

Updated by Tim Pierce over 10 years ago

Target version changed from 2014-10-08 sprint to Bug Triage

Actions

Copy link

Updated by Tom Clegg over 10 years ago

Target version changed from Bug Triage to Arvados Future Sprints

Actions

Copy link

#10

Updated by Tom Clegg over 10 years ago

Subject changed from [SDK] Python KeepClient should use different timeouts for different operations to [SDKs] Python KeepClient should use different timeouts for different operations

Actions

Copy link

#11

Updated by Ward Vandewege over 10 years ago

Target version changed from Arvados Future Sprints to 2014-10-29 sprint

Actions

Copy link

#12

Updated by Ward Vandewege over 10 years ago

Target version changed from 2014-10-29 sprint to 2014-11-19 sprint

Actions

Copy link

#13

Updated by Tom Clegg over 10 years ago

Target version changed from 2014-11-19 sprint to Arvados Future Sprints

Actions

Copy link

#14

Updated by Tom Clegg over 10 years ago

Target version changed from Arvados Future Sprints to 2014-11-19 sprint

Actions

Copy link

#15

Updated by Tim Pierce over 10 years ago

There doesn't appear to be a trivial fix to this one. httplib2 is built on Python's httplib, which takes a single timeout setting when a connection is created and gives no opportunity to change the timeout afterward or on individual requests. Nor is the httplib2 code organized in such a way as to offer a good place for us to inject this. The httplib2.Http method is implemented as a monolithic class method, so even if we subclass it, there's no obvious place where we can change the socket timeout after creating a connection and before issuing a request.

That said, making this change in the source doesn't look terribly difficult. This is currently marked as an open issue for httplib2. If we were willing to patch the library, we could submit a pull request and hope it makes it into a new httplib2 release. httplib2 appears to have been mostly in maintenance mode, but the author has accepted a few pull requests this year, so it's not impossible.

Actions

Copy link

#16

Updated by Tim Pierce over 10 years ago

Status changed from New to In Progress

Actions

Copy link

#17

Updated by Tim Pierce over 10 years ago

Alternatively, Brett points out that the Python requests module might be better suited to what we want to do here.

At a glance, it also only supports a single "timeout" parameter that is applied the same way to all connection objects. But it might be possible to use the Session class to set up a connection in one call and then issue a request on it with a different timeout setting.

Actions

Copy link

#18

Updated by Brett Smith over 10 years ago

Tim Pierce wrote:

Alternatively, Brett points out that the Python requests module might be better suited to what we want to do here.

To be clear, that was very speculative. Someone mentioned downgrading to httplib, so I wondered if we might do just as well to "upgrade" to requests, which has sometimes been called httplib3. But I've never used it and don't know what it's capable of.

Actions

Copy link

#19

Updated by Tim Pierce over 10 years ago

Tim Pierce wrote:

Alternatively, Brett points out that the Python requests module might be better suited to what we want to do here.

At a glance, it also only supports a single "timeout" parameter that is applied the same way to all connection objects. But it might be possible to use the Session class to set up a connection in one call and then issue a request on it with a different timeout setting.

Yes, reading more closely, the timeout parameter can be given as a (connection_timeout, read_timeout) tuple:

If you specify a single value for the timeout, like this:

r = requests.get('https://github.com', timeout=5)

The timeout value will be applied to both the connect and the read timeouts. Specify a tuple if you would like to set the values separately:

r = requests.get('https://github.com', timeout=(3.05, 27))

http://requests.readthedocs.org/en/latest/user/advanced/#timeouts

Actions

Copy link

#20

Updated by Tom Clegg over 10 years ago

I can't say I've given this a thorough look but I've noticed a couple of things:

I think it would be valuable to have different timeouts (and different defaults) for proxies. Perhaps we could accept timeout=((a,b),(c,d)) to mean "(a,b) if non-proxy, (c,d) if proxy"? In that case I'd suggest ((3,30),(20,60)) as a default.

Is there some way we can test this? Perhaps timeout=(0,12) reliably causes a connection timeout, etc.? If testing the timeouts turns out to be tricky, we could test some basic stuff like passing timeout=123 and timeout=(123,456) and timeout=None doesn't crash.

The doc string bit about the timeout argument could refer to the Requests documentation for more detail about the timeouts.

On that note, I would like to know whether, in the case of a large PUT, the read_timeout timer starts when the last byte is sent by the client, or when the connection is made (~when the first byte is sent by the client). If the latter, I'd suggest something more like ((3,60),(20,300)) instead of ((3,30),(20,60)) above.

(Perhaps Ward has better ideas about defaults.)

Actions

Copy link

#21

Updated by Tim Pierce over 10 years ago

This certainly should be testable, and some clever use of mocks should allow us to test the timeouts without actually waiting for 30 seconds or whatever. I'll ask our mock expert for advice here.

Separate timeouts for proxies seems like a reasonable thing. Is it okay to push that into another story? I'm eager to wrap up this story so I can finish as much work done as possible on NodeManager before my science support gig starts on Monday.

Actions

Copy link

#22

Updated by Tim Pierce over 10 years ago

On the other hand, the more I think about it, the more it seems like the test we're talking about is just that requests.get does what it advertises it will. I'm not sure it's a good use of our time to add testing infrastructure for that. Certainly we should make sure that in practice it gives us the result we want, but testing internal behavior of a third-party library is the kind of test that can be difficult to write and prone to break. WDYT?

With respect to your last question, the package defines the read_timeout parameter to mean the maximum amount of time to wait between consecutive read operations for a response from the server. That seems to indicate that the read timer is reset when the client calls socket.recv, i.e. after the last byte of the request is sent.

Actions

Copy link

#23

Updated by Brett Smith over 10 years ago

Tim Pierce wrote:

This certainly should be testable, and some clever use of mocks should allow us to test the timeouts without actually waiting for 30 seconds or whatever. I'll ask our mock expert for advice here.

Are you talking about me? Oh god, you're talking about me. Horrifying.

Separate timeouts for proxies seems like a reasonable thing. Is it okay to push that into another story?

At the very least, I think this branch needs to use different defaults based on whether or not we're using a proxy. I don't know how else you meet Ward's requirements in this story without regressing #3846. I feel less strongly about whether or not we expose these as separate options to SDK users (as opposed to making them check themselves and pass in one timeout tuple appropriately).

On the other hand, the more I think about it, the more it seems like the test we're talking about is just that requests.get does what it advertises it will. I'm not sure it's a good use of our time to add testing infrastructure for that. Certainly we should make sure that in practice it gives us the result we want, but testing internal behavior of a third-party library is the kind of test that can be difficult to write and prone to break. WDYT?

My feeling is that if we're just passing values directly through to requests.get, that's probably not a big deal to test; but any logic that we have to use different timeout values based on the Keep service type does need to be tested. If there's not an independent unit to test there, then checking what's passed into the requests mock is probably the next best option.

Actions

Copy link

#24

Updated by Tim Pierce over 10 years ago

1fa1006 adds a proxy_timeout configuration setting. The default timeout value is now (3, 30) and the default proxy_timeout is (20, 60).

Added unit tests to check that KeepClient.get and KeepClient.put use the appropriate timeout setting when communicating directly with a Keep server and with a proxy, respectively.

Actions

Copy link

#25

Updated by Brett Smith over 10 years ago

Reviewing 1fa1006. I like this branch a lot. Thanks. This is all small potatoes stuff.

The story says to leave all non-connect timeouts as they are. We've been using 300 for read timeouts. We know for sure we need more than 60 for the proxy case, because that's what caused #3846. Let's go back to that for read timeouts.
The timeout argument to KeepService.__init__ seems to be a noop. If it is, let's remove it.
In KeepService.last_status, AttributeError is the only exception that still makes sense to catch.
In test_collections, _mock_api_call should not be changed to use a Requests error. The Google API client library uses httplib2, so ApiError (which is a replacement for apiclient's own HttpError) is expecting to get httplib2's 2-tuple.
KeepClientRetryGetTestMixin and KeepClientRetryPutTestMixin seem like unnecessary layering. KeepClientRetryTestMixin users already have to define a few class variables; why not just add mock_method to that list (and document it in the comments)?
Personally, I'd prefer to have them go further and have mixin users define the patch method: TEST_PATCHER = tutil.mock_get_responses. Then all the test methods call with self.TEST_PATCHER(args, …). This gives the users more flexibility and eliminates the need for mock_requestslib_responses entirely.
I feel like test_proxy_timeout better fits under KeepClientServiceTestCase than KeepClientRetryTestMixin. The timeout isn't related to retries, and there's not much point in retesting it for every single method that we're interested in the retry behavior of.
PEP 257 calls for no blank line before a docstring's closing triple quotes (KeepClient.__init__).
Please keep import blocks sorted where they already are (arvados_testutil).

Actions

Copy link

#26

Updated by Brett Smith over 10 years ago

Brett Smith wrote:

I feel like test_proxy_timeout better fits under KeepClientServiceTestCase than KeepClientRetryTestMixin. The timeout isn't related to retries, and there's not much point in retesting it for every single method that we're interested in the retry behavior of.

Actually, there's an even better reason to move this: the current implementation assumes we know at __init__ time whether or not we're using a proxy, but that's not always true. If no service roots are defined at init time, KeepClient will ask Arvados for available services, and we might learn then that we're using a proxy. Whether or not we're using a proxy might even change over the lifetime of the KeepClient, since it refreshes this list during retries. KeepClient needs to save both timeout values and pick the right one at request time based on the current value of self.using_proxy.

KeepClientServiceTestCase has all the infrastructure for mocking the API service call, so it can help you test this case of "no service roots at init time, learn we have a proxy later."

Actions

Copy link

#27

Updated by Tim Pierce over 10 years ago

All of these changes adopted at 6959469

Brett Smith wrote:

Reviewing 1fa1006. I like this branch a lot. Thanks. This is all small potatoes stuff.

The story says to leave all non-connect timeouts as they are. We've been using 300 for read timeouts. We know for sure we need more than 60 for the proxy case, because that's what caused #3846. Let's go back to that for read timeouts.

The timeout argument to KeepService.__init__ seems to be a noop. If it is, let's remove it.

In KeepService.last_status, AttributeError is the only exception that still makes sense to catch.

In test_collections, _mock_api_call should not be changed to use a Requests error. The Google API client library uses httplib2, so ApiError (which is a replacement for apiclient's own HttpError) is expecting to get httplib2's 2-tuple.

KeepClientRetryGetTestMixin and KeepClientRetryPutTestMixin seem like unnecessary layering. KeepClientRetryTestMixin users already have to define a few class variables; why not just add mock_method to that list (and document it in the comments)?
Personally, I'd prefer to have them go further and have mixin users define the patch method: TEST_PATCHER = tutil.mock_get_responses. Then all the test methods call with self.TEST_PATCHER(args, …). This gives the users more flexibility and eliminates the need for mock_requestslib_responses entirely.

I feel like test_proxy_timeout better fits under KeepClientServiceTestCase than KeepClientRetryTestMixin. The timeout isn't related to retries, and there's not much point in retesting it for every single method that we're interested in the retry behavior of.

PEP 257 calls for no blank line before a docstring's closing triple quotes (KeepClient.__init__).

Well. Hmph. Somebody should tell Barry Warsaw so M-q does the right thing in python mode, then.

Please keep import blocks sorted where they already are (arvados_testutil).

I'm confused -- I didn't reorder the import modules.

(I'd like to adopt a convention of putting all imports in alphabetical order, but that's a discussion for another time.)

Actions

Copy link

#28

Updated by Tim Pierce over 10 years ago

Tim Pierce wrote:

I'm confused -- I didn't reorder the import modules.

(I'd like to adopt a convention of putting all imports in alphabetical order, but that's a discussion for another time.)

OOPS

fixed at a98086e.

Actions

Copy link

#29

Updated by Brett Smith over 10 years ago

Reviewing a98086e, and just one small technicality about the comments you added KeepClientRetryTestMixin. TEST_PATCHER doesn't need to be a static method: a subclass could def TEST_PATCHER(self, …) and it would work fine. Your specific subclasses need to use staticmethod because that's how you convert an independent function to a method, roughly speaking.

With a small tweak there, this is good to merge, thanks.

Actions

Copy link

#30

Updated by Tim Pierce over 10 years ago

Status changed from In Progress to Resolved

Applied in changeset arvados|commit:ec75fda0fc2c86a77d831dcd7962ece7a2d6ae6d.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Arvados

Custom queries

Bug #3857

[SDKs] Python KeepClient should use different timeouts for different operations

Updated by Ward Vandewege over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Tim Pierce over 10 years ago

Updated by Ward Vandewege over 10 years ago

Updated by Ward Vandewege over 10 years ago

Updated by Tim Pierce over 10 years ago

Updated by Tim Pierce over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Ward Vandewege over 10 years ago

Updated by Ward Vandewege over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Tim Pierce over 10 years ago

Updated by Tim Pierce over 10 years ago

Updated by Tim Pierce over 10 years ago

Updated by Brett Smith over 10 years ago

Updated by Tim Pierce over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Tim Pierce over 10 years ago

Updated by Tim Pierce over 10 years ago

Updated by Brett Smith over 10 years ago

Updated by Tim Pierce over 10 years ago

Updated by Brett Smith over 10 years ago

Updated by Brett Smith over 10 years ago

Updated by Tim Pierce over 10 years ago

Updated by Tim Pierce over 10 years ago

Updated by Brett Smith over 10 years ago

Updated by Tim Pierce over 10 years ago