Bug #8991
closed[PySDK] Possible infinite loop in KeepBlockCache.cap_cache()
Description
Given the code in KeepBlockCache.cap_cache() from sdk/python/arvados/keep.py:
https://github.com/curoverse/arvados/blob/master/sdk/python/arvados/keep.py#L172
there are conditions in which the while loop will never exit: if it won't find anything to delete from the cache. In fact the risk is quite high given the fact that before looping the self._cache is filtered and will contain only the elements that either are not ready (not c.ready.is_set()) OR have content, and then the inner for loop searches exactly this type of elements to delete - the elements that are ready (if self._cache[i].ready.is_set()). So if the self._cache will contain only elements that are not ready, but have content, the while loop will never exit.
Updated by Brett Smith almost 10 years ago
Irina Colgiu wrote:
So if the self._cache will contain only elements that are not ready, but have content, the while loop will never exit.
Irina,
I think I agree that this code could be more defensive, but right now it doesn't look like it's possible for the loop to go infinite, because cache entries are never in this state for long. The only method that saves content is also the only method that sets the ready flag: CacheSlot.set(). So if the slot has content, it will also have the ready flag set shortly thereafter, at which point the inner loop will be able to make progress. Do you see something I'm missing?
Updated by Brett Smith almost 10 years ago
- Subject changed from Possible infinite loop in keep.py to [PySDK] Possible infinite loop in KeepBlockCache.cap_cache()
Updated by Irina Colgiu almost 10 years ago
Hi Brett,
I agree that it is not very probable to go into the infinite loop, but I thought it's better to be on the safe side, especially when it comes to infinite loops and when the fix takes less time than arguing about it. Within the cap_cache code there are a while and a for loop and I see no good reason for having 2 nested loops when you can do the same job with only one for loop => reduce the complexity from O(n^2) to O(n) and avoid infinite looping (though improbable):
Removing from a list while iterating on it in the reverse order works fine, so no worries on that part.
Updated by Brett Smith almost 10 years ago
Irina Colgiu wrote:
I agree that it is not very probable to go into the infinite loop, but I thought it's better to be on the safe side, especially when it comes to infinite loops and when the fix takes less time than arguing about it. Within the cap_cache code there are a while and a for loop and I see no good reason for having 2 nested loops when you can do the same job with only one for loop => reduce the complexity from O(n^2) to O(n) and avoid infinite looping (though improbable):
Irina,
Thanks for putting this together. Definitely I agree that flattening the loops makes it nicer to read. However, this version doesn't fully preserve the method's original behavior, and I'd be wary to change that without further review.
Individual cache entries are filled in separate threads. In the original code, it is possible that the first run of the for loop will not find enough data to delete, but then when the for loop runs again later, it will find additional data to delete, and eventually succeed in capping the cache.
The end result is that the original code blocks until it succeeds in capping the cache to its maximum size. Your version runs through the cache once, making one run through the cache to try to cap its size, and then returning regardless of whether it made any change.
I'm all for a safer loop, but this seems like a larger change in the method's semantics than is necessary for safety. I think we'd have no trouble looking at a patch that stopped cap_cache() if it failed to delete any data for a while. However, I think it should be willing to loop multiple times through the cache as long as it continues to make progress toward capping its size.
Let me know if you have any questions or concerns about that approach. Thanks again.