Bug #7852
closed[SDKs] arvados.arvfile.StreamFileReader.readlines() returns empty list after calling .readline()
100%
Description
If one calls `readline` on an `arvados.arvfile.StreamFileReader` then a subsequent call to `readlines` will return an empty list. My expectation was that the first sfr.readline() call would give me the first line and then the sfr.readlines() call would give me the rest of the lines, but this appears to not work.
Updated by Joshua Randall about 9 years ago
FWIW, the python built-in docs for file object state that readlines() repeatedly calls readline(), so perhaps a fix is to reimplement it that way rather than via readall().
Updated by Brett Smith about 9 years ago
- Subject changed from problems with mixing readline / readlines in arvados.arvfile.StreamFileReader to [SDKs] arvados.arvfile.StreamFileReader.readlines() returns empty list after calling .readline()
Updated by Brett Smith about 9 years ago
- Target version set to Arvados Future Sprints
Updated by Radhika Chippada almost 9 years ago
fc308658aeeeb007fa3eb710b4bdc9179a13db10
branch 7852-test-readlines
Added a test that fails when realine and readlines are invoked on the stream reader in that sequence. The readlines call is only fetching the data after the "first block" which was read during the first readline call. It appears that self._readline_cache and self._filepos are playing a role here.
Updated by Radhika Chippada almost 9 years ago
d1c1f71d3d2e4170c51855d23557d58322627fd8
Added one more test "test_readline_then_readall" which is failing exactly like the previous readline_the_readlines test, for apparently the same reason.
Updated by Radhika Chippada almost 9 years ago
Should we reset self._readline_cache and self._filepos when a readall or a readlines call is made? Looking at the python api documentation, it seems like the readall and readlines calls are expected to read the entire file contents.
Updated by Brett Smith almost 9 years ago
- Status changed from New to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:0c7f9737786fd64177f8210585ec620402d0d9b6.
Updated by Brett Smith almost 9 years ago
7852-readline-cache-fix-wip is up for review. It starts from Radhika's 7852-test-readlines branch, rebases that on top of current master, then adds a commit that fixes the bug and gets the tests passing.
Updated by Brett Smith almost 9 years ago
- Status changed from Resolved to In Progress
- Assigned To set to Brett Smith
- Target version changed from Arvados Future Sprints to 2016-03-30 sprint
- Story points set to 0.5
Updated by Peter Amstutz almost 9 years ago
Since this bug is about the interaction between readline() and readlines(), I'm looking at the branch now and readlines() doesn't actually use readline(). Worse, it looks like readlines() sucks in the entire file into a buffer before using splitlines() to return a list. I suggest refactoring readlines() to be an iterator over readline().
Updated by Brett Smith almost 9 years ago
Peter Amstutz wrote:
Since this bug is about the interaction between readline() and readlines(), I'm looking at the branch now and readlines() doesn't actually use readline().
That's how the bug was reported, but it's worth noting that the bug will strike any time a caller interleaves calls to readline() and any other file read method. This is demonstrated by Radhika's readall test, and noted in the commit message.
Worse, it looks like readlines() sucks in the entire file into a buffer before using splitlines() to return a list. I suggest refactoring readlines() to be an iterator over readline().
readlines() is a standard file method that returns a list, so changing that to an iterator would break the API. Obviously calling it in any non-trivial case is a Bad Idea, but that's for the caller to decide, not us.
Given that API, sucking in whole blocks actually improves the performance of the method, because it avoids the overhead of calling readline() many times when the Keep API currently requires us to fetch whole blocks anyway.
Updated by Brett Smith almost 9 years ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|commit:e66f7687796f4ed3513fbc5a469d578ce76334e0.