Bug #8189
closed[FUSE] Listing a project directory is slow when there are many subprojects
100%
Description
I created a 2947 sub-projects under the project '1000_genome_exome_raw_reads' with uuid su92l-j7d0g-3c6tenm6q4xn7qm on su92l. As a result, directory operations under that project are slow. For example, 'ls' takes nearly two minutes.
Updated by Brett Smith almost 9 years ago
- Subject changed from [Keep] Directory operations are slow after the creation of a large number of projects. to [FUSE] Listing a project directory is slow when there are many subprojects
Updated by Tom Clegg almost 9 years ago
arvados_fuse's ProjectDirectory class uses arvados.util.list_all:
contents = arvados.util.list_all(self.api.groups().contents,
self.num_retries, uuid=self.project_uuid)
arvados.util.list_all doesn't set a limit either, so we get the API's default limit of 100 items per page.
Suggest modifying arvados.util.list_all (in source:sdk/python/arvados/util.py#L365) to do something like
kwargs.setdefault('limit', sys.maxint)
That way, the API server's MAX_LIMIT (currently 1000) will determine the page size.
The rationale is that, once the client is in an API request loop that it won't exit until it gets all of the items, it's never a good idea for it to get fewer items per API request. Getting fewer items per page only makes sense if the client has some chance of doing something else (exiting the loop or processing a subset of results) before receiving MAX_LIMIT results.
(ArvadosResourceList#each_page in source:apps/workbench/app/models/arvados_resource_list.rb#177 needs this fix, too.)
Updated by Ward Vandewege almost 9 years ago
- Assigned To set to Ward Vandewege
- Target version set to 2016-01-20 Sprint
Updated by Ward Vandewege almost 9 years ago
- Status changed from New to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:37a1505b607bbf533512f48b47f208c5cde4c435.
Updated by Tom Clegg almost 9 years ago
- Status changed from Resolved to In Progress
Updated by Jiayong Li almost 9 years ago
Right now su92l workbench is still considerably slower than qr1hi workbench.
On su92l, I also noticed huge performance difference between read-only mount and writable mount (both freshly mounted to reflect recent changes).
read-only mount:
$ time ls keep/home/arvados_genomics_benchmark/1000_genome_exome_raw_reads
real 1m0.422s
user 0m0.020s
sys 0m0.060s
writable mount:
$ time ls mnt/home/arvados_genomics_benchmark/1000_genome_exome_raw_reads
real 94m40.150s
user 0m0.028s
sys 0m0.080s
Updated by Brett Smith almost 9 years ago
- Target version deleted (
2016-01-20 Sprint)
Updated by Brett Smith almost 9 years ago
- Target version set to Arvados Future Sprints
Updated by Jiayong Li almost 9 years ago
I tried running a pipeline on su92l, but the "Run a pipeline" button on the workbench homepage is not clickable now.
Updated by Ward Vandewege over 5 years ago
- Status changed from In Progress to Resolved
- Target version changed from Arvados Future Sprints to 2019-05-22 Sprint
This was resolved long ago, here's the performance today:
wardv@shell:~$ time ls keep/by_id/su92l-j7d0g-3c6tenm6q4xn7qm ... real 0m8.187s user 0m0.021s sys 0m0.086s