Feature #16316
opena-c-r handles resource range requests (especially CPU) and adjusts requests based on what is in InstanceTypes list
0%
Description
Implement a version of select_resources for Arvados.
You can get a dictionary of instance types with this:
api.config()["InstanceTypes"]
The select_resources method should, at minimum, accept a range of CPU core values (e.g. coresMin: 4, coresMax: 16) and then check the available InstanceTypes and assign the greatest core count available. For example, if the system is only configured with 2, 4, and 8 core nodes, it should assign 8 cores since it is in the range (4 - 16).
RAM and disk can also have a range. Just return the minimum value for now (this is the existing behavior).
Tell cwltool to use your select_resources
method by setting the object field runtimeContext.select_resources
.
Updated by Jiayong Li about 2 years ago
I'm trying to run the unit test "test_resource_requirements" using python3 virtualenv on tordo shell node. I got the following error.
$ python setup.py install Installed /home/jli/env/acr/lib/python3.7/site-packages/arvados_cwl_runner-2.5.0.dev20221129154757-py3.7.egg Processing dependencies for arvados-cwl-runner==2.5.0.dev20221129154757 error: pyparsing 3.0.9 is installed but pyparsing<3 is required by {'arvados-python-client'}
$ python setup.py test --test-suite=test.test_container.test_resource_requirements Using /home/jli/git/arvados/sdk/cwl/.eggs/googleapis_common_protos-1.57.0-py3.7.egg Traceback (most recent call last): File "setup.py", line 60, in <module> zip_safe=True, File "/home/jli/env/acr/lib/python3.7/site-packages/setuptools/__init__.py", line 145, in setup return distutils.core.setup(**attrs) File "/usr/lib/python3.7/distutils/core.py", line 148, in setup dist.run_commands() File "/usr/lib/python3.7/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/jli/env/acr/lib/python3.7/site-packages/setuptools/command/test.py", line 216, in run installed_dists = self.install_dists(self.distribution) File "/home/jli/env/acr/lib/python3.7/site-packages/setuptools/command/test.py", line 207, in install_dists ir_d = dist.fetch_build_eggs(dist.install_requires) File "/home/jli/env/acr/lib/python3.7/site-packages/setuptools/dist.py", line 724, in fetch_build_eggs replace_conflicting=True, File "/home/jli/env/acr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 791, in resolve raise VersionConflict(dist, req).with_context(dependent_req) pkg_resources.ContextualVersionConflict: (google-auth 1.35.0 (/home/jli/git/arvados/sdk/cwl/.eggs/google_auth-1.35.0-py3.7.egg), Requirement.parse('google-auth<3.0dev,>=2.14.1'), {'google-api-core'})
1. Are these the right commands for unit testing?
2. Is pyparsing 3.0.9 the main issue here?
Updated by Peter Amstutz about 2 years ago
Try merging main, I think fixed this.
Updated by Jiayong Li about 2 years ago
After trying
$ pip install -e .
I get
$ python setup.py test --test-suite=test.test_container.test_resource_requirements Installed /home/jli/git/arvados/sdk/cwl/.eggs/mock-3.0.5-py3.7.egg running egg_info writing arvados_cwl_runner.egg-info/PKG-INFO writing dependency_links to arvados_cwl_runner.egg-info/dependency_links.txt writing entry points to arvados_cwl_runner.egg-info/entry_points.txt writing requirements to arvados_cwl_runner.egg-info/requires.txt writing top-level names to arvados_cwl_runner.egg-info/top_level.txt reading manifest file 'arvados_cwl_runner.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file 'arvados_cwl_runner.egg-info/SOURCES.txt' running build_ext Traceback (most recent call last): File "setup.py", line 60, in <module> zip_safe=True, File "/home/jli/env/acr/lib/python3.7/site-packages/setuptools/__init__.py", line 145, in setup return distutils.core.setup(**attrs) File "/usr/lib/python3.7/distutils/core.py", line 148, in setup dist.run_commands() File "/usr/lib/python3.7/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/jli/env/acr/lib/python3.7/site-packages/setuptools/command/test.py", line 227, in run with self.project_on_sys_path(): File "/usr/lib/python3.7/contextlib.py", line 112, in __enter__ return next(self.gen) File "/home/jli/env/acr/lib/python3.7/site-packages/setuptools/command/test.py", line 166, in project_on_sys_path require('%s==%s' % (ei_cmd.egg_name, ei_cmd.egg_version)) File "/home/jli/env/acr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 900, in require needed = self.resolve(parse_requirements(requirements)) File "/home/jli/env/acr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 791, in resolve raise VersionConflict(dist, req).with_context(dependent_req) pkg_resources.ContextualVersionConflict: (google-auth 2.15.0 (/home/jli/git/arvados/sdk/cwl/.eggs/google_auth-2.15.0-py3.7.egg), Requirement.parse('google-auth<2'), {'arvados-python-client'})
Updated by Jiayong Li about 2 years ago
Ran "pip install -e ."
under both arvados/sdk/cwl and arvados/sdk/python
Now testing works but the test failed for some reason.
$ python setup.py test --test-suite=test.test_container.test_resource_requirements test_container (unittest.loader._FailedTest) ... ERROR ====================================================================== ERROR: test_container (unittest.loader._FailedTest) ---------------------------------------------------------------------- ImportError: Failed to import test module: test_container Traceback (most recent call last): File "/usr/lib/python3.7/unittest/loader.py", line 154, in loadTestsFromName module = __import__(module_name) ModuleNotFoundError: No module named 'test.test_container' ---------------------------------------------------------------------- Ran 1 test in 0.000s FAILED (errors=1) Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0> error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
Updated by Jiayong Li about 2 years ago
This command works for unit test.
$ python setup.py test --test-suite=tests.test_container.TestContainer.test_resource_requirements Using /home/jli/git/arvados/sdk/python for version number calculation of /home/jli/git/arvados/sdk/cwl running test Searching for subprocess32>=3.5.1 Best match: subprocess32 3.5.4 Processing subprocess32-3.5.4-py3.7.egg Using /home/jli/git/arvados/sdk/cwl/.eggs/subprocess32-3.5.4-py3.7.egg Searching for mock<4,>=1.0 Best match: mock 3.0.5 Processing mock-3.0.5-py3.7.egg Using /home/jli/git/arvados/sdk/cwl/.eggs/mock-3.0.5-py3.7.egg running egg_info writing arvados_cwl_runner.egg-info/PKG-INFO writing dependency_links to arvados_cwl_runner.egg-info/dependency_links.txt writing entry points to arvados_cwl_runner.egg-info/entry_points.txt writing requirements to arvados_cwl_runner.egg-info/requires.txt writing top-level names to arvados_cwl_runner.egg-info/top_level.txt reading manifest file 'arvados_cwl_runner.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' writing manifest file 'arvados_cwl_runner.egg-info/SOURCES.txt' running build_ext test_resource_requirements (tests.test_container.TestContainer) ... ok ---------------------------------------------------------------------- Ran 1 test in 2.149s OK
Updated by Jiayong Li about 2 years ago
- Target version changed from 2022-12-07 Sprint to 2022-12-21 Sprint
In the cwltool code, I'm trying to understand select_resources in action (method of class MultithreadedJobExecutor in executors.py), and I've found a mention of select_resources in process.py.
if runtimeContext.select_resources is not None: # Call select resources hook return runtimeContext.select_resources(request_evaluated, runtimeContext)
What I find confusing is the following:
1. runtimeContext is an object of class RuntimeContext (from context.py), and select_resources is a method defined for MultithreadedJobExecutor class (from executor.py). I don't see how we can apply select_resources on runtimeContext.
2. runtimeContext appears as both as the object and the argument is confusing to me, what is this command doing on a higher level?
Updated by Peter Amstutz about 2 years ago
Jiayong Li wrote in #note-34:
In the cwltool code, I'm trying to understand select_resources in action (method of class MultithreadedJobExecutor in executors.py), and I've found a mention of select_resources in process.py.
[...]What I find confusing is the following:
1. runtimeContext is an object of class RuntimeContext (from context.py), and select_resources is a method defined for MultithreadedJobExecutor class (from executor.py). I don't see how we can apply select_resources on runtimeContext.
2. runtimeContext appears as both as the object and the argument is confusing me, what is this command doing on a higher level?
- select_resources is a field on RuntimeContext of type "Callable", that means it is variable that holds something which is callable as a function. It isn't a method.
- MultithreadedJobExecutor.select_resources is a method that can be assigned to RuntimeContext.select_resources
- In python, you can assign "callable = object.method" and then invoking "callable()" will have the same behavior as calling "object.method()"
- For this task, we want to provide our own select_resources function or method, the way this is made available at the right place in cwltool is by assigning a custom select_resources field on RuntimeContext
- it is referencing select_resources as a variable, the value of which is a function, then calling the function. It passes the runtimeContext object to the function, because the function would not necessarily have it otherwise.
Updated by Jiayong Li about 2 years ago
The assignment happened in main.py
temp_executor = MultithreadedJobExecutor() runtimeContext.select_resources = temp_executor.select_resources
Updated by Jiayong Li about 2 years ago
1. There is no explicit mention of select_resources from cwltool.executors in a-c-r code, how is it implicitly used?
2. How do I use select_resources from cwltool.executors in arvados_cwl.executor without copy/pasting?
Answer:
Currently we don't use select_resources hook (runtimeContext.select_resources is None), we use defaultReq as requirement.
process.py
evalResources
if runtimeContext.select_resources is not None: # Call select resources hook return runtimeContext.select_resources(request_evaluated, runtimeContext) defaultReq = { "cores": request_evaluated["coresMin"], "ram": math.ceil(request_evaluated["ramMin"]), "tmpdirSize": math.ceil(request_evaluated["tmpdirMin"]), "outdirSize": math.ceil(request_evaluated["outdirMin"]), }