Project

General

Profile

Actions

Feature #16316

open

a-c-r handles resource range requests (especially CPU) and adjusts requests based on what is in InstanceTypes list

Added by Peter Amstutz about 4 years ago. Updated over 1 year ago.

Status:
In Progress
Priority:
Normal
Assigned To:
Category:
CWL
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
(Total: 0.00 h)
Story points:
2.0

Description

Implement a version of select_resources for Arvados.

You can get a dictionary of instance types with this:

api.config()["InstanceTypes"]

The select_resources method should, at minimum, accept a range of CPU core values (e.g. coresMin: 4, coresMax: 16) and then check the available InstanceTypes and assign the greatest core count available. For example, if the system is only configured with 2, 4, and 8 core nodes, it should assign 8 cores since it is in the range (4 - 16).

RAM and disk can also have a range. Just return the minimum value for now (this is the existing behavior).

Tell cwltool to use your select_resources method by setting the object field runtimeContext.select_resources.


Subtasks 1 (1 open0 closed)

Task #16354: ReviewNewPeter Amstutz

Actions

Related issues

Related to Arvados Epics - Story #17848: CWL runner improvementsIn Progress07/01/202106/30/2023

Actions
Actions #1

Updated by Peter Amstutz about 4 years ago

  • Assigned To set to Peter Amstutz
Actions #2

Updated by Peter Amstutz about 4 years ago

  • Target version changed from 2020-05-06 Sprint to 2020-05-20 Sprint
Actions #3

Updated by Peter Amstutz about 4 years ago

  • Target version changed from 2020-05-20 Sprint to 2020-06-03 Sprint
Actions #4

Updated by Peter Amstutz almost 4 years ago

  • Target version changed from 2020-06-03 Sprint to 2020-06-17 Sprint
Actions #5

Updated by Peter Amstutz almost 4 years ago

  • Target version changed from 2020-06-17 Sprint to 2020-07-01 Sprint
Actions #6

Updated by Peter Amstutz almost 4 years ago

  • Target version changed from 2020-07-01 Sprint to 2020-07-15
Actions #7

Updated by Peter Amstutz almost 4 years ago

  • Target version changed from 2020-07-15 to 2020-08-12 Sprint
Actions #8

Updated by Peter Amstutz almost 4 years ago

  • Related to Story #16011: CWL support, docs, training, website added
Actions #9

Updated by Peter Amstutz almost 4 years ago

  • Target version changed from 2020-08-12 Sprint to 2020-08-26 Sprint
Actions #10

Updated by Peter Amstutz almost 4 years ago

  • Target version changed from 2020-08-26 Sprint to 2020-09-09 Sprint
Actions #11

Updated by Peter Amstutz over 3 years ago

  • Target version changed from 2020-09-09 Sprint to 2020-09-23 Sprint
Actions #12

Updated by Peter Amstutz over 3 years ago

  • Target version changed from 2020-09-23 Sprint to 2020-10-07 Sprint
Actions #13

Updated by Peter Amstutz over 3 years ago

  • Target version changed from 2020-10-07 Sprint to 2020-10-21 Sprint
Actions #14

Updated by Peter Amstutz over 3 years ago

  • Target version changed from 2020-10-21 Sprint to 2020-11-04 Sprint
Actions #15

Updated by Peter Amstutz over 3 years ago

  • Target version changed from 2020-11-04 Sprint to 2020-11-18
Actions #16

Updated by Peter Amstutz over 3 years ago

  • Target version deleted (2020-11-18)
Actions #17

Updated by Peter Amstutz about 3 years ago

  • Target version set to 2021-03-31 sprint
Actions #18

Updated by Peter Amstutz about 3 years ago

  • Assigned To changed from Peter Amstutz to Jiayong Li
Actions #19

Updated by Peter Amstutz about 3 years ago

  • Target version changed from 2021-03-31 sprint to 2021-04-14 sprint
Actions #20

Updated by Peter Amstutz about 3 years ago

  • Target version changed from 2021-04-14 sprint to 2021-04-28 bughunt sprint
Actions #21

Updated by Peter Amstutz about 3 years ago

  • Target version deleted (2021-04-28 bughunt sprint)
Actions #22

Updated by Peter Amstutz almost 3 years ago

Actions #23

Updated by Peter Amstutz almost 3 years ago

  • Related to deleted (Story #16011: CWL support, docs, training, website)
Actions #24

Updated by Peter Amstutz over 1 year ago

  • Target version set to 2022-11-09 sprint
Actions #25

Updated by Peter Amstutz over 1 year ago

  • Description updated (diff)
Actions #26

Updated by Jiayong Li over 1 year ago

  • Description updated (diff)
  • Status changed from New to In Progress
Actions #27

Updated by Peter Amstutz over 1 year ago

  • Target version changed from 2022-11-09 sprint to 2022-11-23 sprint
Actions #28

Updated by Peter Amstutz over 1 year ago

  • Target version changed from 2022-11-23 sprint to 2022-12-07 Sprint
Actions #29

Updated by Jiayong Li over 1 year ago

I'm trying to run the unit test "test_resource_requirements" using python3 virtualenv on tordo shell node. I got the following error.

$ python setup.py install

Installed /home/jli/env/acr/lib/python3.7/site-packages/arvados_cwl_runner-2.5.0.dev20221129154757-py3.7.egg
Processing dependencies for arvados-cwl-runner==2.5.0.dev20221129154757
error: pyparsing 3.0.9 is installed but pyparsing<3 is required by {'arvados-python-client'}
$ python setup.py test --test-suite=test.test_container.test_resource_requirements

Using /home/jli/git/arvados/sdk/cwl/.eggs/googleapis_common_protos-1.57.0-py3.7.egg
Traceback (most recent call last):
  File "setup.py", line 60, in <module>
    zip_safe=True,
  File "/home/jli/env/acr/lib/python3.7/site-packages/setuptools/__init__.py", line 145, in setup
    return distutils.core.setup(**attrs)
  File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/usr/lib/python3.7/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/jli/env/acr/lib/python3.7/site-packages/setuptools/command/test.py", line 216, in run
    installed_dists = self.install_dists(self.distribution)
  File "/home/jli/env/acr/lib/python3.7/site-packages/setuptools/command/test.py", line 207, in install_dists
    ir_d = dist.fetch_build_eggs(dist.install_requires)
  File "/home/jli/env/acr/lib/python3.7/site-packages/setuptools/dist.py", line 724, in fetch_build_eggs
    replace_conflicting=True,
  File "/home/jli/env/acr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 791, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (google-auth 1.35.0 (/home/jli/git/arvados/sdk/cwl/.eggs/google_auth-1.35.0-py3.7.egg), Requirement.parse('google-auth<3.0dev,>=2.14.1'), {'google-api-core'})

1. Are these the right commands for unit testing?
2. Is pyparsing 3.0.9 the main issue here?

Actions #30

Updated by Peter Amstutz over 1 year ago

Try merging main, I think fixed this.

Actions #31

Updated by Jiayong Li over 1 year ago

After trying

$ pip install -e .

I get

$ python setup.py test --test-suite=test.test_container.test_resource_requirements

Installed /home/jli/git/arvados/sdk/cwl/.eggs/mock-3.0.5-py3.7.egg
running egg_info
writing arvados_cwl_runner.egg-info/PKG-INFO
writing dependency_links to arvados_cwl_runner.egg-info/dependency_links.txt
writing entry points to arvados_cwl_runner.egg-info/entry_points.txt
writing requirements to arvados_cwl_runner.egg-info/requires.txt
writing top-level names to arvados_cwl_runner.egg-info/top_level.txt
reading manifest file 'arvados_cwl_runner.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'arvados_cwl_runner.egg-info/SOURCES.txt'
running build_ext
Traceback (most recent call last):
  File "setup.py", line 60, in <module>
    zip_safe=True,
  File "/home/jli/env/acr/lib/python3.7/site-packages/setuptools/__init__.py", line 145, in setup
    return distutils.core.setup(**attrs)
  File "/usr/lib/python3.7/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/usr/lib/python3.7/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/jli/env/acr/lib/python3.7/site-packages/setuptools/command/test.py", line 227, in run
    with self.project_on_sys_path():
  File "/usr/lib/python3.7/contextlib.py", line 112, in __enter__
    return next(self.gen)
  File "/home/jli/env/acr/lib/python3.7/site-packages/setuptools/command/test.py", line 166, in project_on_sys_path
    require('%s==%s' % (ei_cmd.egg_name, ei_cmd.egg_version))
  File "/home/jli/env/acr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 900, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/home/jli/env/acr/lib/python3.7/site-packages/pkg_resources/__init__.py", line 791, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (google-auth 2.15.0 (/home/jli/git/arvados/sdk/cwl/.eggs/google_auth-2.15.0-py3.7.egg), Requirement.parse('google-auth<2'), {'arvados-python-client'})

Actions #32

Updated by Jiayong Li over 1 year ago

Ran "pip install -e ."
under both arvados/sdk/cwl and arvados/sdk/python

Now testing works but the test failed for some reason.

$ python setup.py test --test-suite=test.test_container.test_resource_requirements

test_container (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: test_container (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: test_container
Traceback (most recent call last):
  File "/usr/lib/python3.7/unittest/loader.py", line 154, in loadTestsFromName
    module = __import__(module_name)
ModuleNotFoundError: No module named 'test.test_container'

----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (errors=1)
Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=1 errors=1 failures=0>

Actions #33

Updated by Jiayong Li over 1 year ago

This command works for unit test.

$ python setup.py test --test-suite=tests.test_container.TestContainer.test_resource_requirements
Using /home/jli/git/arvados/sdk/python for version number calculation of /home/jli/git/arvados/sdk/cwl
running test
Searching for subprocess32>=3.5.1
Best match: subprocess32 3.5.4
Processing subprocess32-3.5.4-py3.7.egg

Using /home/jli/git/arvados/sdk/cwl/.eggs/subprocess32-3.5.4-py3.7.egg
Searching for mock<4,>=1.0
Best match: mock 3.0.5
Processing mock-3.0.5-py3.7.egg

Using /home/jli/git/arvados/sdk/cwl/.eggs/mock-3.0.5-py3.7.egg
running egg_info
writing arvados_cwl_runner.egg-info/PKG-INFO
writing dependency_links to arvados_cwl_runner.egg-info/dependency_links.txt
writing entry points to arvados_cwl_runner.egg-info/entry_points.txt
writing requirements to arvados_cwl_runner.egg-info/requires.txt
writing top-level names to arvados_cwl_runner.egg-info/top_level.txt
reading manifest file 'arvados_cwl_runner.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'arvados_cwl_runner.egg-info/SOURCES.txt'
running build_ext
test_resource_requirements (tests.test_container.TestContainer) ... ok

----------------------------------------------------------------------
Ran 1 test in 2.149s

OK

Actions #34

Updated by Jiayong Li over 1 year ago

  • Target version changed from 2022-12-07 Sprint to 2022-12-21 Sprint

In the cwltool code, I'm trying to understand select_resources in action (method of class MultithreadedJobExecutor in executors.py), and I've found a mention of select_resources in process.py.

        if runtimeContext.select_resources is not None:
            # Call select resources hook
            return runtimeContext.select_resources(request_evaluated, runtimeContext)

What I find confusing is the following:
1. runtimeContext is an object of class RuntimeContext (from context.py), and select_resources is a method defined for MultithreadedJobExecutor class (from executor.py). I don't see how we can apply select_resources on runtimeContext.
2. runtimeContext appears as both as the object and the argument is confusing to me, what is this command doing on a higher level?

Actions #35

Updated by Peter Amstutz over 1 year ago

Jiayong Li wrote in #note-34:

In the cwltool code, I'm trying to understand select_resources in action (method of class MultithreadedJobExecutor in executors.py), and I've found a mention of select_resources in process.py.
[...]

What I find confusing is the following:
1. runtimeContext is an object of class RuntimeContext (from context.py), and select_resources is a method defined for MultithreadedJobExecutor class (from executor.py). I don't see how we can apply select_resources on runtimeContext.
2. runtimeContext appears as both as the object and the argument is confusing me, what is this command doing on a higher level?

  1. select_resources is a field on RuntimeContext of type "Callable", that means it is variable that holds something which is callable as a function. It isn't a method.
    1. MultithreadedJobExecutor.select_resources is a method that can be assigned to RuntimeContext.select_resources
    2. In python, you can assign "callable = object.method" and then invoking "callable()" will have the same behavior as calling "object.method()"
    3. For this task, we want to provide our own select_resources function or method, the way this is made available at the right place in cwltool is by assigning a custom select_resources field on RuntimeContext
  2. it is referencing select_resources as a variable, the value of which is a function, then calling the function. It passes the runtimeContext object to the function, because the function would not necessarily have it otherwise.
Actions #36

Updated by Jiayong Li over 1 year ago

The assignment happened in main.py

                temp_executor = MultithreadedJobExecutor()
                runtimeContext.select_resources = temp_executor.select_resources

Actions #37

Updated by Jiayong Li over 1 year ago

1. There is no explicit mention of select_resources from cwltool.executors in a-c-r code, how is it implicitly used?
2. How do I use select_resources from cwltool.executors in arvados_cwl.executor without copy/pasting?

Answer:
Currently we don't use select_resources hook (runtimeContext.select_resources is None), we use defaultReq as requirement.

process.py
evalResources

if runtimeContext.select_resources is not None:
            # Call select resources hook
            return runtimeContext.select_resources(request_evaluated, runtimeContext)

        defaultReq = {
            "cores": request_evaluated["coresMin"],
            "ram": math.ceil(request_evaluated["ramMin"]),
            "tmpdirSize": math.ceil(request_evaluated["tmpdirMin"]),
            "outdirSize": math.ceil(request_evaluated["outdirMin"]),
        }

Actions #38

Updated by Peter Amstutz over 1 year ago

  • Target version changed from 2022-12-21 Sprint to 2023-01-18 sprint
Actions #39

Updated by Peter Amstutz over 1 year ago

  • Target version changed from 2023-01-18 sprint to 2023-02-01 sprint
Actions #40

Updated by Peter Amstutz over 1 year ago

  • Story points set to 2.0
Actions

Also available in: Atom PDF