Idea #20311
closedUpdate Python packages to build with PEP 517/518
Description
Assuming we want to continue to build with setuptools, this means converting our packages' current setup.py to setup.cfg, then adding a (boilerplate) setup.py and pyproject.toml to them.
Reasons to do this:
- Gives us the option to build packages using other systems in the future if we need or want that for any reason.
- In my experience, maintining
setup.cfgis nicer thansetup.py, since it's more declarative. - Some of the interfaces of
setup.pyare deprecated (I think we're already getting warnings about this), and this gives us a clear migration path off those too.
Background reading:
- PyPA documentation
- The PEPs
- Brett Cannon blogs: 1, 2
This comment provides an example of extending the build with a custom Command added as a build sub-command. We should update source:sdk/python/setup.py to follow this example.
Then similarly, all copies of arvados_version.py should grow a custom Command and build subclass that writes _version.py in the source tree. All the file writing should happen in that Command subclass. It's helpful if version generation stays out of it for the sake of other callers like source:build/build_docker_image.py.
When we write pyproject.toml, we specify version and dependencies as dynamic. Then setup.py uses arvados_version.py to gather that information and pass it to setup() as now.
Updated by Brett Smith over 2 years ago
- Related to Bug #20723: Stop running setup.py in our build+test infrastructure (will be deprecated by Oct 31st 2025) added
Updated by Brett Smith over 2 years ago
Python 3.12 will not preinstall setuptools in new virtualenvs. This will break our current build processes in various places. Doing this ticket would be the best way to get ahead of things.
Updated by Brett Smith over 2 years ago
Brett Smith wrote:
- Some of the interfaces of
setup.pyare deprecated (I think we're already getting warnings about this), and this gives us a clear migration path off those too.
To put more color on this: all of the interfaces of setup.py are deprecated. As the top of that post says,
This does not mean that setuptools itself is deprecated, or that using setup.py to configure your package builds is going to be removed. The only thing you must stop doing is directly executing the setup.py file — instead delegate that to purpose-built or standards-based tools, preferably those that work with any build backend.
We have been seeing this deprecation start to cause issues in some of our dependencies, which makes me think this migration is becoming more urgent.
For us, the path I think we want to take is: wherever we run setup.py install, run pip instead. We've already had a couple commits along these lines: see 20432a4533136a5ab9fa52c2e2ec2d90a855ecfb and upcoming d1e529905d4820ce450dc430139cccda83fefc72.
Wherever we run setup.py build, that needs to be updated to something like python -m build using the build module. There are other alternatives, this is just the simplest one I know about. However, before we can do that, we need to migrate to using at least pyproject.toml as specified in this ticket. Migrating setup.py build calls can either be part of this ticket, or a separate follow-up.
Updated by Peter Amstutz almost 2 years ago
- Target version changed from Future to Development 2024-05-08 sprint
Updated by Brett Smith almost 2 years ago
- Target version changed from Development 2024-05-08 sprint to Future
Updated by Peter Amstutz almost 2 years ago
- Related to Bug #21721: Review Python SDK dependencies added
Updated by Brett Smith over 1 year ago
All our Python tools come with an identical arvados_version.py that knows how to generate the version number, interdependencies, and a _version.py submodule with this information. This is automatically imported and run from setup.py.
As we incrementally remove setup.py invocations, we're running into situations where tools do not need to run setup.py directly and therefore can run on modules without _version.py (which fails):
pip install SOURCEDIRfirst copies the source to an isolated directory. This promotes build isolation but means that whenarvados_version.pyis run from that directory, it can't find any version information to work with.pytestdoesn't necessarily need to runsetup.pyto run tests.
Right now all the smarts you need are in build/run-tests.sh, but because that's a big script doing a lot it means it's not obvious to follow, and working on it can be error-prone if you don't deeply understand the order of steps and why they're required.
As we look at migrating to pyproject.toml, I think we need a fresh look at our options for dynamically generating this metadata, and try to do so in a way that more tools will pick up.
Updated by Brett Smith 9 months ago
- Release set to 79
- Target version deleted (
Future) - Category changed from Deployment to SDKs
Updated by Brett Smith 8 months ago
20311-pep-517-builds @ 713d1452405279b8a5d9fdcaed451e4004f06d6e - developer-run-tests: #4843
⚠️ WARNING ⚠️ WARNING ⚠️ WARNING ⚠️
This branch does a lot to modernize and clean up our Python build process. However, it is prone to pick up stale files and information from previous builds, and that can cause weird errors. Once you have this branch in your $WORKSPACE, I strongly recommend you clean up:
rm -rf VENV3DIRin yourrun-tests.shtemp directory.- Remove all Python
buildanddistdirectories:grep -E '/(build|dist)/$' .gitignore | cut -c2- | xargs -d\\n rm -rf
- Remove all
_version.pyfiles from modules: these have been removed from.gitignoreso any that show up in yourgit statusshould be deleted.
I suspect this is the cause of the CWL test failures: the two build systems organize inclusion of package data differently, but use the same build directory, so I suspect the test run is picking that up from when arvados-cwl-runner was installed as part of building the Jenkins worker image. I can't reproduce locally even when I try to reproduce Jenkins' conditions as closely as possible (Debian 12; no VENV3DIR; run-tests.sh --only install from main; run-tests.sh --only sdk/cwl from the branch). We might need to build a new Jenkins test image when this branch merges. It's been long enough that's not a bad idea anyway just to keep everything up-to-date.
With that caveat:
- All agreed upon points are implemented / addressed. Describe changes from pre-implementation design.
- Yes. And the proof is in the pudding because it goes even farther and removes all the instances we run
setup.py(except for a couple of components we're going to remove in Arvados 3.2.0), which is deprecated and will stop being supported in October.
- Yes. And the proof is in the pudding because it goes even farther and removes all the instances we run
- Anything not implemented (discovered or discussed during work) has a follow-up story.
- N/A
- Code is tested and passing, both automated and manual, what manual testing was done is described.
- See above.
- Tested code incorporates recent main branch changes.
- Yes
- New or changed UI/UX and has gotten feedback from stakeholders.
- There are no UI changes to interal tooling. UI changes to Python package building are required externally.
- Documentation has been updated.
- Mostly N/A because of the above. I have done my best to clean everything up so all our comments and tooling are consistent that we've switched to PEP 517 builds.
- Behaves appropriately at the intended scale (describe intended scale).
- Again, any change in scale is imposed externally.
- Considered backwards and forwards compatibility issues between client and server.
- Ditto.
- Follows our coding standards and GUI style guidelines.
- Yes
Updated by Brett Smith 8 months ago
- Target version set to Development 2025-08-06
- Assigned To set to Brett Smith
- Status changed from New to In Progress
Updated by Tom Clegg 8 months ago
I cleaned up my dev environment as instructed and everything worked.
I sure wish there was a way to not have 7 copies of arvados_version.py in the source tree. Failing that, would it be reasonable to have a test to ensure they stay in sync?
I'm not keen on leaving the stale setup.py usage in arvados-server install. I'm guessing you were thinking it's easy to update but annoying to test? Maybe the best solution is [for me] to remove prod/pkg mode from arvados-server install, and remove arvados-package (which relies on arvados-server install -type package)?
Rest LGTM.
Updated by Brett Smith 8 months ago
Tom Clegg wrote in #note-13:
I sure wish there was a way to not have 7 copies of arvados_version.py in the source tree. Failing that, would it be reasonable to have a test to ensure they stay in sync?
Good news, we already have that test. See test_arvados_version.py() in source:build/run-tests.sh.
I'm not keen on leaving the stale setup.py usage in
arvados-server install. I'm guessing you were thinking it's easy to update but annoying to test? Maybe the best solution is [for me] to remove prod/pkg mode fromarvados-server install, and removearvados-package(which relies onarvados-server install -type package)?
Honestly I didn't even dig that deep. My attitude is, I personally consider #22436 a requirement for 3.2.0 (note it's been release-tagged for a while already) and therefore it wasn't worth spending any brain cells on. I think we're on the same page about where we want to be, we're just negotiating task ordering.
Removing arvados-package is easy, happy to roll that in. With that done, is there a solution you would accept for arvados-server install that requires less code surgery, given that the entire thing is slated to go away imminently? Feels like a bad use of time to carefully delete chunks out of it given that we're going to delete it all soon. Thanks.
Rest LGTM.
Given that you've LGTM'ed all the new code, I am going to build a new Jenkins worker image from the branch, and then test that the branch passes tests on the new image. I want to confirm that my theory about the CWL tests is correct before I merge.
Updated by Tom Clegg 8 months ago
Brett Smith wrote in #note-14:
Removing
arvados-packageis easy, happy to roll that in. With that done, is there a solution you would accept forarvados-server installthat requires less code surgery, given that the entire thing is slated to go away imminently? Feels like a bad use of time to carefully delete chunks out of it given that we're going to delete it all soon. Thanks.
We could log an error and return 1 when we get to the "install python sdk" section. That should make it easy to see the dead end if someone does find themselves trying to run arvados-server install after this merge but before we remove it.
Updated by Brett Smith 8 months ago
Tom Clegg wrote in #note-15:
We could log an error and return 1 when we get to the "install python sdk" section. That should make it easy to see the dead end if someone does find themselves trying to run
arvados-server installafter this merge but before we remove it.
To show you how few brain cells I spent on it previously: we previously updated all instances of setup.py install to pip install, so I assumed this said setup.py build because it needed to do something lower-level. Now that I see it's just a workaround to install from readonly source, the simplest thing to do is update it to pip install: the new build process already provides source isolation and won't write to the source tree. And if I'm wrong, it'll error out, future us will find the commit, and we'll understand why.
My previous theory about the CWL tests was wrong but fixing it was adding one line to sdk/cwl/MANIFEST.in. I still don't understand what's different between our development servers and Jenkins in this regard, but it's no problem to have one line confirming that we want these files in our packages, so here we are.
Updated by Brett Smith 8 months ago
- Status changed from In Progress to Resolved
Applied in changeset arvados|0d5b0aac01dd577504e10ec16d3832dc2a852190.
Updated by Brett Smith 7 months ago
- Related to Bug #23105: Successive Python builds apparently build into each other added
Updated by Brett Smith 5 months ago
- Related to Bug #22590: arvados-user-activity and arvados-cluster-activity missing python package descriptions added