Story #3036: [API] Use regular uuids instead of content hashes to identify collections - Arvados

Actions

Copy link

#1

Updated by Tom Clegg almost 11 years ago

Description updated (diff)

Actions

Copy link

#2

Updated by Tom Clegg over 10 years ago

Target version set to 2014-08-06 Sprint

Actions

Copy link

#3

Updated by Tom Clegg over 10 years ago

Subject changed from Use regular uuids instead of content hashes to identify collections to [API] Use regular uuids instead of content hashes to identify collections

Actions

Copy link

#4

Updated by Tom Clegg over 10 years ago

Description updated (diff)
Category set to API

Actions

Copy link

#5

Updated by Tom Clegg over 10 years ago

Target version changed from 2014-08-06 Sprint to Arvados Future Sprints

Actions

Copy link

#6

Updated by Tom Clegg over 10 years ago

Subject changed from [API] Use regular uuids instead of content hashes to identify collections to [API] [Draft] Use regular uuids instead of content hashes to identify collections
Description updated (diff)

Actions

Copy link

#7

Updated by Tom Clegg over 10 years ago

Description updated (diff)

Actions

Copy link

#8

Updated by Tom Clegg over 10 years ago

Description updated (diff)

Actions

Copy link

#9

Updated by Tom Clegg over 10 years ago

Description updated (diff)

Actions

Copy link

#10

Updated by Tom Clegg over 10 years ago

Description updated (diff)

Actions

Copy link

#11

Updated by Tom Clegg over 10 years ago

Target version changed from Arvados Future Sprints to 2014-08-27 Sprint

Actions

Copy link

#12

Updated by Peter Amstutz over 10 years ago

How does this interact with crunch? Do they continue to use manifest hashes, or switch to using uuids?

Actions

Copy link

#13

Updated by Peter Amstutz over 10 years ago

Assigned To set to Peter Amstutz

Actions

Copy link

#14

Updated by Tom Clegg over 10 years ago

Subject changed from [API] [Draft] Use regular uuids instead of content hashes to identify collections to [API] Use regular uuids instead of content hashes to identify collections
Description updated (diff)

Actions

Copy link

#15

Updated by Tom Clegg over 10 years ago

Description updated (diff)

Actions

Copy link

#16

Updated by Peter Amstutz over 10 years ago

Status changed from New to In Progress

Actions

Copy link

#17

Updated by Tom Clegg over 10 years ago

At d08c3a5...

sdk/python/arvados/commands/put.py

I think it would be neater to provide owner_uuid up front in the initial collections().create() call (I think this will work equally well before and after #3036, unlike the name attribute).

sdk/python/tests/test_arv_put.py

I know this seems trivial but please update indentation when this sort of thing happens (3 occurrences here):

-        link = self.run_and_find_link("Test unnamed collection",
+        link = self.run_and_find_collection("Test unnamed collection",
                                       ['--project-uuid', self.PROJECT_UUID])

services/api/app/controllers/application_controller.rb

While we're at it, couldn't this be reduced to

-    if (@object and @object.respond_to? :errors and
-        @object.errors and @object.errors.full_messages and
-        not @object.errors.full_messages.empty?)
+    if (@object.respond_to? :errors and
+        @object.errors.andand.full_messages.andand.any?)

(ok, now we're getting to the fun part)

services/api/app/controllers/arvados/v1/collections_controller.rb

Shouldn't this be done with a model validation? That would protect all create/update operations, rather than just ones that come through CollectionsController#create, and reporting would be more consistent. (Bypassing render_error and going directly to send_error seems especially snowflakey.) I suspect the only reason we used to do all this work on resource_attrs, instead of doing this work in the model, is that act_as_system_user makes the model think everyone's an admin. Now that we don't need this, should we move the "check signatures" stuff into a validation on the Collection model?
- ```
+    if !resource_attrs[:manifest_text]
+      return send_error("'manifest_text' attribute must be specified",
+                        status: :unprocessable_entity)
+    end
```
- (One way or another, we'd better check signatures during #update too, if manifest_text_changed? -- I don't see that here yet.)
Ah, so nice to see that act_as_system_user block in #create disappear into history.

CollectionsController#find_objects_for_index seems like the wrong place to do something that's only needed by #update. There's now a cleaner solution to this general column dependency problem in ApplicationController.apply_where_limit_order_params. Perhaps it's reasonable to put "always select the :id column" in there with a comment about the future headaches that strategy is likely to avoid.

+    if @select.nil?
+      @select = model_class.api_accessible_attributes(:user).map { |attr_spec|attr_spec.first.to_s }
+      @select -= ["manifest_text"]
+      # have to make sure 'id' column is included or #update will break.
+      @select += ["id"]
     end

services/api/app/helpers/collections_helper.rb

New stripped_portable_data_hash method seems to be a re-implementation of Locator.parse!(uuid).strip_hints.to_s and isn't used by anything. Remove?

services/api/app/controllers/arvados/v1/groups_controller.rb

Add a comment near include_linked in _index_requires_parameters scheduling it for deletion, so it's not just a mystery why it's listed but not used.

services/api/app/controllers/arvados/v1/jobs_controller.rb

I noticed this new code issues a separate find() query for each result. Then I noticed new code in uuids_for_docker_image itself issues a separate find() query for each result. Then I noticed the next move made by each of the (now) two callers of uuids_for_docker_image is to issue one or more find() queries on the results. If that's what the callers want, then I suggest renaming uuids_for_docker_image to find_all_for_docker_image having it return the collections instead of just the uuids. So, instead of doing this in jobs_controller:
- ```
          Collection.uuids_for_docker_image(image_search, image_tag, @read_users).map do |uuid|
            Collection.find_by_uuid(uuid).portable_data_hash
          end
```

...could we do this in Collection.find_all_for_docker_image()?

    matches = Collection.where('uuid in (?)', matches)
    matches.sort_by! do |collection|
      ...
    end

services/api/app/models/arvados_model.rb

Why is it necessary to prevent subclasses from overriding resource_class_for_uuid as used here?

-      while (owner_class = self.class.resource_class_for_uuid(x)) != User
+      while (owner_class = ArvadosModel::resource_class_for_uuid(x)) != User

(Another occurrence in ensure_owner_uuid_is_permitted)

Error message "can only be set to User or Group" → more direct "must be User or Group"?

services/api/app/models/collection.rb

Collection.new.valid? crashes because manifest_text is nil during set_portable_data_hash. If it weren't for that, I think ensure_hash_matches_manifest_text would report true because neither attribute has changed. If both attributes are nil, I think it would be a bit neater to
- not crash before_validation, and
- return false from ensure_hash_matches_manifest_text (or a separate validation) when manifest_text is nil.

services/api/app/models/link.rb

Update indent to suit new code:

     if link_class == 'name'
-      unless name.is_a? String and !name.empty?
-        errors.add('name', 'must be a non-empty string')
-      end
+        errors.add('name', 'Name links are obsolete')
     else

services/api/app/views/

Thanks for cleaning up the unused views.

services/api/db/migrate/20140811184643_collection_use_regular_uuids.rb

Suggest expires_at instead of expire_time to be consistent with our other timestamp columns.
Down-migration should admit failure instead of wedging your database into a state where it can't migrate back up again, either. You want this:
- ```
  def down
    raise ActiveRecord::IrreversibleMigration, "Explain why its irreversible!" 
  end
```

services/api/db/migrate/20140815171049_add_name_description_columns.rb

There's another story on this sprint (#2875) that adds description to PipelineInstance. Hiding this migration in a big commit in a long-lived branch can create unnecessary merging/backporting awkwardness. (I put a note on #2875 which should be enough to avert some extra work.)

services/api/lib/has_uuid.rb

This produces messages like "uuid Not permitted to specify uuid". Change to something like (:uuid, "assignment not permitted") / "change not permitted" ...?

+        if self.new_record?
+          self.errors.add(:uuid, "Not permitted to specify uuid")
+        else
+          self.errors.add(:uuid, "Not permitted to change uuid")
+        end

This message is still a bit wonky too. Perhaps: "has type segment '#{re[1]}', expected [...]" ...?

+            self.errors.add(:uuid, "Matched uuid type '#{re[1]}', expected '#{self.class.uuid_prefix}'")

Posting this so I'm not keeping you waiting. Review still todo:

Review big migration
Review the tests
Run migrations
Run tests

Thanks!

Actions

Copy link

#18

Updated by Peter Amstutz over 10 years ago

Status changed from In Progress to New

Tom Clegg wrote:

At d08c3a5...
sdk/python/arvados/commands/put.py

I think it would be neater to provide owner_uuid up front in the initial collections().create() call (I think this will work equally well before and after #3036, unlike the name attribute).

Fixed

sdk/python/tests/test_arv_put.py

I know this seems trivial but please update indentation when this sort of thing happens (3 occurrences here):

[...]

Fixed

services/api/app/controllers/application_controller.rb

While we're at it, couldn't this be reduced to

[...]

Fixed

services/api/app/controllers/arvados/v1/collections_controller.rb

Shouldn't this be done with a model validation? That would protect all create/update operations, rather than just ones that come through CollectionsController#create, and reporting would be more consistent. (Bypassing render_error and going directly to send_error seems especially snowflakey.) I suspect the only reason we used to do all this work on resource_attrs, instead of doing this work in the model, is that act_as_system_user makes the model think everyone's an admin. Now that we don't need this, should we move the "check signatures" stuff into a validation on the Collection model?

[...]

(One way or another, we'd better check signatures during #update too, if manifest_text_changed? -- I don't see that here yet.)

Ah, so nice to see that act_as_system_user block in #create disappear into history.

Fixed.

CollectionsController#find_objects_for_index seems like the wrong place to do something that's only needed by #update. There's now a cleaner solution to this general column dependency problem in ApplicationController.apply_where_limit_order_params. Perhaps it's reasonable to put "always select the :id column" in there with a comment about the future headaches that strategy is likely to avoid.

Fixed.

services/api/app/helpers/collections_helper.rb

New stripped_portable_data_hash method seems to be a re-implementation of Locator.parse!(uuid).strip_hints.to_s and isn't used by anything. Remove?

Removed.

services/api/app/controllers/arvados/v1/groups_controller.rb

Add a comment near include_linked in _index_requires_parameters scheduling it for deletion, so it's not just a mystery why it's listed but not used.

Added comment.

services/api/app/controllers/arvados/v1/jobs_controller.rb

I noticed this new code issues a separate find() query for each result. Then I noticed new code in uuids_for_docker_image itself issues a separate find() query for each result. Then I noticed the next move made by each of the (now) two callers of uuids_for_docker_image is to issue one or more find() queries on the results. If that's what the callers want, then I suggest renaming uuids_for_docker_image to find_all_for_docker_image having it return the collections instead of just the uuids. So, instead of doing this in jobs_controller:

Refactored.

services/api/app/models/arvados_model.rb

Why is it necessary to prevent subclasses from overriding resource_class_for_uuid as used here?

[...]

(Another occurrence in ensure_owner_uuid_is_permitted)

Fixed.

Error message "can only be set to User or Group" → more direct "must be User or Group"?

Fixed.

services/api/app/models/collection.rb

Collection.new.valid? crashes because manifest_text is nil during set_portable_data_hash. If it weren't for that, I think ensure_hash_matches_manifest_text would report true because neither attribute has changed. If both attributes are nil, I think it would be a bit neater to

not crash before_validation, and

return false from ensure_hash_matches_manifest_text (or a separate validation) when manifest_text is nil.

Fixed.

services/api/app/models/link.rb

Update indent to suit new code:

[...]

Fixed.

services/api/db/migrate/20140811184643_collection_use_regular_uuids.rb

Suggest expires_at instead of expire_time to be consistent with our other timestamp columns.

Down-migration should admit failure instead of wedging your database into a state where it can't migrate back up again, either. You want this:

[...]

Fixed

services/api/db/migrate/20140815171049_add_name_description_columns.rb

There's another story on this sprint (#2875) that adds description to PipelineInstance. Hiding this migration in a big commit in a long-lived branch can create unnecessary merging/backporting awkwardness. (I put a note on #2875 which should be enough to avert some extra work.)

First one to merge wins?

services/api/lib/has_uuid.rb

This produces messages like "uuid Not permitted to specify uuid". Change to something like (:uuid, "assignment not permitted") / "change not permitted" ...?

[...]

This message is still a bit wonky too. Perhaps: "has type segment '#{re[1]}', expected [...]" ...?

[...]

Fixed.

Posting this so I'm not keeping you waiting. Review still todo:

Review big migration

Review the tests

Run migrations

Run tests

Still waiting on these?

Actions

Copy link

#19

Updated by Peter Amstutz over 10 years ago

Status changed from New to In Progress

Actions

Copy link

#20

Updated by Peter Amstutz over 10 years ago

I'm thinking about backing out the owner/name uniqueness for jobs and pipeline instances. It seems typical that the name of a pipeline instance or job will be copied from the template/component/run script and will yield many similarly named jobs or instances; making them unique by adding a timestamp isn't very interesting when there is already a real timestamp field.

Actions

Copy link

#21

Updated by Tom Clegg over 10 years ago

Peter Amstutz wrote:

There's another story on this sprint (#2875) that adds description to PipelineInstance. Hiding this migration in a big commit in a long-lived branch can create unnecessary merging/backporting awkwardness. (I put a note on #2875 which should be enough to avert some extra work.)

First one to merge wins?

Better communication → everybody wins :P

All of the changes look good, thanks.

Posting this so I'm not keeping you waiting. Review still todo:

Review big migration

Review the tests

Run migrations

Run tests

Still waiting on these?

Meh, don't really want to hold up the merge for that. I say go ahead. Thanks!

Actions

Copy link

#22

Updated by Anonymous over 10 years ago

Status changed from In Progress to Resolved
% Done changed from 85 to 100

Applied in changeset arvados|commit:61cd57499905e8e8cca07c774d1bf8c6bfa069a7.

Project

General

Profile

Arvados

Custom queries

Story #3036

[API] Use regular uuids instead of content hashes to identify collections

Summary¶

Background (current behavior)¶

New behavior¶

Looking up collections by portable data hash¶

Updated by Tom Clegg almost 11 years ago

Updated by Tom Clegg over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Peter Amstutz over 10 years ago

Updated by Peter Amstutz over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Peter Amstutz over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Peter Amstutz over 10 years ago

Updated by Peter Amstutz over 10 years ago

Updated by Peter Amstutz over 10 years ago

Updated by Tom Clegg over 10 years ago

Updated by Anonymous over 10 years ago