Bug #5365
closed[Workbench] Do not link to objects that are not readable by current user
Added by Nancy Ouyang almost 10 years ago. Updated over 9 years ago.
100%
Description
Specific examples to fix¶
(copied from note-3)
In the specific case of the Log tab, when the log data is not readable by the current user (or anonymous user, in the anonymous case):- The Log tab should be greyed out / disabled / un-clickable
- Hovering the Log tab should show a tooltip, something like "Log data is not available"
- Job output collections
- Job input collections (including docker image)
- Template used to run this pipeline instance
Bug as reported¶
Fiddlesticks when viewing log on anonymous project pathomap
Also, clicking "report" on that fiddlesticks gives another fiddlesticks.
https://workbench.qr1hi.arvadosapi.com/jobs/qr1hi-8i9sb-34f5yfeggzdhn1n#Log
Oh... fiddlesticks.
An error occurred when Workbench sent a request to the Arvados API server. Try reloading this page. If the problem is temporary, your request might go through next time. If that doesn't work, the information below can help system administrators track down the problem.
API request URL
https://qr1hi.arvadosapi.com/arvados/v1/collections/b6b506fa082b773c61f8589b3261b3b0+87
API response
{
":errors":[
"Path not found"
],
":error_token":"1425415698+1e617bf4"
}
If you suspect this is a bug, you can help us fix it by sending us a problem report:
Send a problem report right here. Report problem
If you prefer, send email to: support@curoverse.com
Missing template actions/report_issue_popup, application/report_issue_popup with {:locale=>[:en], :formats=>[:html], :variants=>[], :handlers=>[:erb, :builder, :raw, :ruby, :coffee]}. Searched in: * "/var/www/workbench.qr1hi.arvadosapi.com/releases/20150302143906/themes/curoverse/views" * "/var/www/workbench.qr1hi.arvadosapi.com/releases/20150302143906/app/views" * "/var/www/workbench.qr1hi.arvadosapi.com/shared/vendor_bundle/ruby/2.1.0/gems/wiselinks-1.2.1/app/views"
Updated by Nancy Ouyang almost 10 years ago
curover.se/pathomap > SNP-Annotation-justmap > uuid > log
Updated by Radhika Chippada almost 10 years ago
- Subject changed from Fiddlesticks when viewing log on anonymous project pathomap to [Workbench] Fiddlesticks when viewing log on anonymous project pathomap
- Category set to Workbench
- Target version set to Bug Triage
Updated by Tom Clegg almost 10 years ago
- Story points set to 1.0
Generally, Workbench should not link to objects that are not readable.
In the specific case of the Log tab, when the log data is not readable by the current user (or anonymous user, in the anonymous case):- The Log tab should be greyed out / disabled / un-clickable
- Hovering the Log tab should show a popup, something like "Log data is not available"
- Job output collections
- Job input collections (including docker image)
- Template used to run this pipeline instance
Updated by Tom Clegg almost 10 years ago
- Target version changed from Bug Triage to 2015-04-01 sprint
Updated by Tom Clegg almost 10 years ago
- Subject changed from [Workbench] Fiddlesticks when viewing log on anonymous project pathomap to [Workbench] Do not link to objects that are not readable by current user
- Description updated (diff)
Updated by Radhika Chippada almost 10 years ago
- Assigned To set to Radhika Chippada
Updated by Radhika Chippada almost 10 years ago
- Assigned To deleted (
Radhika Chippada)
Updated by Tom Clegg almost 10 years ago
- Target version changed from 2015-04-01 sprint to 2015-04-29 sprint
Updated by Radhika Chippada almost 10 years ago
- Assigned To set to Radhika Chippada
Updated by Radhika Chippada almost 10 years ago
- Status changed from New to In Progress
Updated by Radhika Chippada over 9 years ago
Notes about implementation:
- No longer dead link job and output when they are not readable by current user (or no user)
- Disable Log tab if not readable by anonymous user. (I felt disabling Log tab when current_user is present might be troublesome. The specific case I was concerned about was: a pipeline or job has started running just now. Log is not yet available. Hence disabling in this case seemed troublesome and hence I only disabled it for anonymous user case.)
- Template link was (already) hidden in pipeline instance page if it is not readable by user.
- Added integration tests and corresponding required test fixtures.
Updated by Brett Smith over 9 years ago
Reviewing 8ad6b6e
- When a logged in user views a pipeline instance page, job UUIDs are not rendered.
- Some pipelines can get quite long. The current implementation makes a separate API call to check the readability of every component job. I'm concerned that this could perform poorly for those large pipelines. Could it be possible to use
preload_objects_for_dataclass
, or a strategy similar to that method's implementation, to get the component jobs in a single API query? - Is it possible to write a helper method that does the six-line work of trying to find the object with a given UUID, yielding it to a block for rendering if found, and rendering a static string if not? I'm concerned that we will likely want to repeat this pattern as we continue development on anonymous browsing, and avoiding the repetition now would help.
- The new test includes a condition
if objects_readable or (!objects_readable and user)
. I think this would be simpler to read asif objects_readable or user
(as you have it later in the test). - In the pipeline instance fixtures, I think it could save us potential future headache if the job hashes included at least all the attributes that are required on jobs: specifically
script
,script_version
,script_parameters
(withinput
set), andrepository
. We've run into issues in the past where our testing was less helpful than it should've been because fixtures did not look realistic. Spending a little time making them more realistic could help avoid headaches in the future.
Thank you.
Updated by Radhika Chippada over 9 years ago
Brett said:
- When a logged in user views a pipeline instance page, job UUIDs are not rendered.
There were multiple issues at this line. Fixed.
- Some pipelines can get quite long. The current implementation makes a separate API call to check the readability of every component job. I'm concerned that this could perform poorly for those large pipelines. Could it be possible to use
preload_objects_for_dataclass
, or a strategy similar to that method's implementation, to get the component jobs in a single API query?
Updated code to preload job uuids and output collection uuids. The existing preload methods do not work when preloading collections using portable data hashes. After spending some time to correct this (to add a new method that uses portable_data_hash instead of uuid in where clause), I now dropped this idea and doing a find on each of those. It appears that our api currently does not support group_by and without it all rows matching the pdh are being returned and hence I felt this might not be desirable. It also appears that most of newer pipelines will use uuids for output collections anyways (which are preloaded).
- Is it possible to write a helper method that does the six-line work of trying to find the object with a given UUID, yielding it to a block for rendering if found, and rendering a static string if not? I'm concerned that we will likely want to repeat this pattern as we continue development on anonymous browsing, and avoiding the repetition now would help.
Updated this and used it for jobs and outputs.
- The new test includes a condition
if objects_readable or (!objects_readable and user)
. I think this would be simpler to read asif objects_readable or user
(as you have it later in the test).
Thanks for catching this; corrected.
- In the pipeline instance fixtures, I think it could save us potential future headache if the job hashes included at least all the attributes that are required on jobs: specifically
script
,script_version
,script_parameters
(withinput
set), andrepository
. We've run into issues in the past where our testing was less helpful than it should've been because fixtures did not look realistic. Spending a little time making them more realistic could help avoid headaches in the future.
Added these.
Radhika said:
- Disable Log tab if not readable by anonymous user. (I felt disabling Log tab when current_user is present might be troublesome. The specific case I was concerned about was: a pipeline or job has started running just now. Log is not yet available. Hence disabling in this case seemed troublesome and hence I only disabled it for anonymous user case.)
I updated to disable Log tab when current user cannot write to the object (which would cover owner situation). With this, we will not be disabling the Log tab for a brand new pipeline with no log data as yet. And this also addresses the scenario where a logged in user is viewing a public pipeline with unreadable logs.
- Just an observation about preloading data: When the user cannot read the objects, the preloading does not result in fetching them. Hence, we will be making one extra api call for each unreadable uuid unfortunately.
Updated by Brett Smith over 9 years ago
Radhika Chippada wrote:
- Just an observation about preloading data: When the user cannot read the objects, the preloading does not result in fetching them. Hence, we will be making one extra api call for each unreadable uuid unfortunately.
I'm worried that this means the we might still witness the poor performance scenarios I mentioned in my last comment. While it's difficult to say right now what the "common" use case will be here, I think it will definitely happen where users sometimes make only minimal inputs and outputs available publicly, while keeping all other details like logs private. If they do that, this implementation can still end up making many API calls.
Would it make sense to extend preload_objects_for_dataclass
to avoid these redundant API calls? For example, maybe it could expressly set @objects_for[uuid] = nil
for any UUID that it doesn't find. That would prevent it from repeating the search in future calls. That has the potential to get expensive quickly.
If the solution isn't that simple for some reason, I still think we need some solution. Maybe it's a more involved rework of preload_objects_for_dataclass
, or maybe it's a separate method. But as much as possible, I think it's worthwhile to avoid making a situation where we make an API call to check the readibility of an individual object.
- Some pipelines can get quite long. The current implementation makes a separate API call to check the readability of every component job. I'm concerned that this could perform poorly for those large pipelines. Could it be possible to use
preload_objects_for_dataclass
, or a strategy similar to that method's implementation, to get the component jobs in a single API query?Updated code to preload job uuids and output collection uuids. The existing preload methods do not work when preloading collections using portable data hashes. After spending some time to correct this (to add a new method that uses portable_data_hash instead of uuid in where clause), I now dropped this idea and doing a find on each of those. It appears that our api currently does not support group_by and without it all rows matching the pdh are being returned and hence I felt this might not be desirable.
It's possible to check the readability of portable data hashes in a list by selecting only the portable_data_hash column, and setting distinct
to ask for unique values. Here's a demonstration using the Python SDK:
>>> cl = api.collections().list( ... filters=[['portable_data_hash', 'in', ['d41d8cd98f00b204e9800998ecf8427e+0']]], ... select=['portable_data_hash'], ... distinct=True ... ).execute() >>> pp(cl) {u'etag': u'', u'items': [{u'kind': u'arvados#collection', u'portable_data_hash': u'd41d8cd98f00b204e9800998ecf8427e+0'}], u'items_available': 1507, u'kind': u'arvados#collectionList', u'limit': 100, u'offset': 0, u'self_link': u''}
Even though the limit is 100 and many more items are available, there's only one item in the response list. You can make this query on all the portable data hashes you're interested in. Any that are included in the response are readable.
- When a logged in user views a pipeline instance page, job UUIDs are not rendered.
There were multiple issues at this line. Fixed.
- Is it possible to write a helper method that does the six-line work of trying to find the object with a given UUID, yielding it to a block for rendering if found, and rendering a static string if not? I'm concerned that we will likely want to repeat this pattern as we continue development on anonymous browsing, and avoiding the repetition now would help.
Updated this and used it for jobs and outputs.
Throughout _running_component, could UUID links always be generated using link_to_arvados_object_if_readable
? That method already checks the resource class just like we're doing for job UUIDs, so I wonder if we could reduce duplication by using the single method call for that. I think it could also be used in other places where we're using link_to_if_arvados_object
, like owner and Docker image. (The Docker image link is specifically named in the description as something that must not be linked.)
The helper method includes a line if resource_class.andand.to_s == 'Collection'
. Is the andand
necessary here? Earlier code returns from the method if !resource_class
, so I don't think resource_class
will ever be nil at this point.
- In the pipeline instance fixtures, I think it could save us potential future headache if the job hashes included at least all the attributes that are required on jobs: specifically
script
,script_version
,script_parameters
(withinput
set), andrepository
. We've run into issues in the past where our testing was less helpful than it should've been because fixtures did not look realistic. Spending a little time making them more realistic could help avoid headaches in the future.Added these.
Thanks. Under these job hashes, the input
parameter hashes should have value
keys that point to the input collections used. crunch-dispatch normally makes that transformation. You can remove the title
key if you like; that's not normally carried over.
Thank you.
Updated by Radhika Chippada over 9 years ago
Addressed the comments above. A few notes:
- Workbench did not offer @distinct before, which made it not possible to get one collection each for a list of portable_data_hashes. I added this now and was able to add a new preload_for_pdhs method.
- Updated pipeline_instances/_show_components_editable.html.erb to check if readable before linking input collections.
- In the running components case, preloading job uuids, outputs, and docker image locators. Any others we want to preload?
Thanks.
Updated by Brett Smith over 9 years ago
Reviewing 02840ce9
The story says, "The Log tab should be greyed out / disabled / un-clickable" when the log is unviewable. Right now that's implemented by disabling the link's data-toggle attribute. However, this means the tab still appears clickable, and even gets focus if you click it; it just doesn't do anything. I think it might be a little nicer to put class="disabled"
on the li
element in this case. This actually grays out the tab, and changes the mouse cursor to show that it's not clickable. I tested this in Firebug and it seemed to work out all right.
Right now, link_to_arvados_object_if_readable
takes a boolean friendly_name
attribute. I wonder if it might be a little nicer and future-proof to have it take an opts={}
argument, just like link_to_if_arvados_object
does? Since the method sometimes falls back to calling link_to_if_arvados_object
, I think it might be nice to give callers more control over the arguments that are passed in when it does—it seems pretty plausible that we'll want that in the future. It also makes the current implementation a little simpler, I think.
In _show_components_editable, tv[:dataclass].andand.eql?('Collection')
can just say tv[:dataclass] == "Collection"
. It's safe to compare against nil
with ==
and !=
.
In the preload_for_pdhs test, I'm not sure what the assertions after # invoke again for this same input. this time, the preloaded data will be returned
do. As far as I can tell, nothing in them checks that the data was actually returned from a cache (e.g., by using #equal?); it just makes the same basic "found the right result" assertions made after the first call. Am I missing something? If not, I think it would be good to update the assertions to better match the comment, or else remove them.
Thanks.
Updated by Radhika Chippada over 9 years ago
- Status changed from In Progress to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|commit:6564944612790934531a3c30e6ab2cab6f329461.
Updated by Radhika Chippada over 9 years ago
Thanks Brett. I incorporated all these suggestions and merged into master. I think the comment "link_to_arvados_object_if_readable takes args" was quite helpful. The method signature is now much better and is similar to other usages. It was a bit involved update to change all the usages, but is worth it. Thanks.