Support #6357: Review curoverse.link - Arvados

Actions

Copy link

Support #6357

closed

Review curoverse.link

Added by Brett Smith almost 11 years ago. Updated over 10 years ago.

Status:

Resolved

Priority:

Normal

Assigned To:

Tom Clegg

Category:

Target version:

2015-07-08 sprint

Due date:

Story points:

1.0

Description

[Tom, Radhika and I thought it might make sense for you to take this up as part of the user support story. It's at least tangentially related. If that's an issue, let us know and we'll figure out a plan B.]

Abram has written up curoverse.link, which is basically a smarter version of curover.se. In particular, it can query different clusters to see find one that has a collection with a specific content address, and redirect the user there. It caches these results.

Right now curover.se is just an Apache RewriteRule. Assume that we want to replace that with curoverse.link, to get support for content addresses (and in the future, other kinds of redirects). Please review it for suitability to that end, especially around scalability and performance. Of course, anything in our usual review criteria is appropriate—it's possible or even likely that engineering will also work on this software in the future.

Subtasks 1 (0 open — 1 closed)

Actions

Copy link

Updated by Brett Smith almost 11 years ago

Description updated (diff)
Assigned To set to Tom Clegg

Actions

Copy link

Updated by Brett Smith almost 11 years ago

One comment I already made just based on a functional review is that the size of the cache needs to be limited by configuration, so it doesn't grow to consume all RAM on the host.

Actions

Copy link

Updated by Tom Clegg over 10 years ago

adopt? replace curover.se?¶

This service implicitly offers to forever do Y when presented with link X, but the X→Y mapping isn't defined except by source code and (more likely) experimentation, and hasn't been discussed/agreed. We have some choices:

Accept that some curoverse.link links will break
Offer long term support for multiple short-url services with different (overlapping) behavior

IMO we should not add behavior to curover.se that we aren't prepared to commit to, and given that we haven't discussed curoverse.link behavior, we're not prepared to commit to it.

I'm not taking a position that anything here is incorrect -- just saying I don't think we should adopt the long term maintenance responsibility without giving ourselves a chance to think it through and make a good decision.

api¶

I'd like to know what the intended behavior is. I can read the source code and assume it's doing what was intended, but it would be better to have it written explicitly. That way we can see which behaviors are promised, which are undefined, and which are bugs.

PDH redirects should be 302, not 301.

overall¶

When a browser shows up at this service, it would be better to implement everything (except the list of public servers¹) entirely on the client side. There might be some todo's but I think this architecture has a better future than a server-side cache: for example, it could also find objects on local installs which are inaccessible to curoverse.link.

(That doesn't mean this service is useless -- e.g., "wget" won't be able to use our fancy javascript -- but it'll never be as good as a client-side implementation, in the browser case.)

¹ ...and that too, when we decide on a supported way to get that list

code review / suggestions¶

Should have test cases.

Should have access logs.

Bunch of recommendations at https://github.com/golang/go/wiki/CodeReviewComments.

Prefer command line flags over JSON config file.

Accept "address" (e.g., ":8000") instead of accepting just a port and prepending ":". (It's free to let the caller specify a listening IP, so why not...)

Use var x = regexp.MustCompile(...), instead of doing all that stuff in init().

Use RWMutex instead of channel, and guard readers too (using RLock()), not just writers.

Seems like write_cache_pdh() and write_cache_uuid() should just be one function that takes a map as an argument.

-if x { return true }
-if y { return true }
-return false
+return x || y

Use router to parse paths. Seems like "*filepath" actually represents "{id}/filepath" -- tell the router that, and use ByName("id") and ByName("filepath") instead of re-parsing the url yourself in the handler.

Make a debugLog function that does if g_verbose { log.Printf ... }. Add a -verbose or -debug flag to enable it.

Query API directly, instead of Workbench (and tell it you don't want the manifest_text). "GET {workbench}/collection" from workbench involves a lot of excess work. (We should streamline the way you get an anonymous token -- currently this would have to be in curoverse.link's config.)

Actions

Copy link