Support #6357
closedReview curoverse.link
100%
Description
[Tom, Radhika and I thought it might make sense for you to take this up as part of the user support story. It's at least tangentially related. If that's an issue, let us know and we'll figure out a plan B.]
Abram has written up curoverse.link, which is basically a smarter version of curover.se. In particular, it can query different clusters to see find one that has a collection with a specific content address, and redirect the user there. It caches these results.
Right now curover.se is just an Apache RewriteRule. Assume that we want to replace that with curoverse.link, to get support for content addresses (and in the future, other kinds of redirects). Please review it for suitability to that end, especially around scalability and performance. Of course, anything in our usual review criteria is appropriate—it's possible or even likely that engineering will also work on this software in the future.
Updated by Brett Smith over 9 years ago
- Description updated (diff)
- Assigned To set to Tom Clegg
Updated by Brett Smith over 9 years ago
One comment I already made just based on a functional review is that the size of the cache needs to be limited by configuration, so it doesn't grow to consume all RAM on the host.
Updated by Tom Clegg over 9 years ago
adopt? replace curover.se?¶
This service implicitly offers to forever do Y when presented with link X, but the X→Y mapping isn't defined except by source code and (more likely) experimentation, and hasn't been discussed/agreed. We have some choices:- Accept that some curoverse.link links will break
- Offer long term support for multiple short-url services with different (overlapping) behavior
IMO we should not add behavior to curover.se that we aren't prepared to commit to, and given that we haven't discussed curoverse.link behavior, we're not prepared to commit to it.
I'm not taking a position that anything here is incorrect -- just saying I don't think we should adopt the long term maintenance responsibility without giving ourselves a chance to think it through and make a good decision.
api¶
I'd like to know what the intended behavior is. I can read the source code and assume it's doing what was intended, but it would be better to have it written explicitly. That way we can see which behaviors are promised, which are undefined, and which are bugs.
PDH redirects should be 302, not 301.
overall¶
When a browser shows up at this service, it would be better to implement everything (except the list of public servers1) entirely on the client side. There might be some todo's but I think this architecture has a better future than a server-side cache: for example, it could also find objects on local installs which are inaccessible to curoverse.link.
(That doesn't mean this service is useless -- e.g., "wget" won't be able to use our fancy javascript -- but it'll never be as good as a client-side implementation, in the browser case.)
1 ...and that too, when we decide on a supported way to get that list
code review / suggestions¶
Should have test cases.
Should have access logs.
Bunch of recommendations at https://github.com/golang/go/wiki/CodeReviewComments.
Prefer command line flags over JSON config file.
Accept "address" (e.g., ":8000") instead of accepting just a port and prepending ":". (It's free to let the caller specify a listening IP, so why not...)
Use var x = regexp.MustCompile(...), instead of doing all that stuff in init().
Use RWMutex instead of channel, and guard readers too (using RLock()), not just writers.
Seems like write_cache_pdh() and write_cache_uuid() should just be one function that takes a map as an argument.
-if x { return true }
-if y { return true }
-return false
+return x || y
Use router to parse paths. Seems like "*filepath" actually represents "{id}/filepath" -- tell the router that, and use ByName("id") and ByName("filepath") instead of re-parsing the url yourself in the handler.
Make a debugLog function that does if g_verbose { log.Printf ... }
. Add a -verbose or -debug flag to enable it.
Query API directly, instead of Workbench (and tell it you don't want the manifest_text). "GET {workbench}/collection" from workbench involves a lot of excess work. (We should streamline the way you get an anonymous token -- currently this would have to be in curoverse.link's config.)