Bug #8383
closedSome download links are broken
100%
Description
The following public file:
http://evidence.pgp-hms.org/genome_download.php?download_genome_id=546fcaf82dff949832160d7969ae7d55aa024c21&download_nickname=Microbiome+data+for+PGP+kit+%232182+%22Goddu%22+-+Goddu.fna.gz
reports an error of:
Error: Unable to open file for download!
Updated by Abram Connelly almost 9 years ago
Here are two full links that are failing:
From looking at the source, I believe this has to do with the assumption that there is only one file and/or block in the manifest for old-style locators (for example, with a '+K@ant' suffix at the end). The above link has multiple files in the manifest and the regex matching to take out the filename from the manifest isn't working.
if (preg_match('/^(\.[^\s]*) .* 0:(\d+):(\S+)$/', $manifest, $regs)) {
//$passthru_command = "whget ".escapeshellarg("$locator/**/$regs[2]");
$subdir = preg_replace( '/^\.\/?/', '', $regs[1] );
if ( $subdir != "" ) { $subdir = $subdir . "/"; }
$passthru_command = "arv-get --no-progress ".escapeshellarg("$pdh/$subdir$regs[3]");
$fsize = $regs[2];
$ext = preg_replace ('/^.*?((\.\w{3})?(\.[bg]z2?)?)$/', '\1', $regs[2]);
}
The manifest for the above collection is:
. 7d1d6dcad72711dfff71c79e1d380c1e+2286661+K@ant 0:2286229:Goddu.fna.gz 2286229:432:Goddu.txt
I believe the regex fails to find the first file since the regex is designed to match the first file entry, starting at file offset 0 and making sure there are not more characters after the first manifest file/block entry.
So I think this is a combination of the files being in the 'old' style (that is, having something like a '+K@ant' suffix) and having a faulty regex that doesn't recognize files that have more than one file or block.
Updated by Abram Connelly almost 9 years ago
I was mistaken about how some of these things work on the backend. "New style" links are a symlink on the file system, in the /home/trait/upload/ID directory that points the the 'fully qualified' location, meaning it has the subdirectory and file in it. For example, the following is the symlink for the 'input.locator' file on for the following link: http://evidence.pgp-hms.org/genome_download.php?download_genome_id=8e2fb8975d5a05735c56505e1697ad1fa1df73ab&download_nickname=CGI+sample%3A+GS03052-DNA_B01 :
input.locator -> 5236ab958ba6dbe909796ddafce8e570+32508/ASM/var-GS000037338-ASM.tsv.bz2
Whereas the 'old style' link does not have the subdirectory and filename after it. For example, the above symlink in the 'old style' might look like:
input.locator -> 5236ab958ba6dbe909796ddafce8e570+32508
It's unclear to me how to handle multiple filenames. Should we create another subdirectory, one for each file of interest in the /home/trait/upload directory and change the links in Tapestry and GET-Evidence to point to these individual sub-directories?
Updated by Ward Vandewege almost 9 years ago
- Status changed from New to Resolved
- % Done changed from 0 to 100
Applied in changeset 63fe9fb9611053eeea464978abd44263a45dffed.