Bug #506: Standardize gene names (correctly) - GET-Evidence - Arvados

Actions

Copy link

Bug #506

open

Standardize gene names (correctly)

Added by Madeleine Ball almost 16 years ago. Updated about 15 years ago.

Status:

New

Priority:

Normal

Assigned To:

Tom Clegg

Target version:

Story points:

Description

Some attempt at using standard gene names has occurred, but there's mistakes in the implementation -- there are cases when a name is both a standard name, and an alias that could point to a different standard name. The most conservative thing to do in these cases would be to accept the given name and not change it. One might also use the positions associated with gene names to distinguish between the two cases.

Right now it's been implemented incorrectly somewhere, resulting in gene names inconsistent with the position (in other words, moved to a new standard name when the old name was correct and standard).

The current GET-Evidence GJA9 L422F is a variant in ABT at chr1 39113094. According to genenames.org, this is both a standard name (to a gene on chr1) and an alias to another standard name, "GJD2" (on chr 15). So based on the position we can infer that "GJA9" is the correct, standard name for this.

But GET-Evidence incorrectly has this variant entered under GJD2: http://evidence.personalgenomes.org/GJD2-L422F

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Updated by Tom Clegg almost 16 years ago

The "incorrect fix" has been backed out, although there are still (probably) cases of multiple variants that refer to the same gene under different names.

Actions

Copy link

Updated by Madeleine Ball almost 16 years ago

BLM and RECQL3 both refer to the same gene (UCSC currently uses BLM rather than RECQL3). In GET we have 2 OMIM imported variants in RECQL3 but all genomes are processed as BLM.

Actions

Copy link

Updated by Ward Vandewege about 15 years ago

Project changed from 19 to GET-Evidence
Category deleted (~~GET-Evidence~~)

Actions

Copy link

Updated by Madeleine Ball about 15 years ago

Changes in the transcript file we use means that gene names now produced are almost forced to be a name that is in HGNC gene names and consistent with chromosome info (if available):

HGNC list:
http://www.genenames.org/cgi-bin/hgnc_downloads.cgi?title=HGNC+output+data&hgnc_dbtag=onlevel=pri&=on&order_by=gd_app_sym_sort&limit=&format=text&.cgifields=&.cgifields=level&.cgifields=chr&.cgifields=status&.cgifields=hgnc_dbtag&&where=&status=Approved&status_opt=1&submit=submit&col=gd_hgnc_id&col=gd_app_sym&col=gd_app_name&col=gd_status&col=gd_prev_sym&col=gd_aliases&col=gd_pub_chrom_map&col=gd_pub_acc_ids&col=gd_pub_refseq_ids

Names are added by source:server/script/getCanonicalWithName.pl
(It's still possible to have a non-HGNC name, but only if you couldn't find an HGNC name after trying all the steps outlined in the above script.)

I think we should consider removing all GET-Evidence entries that are not in one of the imported databases (OMIM/PharmGKB/etc) and were only found in one of the old genome processing runs -- this will clean out messed up placements and gene names that will never be looked at again. There may still be some nonstandard gene names from OMIM but I think that's less of a concern.

Actions

Copy link

Updated by Madeleine Ball about 15 years ago

Priority changed from High to Normal

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

GET-Evidence

Custom queries

Bug #506

Standardize gene names (correctly)

Updated by Tom Clegg almost 16 years ago

Updated by Madeleine Ball almost 16 years ago

Updated by Ward Vandewege about 15 years ago

Updated by Madeleine Ball about 15 years ago

Updated by Madeleine Ball about 15 years ago