Online Research Information Environment for the Life Sciences


 
Image

A novel method to disambiguate gene symbols in the biomedical literature

Gene name ambiguity

 

A major problem when automatically identifying gene names and symbols in text is ambiguity. Many gene names are homonyms: the same name also denotes another gene, or a completely different concept. For instance, the term 'PSA' is used for the gene Prostate Specific Antigen, puromycin-sensitive aminopeptidase or even for the Poultry Science Association! Our investigations have shown the homonym problem for genes to be substantial. You can download our report here:

Quantifying the ambiguity problem for gene symbols

Disambiguation tool

 

We have developed a tool that can disambiguate gene mentions by making use of the context in which the ambiguous gene name or symbol appears. Concepts found in the context are compared to those concepts that are known to appear frequently in the context of a particular meaning of the ambiguous word. If the context does not match sufficiently with any of the known meanings of that word, a so-called Not-In-Thesaurus meaning is assumed. A full description of our approach, including an evaluation of its performance, can be found in our recent paper below. We have also provided a detailed example of the way disambiguation is performed.

Thesaurus-based disambiguation of gene symbols (accepted for publication)

Gene symbol disambiguation example

Using the tool

 

Our disambiguation tool is available as a web service and can be accessed through a SOAP interface. Please note that the service only provides disambiguation, and not indexation. Prior to using the disambiguation tool, the text should be indexed to map terms in the text to concepts in a thesaurus. The current default settings for the service require an indexation with the thesaurus currently used in the E-BioSci project. A description of how to use the web-service, and an example perl script can be downloaded below:

Web service manual

Example Perl script

If you have any questions, please e-mail m.schuemie@erasmusmc.nl.

Developed by partner Erasmus University (EUR).

 
 
 
 
Image
 

 

     

Home

News

Software, Tools & Services

Sightings/Citings

Meetings

Publications

Presentations

About the ORIEL Project and its Partners

Workpackage Descriptions

Internal Site

E-BioSci

Contact Us

Partners Only


This project is funded by the European Commission as ORIEL, contract no. IST-2001-32688, under Key Action 3 of the IST Programme (Multimedia Content and Tools).

Website last updated:

Website designed and maintained by Anne Seller (E-BioSci and ORIEL Projects, EMBO).

© European Molecular Biology Organization, 2001-2004. Most of the information available from this site is within the public domain and unless stated otherwise, may be freely downloaded and reproduced, provided that the source is acknowledged. However, some material may be copyright protected. For such material, the submitting authors retain all rights for reproduction or redistribution. Permission to reproduce these documents may be required. Please also read our Disclaimer.