Correcting Accession Number Issue with New Uniprot GO Annotation Files

Scaffold users often use the UniProt identifier in the form of IDENTIFIER_TAXONOMY as their accession numbers when creating a Scaffold file. Up until recently this has allowed users to add GO annotations from the UniProt knowledge base.

In the newest version of the UniProt GO annotation files, the _TAXONOMY has been removed from the identifier. This has caused problems as proteins in Scaffold files with the _TAXONOMY in the accession number column can no longer match the accession number stored in the GOA file. The custom parsing option often allows for the changing of the accession numbers so they will match with the identifiers found in the new GOA files.

As an alternative to this method if you are using Scaffold 4.10 or above, you can simply use the AutoParse feature when indexing FASTA files. This will create accession numbers solely with the P or Q number, which is compatible with UniProt GOA files.

Procedure

  1. Open the Edit > Edit FASTA Databases... menu. Choose the FASTA in question and select Edit.
  2. When the parsing method dialog box opens select the Use Regular Expressions button.
  3. Choose User Specified from the dropdown menu. This will allow a user to use a custom regular expression for the accession number parse rule and the description parse rule.
  4. Use >([^_]*) for the accession number parse rule and >[^\s]*[\s](.*) as the description parse rule.
  5. This will strip off the _TAXONOMY portion of the accession number. The accession number will look like this sp|P18206|VINC as opposed to sp|P18206|VINC_HUMAN or VINC_HUMAN for Vinculin for example.
  6. Now, try applying the GO terms again using a file form the UniProt knowledge base. The GO terms should be displayed in the Samples view.

 

Have more questions? Submit a request