A number of Scaffold users have asked about the possibility of developing customized Gene Ontology databases. Perhaps you are curating a database for a particular species, or you have downloaded GO information for a certain taxonomy but it is not in standard GOA form. Scaffold can easily accommodate such custom sources of GO information. All that is required is to create a properly formatted Excel spreadsheet and to direct Scaffold to use that file as the source of its GO annotations. The following document, written by Susan Ludwigsen, provides instructions on how to create a custom GOA file for use in Scaffold.
These instructions assume that you have a file containing GO information and wish to reformat it so Scaffold can read it. The instructions can easily be adapted to create a new GO Database from scratch, however.
- Open the custom GO file in Excel. If there are any header lines (usually beginning with “!”), delete all except !gaf-version:2.0. This should be the first line of the file, so if it is not present, it needs to be created. A line of column headers should follow.
- Check the format of the accession numbers in your Scaffold file. When Scaffold tries to match an accession number, it will check in two places within the GO Database file: Column B and Column K. Column B contains the "primary" accession number. Column K contains a list of alternative accession numbers for that same protein, separated by the "|" character. The accession number must appear in the GOA file exactly as it appears in your Scaffold file in one of these places.
If your format is different, you may need to do some editing to produce the proper format. If your Scaffold accession numbers contain "|" characters, you will need to put those accession numbers into Column B, as the “|” would be interpreted as a separator in Column K. In one example, the Scaffold accession numbers were of the form “gi|26989046”. One column in the GO file was labeled “GI” and contained numbers. Prepending the numbers with the string “gi|” and moving them to column B created the appropriate accession numbers
- Once the accession numbers are correctly formatted and placed, there are a few other columns that need to be formatted appropriately for Scaffold:
- Column E must contain the GO annotation numbers in the form GO:#######
- Column K contains the list of alternate accession numbers as described above
- Column M contains the taxonomy in the form taxon:###
- Column R should contain some text. It can be any string, and can be the same for all entries, but it is used to recognize the end of the line, so it needs to be there, and should just contain simple alpha-numeric characters. An easy approach is to enter a simple string, e.g. CDS in the first row of Column R and then copy it down to all remaining rows.
Remaining columns will be ignored.
After editing the file to make sure that your accession numbers will be found and that all of the columns listed above are correctly formatted, you should save the file as a tab-delimited text file.The follow the steps below to add this file to Scaffold
- Navigate to Edit > Edit Annotation Options...Click the Add button at the bottom of the window.
- Choose "Other File..." from the dropdown and use the "Choose File..." button to navigate to your file
- Give the GOA database file a name
- Use the Choose file to save the created database to a different location. Saving to the default parameters directory will result in Scaffold needing to be run as an admin and is not recommended
- Click Add and allow Scaffold to index the file
- Annotations can be added to an open experiment using the Experiment > Add or Edit Annotations... menu
- Choose the file just created from the GO Terms dropdown menu
- Scaffold can add annotations automatically using PSEA-Quant analysis, more information can be found here
- Or, select the Manually select radio button and click OK
- The default GO term set can now be added using the OK button, or use this dialog to define the set of GO terms you would like to include