The following article contains a list of frequently asked question relating to Scaffold's views and displays. For specific questions not covered in our documentation we are available by telephone Monday through Friday from 8 AM to 5 PM PST. Our toll free number is 1-800-944-6027. Additionally support can be contacted via email at firstname.lastname@example.org.
Views and Panes
What do I see when I toggle between the the Bio or MS views?
Clicking on the Bio or MS buttons will display information about the Biological sample or individual MS samples loaded into Scaffold. For example, if you loaded data for four MS samples into two biosamples in Scaffold and clicked on the Bio button, Scaffold would display information in two columns (one for each biosample). However, if you clicked on the MS button Scaffold would display information in four columns for each biosample (one for each MS sample).
Why can't I see some of the views?
You can modify the views displayed in the Edit>Preferences menu. In the Display Settings tab users can select what views to display as well as the default Display Option for new files in the Samples view. Note: Mac users will have to click on the more button to see Display Settings option. Check the boxes for the views that you wish to see and click Apply.
What information is contained in the Statistics view?
The Statistics view in Scaffold provides valuable insight into the statistical analysis behind Scaffold’s protein predictions. Here you can view four windows, each telling you about a different aspect of Scaffold’s analysis.
The Protein Probability Calculation in the upper right hand corner of the Statistics view allows you to see the relationship between the numbers of peptides identified, their identification probability, and the resulting protein probability. This gives you representation of the relationship between Peptide Prophet’s peptide prediction algorithm and Protein Prophet’s protein prediction algorithm. The ROC plot compares the number of identified spectra to the peptide false positive rate and points on the curve are labeled with corresponding probabilities.
The Histogram in the lower right hand corner of the screen displays the results of Scaffold's Peptide Prophet assignments in a color-coded histogram. The bars in the histogram represent the number of spectra assigned to a particular discriminant score. The colors represent correct, incorrect or decoy assignments. Red indicates Scaffold determined this to be a correct assignment while blue indicates an incorrect assignment. The peptides determined to be decoys are colored green.
The Scatterplot in the lower left hand corner of the screen will be displayed if you used two or more search engines on your data (for example; Mascot and X! Tandem). The Scatterplot represents the correlation between the search engines’ results. If the dots fall along a relatively straight line, then there is “good” correlation. If Scaffold's Protein Prophet algorithm assigned the spectra to a protein then the dot will be red, but if Scaffold did not assign the spectra to a protein then the dot will be blue.
In the upper left hand corner of the screen is the MS sample selection screen. Here you can select an MS sample to display its statistical parameters in the other three windows.
How do I interpret what the Protein Probability Calculation chart is saying?
The protein probability calculation chart shows that increasing the minimum number of peptides increases the protein probability for a given peptide probability. In other words, the more peptides you have, the more confident in your protein probability you can be. Usually, this chart will tell you that even at a high peptide probability, using one peptide to ID a protein will only allow for a relatively low protein probability, indicating that one hit wonders are not advised.
What is the Peptide Receiver Operator Characteristic (ROC) Plot? How do I interpret it?
The peptide ROC Plot compares the number of identified spectra for a single protein at each protein probability to a peptide false positive rate. Basically it tells you how many peptides are falsely assigned correct at each probability per identified spectra.
What are the Quantitative Scatterplots? How are Q-Q and Mean-Deviation Scatterplots different?
Q-Q Scatterplot shows differentiation between groups (or, in Scaffold, categories). You can choose to compare any two categories. When a point is found far off the 45 degree line, this indicates the proteins are in different amounts in each category. The dotted lines refer to 2 standard deviations from the median. Whereas, the Mean/Deviation Scatterplot simply plots the coefficient of variance (CV) which is the standard deviation divided by the mean.
Why are spectra missing from the Proteins view?
If you see that spectra are missing from the Proteins view it is likely that the search engine files were loaded without their accompanying peak list files. When loading results from most search engines (Mascot, and various search engines that produce MZID Scaffold needs access to the MGF files in order to display spectral information.
Why is Scaffold displaying protein identifications below the protein threshold value set?
If Scaffold identifies a protein in a biosample at or above the set protein threshold and the same protein with a lower probability in another biosample the Samples view will only display the high confidence protein by default. However activating the View>Show Lower Scoring Matches option will display the low confidence proteins in other biosamples. Additionally activating the View>Show Less Than 5% Probabilities will display those proteins assigned a probability of 5%.
Why aren't identical peptides shown together in the Peptide Spectrum Match (PSM) table?
This can occur if you sorted the PSM table by some variable other than peptide sequence e.g. Mascot score. To place identical peptides next to each other simply click at the top of the peptide sequence column header, and Scaffold will sort peptides according to sequence placing identical peptides next to each other.
What does the Display Options dropdown change?
The Display Options changes which protein information is displayed in the Samples view for each protein identified. Options include but are not limited to Protein Identification Probability, Total Spectrum Count, Percent Coverage, and Quantitative Value.
What settings are located in Scaffold's Preferences dialog?
The Preferences dialog gives you access to numerous settings and is accessible through Edit>Preferences. There are nine different tabs in the Preferences dialog and each of them allows you to modify Scaffold parameters to optimize or customize performance on your computer.
How do I get Scaffold to show only " Good" proteins and peptides?
It is important to note that there is no such thing as a “good” protein or peptide. Analysis software such as Scaffold gives you a probability that a protein is present in your sample. This probability is based on the confidence of the peptides associated with that protein. You have various options for increasing the stringency of your analysis in Scaffold. This will not mean that all of your proteins are “good”, but you can have more confidence in your protein identifications because they were obtained with higher probability thresholds.
The primary way of increasing the stringency of your analysis is to adjust the Protein Threshold, Min # Peptides, and Peptide Threshold parameters. The higher these values are the more strict your analysis will become. You can adjust these options to higher or lower values by clicking on them and choosing a value from the drop down menu.
What do the boxes Min Protein, Min # Peptides, and Min Peptide do?
The Protein Threshold, Min # Peptides, and Peptide Threshold boxes in the Samples view are dropdown menus where you can adjust filters for viewing your data.
If you click on the Protein Threshold box you will be able to choose various minimum protein probabilities. If you are only interested in proteins whose identification is 99%, then you can adjust the Protein Threshold option to 99%. This means that Scaffold will only display those proteins its statistical analysis has indicated have a 99% probability of being present in your sample.
The Min # Peptides box allows you to choose between one and five peptides. This number is the minimum number of peptides Scaffold will use to determine if a protein is present. If you set this number to three, then Scaffold will only show you those proteins for which it identified three separate peptides.
The Peptide Threshold box allows you to choose the minimum probability that Scaffold will use in determining whether or not a spectra actually identifies a peptide. If you raise this value to 90% or 95% then only those peptides which have a greater than 0.9 or 0.95 probability will be displayed. If you set strict peptide parameters then you will generate fewer protein predictions because proteins are predicted based on peptide presence.
What does Custom... on the Min Peptide filter do?
The Custom… option on the Peptide filter provides a way for users to set custom peptide filter parameters. If you click on the custom option the Edit Peptide Thresholds dialog will display. Choose New Threshold... to open the Configure Peptide Thresholds dialog. Here you can set parameters to meet your specific needs.
In the Configure Peptide Thresholds page you can specify the general minimum thresholds such as Peptide Probability, Accept Charge, and Minimum # Enzymatic Termini (NTT) value. To set minimum peptide parameters for various other search engines such as SEQUEST or Mascot you need to click the Use Individual Program Thresholds circle.
If you wanted to include only those peptides that had a SEQUEST Xcorr value of three or better in your analysis you could set min SEQUEST XCorr to three and Scaffold would only look at peptides with a minimum SEQUEST score of three.
How can I filter my peptides using my SEQUEST criteria?
You can choose to filter your peptides using SEQUEST criteria (e.g. minimum XCorr value) by setting a Custom peptide threshold using the Peptide Threshold dropdown option in the View page. When you click on the Custom option, a pop up window entitled Edit Peptide Thresholds will appear.
Choose to add a new threshold and a pop up window entitled Configure Peptide Thresholds will appear. Here you can choose the minimum requirements for a peptide to be included in Scaffold’s analysis. Mark the Use Individual Program Thresholds box and pick the specific SEQUEST (or Mascot, or X! Tandem) values you wish to use as your minimum peptide requirements. When you are done make sure to choose the new threshold in the Peptide Threshold dropdown.
What is Protein Grouping Ambiguity, e.g. Similar Proteins?
The similar proteins listed by Scaffold are proteins that Scaffold cannot differentiate based on the information provided: Protein Grouping Ambiguity. For example, if there are two peptides that each have a sequence found in two different proteins, and there is no other evidence to suggest which of the proteins are present in your sample, then Scaffold will show one of the proteins and list the other as a similar protein. In Scaffold 4, the grouping changed and can be reviewed here.
Scaffold reports proteins as similar proteins because there is no way to tell which protein is present based on the evidence. However, if your sample contains Human myoglobin, and Scaffold is showing mouse myoglobin in the samples view (with human myoglobin listed as a similar protein), you can change which of the of the similar proteins is shown in the Samples view to match known information about the sample. You can also change which of the similar proteins are listed in the Samples view by changing the preferred accession number in the lower left hand corner of the screen.
Consider the following case:
In this situation peptide 1 could go to protein A, B, or C. Peptide 2 could go to protein A or B, and peptide 3 could go to protein B or C. Proteins A and C contain a subset of the peptides present in protein B and are said to be subsummable. They will not be reported in Scaffold. In this example Scaffold would only report protein B, proteins A and C will not be listed as similar proteins. Scaffold only reports peptide B because it is the most parsimonious choice (simplest explanation of the data).
How can I manually validate the peptide or spectra matches?
Manual validation of spectra can be done using Scaffold. However, because of the large number of spectra loaded in a typical Scaffold analysis manual validation should not be undertaken unless there is a reason to suspect the protein predictions.
Because there are many options when manually validating the data behind a protein or peptide prediction it is important to determine the level of detail you wish to investigate before beginning validation.
The first thing that you can look at when manually validating your spectra is the percent coverage of your protein. This is available in the in the upper left-hand quadrant of the Proteins view. If the sequence coverage is similar between multiple samples the protein has good evidence, if the sequence coverage is not similar between multiple samples you should investigate further. You can also mouse over the protein sequence and see the number of peptides matching a covered location (covered locations are indicated in yellow). A variety of different peptides matching your sequence is good evidence for the protein.
The second thing you should look at is the individual peptide associated spectra. Here you can see the amino acid associated spectral peaks. If the peaks seem to line up with the amino acid assignments, high intensity peaks are labeled, and there is good signal to noise ratio, the spectrum has good quality and peptide prediction based on that spectrum probably has good quality also.
Peptide associated spectrum
If you wish to delve deeper into your protein spectra assignments you have a couple of choices. You can look at the Fragmentation table. Here you are looking for ladders of ion peaks (i.e. a number of ions with a particular mass loss in sequence). A nicely laddered fragmentation table like the one below increases your confidence in the peptide probability.
Another piece of data you can use to manually validate your data is the Spectrum/Model Error. If the Spectrum/Model Error displays small errors with similar delta AMU and they are in the same orientation (either positive errors, or negative errors, see below) then the peptide spectrum has good quality. If the Spectrum/Model Error rate is consistently off it suggests the MS machine is mis-calibrated.
However, if the Spectrum/Model Error has error rates with highly variable delta AMU values, and error rates “flip-flop” in orientation, your confidence in that peptide assignment could be decreased.
Where do I specify the taxonomy of my sample?
When GO annotations are applied using either the NCBI or a UniProt GOA file, Scaffold displays a Taxonomy column in the Samples view if an annotation is present. Once the taxonomies are applied Scaffold has the ability to filter based on taxonomy using the Advanced Filter function (the magnifying glass icon).
What kind of accession number is "(2 similar)"?
(2 similar) is not an accession number. However, you may see this if your database is not parsing correctly. A quick check of your parsing parameters can be done by going to the Edit option of the protein dropdown menu and choose Edit FASTA Databases option.
Pick the database you are using from the list and then click on the Edit button at the bottom of the page. A popup window will show the way that this database is being parsed. If the accession numbers do not make sense [e.g. they are called (2 similar)], then you can choose different parsing parameters from the drop-down list of options in the lower left hand corner of the window.
Pick those parsing parameters that place the accession numbers and protein names in the correct columns, and then choose Apply. Your Scaffold search should now display the protein accession numbers correctly.