The following article contains a list of frequently asked questions relating to calculations employed in Scaffold Q+. For specific questions not covered in our documentation we are available by telephone Monday through Friday from 8 AM to 5 PM PST. Our toll free number is 1-800-944-6027. Additionally support can be contacted via email at firstname.lastname@example.org.
Should I use the median or the mean when quantifying my data set?
Median: Choosing the median normalization implies that you want to use the non-parametric statistical tests that are more robust in handling missing and non-normal data. Median is the default and suggested setting.
Mean: This choice can be considered appropriate when there is evidence that the data under analysis can be considered well behaved.
Checklist for well behaved data
- Little missing data
- Few outliers
- Little data below the limit of detection
- Little distortion due to saturation at high intensities
- Little variation in the spread of the data between data sets
- Little skew - the data set looks mostly symmetrical
- No fat, or thin, tails – the data set looks like a normal distribution
Choosing the mean normalization implies that the data will be more or less normal and that Scaffold Q+ should use the standard statistical tests, the t-test and ANOVA test. Scaffold Q+ gives you several graphs of your data that you can examine to decide if the data is normal enough to use with the mean normalization. Missing data will have assigned an arbitrary low number for the calculations
What quantitative tests can I perform on my samples?
You can perform several quantitative tests on the data, depending on how the data is quantified. Certain tests require median-based analysis while other tests require mean-based analysis.
Mann Whitney Test: Non-parametric statistical hypothesis test for assessing whether one of two samples of independent observations tends to have larger values than the other. It can also be defined as a distribution-free test of whether two medians are equal. It uses the ranks of the data in the two samples. Compares well with a T-test, but it is independent of the way the data is distributed.
Kruskal-Wallis Test: The Kruskal–Wallis one-way analysis of variance by ranks is a non-parametric method for testing whether samples originate from the same distribution. It is used for comparing more than two samples that are independent, or not related. The parametric equivalence of the Kruskal-Wallis test is the one-way analysis of variance (ANOVA). The factual null hypothesis is that the populations from which the samples originate, have the same median. When the Kruskal-Wallis test leads to significant results, then at least one of the samples is different from the other samples. The test does not identify where the differences occur or how many differences actually occur. Since it is a non-parametric method, the Kruskal–Wallis test does not assume a normal distribution, unlike the analogous one-way analysis of variance. However, the test does assume an identically-shaped and scaled distribution for each group, except for any difference in medians.
T-Test: Compares the abundance between two sample categories. The T-test reports a p-value. The p-value is the probability that the observed difference could be random chance. Smaller p-values imply more difference in protein abundance. The T-test is only reliable if there are at least 3 replicates in each of two categories
ANOVA: Determines whether three or more categories are different. A small p-value means that there is some significant difference in protein abundance in at least one of the categories. The ANOVA test is reliable only if there are at least 3 replicates in each category.
Mean or Median-Based Analysis
Permutation Test: The permutation test is a quantitative assessment of whether the protein abundance is differentially expressed between categories of samples. It is available with both the median- and mean-based analysis methods. It assumes that there is no differential expression between the categories when the differential expression is calculated using the t-statistic and then sets out to calculate a p-value to find out how likely this assumption is wrong. The trick the permutation test uses to calculate the p-value depends upon building a reference histogram of t-statistic values. This reference histogram is made by permutating the measurements between the categories in a similar manner as bootstrapping.
What is the Bonferroni Correction? Why would I use it?
A statistical method used to counteract the problem of multiple comparisons. This very conservative correction will allow you to reduce the error in establishing which proteins are differentially expressed.
How does normalization work in Q+?
For detailed descriptions of Normalization in Q+, please see this document, which includes full explanations of the process.
Intensity Based Normalization: In intensity-based normalization, the fold change is calculated using a weighted median of the reporter ion or MS1 intensity values for all the spectra that identify a specific peptide. The weighted median value is calculated using a kernel density estimation that is constructed using cosinus kernels, with the height of a kernel corresponding to the mean value of the intensity for a quant sample and the width of a kernel corresponding to the standard deviation of the intensity for a quant sample.
Ratio Based Normalization: Ratio based normalization operates on log2 intensity ratios rather than on the intensity themselves. Note that currently for ratio based analysis, only mean mode with individual spectrum reference is available.
How is the Intensity Weighting calculated?
For an explanation of Intensity Weighting, see the documentation here, starting on Page 6.
How does the kernel density function work?
For an explanation of how the kernel density function works, see the documentation here.
What is the Randomized Permutation Test?
For an explanation of the randomized permutation test, see the documentation here.