DIA Proteomics Workflows: Choosing the Right Library – Proteome Software Technical Help Center

The following article describes library and workflow options for searching DIA data and provides a breakdown of when to use each one. Choosing the right library will have a major impact on your results and is dependent on a few options such as what data you have available, how much time you have, and how many samples you plan to process.

Types of Libraries
When To Use Each Option

Types of Libraries

Library selection is a critical component of any data independent acquisition (DIA) proteomics experiment and there are numerous options. Thus, users should consider the experiment characteristics and match their library to the data. This goes beyond simply making sure your library matches the organism you're studying.

The simplest route would be to search your DIA data against a FASTA file for the organism in question. While this may be the easiest option, searching FASTA files directly comes with a few big limitations and should not be your first choice. Even searching existing data dependent acquisition (DDA) data provides a few advantages over FASTA searches:

More accurate MS2 fragmentation patterns as compared to FASTA searches
More accurate retention times as compared to FASTA searches
Narrower search space composed of only the most likely peptides/proteins as compared to FASTA searches

Before searching a FASTA directly, consider converting existing DDA data (e.g., using Scaffold), if available, to a library format compatible with Scaffold DIA. While it may seem tempting to use a FASTA for convenience, you will get dramatically better results from DDA libraries with minimal effort. A note here about DDA data. This is a better option when you are looking at PTM experiments specifically as prediction algorithms, which will be discussed next, do not do quite as well when considering PTMs.

A popular option that has recently become available is to create a library using Prosit. Prosit is a software program that provides accurate predicted spectra using a machine learning approach. A Prosit library can be made from a FASTA file, and no additional data is needed to generate the library. Prosit libraries hit two of the three points above:

More accurate MS2 fragmentation patterns as compared to FASTA searches
More accurate retention times as compared to FASTA searches

The search space will not be nearly as constrained as when you search against a DDA library. However, Prosit libraries provide a large increase in sensitivity over FASTA files and are easy to generate. With the availability of Prosit libraries, there is little reason to search data directly against a FASTA any more. Prosit libraries for model organisms can be found here. Proteome Software can also generate Prosit libraries for non-model organisms using our in-house Prosit server. Please contact us with library requests.

While DDA and Prosit libraries provide high-quality results with minimal time up front, creating a chromatogram library can ultimately produce better results than either DDA or Prosit libraries alone. Chromatogram libraries are made from DIA data as opposed to DDA data. Typically, we recommend creating a pooled reference sample and running that using a process called gas phase fractionation (GPF). This technique involves limiting the mass range in the instrument for each injection, which results in a greater depth of coverage. In general we suggest that six gas phase fractions of 100 m/z each should be captured over a 400 to 1000 m/z range.

These pooled reference samples are then searched using a Prosit library to create a project specific chromatogram library. Experimental samples are then searched against this chromatogram library (also called a reference library). Chromatogram libraries ensure that your library data is generated using the same instrument with the same chromatography as your experimental data. This, of course, comes at the cost of both instrument time and the need for some extra pooled reference sample. Additionally, even with these additional acquisitions, creating a chromatogram library will lead to an overall faster processing speed later in your pipeline.

While employing chromatogram libraries in DIA searches will often lead to the best results, for small experiments it is not always feasible to include the extra six injections needed to generate one. In these cases, how do you determine which workflow to use? An easy way to break it down is to look at the number of samples you need to run and the time that you have to do so. See the table below for a matrix describing the available options.

When to use each option

Number of Samples	Time per Sample	Injections per Sample	Library
1	12 hr	6	Prosit
1	4 hr	2	Prosit
1	2 hr	1	Use DDA search
2-6	4 hr	2	Prosit
7-9	2-4 hr	1 or 2	Consider either Prosit or Chromatogram Library
10 or more	2 hr	1	Chromatogram Library

If you only have one sample and instrument time is not a concern (you have at least 12 hours to process) then running 6 GPF samples of your 1 sample and searching against a Prosit library is ideal. If you are limited to 4 hours of instrument time, consider capturing 2 GPF injections and searching against a Prosit library. Finally, if you only have a few hours and cannot collect any gas phase fractions, process this sample using a single injection and consider a traditional DDA search as opposed to DIA.

When running 6 or fewer samples, you will likely get the best results for your time by running two fractions per sample and searching a Prosit library directly. When do you start to see the benefits of creating a chromatogram library? When you have 10 or more samples, consider creating a chromatogram library as described above. Not only will you save instrument time by reducing the number of injections needed (you only need one injection per sample), you will also save processing time, as searching a chromatogram library is faster than searching a Prosit library directly.

The breakdown described here is a result of the instrument time needed to process the samples. If you assume 2 hour gradients for each run, then processing 6 GPF library samples and 10 single injection experimental samples will take about 32 hours. If you were to run two injections of each sample and search a Prosit library directly, you would spend at least 40 hours processing this same set of samples. The break even point is roughly 6 samples, so for those experiments between 6 and 10 samples, either library option is appropriate. The calculations above are all based on chromatography producing peak widths of about 30 seconds or better. If your chromatographic peaks are wider than 30s, please contact us for more assistance.

If you are looking to explore this topic further, please check out the paper by Pino et al on analyzing DIA data without spectral libraries. Finally, if you have any additional questions feel free to contact our support department for more information.