itpseq.Sample.DE#

Sample.DE(pos=None, how='aax', join=False, quiet=True, filter_size=True, translate=False, multi=True, n_cpus=None, raw=False, _nocache=False, **kwargs)[source]#

Computes the differential expression between the sample and its reference.

Parameters:

pos (str) – Ribosome positions to consider to compute the differential expression.
how (str, optional) – Type of inverse toeprints to analyze (see Replicate.load_data).
join (bool, optional) – If True, joins the DE results back to the original df. Defaults to False.
quiet (bool, optional) – If True, suppresses the console output of the pydeseq2 library. Defaults to True.
translate (bool, optional) – If True, translates the nucleotide motif into amino-acids. Defaults to False. This doesn’t check that the nucleotide motif is in frame or composed of consecutive positions.
multi (bool, optional) – Whether to compute DE with a specific contrast (cond vs. ref). Defaults to True.
n_cpus (int, optional) – The number of CPUs to utilize for parallel processing. Defaults to the total number of available CPUs.
filter_size (bool) – Only considers reads for which an amino acid is present in all target positions.
**kwargs (optional) – Additional parameters to get_counts_ratio and Replicate.load_data. For instance min_peptide and max_peptide are useful to filter the peptide size of the inverse toeprints to consider.

Returns:

DataFrame of the differential expression statistics with a row per motif.

Return type:

DataFrame

See also

Sample.get_counts_ratio: Gets the inverse toeprint counts and sample/reference ration of normalized counts.
Sample.volcano: Draws a volcano plot from the Differential Expression data.
Sample.subset_logo: Creates a logo from a subset of the Differential Expression data.

Examples

Compute the differential expression for positions E-P

>>> sample.DE('E:P')
        baseMean  log2FoldChange     lfcSE       stat        pvalue          padj  log10pvalue  log10padj
QK   5537.704183        0.778031  0.073280  10.617238  2.477833e-26  7.582170e-24    25.605928  23.120206
VI   6874.891363        0.295160  0.371018   0.795542  4.262985e-01  7.718778e-01     0.370286   0.112451
MY    747.538317        0.263705  0.074294   3.549477  3.859965e-04           NaN     3.413417        NaN
YY   2216.501684        0.259860  0.068213   3.809545  1.392226e-04  6.086018e-03     3.856290   2.215667
WM    200.446070        0.226720  0.111555   2.032371  4.211614e-02           NaN     1.375551        NaN
..           ...             ...       ...        ...           ...           ...          ...        ...
TP  15256.234795       -0.255940  0.061793  -4.141886  3.444618e-05  2.635133e-03     4.462859   2.579197
mK  10824.395771       -0.308538  0.210353  -1.466765  1.424400e-01  4.737680e-01     0.846368   0.324434
EP   8950.363473       -0.321266  0.068514  -4.689045  2.744828e-06  2.799725e-04     5.561485   3.552885
PP  20880.530910       -0.372203  0.078851  -4.720365  2.354220e-06  2.799725e-04     5.628153   3.552885
KK   7645.111411       -0.390365  0.096140  -4.060381  4.899280e-05  2.998359e-03     4.309868   2.523116
[420 rows x 8 columns]

Include the read counts for each replicate, the average count per million reads, and the sample/reference ratio of the normalized counts.

>>> sample.DE('E:P', join=True)
    noa.1  noa.2  noa.3  sample.1  sample.2  sample.3          noa       sample     ratio      baseMean  log2FoldChange     lfcSE       stat        pvalue          padj  log10pvalue  log10padj
QK   4312   3594   4506      7161      7153      6396  1414.463921  2663.851852  1.883294   5537.704183        0.778031  0.073280  10.617238  2.477833e-26  7.582170e-24    25.605928  23.120206
VI   6696   4833   7340      6335      5263     10473  2146.655446  2805.667048  1.306995   6874.891363        0.295160  0.371018   0.795542  4.262985e-01  7.718778e-01     0.370286   0.112451
MY    760    633    672       855       795       767   233.801175   309.668069  1.324493    747.538317        0.263705  0.074294   3.549477  3.859965e-04           NaN     3.413417        NaN
YY   2365   1826   1956      2686      2231      2260   692.561206   914.186048  1.320008   2216.501684        0.259860  0.068213   3.809545  1.392226e-04  6.086018e-03     3.856290   2.215667
WM    204    172    186       236       243       164    63.727865    82.966388  1.301886    200.446070        0.226720  0.111555   2.032371  4.211614e-02           NaN     1.375551        NaN
..    ...    ...    ...       ...       ...       ...          ...          ...       ...           ...             ...       ...        ...           ...           ...          ...        ...
TP  17811  14607  18129     15427     13436     12471  5751.773266  5279.144489  0.917829  15256.234795       -0.255940  0.061793  -4.141886  3.444618e-05  2.635133e-03     4.462859   2.579197
mK  15656   9112  12044     11894      7434      9530  4099.750100  3623.052276  0.883725  10824.395771       -0.308538  0.210353  -1.466765  1.424400e-01  4.737680e-01     0.846368   0.324434
EP  10913   8512  10882      8323      7911      7345  3441.100658  3024.810787  0.879024   8950.363473       -0.321266  0.068514  -4.689045  2.744828e-06  2.799725e-04     5.561485   3.552885
PP  24521  19997  27230     19044     17820     17053  8191.673763  6910.316773  0.843578  20880.530910       -0.372203  0.078851  -4.720365  2.354220e-06  2.799725e-04     5.628153   3.552885
KK   9528   6918  10052      7128      5688      6785  3010.061611  2490.775936  0.827483   7645.111411       -0.390365  0.096140  -4.060381  4.899280e-05  2.998359e-03     4.309868   2.523116
[420 rows x 17 columns]

Export the previous table as CSV (name the index “motif”)

>>> sample.DE('E:P', join=True).rename_axis('motif').to_csv('sample_enrichment_EP.csv')