DataSet#
Constructor#
- class itpseq.DataSet(data=None, *, data_path: Path = None, result_path: Path = None, samples: dict | None = None, keys=None, ref_labels: str | tuple | None = 'noa', cache_path=None, file_pattern=None, allow_partial_keys=True, ref_mapping=None)[source]
Loads an iTP-Seq dataset and provides methods for analyzing and visualizing the data.
A DataSet object is constructed to handle iTP-Seq Samples with their respective Replicates. By default, it infers the files to uses in the provided directory by looking for “*.itp.json” files produced during the initial step of pre-processing and filtering the fastq files. It uses the pattern of the file names to group the Replicates into a Sample, and to define which condition is the reference in the DataSet (the Sample with name “noa” by default).
- data_path
Path to the data directory containing the output files from the fastq pre-processing.
- Type:
str or Path
- result_path
Path to the directory where the results of the analysis will be saved.
- Type:
str or Path
- samples
List or dictionary of Samples in the DataSet. By default, it is None and will be populated automatically if data_path is provided.
- Type:
list or dict or None
- keys
Properties in the file name to use for identifying the reference.
- Type:
tuple
- ref_labels
Specifies the reference: e.g. ‘noa’ or ((‘sample’, ‘noa’),)
- Type:
str or tuple
- cache_path
Path used to cache intermediate results. By default, this creates a subdirectory called “cache” in the result_path directory.
- Type:
str or Path
- file_pattern
Regex pattern used to identify the sample files in the data_path directory. If None, defaults to r’(?P<lib_type>[^_]+)_(?P<sample>[^_d]+)(?P<replicate>d+)’ which matches files like nnn15_noa1.itp.json, nnn15_tcx2.itp.json, etc.
- Type:
str
- allow_partial_keys
If no exact match of the keys if found, try to map a reference using partial keys.
- Type:
bool
- ref_mapping
If set, do not try to infer the references but use the passed dictionary as mapping. The dictionary should have a format: {‘sample.id’: ‘ref.id’} where “sample.id” and “ref.id” are the labels generated upon import.
- Type:
dict or None
Examples
Creating a DataSet from a simple antibiotic treatment (tcx) vs no treatement (noa) with 3 replicates each (1, 2, 3).
- Load a dataset from the current directory, inferring the samples automatically.
>>> from itpseq import DataSet >>> data = DataSet(data_path='.') >>> data DataSet(data_path=PosixPath('.'), file_pattern='(?P<lib_type>[^_]+)_(?P<sample>[^_\d]+)(?P<replicate>\d+)', samples=[Sample(nnn15.noa:[1, 2, 3]), Sample(nnn15.tcx:[1, 2, 3], ref: nnn15.noa)], )
- Same as above, but only use “sample” as key.
>>> data = DataSet(data_path='.', keys=['sample']) >>> data DataSet(data_path=PosixPath('.'), file_pattern='(?P<lib_type>[^_]+)_(?P<sample>[^_\d]+)(?P<replicate>\d+)', samples=[Sample(noa:[1, 2, 3]), Sample(tcx:[1, 2, 3], ref: noa)], )
- Compute a standard report and export it as PDF
>>> data.report('my_experiment.pdf')
- Display a graph of the inverse-toeprints lengths for each sample
>>> data.itp_len_plot(row='sample')
- Attributes:
samples_with_refDictionary of the samples that have a reference
- toeprint_df
DataFrame of the counts of each inverse toeprint length per Replicate
Methods
DE([pos])Computes the log2-FoldChange for each motif described by pos for each sample in the DataSet relative to their reference
infos([html])Displays summary information about the dataset NGS reads per replicate.
itoeprint([plot, norm, norm_range, ...])Plots a virtual inverse-toeprint gel.
itp_len_plot([ax, col, row, min_codon, ...])Generates a line plot of inverse-toeprint (ITP) counts per length.
reorder_samples(order[, validate, ...])Reorders the samples in the DataSet.
report([template, output])Create a report for the DataSet.
set_references([ref_mapping, exact_mapping])Sets the Sample references from a mapping
Methods#
|
Reorders the samples in the DataSet. |
|
Sets the Sample references from a mapping |
|
Displays summary information about the dataset NGS reads per replicate. |
|
Computes the log2-FoldChange for each motif described by pos for each sample in the DataSet relative to their reference |
|
Generates a line plot of inverse-toeprint (ITP) counts per length. |
|
Create a report for the DataSet. |
|
Plots a virtual inverse-toeprint gel. |