DataSet#

Constructor#

class itpseq.DataSet(data=None, *, data_path: Path = None, result_path: Path = None, samples: dict | None = None, keys=None, ref_labels: str | tuple | None = 'noa', cache_path=None, file_pattern=None, allow_partial_keys=True, ref_mapping=None)[source]

Loads an iTP-Seq dataset and provides methods for analyzing and visualizing the data.

A DataSet object is constructed to handle iTP-Seq Samples with their respective Replicates. By default, it infers the files to uses in the provided directory by looking for “*.itp.json” files produced during the initial step of pre-processing and filtering the fastq files. It uses the pattern of the file names to group the Replicates into a Sample, and to define which condition is the reference in the DataSet (the Sample with name “noa” by default).

data_path

Path to the data directory containing the output files from the fastq pre-processing.

Type:

str or Path

result_path

Path to the directory where the results of the analysis will be saved.

Type:

str or Path

samples

List or dictionary of Samples in the DataSet. By default, it is None and will be populated automatically if data_path is provided.

Type:

list or dict or None

keys

Properties in the file name to use for identifying the reference.

Type:

tuple

ref_labels

Specifies the reference: e.g. ‘noa’ or ((‘sample’, ‘noa’),)

Type:

str or tuple

cache_path

Path used to cache intermediate results. By default, this creates a subdirectory called “cache” in the result_path directory.

Type:

str or Path

file_pattern

Regex pattern used to identify the sample files in the data_path directory. If None, defaults to r’(?P<lib_type>[^_]+)_(?P<sample>[^_d]+)(?P<replicate>d+)’ which matches files like nnn15_noa1.itp.json, nnn15_tcx2.itp.json, etc.

Type:

str

allow_partial_keys

If no exact match of the keys if found, try to map a reference using partial keys.

Type:

bool

ref_mapping

If set, do not try to infer the references but use the passed dictionary as mapping. The dictionary should have a format: {‘sample.id’: ‘ref.id’} where “sample.id” and “ref.id” are the labels generated upon import.

Type:

dict or None

Examples

Creating a DataSet from a simple antibiotic treatment (tcx) vs no treatement (noa) with 3 replicates each (1, 2, 3).

Load a dataset from the current directory, inferring the samples automatically.
>>> from itpseq import DataSet
>>> data = DataSet(data_path='.')
>>> data
DataSet(data_path=PosixPath('.'),
        file_pattern='(?P<lib_type>[^_]+)_(?P<sample>[^_\d]+)(?P<replicate>\d+)',
        samples=[Sample(nnn15.noa:[1, 2, 3]),
                 Sample(nnn15.tcx:[1, 2, 3], ref: nnn15.noa)],
        )
Same as above, but only use “sample” as key.
>>> data = DataSet(data_path='.', keys=['sample'])
>>> data
DataSet(data_path=PosixPath('.'),
        file_pattern='(?P<lib_type>[^_]+)_(?P<sample>[^_\d]+)(?P<replicate>\d+)',
        samples=[Sample(noa:[1, 2, 3]),
                 Sample(tcx:[1, 2, 3], ref: noa)],
        )
Compute a standard report and export it as PDF
>>> data.report('my_experiment.pdf')
Display a graph of the inverse-toeprints lengths for each sample
>>> data.itp_len_plot(row='sample')
Attributes:
samples_with_ref

Dictionary of the samples that have a reference

toeprint_df

DataFrame of the counts of each inverse toeprint length per Replicate

Methods

DE([pos])

Computes the log2-FoldChange for each motif described by pos for each sample in the DataSet relative to their reference

infos([html])

Displays summary information about the dataset NGS reads per replicate.

itoeprint([plot, norm, norm_range, ...])

Plots a virtual inverse-toeprint gel.

itp_len_plot([ax, col, row, min_codon, ...])

Generates a line plot of inverse-toeprint (ITP) counts per length.

reorder_samples(order[, validate, ...])

Reorders the samples in the DataSet.

report([template, output])

Create a report for the DataSet.

set_references([ref_mapping, exact_mapping])

Sets the Sample references from a mapping

Methods#

DataSet.reorder_samples(order[, validate, ...])

Reorders the samples in the DataSet.

DataSet.set_references([ref_mapping, ...])

Sets the Sample references from a mapping

DataSet.infos([html])

Displays summary information about the dataset NGS reads per replicate.

DataSet.DE([pos])

Computes the log2-FoldChange for each motif described by pos for each sample in the DataSet relative to their reference

DataSet.itp_len_plot([ax, col, row, ...])

Generates a line plot of inverse-toeprint (ITP) counts per length.

DataSet.report([template, output])

Create a report for the DataSet.

DataSet.itoeprint([plot, norm, norm_range, ...])

Plots a virtual inverse-toeprint gel.