Loading Data

There are multiple ways to load AIRR-seq data in hicutils:

  1. (Recomended) Using existing un-pooled AIRR-formatted files with a metadata file with one row per file.

  2. Using existing pooled AIRR-formatted files exported from ImmuneDB, where pooling metadata is embedded in the file names.

  3. Directly downloading and loading data from a hosted ImmuneDB instance using its URL and database name.

Examples

loading_data

API Documentation

hicutils.core.io.pull_immunedb_data(endpoint, db_name, out_name, skip_existing=True)

Downloads unpooled clonal data from an ImmuneDB instance.

Parameters

endpointstr

The endpoint to the hosted ImmuneDB instance. For example https://mydomain.com/immunedb.

db_namestr

The database name itself. For example my_db.

out_namestr

The name of the directory into which the data will be saved.

Returns

A pd.DataFrame with all clonal data downloaded from the ImmuneDB instance.

Examples

>>> io.pull_immunedb_data(
    'https://mydomain.com/immunedb',
    'my_db',
    'my_db_data'
)
hicutils.core.io.read_directory(path)

Reads AIRR-formatted TSV files and joins it with an associated metadata.tsv file to return a unified pd.DataFrame.

Parameters

pathstr

Path to AIRR-formatted files and metadata.tsv

Returns

pd.DataFrame with AIRR-seq data and metadata.

hicutils.core.io.read_metadata(path)

Reads a metadata file into a pd.DataFrame, prefixing METADATA_ to each field and setting the replicate_name to its index.

Parameters

pathstr

Path to metadata file

Returns

pd.DataFrame containing the metadata.

hicutils.core.io.read_tsvs(path, features=())

Reads AIRR-formatted input files into a single DataFrame and populates common fields.

Parameters

pathstr

Path to directory containing .pooled.tsv files

featureslist, optional

List of features which are encoded in the file names.

Returns

Single DataFrame containing the concatenated AIRR-formatted data.

hicutils.core.io.save_fig_and_data(name, df, path='./', ext='pdf', fig_args=None, **kwargs)

Saves the most recently generated figure and associated data to files.

Parameters

namestr

The filename to use for both the figure and data file.

dfpd.DataFrame

The DataFrame used to generate the figure.

pathstr, optional

Path to directory into which the files should be saved.

extstr, optional

The extension of the figure file. Defaults to pdf but can be any image format such as png.

fig_argsdict

Additional parameters which will be passed to plt.savefig

kwargsdict

Additional parameters which will be passed to df.to_csv