Filtering Data

Data can be filtered arbitrarily using pd.DataFrame methods but the hicutils.core.filters module provides helper utilities for common filtering routines. Examples include filtering non-productive clones and excluding clones by copy number cutoffs.

Examples

filtering

API Documentation

hicutils.core.filters.filter_by_gene_frequency(df, min_frequency, by='subject', gene='v_gene')

Removes clones in by (defaults to subject) which have an overall gene usage less than or equal to min_frequency.

For example, if min_frequency=0.05 and by='subject', all clones using a V-gene with a frequency less than or equal to 0.05 in a given subject are removed.

dfpd.DataFrame

The DataFrame to filter.

min_frequencyfloat

The minimum frequency of a gene in by that should be included.

bystr

The column on which to calculate frequency. Defaults to subject.

genestr

The gene on which to filter. Accepts v_gene or j_gene defaulting to j_gene

Returns

DataFrame filtered on gene frequency in by.

hicutils.core.filters.filter_by_overall_copies(df, copies, field='clone_id')

Removes clones identified by field (default clone_id) from a DataFrame with less than copies total copies across all pools.

Changing field changes the definition of a clone. For example, setting field to 'cdr3_aa' will defined clones by their CDR3 AA sequence.

Parameters

dfpd.DataFrame

The DataFrame to filter.

copiesint

The minimum copy number of each clone required to be included in the resulting DataFrame.

Returns

DataFrame filtered by copies.

Examples

The following removes all clones with less than 5 copies from df:

>>> df.copies.min()
1
>>> df = filter_by_overall_copies(df, 5)
>>> df.copies.min()
4
hicutils.core.filters.filter_by_presence(df, pool, pool_value)

Filters clones based on presence in a given pool.

dfpd.DataFrame

The DataFrame to filter.

poolstr

The pool on which to filter.

pool_valuestr

The pool value on which to filter.

Returns

DataFrame filtered by number of pools.

hicutils.core.filters.filter_functional(df, functional=True)

Removes clones on functionality, by default removing non-functional clones. Setting functionality to False removes functional clones.

Parameters

dfpd.DataFrame

The DataFrame to filter.

functionalbool

The functionality of the clones to include. Set to True (the default) to include functional clones only. Set to False to only include non-functional clones.

Returns

DataFrame filtered by functionality.

hicutils.core.filters.filter_number_of_pools(df, pool, n, func='greater_equal', limit_to=None)

Filters clones based on the number of pools in which it occurs. df : pd.DataFrame

The DataFrame to filter.

poolstr

The pool on which to filter.

nstr

The number of distinct pools a clone must be in to be included in the resulting DataFrame.

funcfunction

The comparison function to use between n and the number of occurrences of each clone. The default is greater_equal meaning a clone must occur in ≥ n pools to be included. Any numpy function may be used such as equal or less_equal.

limit_tolist(str), str, None

If specified, overlap will be limited to the specified pools. This is useful to filter clones based on their overlap in a subset of pools.

Returns

DataFrame filtered by number of pools.

hicutils.core.filters.remove_potential_contaminates(df, pool, pool_values, clone_feature='cdr3_nt')

Removes clones based on clone_feature (defaults to CDR3 NT) which occur in pool with values pool_values. For example, to remove all clones with CDR3 NT sequences found in subjects ‘Fibroblast’ and ‘Water’:

remove_potential_contaminates(df, 'subject', ['Fibroblast', 'Water'])
dfpd.DataFrame

The DataFrame to filter.

poolstr

The pool to use for filtering.

pool_valueslist

The values of pool which should be the basis of clonal exclusion.

clone_featurestr

The clone feature to use for filtering. For example cdr3_nt (the default) will use the CDR3 NT sequence as the basis for removing other clones.

Returns

DataFrame with clones occurring in pool with values pool_values excluded on the basis of clone_feature.