Plotting¶

The hicutils.plotting module provides all plotting functions. Each plotting function returns both a handle to the underlying figure as well as the pd.DataFrame which was used to create the plot.

Clone Size¶

A variety of clone size plots are provide to visualize the overall clonal landscape of a dataset.

clone_size

In [1]:

import hicutils as hu

df = hu.io.read_directory('../example_data_immunedb')
pdf = (
    df
    .pipe(hu.pooling.pool_by, 'disease') # Pool on disease
    .pipe(hu.filters.filter_functional) # Filter on functional
    .pipe(hu.filters.filter_by_overall_copies, 2) # Filter on 2+ copy
)

Clone size distribution¶

In [2]:

_ = hu.plots.plot_clone_sizes(pdf, aspect=2)

Clone size distribution with cutoff¶

In [3]:

_ = hu.plots.plot_clone_sizes(pdf, cutoff=20)

Top clone plots¶

In [4]:

_ = pdf.groupby('disease').apply(hu.plots.plot_top_clones)

Top clones with custom cutoff¶

In [5]:

_ = pdf.groupby('disease').apply(hu.plots.plot_top_clones, cutoff=50)

Top clones with custom cutoff & annotation¶

In [6]:

_ = pdf.groupby('disease').apply(hu.plots.plot_top_clones, cutoff=15,
                                 annotate=['v_gene', 'cdr3_aa'])

D_{x plot¶}

In [7]:

_ = hu.plots.plot_d_index(pdf, 'disease', height=4, aspect=.75)

/Users/arosenfeld/Documents/repos/hicutils/hicutils/plots/clone_size.py:258: UserWarning: FixedFormatter should only be used together with FixedLocator
  g.axes.flatten()[0].set_xticklabels(

Clone size range plot¶

In [8]:

_ = hu.plots.plot_ranges(pdf, 'disease')

/Users/arosenfeld/Documents/repos/hicutils/hicutils/plots/clone_size.py:223: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set_yticklabels([round(abs(tick), 2) for tick in ax.get_yticks()])

Clone size range plot with custom ranges¶

In [9]:

_ = hu.plots.plot_ranges(pdf, 'disease', intervals=(10, 100, 1000, 5000))

/Users/arosenfeld/Documents/repos/hicutils/hicutils/plots/clone_size.py:223: UserWarning: FixedFormatter should only be used together with FixedLocator
  ax.set_yticklabels([round(abs(tick), 2) for tick in ax.get_yticks()])

hicutils.plots.clone_size.plot_clone_counts(df, pool, **kwargs)¶

Plots the number of clones per pool.

Parameters¶

dfpd.DataFrame: The DataFrame used to plot the clone size distribution.
poolstr: The field on which to pool.

Returns¶

A tuple (g, df) where g is a handle to the plot and df is the underlying DataFrame.

hicutils.plots.clone_size.plot_clone_sizes(df, cutoff=None, **kwargs)¶

Plots the distribution of clone sizes in df.

Parameters¶

dfpd.DataFrame: The DataFrame used to plot the clone size distribution.
cutoffint or None: Aggregate all clones with cutoff or more copies into one bin on the right side of the graph. This is useful to condense the tail of the plotted distribution.

Returns¶

A tuple (g, df) where g is a handle to the plot and df is the underlying DataFrame.

hicutils.plots.clone_size.plot_d_index(df, pool, cutoff=20, **kwargs)¶

Plots the Dx index for clones in df. The default cutoff value is 20 and the generated figure is a dot plot of Dx values stratified by pool.

Parameters¶

dfpd.DataFrame: The DataFrame used to plot the top clones.
cutoffint: The D-value to use as a cutoff, defaults to 20.

Returns¶

A tuple (g, df) where g is a handle to the plot and df is the underlying DataFrame.

hicutils.plots.clone_size.plot_top_clones(df, cutoff=20, annotate=False, color=(0.8392156862745098, 0.15294117647058825, 0.1568627450980392), figsize=(12, 8))¶

Plots the copy-number frequency of the top cutoff clones (default 20). Optionally, the annotate keyword can be set to one or more clone features to annotate each bar. For example setting annotate=('v_gene', 'cdr3_aa') will show the V-gene and CDR3 AA for each clone.

Parameters¶

dfpd.DataFrame: The DataFrame used to plot the top clones.
cutoffint: The number of clones to plot, defaults to 20.
annotatestr, list, or None: The feature(s) to annotate for each clone
colorstr: The color to use for bars.
figsizetuple: The (width, height) of the plot.

Returns¶

A tuple (g, df) where g is a handle to the plot and df is the underlying DataFrame.

Gene Usage¶

The gene usage plots show V- or J-gene usage grouped by pool. This can be useful for investigating gene skewing in different populations. Each plot can be scaled in various ways and clustered by row, column, both, or neither.

gene_usage

In [1]:

import hicutils as hu

df = hu.io.read_directory('../example_data_immunedb')
pdf = (
    df
    .pipe(hu.pooling.pool_by, 'disease') # Pool on disease
    .pipe(hu.filters.filter_functional) # Filter on functional
    .pipe(hu.filters.filter_by_overall_copies, 2) # Filter on 2+ copy
)

Basic VH dot plot¶

In [12]:

_ = hu.plots.plot_gene_frequency(pdf, ['subject', 'disease'], 'v_gene', by='subject', height=4)

/Users/arosenfeld/Documents/repos/hicutils/hicutils/plots/gene_usage.py:102: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)

To adopt the future behavior and silence this warning, use 

	>>> .groupby(..., group_keys=True)
  pdf
/Users/arosenfeld/Documents/repos/hicutils/hicutils/plots/gene_usage.py:119: UserWarning: FixedFormatter should only be used together with FixedLocator
  g.axes[0][0].set_xticklabels(g.axes[0][0].get_xticklabels(), rotation=90)

Filter out VHs with a frequency < 0.5%¶

In [13]:

_ = hu.plots.plot_gene_frequency(
    hu.filters.filter_by_gene_frequency(pdf, 0.005),
    ['subject', 'disease'], 'v_gene',
    by='subject',
    height=4
)

/Users/arosenfeld/Documents/repos/hicutils/hicutils/plots/gene_usage.py:102: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)

To adopt the future behavior and silence this warning, use 

	>>> .groupby(..., group_keys=True)
  pdf
/Users/arosenfeld/Documents/repos/hicutils/hicutils/plots/gene_usage.py:119: UserWarning: FixedFormatter should only be used together with FixedLocator
  g.axes[0][0].set_xticklabels(g.axes[0][0].get_xticklabels(), rotation=90)

Basic VH usage heatmap¶

In [3]:

_ = hu.plots.plot_gene_heatmap(pdf, 'disease', 'v_gene', figsize=(12, 5))

Normalized by column¶

In [4]:

_ = hu.plots.plot_gene_heatmap(pdf, 'disease', 'v_gene', normalize_by='cols',
                               figsize=(12, 5))

Weighted by copies¶

In [5]:

_ = hu.plots.plot_gene_heatmap(pdf, 'disease', 'v_gene', size_metric='copies',
                               figsize=(12, 5))

Disable clustering¶

In [6]:

_ = hu.plots.plot_gene_heatmap(pdf, 'disease', 'v_gene', cluster_by=None,
                               figsize=(12, 5))

hicutils.plots.gene_usage.plot_gene_frequency(df, pool, gene, size_metric='clones', by=None, **kwargs)¶

Generates a gene-usage dot/bar plot showing the utilization of each V or J gene based on pools.

Parameters¶

dfpd.DataFrame: The DataFrame to use as the source of gene usage information.
poolstr: The pooling column to use for each row of the heatmap.
genestr (v_gene or j_gene): The gene to plot. Must be either v_gene or j_gene.
size_metricstr: The size metric which is plotted as the intensity of each cell. Must be one of clones, copies, or uniques.
bystr: The feature to use as the hue variable for the plot. Must be included in the pool parameter.

Returns¶

A tuple (g, df) where g is a handle to the plot and df is the underlying DataFrame.

hicutils.plots.gene_usage.plot_gene_heatmap(df, pool, gene, min_frequency=0, size_metric='clones', normalize_by='rows', cluster_by='both', **kwargs)¶

Generates a gene-usage heatmap showing the utilization of each V or J gene based on pools.

Parameters¶

dfpd.DataFrame: The DataFrame to use as the source of gene usage information.
poolstr: The pooling column to use for each row of the heatmap.
genestr (v_gene or j_gene): The gene to plot. Must be either v_gene or j_gene.
min_frequencyfloat: The minimum frequency across all pools allowed to be included in the heatmap.
size_metricstr: The size metric which is plotted as the intensity of each cell. Must be one of clones, copies, or uniques.
normalize_bystr: Sets how to normalize the plot. If set to rows (the default) each row is normalized to sum to one. Setting it to cols causes each column (gene) to sum to one.
cluster_bystr (rows, cols, or both) or None: Sets which clustering to display. Valid values are rows, cols, both, or clustering can be disabled with None.

Returns¶

A tuple (g, df) where g is a handle to the plot and df is the underlying DataFrame.

Clonal Overlap¶

Clonal overlap can be visualized by string plots with the plot_strings function or as UpSet plots with the plot_upset function.

For string plots, each row represents a clone and each column a pool. The frequency of a given clone in a pool can be indicated by the intensity of the corresponding cell if desired. Further, the definition of a clone (defaulting to clone_id) can be modified by the overlapping_features parameter. For example, to track clonal CDR3 amino-acids rather than clone_id, one can specify overlapping_features=['cdr3_aa'].

UpSet plots are an extension of Venn diagrams to show large numbers of categories. These can be plotted using the plot_upset function and are highly configurable.

See the API documents to see all parameters for these functions.

overlap

In [1]:

import hicutils as hu

df = hu.io.read_directory('../example_data_immunedb')

String plot of overlapping clones by replicate¶

In [2]:

_ = hu.plots.plot_strings(df, 'replicate_name')

String plot of all clones by replicate¶

In [3]:

_ = hu.plots.plot_strings(df, 'replicate_name', only_overlapping=False)

String plot of all clones heatmapped by frequency¶

In [4]:

_ = hu.plots.plot_strings(df, 'replicate_name', only_overlapping=False)

String plot of all clones heatmapped by log-scaled frequency¶

In [5]:

_ = hu.plots.plot_strings(df, 'replicate_name', only_overlapping=False, scale='log')

String plot of top 25 clones, heatmapped by log-scaled frequency, and annotated¶

In [6]:

_ = hu.plots.plot_strings(df, 'replicate_name', only_overlapping=False, scale=True,
                          limit=25, ylabels='full', figsize=(8, 8))

Track CDR3 AAs through all replicates (across donors)¶

In [7]:

_ = hu.plots.plot_strings(df, 'subject', only_overlapping=False,
                          overlapping_features=('cdr3_aa',))

Highlighting rows¶

Highlight rows with a list¶

Highlight the clones with CDR3s CARGYCSGGSCYSNAFDIW and CARDQGDYGDYYFDYW in red, and clones with CDR3s CARDPFPPEQPIDYW and CARDMTTVMYYMDVW in green.

In [8]:

clones_to_highlight = [
    ('#ff0000', ['CARGYCSGGSCYSNAFDIW', 'CARDQGDYGDYYFDYW']),
    ('#00ff00', ['CARDPFPPEQPIDYW', 'CARDMTTVMYYMDVW']),
]

_ = hu.plots.plot_strings(df, 'replicate_name', only_overlapping=True, 
                          overlapping_features=['cdr3_aa'],
                          highlight=clones_to_highlight,
                          figsize=(8, 8))

Highlight rows with a function¶

Highlight all clones in IgH_HPAP017_rep1_200p0ng in red.

In [9]:

def highlight_clones(df):
    return [
        ('#ff0000', df[df['IgH_HPAP017_rep1_200p0ng'] > 0].index)
    ]

_ = hu.plots.plot_strings(df, 'replicate_name', only_overlapping=False, figsize=(8, 8),
                          highlight=highlight_clones)

Basic UpSet Plot¶

In [10]:

_ = hu.plots.plot_upset(df, 'replicate_name')

Only show overlapping clones¶

In [11]:

_ = hu.plots.plot_upset(df, 'replicate_name', min_degree=2)

Show SHM and CDR3 length for each category¶

In [12]:

_ = hu.plots.plot_upset(df, 'replicate_name', subplots=['shm', 'cdr3_num_nts'])

Change the definition of a "clone" to just CDR3 AA¶

In [13]:

_ = hu.plots.plot_upset(df, 'replicate_name', clone_features=['cdr3_aa'])

Show counts and percentages¶

In [14]:

_ = hu.plots.plot_upset(df, 'replicate_name', show_counts=True, show_percentages=True)

hicutils.plots.overlap.plot_similarity_heatmap(df, pool, dist_func_name, clone_features='clone_id', cutoff_func=None, **kwargs)¶

Generates an UpSet plot of clonal data. The UpSet plot may be scaled by clones or copies with size and the definition of a clone can be varied with the clone_features parameter. Further, distributions of other variables such as cdr3_num_nts and shm can be placed above each intersection bar with subplots.

Parameters¶

dfpd.DataFrame: The DataFrame to use as the source of clonal overlap information.
poolstr: How to pool the clones to calculate similarity
dist_func_namefunction: Function to use for similarity calculation. Accepts jaccard or cosine.
clone_featureslist(str): The feature(s) to use for clone definition. The default clone_id uses the clone definitions in df. This can be altered to any other columns in the DataFrame such as cdr3_aa.
cutoff_funcfunc(df) -> float: A function returning a cutoff to designate the maximum value in the DataFrame. All values greater than or equal to the returned value are remapped to the returned value.

Returns¶

A tuple (g, df) where g is a handle to the plot and df is the underlying overlap DataFrame.

hicutils.plots.overlap.plot_strings(df, pool, only_overlapping=True, overlapping_features=('clone_id', 'cdr3_aa', 'v_gene', 'j_gene'), scale=False, limit=None, ylabels='counts', col_order=None, row_order=None, order=None, pivot_hook=None, col_namer=<function <lambda>>, highlight=None, **kwargs)¶

Creates an overlap string plot where each row represents a clone and each column represents a pool. Among other features, the definition of a clone can be modified and the heatmap can be boolean or scaled to the number of copies a clone comprises in each pool.

Parameters¶

dfpd.DataFrame

The DataFrame to use for tracking clones.

poolstr

The column to use for pooling clones into columns.

only_overlappingbool

If set to True (the default), only clones overlapping at least two pools will be included in the overlap plot.

overlapping_featureslist

The feature(s) to use to track clones across pools. By default the clone_id value is used. To alter this behavior, this value can be changed to any clonal information field such as cdr3_aa, v_gene, j_gene, and cdr3_nt.

This is particularly useful to track clones across donors where the clone_id will differ but the cdr3_aa can be used instead.

scalebool or log

If scale=False (the default) presence of a clone in a pool is indicated by blue and absence by gray. When scale=True the color of each clone/pool indicates the total number of copies. Setting scale='log' changes the scale to be the log10 of copies.

limitint or None

If set to an integer n, limits the number of clones to the top n.

ylabelscounts or full

If set to counts (the default) y-axis ticks will be shown indicating the number of clones in the plot. If set to full, all features in overlapping_features will be shown for each row.

col_orderfunction or None

A function that is passed the pd.DataFrame and shall return a list of columns in the desired order.

row_orderfunction or None

A function that is passed the pd.DataFrame and shall return a list of row indexes.

pivot_hookfunction or None

A function to call on the pivoted table. Useful for filtering sequences based on their frequency across pools.

col_namerfunction

A function to rename columns. The function should accept a tuple and return a formatted string version.

highlightlist or function

A list of two-value tuples in the format [(color_hex, [indices], …] to highlight. Each item in the list specifies a color to use and the row indices to highlight with the color. The highlights are applied in order, so row indices which occur multiple times are colored by the last item in the list.

The indices should be match the format specified in clone_features.

Alternatively, a function can be passed which returns an array formatted as described and shown above.

For example, the following will color the CDR3 CARAFDHW in red and CARESLRFMDVW in green:

[
    ('#ff0000', ['CARAFDHW']),
    ('#00ff00', ['CARESLRFMDVW']),
]

Returns¶

A tuple (g, df) where g is a handle to the plot and df is the underlying DataFrame.

hicutils.plots.overlap.plot_upset(df, pool, size='clones', clone_features=['clone_id'], subplots=(), subplot_kind='violin', **kwargs)¶

Generates an UpSet plot of clonal data. The UpSet plot may be scaled by clones or copies with size and the definition of a clone can be varied with the clone_features parameter. Further, distributions of other variables such as cdr3_num_nts and shm can be placed above each intersection bar with subplots.

Parameters¶

dfpd.DataFrame: The DataFrame to use as the source of clonal overlap information.
poolstr: How to pool the clones to calculate overlap. Each pool value will be treated as a category in the UpSet plot.
sizestr, clones or copies: The number to use as the cardinality of overlap sizes.
clone_featureslist(str): The feature(s) to use for clone definition. The default clone_id uses the clone definitions in df. This can be altered to any other columns in the DataFrame such as cdr3_aa to track clones across subjects.
subplotslist(str): Features to plot as sns.catplot``s above each intersection bar. Valid options are ``shm and cdr3_num_nts.
subplot_kindstr: The kind of plot to use for subplots. Any valid sns.catplot type is allowed (e.g. box, violin)
kwargsdict: Other parameters to pass to usp.UpSet

Returns¶

A tuple (g, df) where g is a handle to the plot and df is the underlying overlap DataFrame.

Somatic hypermutation (SHM)¶

The somatic hypermutation (SHM) of a dataset can be plotted in a variety of ways including as a distribution, bar/violin plots, and as a range plot.

shm

In [1]:

import hicutils as hu

df = hu.io.read_directory('../example_data_immunedb')
pdf = (
    df
    .pipe(hu.pooling.pool_by, 'disease') # Pool on disease
    .pipe(hu.filters.filter_functional) # Filter on functional
    .pipe(hu.filters.filter_by_overall_copies, 2) # Filter on 2+ copy
)

SHM distribution plot (by clones)¶

In [2]:

_ = hu.plots.plot_shm_distribution(pdf, 'disease', 'clones')

SHM distribution plot (by copies)¶

In [3]:

_ = hu.plots.plot_shm_distribution(pdf, 'disease', 'copies')

Aggregate SHM plot¶

In [4]:

_ = hu.plots.plot_shm_aggregate(pdf, 'disease')

Aggregate SHM plot as boxplot¶

In [5]:

_ = hu.plots.plot_shm_aggregate(pdf, 'disease', kind='box')

Basic SHM range plot using boxplot¶

In [6]:

_ = hu.plots.plot_shm_range(pdf, 'disease')

SHM range plot with custom cutoffs¶

In [7]:

_ = hu.plots.plot_shm_range(pdf, 'disease', buckets=(1, 2, 5, 10, 15))

SHM range plot with custom colors¶

In [8]:

import seaborn as sns

_ = hu.plots.plot_shm_range(pdf, 'disease', color=sns.color_palette('Blues'))

Plotting fraction of mutated clones¶

A clone is considered mutated if it's average SHM is 2% or greater.

In [9]:

_ = hu.plots.plot_mutated_fraction(df, 'subject')

With custom mutation fraction of 5%:

In [10]:

_ = hu.plots.plot_mutated_fraction(df, 'subject', threshold=5)

hicutils.plots.shm.plot_most_mutated_pie(df, pool, colors, **kwargs)¶

Plots the most mutated pool in df as a pie chart.

Parameters¶

dfpd.DataFrame: The DataFrame used to plot the SHM.
poolstr: The pool to use for plotting.

Returns¶

A tuple (g, df) where g is a handle to the plot and df is the underlying DataFrame.

hicutils.plots.shm.plot_mutated_fraction(df, pool, threshold=2.0, **kwargs)¶

Plots the fraction of clones with greater than threshold SHM in each pool.

Parameters¶

dfpd.DataFrame: The DataFrame used to plot the SHM.
poolstr: The pool to use for plotting.
thresholdfloat: The SHM percentage threshold to use to determine if a clone is mutated.

Returns¶

A tuple (g, df) where g is a handle to the plot and df is the underlying DataFrame.

hicutils.plots.shm.plot_shm_aggregate(df, pool, **kwargs)¶

Categorically plots the SHM of each pool.

Parameters¶

dfpd.DataFrame: The DataFrame used to plot the SHM.
poolstr: The pool to use for plotting.

Returns¶

A tuple (g, df) where g is a handle to the plot and df is the underlying DataFrame.

hicutils.plots.shm.plot_shm_distribution(df, pool, size_metric, palette=None, hue_order=None, **kwargs)¶

Plots the SHM distribution of a pooled DataFrame using either clones, copies, or uniques as a size metric.

Parameters¶

dfpd.DataFrame: The DataFrame used to plot the SHM distribution.
poolstr: The pool to use for plotting.
size_metricstr: The metric to determine each clones’ size. Must be clones, copies, or uniques.

Returns¶

A tuple (g, df) where g is a handle to the plot and df is the underlying DataFrame.

hicutils.plots.shm.plot_shm_range(df, pool, buckets=(1, 10, 25), order=None, **kwargs)¶

Plot the range of clonal SHM for each pool.

Parameters¶

dfpd.DataFrame: The DataFrame used to plot the SHM.
poolstr: The pool to use for plotting.
bucketslist(int): A list of cut-points to bin SHM. The default is (1, 10, 25) meaning clones will be stratified by SHM into the buckets [1, 10), [10, 25), and 25+. All intervals are left-closed; that is the lesser value in each interval is inclusive and the greater value is exclusive.

Returns¶

A tuple (g, df) where g is a handle to the plot and df is the underlying DataFrame.

CDR3 analysis¶

A number of CDR3 analysis plots are provided including CDR3 amino-acid usage both as a heatmap and also as logo plots. Additionally CDR3 spectratypes can be created to show the CDR3 length distribution and highlight the top copy clones.

cdr3_analysis

In [1]:

import hicutils as hu

df = hu.io.read_directory('../example_data_immunedb')
pdf = (
    df
    .pipe(hu.pooling.pool_by, 'disease') # Pool on disease
    .pipe(hu.filters.filter_functional) # Filter on functional
    .pipe(hu.filters.filter_by_overall_copies, 2) # Filter on 2+ copy
)

CDR3 length distribution¶

In [11]:

_ = hu.plots.plot_cdr3_distribution(pdf, 'disease', aspect=1.5)

/Users/arosenfeld/Documents/repos/hicutils/hicutils/plots/cdr3_analysis.py:188: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object.
To preserve the previous behavior, use

	>>> .groupby(..., group_keys=False)

To adopt the future behavior and silence this warning, use 

	>>> .groupby(..., group_keys=True)
  df

CDR3 usage plot, normalized by row¶

In [2]:

_ = hu.plots.plot_cdr3_aa_usage(pdf, 'disease', figsize=(10, 3))

CDR3 usage plot, normalized by column¶

In [3]:

_ = hu.plots.plot_cdr3_aa_usage(pdf, 'disease', normalize_by='cols', figsize=(10, 3))

CDR3 usage plot, weighted by copies¶

In [4]:

_ = hu.plots.plot_cdr3_aa_usage(pdf, 'disease', size_metric='copies', figsize=(10, 3))

CDR3 AA Logo for 10-AA length clones¶

In [5]:

g, m = hu.plots.plot_cdr3_logo(pdf, 'cdr3_aa', 10)

/Users/arosenfeld/Documents/repos/hicutils/venv/lib/python3.8/site-packages/logomaker/src/matrix.py:584: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
  counts_df.loc[:, c] = tmp_mat.sum(axis=0).T

CDR3 NT Logo for 21-NT length clones¶

In [6]:

g, m = hu.plots.plot_cdr3_logo(pdf, 'cdr3_nt', 21)

/Users/arosenfeld/Documents/repos/hicutils/venv/lib/python3.8/site-packages/logomaker/src/matrix.py:584: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)`
  counts_df.loc[:, c] = tmp_mat.sum(axis=0).T

CDR3 spectratype¶

In [7]:

_ = pdf.groupby('disease').apply(hu.plots.plot_cdr3_spectratype)

CDR3 spectratype, coloring only the top 5 clones¶

In [8]:

_ = pdf.groupby('disease').apply(hu.plots.plot_cdr3_spectratype, color_top=5)

hicutils.plots.cdr3_analysis.plot_cdr3_aa_usage(df, pool, size_metric='clones', normalize_by='rows', cluster_by='both', figsize=(20, 10))¶

Plots CDR3 amino-acid usage separated by pool.

Parameters¶

dfpd.DataFrame: The DataFrame to use as the source of CDR3 amino-acid usage information.
poolstr: The pooling column to use for each row of the heatmap.
size_metricstr: The size metric which is plotted as the intensity of each cell. Must be one of clones, copies, or uniques.
normalize_bystr: Sets how to normalize the plot. If set to rows (the default) each row is normalized to sum to one. Setting it to cols causes each column (amino-acid) to sum to one.
cluster_bystr (rows, cols, or both) or None: Sets which clustering to display. Valid values are rows, cols, both, or clustering can be disabled with None.

Returns¶

A tuple (g, df) where g is a handle to the plot and df is the underlying DataFrame.

hicutils.plots.cdr3_analysis.plot_cdr3_distribution(df, pool, size_metric='clones', **kwargs)¶

Plots CDR3 length distribution.

Parameters¶

dfpd.DataFrame: The DataFrame to use for plotting CDR3 length.
poolstr: The pooling column to use for hue value.
size_metricstr: The size metric to use as the height for each bar.

Returns¶

A tuple (g, df) where g is a handle to the plot and df is the underlying DataFrame.

hicutils.plots.cdr3_analysis.plot_cdr3_logo(df, by, length, hide_ambig=True, **kwargs)¶

Creates a logo plot for CDR3 strings of a given length either by amino-acid or nucleotide.

Parameters¶

dfpd.DataFrame: The DataFrame to use as the source of CDR3 information.
bystr: Either cdr3_aa to plot amino-acids or cdr3_nt to plot nucleotides.
lengthint: The length of CDR3s to plot. Interpreted as the length of by.

Returns¶

A tuple (g, df) where g is a handle to the plot and df is the underlying DataFrame.

hicutils.plots.cdr3_analysis.plot_cdr3_spectratype(df, color_top=10, **kwargs)¶

Plots CDR3 length while annotating and highlighting the top color_top clones.

Parameters¶

dfpd.DataFrame: The DataFrame to use for plotting CDR3 length.
color_topint: The number of clones to highlight (default 10).

Returns¶

A tuple (g, df) where g is a handle to the plot and df is the underlying DataFrame.

Plotting¶

Clone Size¶

Clone size distribution¶

Clone size distribution with cutoff¶

Top clone plots¶

Top clones with custom cutoff¶

Top clones with custom cutoff & annotation¶

Dx plot¶

Clone size range plot¶

Clone size range plot with custom ranges¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Gene Usage¶

Basic VH dot plot¶

Filter out VHs with a frequency < 0.5%¶

Basic VH usage heatmap¶

Normalized by column¶

Weighted by copies¶

Disable clustering¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Clonal Overlap¶

String plot of overlapping clones by replicate¶

String plot of all clones by replicate¶

String plot of all clones heatmapped by frequency¶

String plot of all clones heatmapped by log-scaled frequency¶

String plot of top 25 clones, heatmapped by log-scaled frequency, and annotated¶

Track CDR3 AAs through all replicates (across donors)¶

Highlighting rows¶

Highlight rows with a list¶

Highlight rows with a function¶

Basic UpSet Plot¶

Only show overlapping clones¶

Show SHM and CDR3 length for each category¶

Change the definition of a "clone" to just CDR3 AA¶

Show counts and percentages¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Somatic hypermutation (SHM)¶

SHM distribution plot (by clones)¶

SHM distribution plot (by copies)¶

Aggregate SHM plot¶

Aggregate SHM plot as boxplot¶

Basic SHM range plot using boxplot¶

SHM range plot with custom cutoffs¶

SHM range plot with custom colors¶

Plotting fraction of mutated clones¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

Parameters¶

Returns¶

CDR3 analysis¶

CDR3 length distribution¶

CDR3 usage plot, normalized by row¶

CDR3 usage plot, normalized by column¶

CDR3 usage plot, weighted by copies¶

CDR3 AA Logo for 10-AA length clones¶

CDR3 NT Logo for 21-NT length clones¶

CDR3 spectratype¶

CDR3 spectratype, coloring only the top 5 clones¶

Parameters¶

Returns¶

D_{x plot¶}