Plotting¶
The hicutils.plotting module provides all plotting functions. Each
plotting function returns both a handle to the underlying figure as well as the
pd.DataFrame which was used to create the plot.
Clone Size¶
A variety of clone size plots are provide to visualize the overall clonal landscape of a dataset.
import hicutils as hu
df = hu.io.read_directory('../example_data_immunedb')
pdf = (
df
.pipe(hu.pooling.pool_by, 'disease') # Pool on disease
.pipe(hu.filters.filter_functional) # Filter on functional
.pipe(hu.filters.filter_by_overall_copies, 2) # Filter on 2+ copy
)
Clone size distribution¶
_ = hu.plots.plot_clone_sizes(pdf, aspect=2)
Clone size distribution with cutoff¶
_ = hu.plots.plot_clone_sizes(pdf, cutoff=20)
Top clone plots¶
_ = pdf.groupby('disease').apply(hu.plots.plot_top_clones)
Top clones with custom cutoff¶
_ = pdf.groupby('disease').apply(hu.plots.plot_top_clones, cutoff=50)
Top clones with custom cutoff & annotation¶
_ = pdf.groupby('disease').apply(hu.plots.plot_top_clones, cutoff=15,
annotate=['v_gene', 'cdr3_aa'])
Dx plot¶
_ = hu.plots.plot_d_index(pdf, 'disease', height=4, aspect=.75)
/Users/arosenfeld/Documents/repos/hicutils/hicutils/plots/clone_size.py:258: UserWarning: FixedFormatter should only be used together with FixedLocator g.axes.flatten()[0].set_xticklabels(
Clone size range plot¶
_ = hu.plots.plot_ranges(pdf, 'disease')
/Users/arosenfeld/Documents/repos/hicutils/hicutils/plots/clone_size.py:223: UserWarning: FixedFormatter should only be used together with FixedLocator ax.set_yticklabels([round(abs(tick), 2) for tick in ax.get_yticks()])
Clone size range plot with custom ranges¶
_ = hu.plots.plot_ranges(pdf, 'disease', intervals=(10, 100, 1000, 5000))
/Users/arosenfeld/Documents/repos/hicutils/hicutils/plots/clone_size.py:223: UserWarning: FixedFormatter should only be used together with FixedLocator ax.set_yticklabels([round(abs(tick), 2) for tick in ax.get_yticks()])
- hicutils.plots.clone_size.plot_clone_counts(df, pool, **kwargs)¶
Plots the number of clones per
pool.Parameters¶
- dfpd.DataFrame
The DataFrame used to plot the clone size distribution.
- poolstr
The field on which to pool.
Returns¶
A tuple
(g, df)wheregis a handle to the plot anddfis the underlying DataFrame.
- hicutils.plots.clone_size.plot_clone_sizes(df, cutoff=None, **kwargs)¶
Plots the distribution of clone sizes in
df.Parameters¶
- dfpd.DataFrame
The DataFrame used to plot the clone size distribution.
- cutoffint or None
Aggregate all clones with
cutoffor more copies into one bin on the right side of the graph. This is useful to condense the tail of the plotted distribution.
Returns¶
A tuple
(g, df)wheregis a handle to the plot anddfis the underlying DataFrame.
- hicutils.plots.clone_size.plot_d_index(df, pool, cutoff=20, **kwargs)¶
Plots the Dx index for clones in
df. The default cutoff value is 20 and the generated figure is a dot plot of Dx values stratified bypool.Parameters¶
- dfpd.DataFrame
The DataFrame used to plot the top clones.
- cutoffint
The D-value to use as a cutoff, defaults to 20.
Returns¶
A tuple
(g, df)wheregis a handle to the plot anddfis the underlying DataFrame.
- hicutils.plots.clone_size.plot_top_clones(df, cutoff=20, annotate=False, color=(0.8392156862745098, 0.15294117647058825, 0.1568627450980392), figsize=(12, 8))¶
Plots the copy-number frequency of the top
cutoffclones (default 20). Optionally, theannotatekeyword can be set to one or more clone features to annotate each bar. For example settingannotate=('v_gene', 'cdr3_aa')will show the V-gene and CDR3 AA for each clone.Parameters¶
- dfpd.DataFrame
The DataFrame used to plot the top clones.
- cutoffint
The number of clones to plot, defaults to 20.
- annotatestr, list, or None
The feature(s) to annotate for each clone
- colorstr
The color to use for bars.
- figsizetuple
The
(width, height)of the plot.
Returns¶
A tuple
(g, df)wheregis a handle to the plot anddfis the underlying DataFrame.
Gene Usage¶
The gene usage plots show V- or J-gene usage grouped by pool. This can be useful for investigating gene skewing in different populations. Each plot can be scaled in various ways and clustered by row, column, both, or neither.
import hicutils as hu
df = hu.io.read_directory('../example_data_immunedb')
pdf = (
df
.pipe(hu.pooling.pool_by, 'disease') # Pool on disease
.pipe(hu.filters.filter_functional) # Filter on functional
.pipe(hu.filters.filter_by_overall_copies, 2) # Filter on 2+ copy
)
Basic VH dot plot¶
_ = hu.plots.plot_gene_frequency(pdf, ['subject', 'disease'], 'v_gene', by='subject', height=4)
/Users/arosenfeld/Documents/repos/hicutils/hicutils/plots/gene_usage.py:102: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object. To preserve the previous behavior, use >>> .groupby(..., group_keys=False) To adopt the future behavior and silence this warning, use >>> .groupby(..., group_keys=True) pdf /Users/arosenfeld/Documents/repos/hicutils/hicutils/plots/gene_usage.py:119: UserWarning: FixedFormatter should only be used together with FixedLocator g.axes[0][0].set_xticklabels(g.axes[0][0].get_xticklabels(), rotation=90)
Filter out VHs with a frequency < 0.5%¶
_ = hu.plots.plot_gene_frequency(
hu.filters.filter_by_gene_frequency(pdf, 0.005),
['subject', 'disease'], 'v_gene',
by='subject',
height=4
)
/Users/arosenfeld/Documents/repos/hicutils/hicutils/plots/gene_usage.py:102: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object. To preserve the previous behavior, use >>> .groupby(..., group_keys=False) To adopt the future behavior and silence this warning, use >>> .groupby(..., group_keys=True) pdf /Users/arosenfeld/Documents/repos/hicutils/hicutils/plots/gene_usage.py:119: UserWarning: FixedFormatter should only be used together with FixedLocator g.axes[0][0].set_xticklabels(g.axes[0][0].get_xticklabels(), rotation=90)
Basic VH usage heatmap¶
_ = hu.plots.plot_gene_heatmap(pdf, 'disease', 'v_gene', figsize=(12, 5))
Normalized by column¶
_ = hu.plots.plot_gene_heatmap(pdf, 'disease', 'v_gene', normalize_by='cols',
figsize=(12, 5))
Weighted by copies¶
_ = hu.plots.plot_gene_heatmap(pdf, 'disease', 'v_gene', size_metric='copies',
figsize=(12, 5))
Disable clustering¶
_ = hu.plots.plot_gene_heatmap(pdf, 'disease', 'v_gene', cluster_by=None,
figsize=(12, 5))
- hicutils.plots.gene_usage.plot_gene_frequency(df, pool, gene, size_metric='clones', by=None, **kwargs)¶
Generates a gene-usage dot/bar plot showing the utilization of each V or J gene based on pools.
Parameters¶
- dfpd.DataFrame
The DataFrame to use as the source of gene usage information.
- poolstr
The pooling column to use for each row of the heatmap.
- genestr (
v_geneorj_gene) The gene to plot. Must be either
v_geneorj_gene.- size_metricstr
The size metric which is plotted as the intensity of each cell. Must be one of
clones,copies, oruniques.- bystr
The feature to use as the
huevariable for the plot. Must be included in thepoolparameter.
Returns¶
A tuple
(g, df)wheregis a handle to the plot anddfis the underlying DataFrame.
- hicutils.plots.gene_usage.plot_gene_heatmap(df, pool, gene, min_frequency=0, size_metric='clones', normalize_by='rows', cluster_by='both', **kwargs)¶
Generates a gene-usage heatmap showing the utilization of each V or J gene based on pools.
Parameters¶
- dfpd.DataFrame
The DataFrame to use as the source of gene usage information.
- poolstr
The pooling column to use for each row of the heatmap.
- genestr (
v_geneorj_gene) The gene to plot. Must be either
v_geneorj_gene.- min_frequencyfloat
The minimum frequency across all pools allowed to be included in the heatmap.
- size_metricstr
The size metric which is plotted as the intensity of each cell. Must be one of
clones,copies, oruniques.- normalize_bystr
Sets how to normalize the plot. If set to
rows(the default) each row is normalized to sum to one. Setting it tocolscauses each column (gene) to sum to one.- cluster_bystr (
rows,cols, orboth) or None Sets which clustering to display. Valid values are
rows,cols,both, or clustering can be disabled withNone.
Returns¶
A tuple
(g, df)wheregis a handle to the plot anddfis the underlying DataFrame.
Clonal Overlap¶
Clonal overlap can be visualized by string plots with the plot_strings
function or as UpSet plots with the plot_upset function.
For string plots, each row represents a clone and each column a pool. The
frequency of a given clone in a pool can be indicated by the intensity of the
corresponding cell if desired. Further, the definition of a clone (defaulting
to clone_id) can be modified by the overlapping_features parameter.
For example, to track clonal CDR3 amino-acids rather than clone_id, one can
specify overlapping_features=['cdr3_aa'].
UpSet plots are an extension of Venn diagrams to show large numbers of
categories. These can be plotted using the plot_upset function and are
highly configurable.
See the API documents to see all parameters for these functions.
import hicutils as hu
df = hu.io.read_directory('../example_data_immunedb')
String plot of overlapping clones by replicate¶
_ = hu.plots.plot_strings(df, 'replicate_name')
String plot of all clones by replicate¶
_ = hu.plots.plot_strings(df, 'replicate_name', only_overlapping=False)
String plot of all clones heatmapped by frequency¶
_ = hu.plots.plot_strings(df, 'replicate_name', only_overlapping=False)
String plot of all clones heatmapped by log-scaled frequency¶
_ = hu.plots.plot_strings(df, 'replicate_name', only_overlapping=False, scale='log')
String plot of top 25 clones, heatmapped by log-scaled frequency, and annotated¶
_ = hu.plots.plot_strings(df, 'replicate_name', only_overlapping=False, scale=True,
limit=25, ylabels='full', figsize=(8, 8))
Track CDR3 AAs through all replicates (across donors)¶
_ = hu.plots.plot_strings(df, 'subject', only_overlapping=False,
overlapping_features=('cdr3_aa',))
Highlighting rows¶
Highlight rows with a list¶
Highlight the clones with CDR3s CARGYCSGGSCYSNAFDIW and CARDQGDYGDYYFDYW in red, and clones with CDR3s CARDPFPPEQPIDYW and CARDMTTVMYYMDVW in green.
clones_to_highlight = [
('#ff0000', ['CARGYCSGGSCYSNAFDIW', 'CARDQGDYGDYYFDYW']),
('#00ff00', ['CARDPFPPEQPIDYW', 'CARDMTTVMYYMDVW']),
]
_ = hu.plots.plot_strings(df, 'replicate_name', only_overlapping=True,
overlapping_features=['cdr3_aa'],
highlight=clones_to_highlight,
figsize=(8, 8))
Highlight rows with a function¶
Highlight all clones in IgH_HPAP017_rep1_200p0ng in red.
def highlight_clones(df):
return [
('#ff0000', df[df['IgH_HPAP017_rep1_200p0ng'] > 0].index)
]
_ = hu.plots.plot_strings(df, 'replicate_name', only_overlapping=False, figsize=(8, 8),
highlight=highlight_clones)
Basic UpSet Plot¶
_ = hu.plots.plot_upset(df, 'replicate_name')
Only show overlapping clones¶
_ = hu.plots.plot_upset(df, 'replicate_name', min_degree=2)
Show SHM and CDR3 length for each category¶
_ = hu.plots.plot_upset(df, 'replicate_name', subplots=['shm', 'cdr3_num_nts'])
Change the definition of a "clone" to just CDR3 AA¶
_ = hu.plots.plot_upset(df, 'replicate_name', clone_features=['cdr3_aa'])
Show counts and percentages¶
_ = hu.plots.plot_upset(df, 'replicate_name', show_counts=True, show_percentages=True)
- hicutils.plots.overlap.plot_similarity_heatmap(df, pool, dist_func_name, clone_features='clone_id', cutoff_func=None, **kwargs)¶
Generates an UpSet plot of clonal data. The UpSet plot may be scaled by clones or copies with
sizeand the definition of a clone can be varied with theclone_featuresparameter. Further, distributions of other variables such ascdr3_num_ntsandshmcan be placed above each intersection bar withsubplots.Parameters¶
- dfpd.DataFrame
The DataFrame to use as the source of clonal overlap information.
- poolstr
How to pool the clones to calculate similarity
- dist_func_namefunction
Function to use for similarity calculation. Accepts
jaccardorcosine.- clone_featureslist(str)
The feature(s) to use for clone definition. The default
clone_iduses the clone definitions indf. This can be altered to any other columns in the DataFrame such ascdr3_aa.- cutoff_funcfunc(df) -> float
A function returning a cutoff to designate the maximum value in the DataFrame. All values greater than or equal to the returned value are remapped to the returned value.
Returns¶
A tuple
(g, df)wheregis a handle to the plot anddfis the underlying overlap DataFrame.
- hicutils.plots.overlap.plot_strings(df, pool, only_overlapping=True, overlapping_features=('clone_id', 'cdr3_aa', 'v_gene', 'j_gene'), scale=False, limit=None, ylabels='counts', col_order=None, row_order=None, order=None, pivot_hook=None, col_namer=<function <lambda>>, highlight=None, **kwargs)¶
Creates an overlap string plot where each row represents a clone and each column represents a pool. Among other features, the definition of a clone can be modified and the heatmap can be boolean or scaled to the number of copies a clone comprises in each pool.
Parameters¶
- dfpd.DataFrame
The DataFrame to use for tracking clones.
- poolstr
The column to use for pooling clones into columns.
- only_overlappingbool
If set to
True(the default), only clones overlapping at least two pools will be included in the overlap plot.- overlapping_featureslist
The feature(s) to use to track clones across pools. By default the
clone_idvalue is used. To alter this behavior, this value can be changed to any clonal information field such ascdr3_aa,v_gene,j_gene, andcdr3_nt.This is particularly useful to track clones across donors where the
clone_idwill differ but thecdr3_aacan be used instead.- scalebool or
log If
scale=False(the default) presence of a clone in a pool is indicated by blue and absence by gray. Whenscale=Truethe color of each clone/pool indicates the total number of copies. Settingscale='log'changes the scale to be the log10 of copies.- limitint or None
If set to an integer
n, limits the number of clones to the topn.- ylabels
countsorfull If set to
counts(the default) y-axis ticks will be shown indicating the number of clones in the plot. If set tofull, all features inoverlapping_featureswill be shown for each row.- col_orderfunction or None
A function that is passed the pd.DataFrame and shall return a list of columns in the desired order.
- row_orderfunction or None
A function that is passed the pd.DataFrame and shall return a list of row indexes.
- pivot_hookfunction or None
A function to call on the pivoted table. Useful for filtering sequences based on their frequency across pools.
- col_namerfunction
A function to rename columns. The function should accept a tuple and return a formatted string version.
- highlightlist or function
A list of two-value tuples in the format [(color_hex, [indices], …] to highlight. Each item in the list specifies a color to use and the row indices to highlight with the color. The highlights are applied in order, so row indices which occur multiple times are colored by the last item in the list.
The indices should be match the format specified in clone_features.
Alternatively, a function can be passed which returns an array formatted as described and shown above.
For example, the following will color the CDR3 CARAFDHW in red and CARESLRFMDVW in green:
[ ('#ff0000', ['CARAFDHW']), ('#00ff00', ['CARESLRFMDVW']), ]
Returns¶
A tuple
(g, df)wheregis a handle to the plot anddfis the underlying DataFrame.
- hicutils.plots.overlap.plot_upset(df, pool, size='clones', clone_features=['clone_id'], subplots=(), subplot_kind='violin', **kwargs)¶
Generates an UpSet plot of clonal data. The UpSet plot may be scaled by clones or copies with
sizeand the definition of a clone can be varied with theclone_featuresparameter. Further, distributions of other variables such ascdr3_num_ntsandshmcan be placed above each intersection bar withsubplots.Parameters¶
- dfpd.DataFrame
The DataFrame to use as the source of clonal overlap information.
- poolstr
How to pool the clones to calculate overlap. Each pool value will be treated as a category in the UpSet plot.
- sizestr,
clonesorcopies The number to use as the cardinality of overlap sizes.
- clone_featureslist(str)
The feature(s) to use for clone definition. The default
clone_iduses the clone definitions indf. This can be altered to any other columns in the DataFrame such ascdr3_aato track clones across subjects.- subplotslist(str)
Features to plot as
sns.catplot``s above each intersection bar. Valid options are ``shmandcdr3_num_nts.- subplot_kindstr
The kind of plot to use for
subplots. Any validsns.catplottype is allowed (e.g.box,violin)- kwargsdict
Other parameters to pass to
usp.UpSet
Returns¶
A tuple
(g, df)wheregis a handle to the plot anddfis the underlying overlap DataFrame.
Somatic hypermutation (SHM)¶
The somatic hypermutation (SHM) of a dataset can be plotted in a variety of ways including as a distribution, bar/violin plots, and as a range plot.
import hicutils as hu
df = hu.io.read_directory('../example_data_immunedb')
pdf = (
df
.pipe(hu.pooling.pool_by, 'disease') # Pool on disease
.pipe(hu.filters.filter_functional) # Filter on functional
.pipe(hu.filters.filter_by_overall_copies, 2) # Filter on 2+ copy
)
SHM distribution plot (by clones)¶
_ = hu.plots.plot_shm_distribution(pdf, 'disease', 'clones')
SHM distribution plot (by copies)¶
_ = hu.plots.plot_shm_distribution(pdf, 'disease', 'copies')
Aggregate SHM plot¶
_ = hu.plots.plot_shm_aggregate(pdf, 'disease')
Aggregate SHM plot as boxplot¶
_ = hu.plots.plot_shm_aggregate(pdf, 'disease', kind='box')
Basic SHM range plot using boxplot¶
_ = hu.plots.plot_shm_range(pdf, 'disease')
SHM range plot with custom cutoffs¶
_ = hu.plots.plot_shm_range(pdf, 'disease', buckets=(1, 2, 5, 10, 15))
SHM range plot with custom colors¶
import seaborn as sns
_ = hu.plots.plot_shm_range(pdf, 'disease', color=sns.color_palette('Blues'))
Plotting fraction of mutated clones¶
A clone is considered mutated if it's average SHM is 2% or greater.
_ = hu.plots.plot_mutated_fraction(df, 'subject')
With custom mutation fraction of 5%:
_ = hu.plots.plot_mutated_fraction(df, 'subject', threshold=5)
- hicutils.plots.shm.plot_most_mutated_pie(df, pool, colors, **kwargs)¶
Plots the most mutated
poolindfas a pie chart.Parameters¶
- dfpd.DataFrame
The DataFrame used to plot the SHM.
- poolstr
The pool to use for plotting.
Returns¶
A tuple
(g, df)wheregis a handle to the plot anddfis the underlying DataFrame.
- hicutils.plots.shm.plot_mutated_fraction(df, pool, threshold=2.0, **kwargs)¶
Plots the fraction of clones with greater than
thresholdSHM in each pool.Parameters¶
- dfpd.DataFrame
The DataFrame used to plot the SHM.
- poolstr
The pool to use for plotting.
- thresholdfloat
The SHM percentage threshold to use to determine if a clone is mutated.
Returns¶
A tuple
(g, df)wheregis a handle to the plot anddfis the underlying DataFrame.
- hicutils.plots.shm.plot_shm_aggregate(df, pool, **kwargs)¶
Categorically plots the SHM of each pool.
Parameters¶
- dfpd.DataFrame
The DataFrame used to plot the SHM.
- poolstr
The pool to use for plotting.
Returns¶
A tuple
(g, df)wheregis a handle to the plot anddfis the underlying DataFrame.
- hicutils.plots.shm.plot_shm_distribution(df, pool, size_metric, palette=None, hue_order=None, **kwargs)¶
Plots the SHM distribution of a pooled DataFrame using either clones, copies, or uniques as a size metric.
Parameters¶
- dfpd.DataFrame
The DataFrame used to plot the SHM distribution.
- poolstr
The pool to use for plotting.
- size_metricstr
The metric to determine each clones’ size. Must be
clones,copies, oruniques.
Returns¶
A tuple
(g, df)wheregis a handle to the plot anddfis the underlying DataFrame.
- hicutils.plots.shm.plot_shm_range(df, pool, buckets=(1, 10, 25), order=None, **kwargs)¶
Plot the range of clonal SHM for each pool.
Parameters¶
- dfpd.DataFrame
The DataFrame used to plot the SHM.
- poolstr
The pool to use for plotting.
- bucketslist(int)
A list of cut-points to bin SHM. The default is
(1, 10, 25)meaning clones will be stratified by SHM into the buckets[1, 10),[10, 25), and25+. All intervals are left-closed; that is the lesser value in each interval is inclusive and the greater value is exclusive.
Returns¶
A tuple
(g, df)wheregis a handle to the plot anddfis the underlying DataFrame.
CDR3 analysis¶
A number of CDR3 analysis plots are provided including CDR3 amino-acid usage both as a heatmap and also as logo plots. Additionally CDR3 spectratypes can be created to show the CDR3 length distribution and highlight the top copy clones.
import hicutils as hu
df = hu.io.read_directory('../example_data_immunedb')
pdf = (
df
.pipe(hu.pooling.pool_by, 'disease') # Pool on disease
.pipe(hu.filters.filter_functional) # Filter on functional
.pipe(hu.filters.filter_by_overall_copies, 2) # Filter on 2+ copy
)
CDR3 length distribution¶
_ = hu.plots.plot_cdr3_distribution(pdf, 'disease', aspect=1.5)
/Users/arosenfeld/Documents/repos/hicutils/hicutils/plots/cdr3_analysis.py:188: FutureWarning: Not prepending group keys to the result index of transform-like apply. In the future, the group keys will be included in the index, regardless of whether the applied function returns a like-indexed object. To preserve the previous behavior, use >>> .groupby(..., group_keys=False) To adopt the future behavior and silence this warning, use >>> .groupby(..., group_keys=True) df
CDR3 usage plot, normalized by row¶
_ = hu.plots.plot_cdr3_aa_usage(pdf, 'disease', figsize=(10, 3))
CDR3 usage plot, normalized by column¶
_ = hu.plots.plot_cdr3_aa_usage(pdf, 'disease', normalize_by='cols', figsize=(10, 3))
CDR3 usage plot, weighted by copies¶
_ = hu.plots.plot_cdr3_aa_usage(pdf, 'disease', size_metric='copies', figsize=(10, 3))
CDR3 AA Logo for 10-AA length clones¶
g, m = hu.plots.plot_cdr3_logo(pdf, 'cdr3_aa', 10)
/Users/arosenfeld/Documents/repos/hicutils/venv/lib/python3.8/site-packages/logomaker/src/matrix.py:584: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)` counts_df.loc[:, c] = tmp_mat.sum(axis=0).T
CDR3 NT Logo for 21-NT length clones¶
g, m = hu.plots.plot_cdr3_logo(pdf, 'cdr3_nt', 21)
/Users/arosenfeld/Documents/repos/hicutils/venv/lib/python3.8/site-packages/logomaker/src/matrix.py:584: FutureWarning: In a future version, `df.iloc[:, i] = newvals` will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either `df[df.columns[i]] = newvals` or, if columns are non-unique, `df.isetitem(i, newvals)` counts_df.loc[:, c] = tmp_mat.sum(axis=0).T
CDR3 spectratype¶
_ = pdf.groupby('disease').apply(hu.plots.plot_cdr3_spectratype)
CDR3 spectratype, coloring only the top 5 clones¶
_ = pdf.groupby('disease').apply(hu.plots.plot_cdr3_spectratype, color_top=5)
- hicutils.plots.cdr3_analysis.plot_cdr3_aa_usage(df, pool, size_metric='clones', normalize_by='rows', cluster_by='both', figsize=(20, 10))¶
Plots CDR3 amino-acid usage separated by pool.
Parameters¶
- dfpd.DataFrame
The DataFrame to use as the source of CDR3 amino-acid usage information.
- poolstr
The pooling column to use for each row of the heatmap.
- size_metricstr
The size metric which is plotted as the intensity of each cell. Must be one of
clones,copies, oruniques.- normalize_bystr
Sets how to normalize the plot. If set to
rows(the default) each row is normalized to sum to one. Setting it tocolscauses each column (amino-acid) to sum to one.- cluster_bystr (
rows,cols, orboth) or None Sets which clustering to display. Valid values are
rows,cols,both, or clustering can be disabled withNone.
Returns¶
A tuple
(g, df)wheregis a handle to the plot anddfis the underlying DataFrame.
- hicutils.plots.cdr3_analysis.plot_cdr3_distribution(df, pool, size_metric='clones', **kwargs)¶
Plots CDR3 length distribution.
Parameters¶
- dfpd.DataFrame
The DataFrame to use for plotting CDR3 length.
- poolstr
The pooling column to use for hue value.
- size_metricstr
The size metric to use as the height for each bar.
Returns¶
A tuple
(g, df)wheregis a handle to the plot anddfis the underlying DataFrame.
- hicutils.plots.cdr3_analysis.plot_cdr3_logo(df, by, length, hide_ambig=True, **kwargs)¶
Creates a logo plot for CDR3 strings of a given length either by amino-acid or nucleotide.
Parameters¶
- dfpd.DataFrame
The DataFrame to use as the source of CDR3 information.
- bystr
Either
cdr3_aato plot amino-acids orcdr3_ntto plot nucleotides.- lengthint
The length of CDR3s to plot. Interpreted as the length of
by.
Returns¶
A tuple
(g, df)wheregis a handle to the plot anddfis the underlying DataFrame.
- hicutils.plots.cdr3_analysis.plot_cdr3_spectratype(df, color_top=10, **kwargs)¶
Plots CDR3 length while annotating and highlighting the top
color_topclones.Parameters¶
- dfpd.DataFrame
The DataFrame to use for plotting CDR3 length.
- color_topint
The number of clones to highlight (default 10).
Returns¶
A tuple
(g, df)wheregis a handle to the plot anddfis the underlying DataFrame.