Tapestri Insights v3 User Guide
Tapestri Insights is a turnkey standalone software application for single-cell DNA and Protein analysis.
Overview
Key Features
- Find clusters of cells based on DNA or protein data
- Review variants and proteins
- Filter data
- Export publication-ready visualizations
- Export data for custom visualization and analysis
Primary Input/Output Files
Input Files
- .h5 file from Tapestri Pipeline for importing samples
- Optional: Whitelist file with known chromosomal locations with pathogenic variants
Output Files
- .zip file containing filtered data
- .csv files listing sample summary, clustering, variants, proteins
- .png for variant filters
- .png files for bar plots, fish plots, and violin plots
Home Page
New Analysis
To start a new analysis, import an .h5 file.
Recent Analyses
This list contains any .h5 files imported into Tapestri Insights. Click the icon to delete the analysis.
New to Tapestri Insights
For an overview of Tapestri Insights, see this article.
What's new in this version?
For the latest release notes, see this article.
Need additional information?
Click this link to view the Tapestri Insights User Guide.
New Technical Support?
Click this link to contact Technical Support.
Navigation
Home
Click this button to return to the home page to import a new .h5 file or look at a different analysis.
Help
Click this button to link to the appropriate page in the Tapestri Insights User Guide in the Support Center.
Maximize / Minimize
Click this button to maximize any window.
Click this button to minimize any window.
To exit full-screen mode for Tapestri Insights on a Macintosh computer, click at the very top of the maximized window, and the menu bar will display. Click the green button to restore the app to the normal size.
Sort
Click this icon in any table and sort the data by that column. Each dialog displays the information contained in that column. You can sort in ascending and descending order.
Filter Tab
Filtered Data
This shows the number of cells, variants, and proteins in the filtered data.
Filter Tab Export
The export button at the top left of the page has three options for exporting data:
1. Export Filtered Data
Exporting filtered or selected data creates a .zip file that contains the following files.
AF.csv
This file contains the allele frequency broken down by sample per subclone and cell for each variant. The frequency is calculated as the number of reads with evidence for a mutation out of the total number of reads times 100.
clustering.csv
This file contains a matrix with a subclone label for each cell calculated by the last clustering performed. This file is only exported from the Explore Tab because clustering must be performed first.
DP.csv
This file contains the read depth (DePth) broken down by sample per subclone and cell for each variant. The read depth per variant metric is the filtered depth at the cell level. This provides the number of filtered reads that support each of the reported alleles.
GQ.csv
This file contains the genotype quality score broken down by sample per subclone and cell for each variant. The genotype quality score represents the Phred-scaled confidence that the genotype assignment (GT) is correct, derived from the genotype normalized likelihoods of possible genotypes (PL). Refer to this section for more information.
NGT.csv
This file contains the numbered genotype call converted to categorical values for all filtered data cells and variants broken down by the sample name.
The values represent:
0 – Reference
1 – Heterozygous mutation
2 – Homozygous mutation
3 – Unknown
The calls are made by GATK/HaplotypeCaller.
protein_asinh_normalized_reads.csv
This file contains a numeric matrix with normalized protein reads using the inverse hyperbolic sine function (asinh). It is only available if there is protein data.
protein_clr_normalized_reads.csv
This file contains a numeric matrix with normalized protein reads using the centered log ratio function (clr). It is only available if there is protein data.
protein_reads.csv
This file contains a numeric matrix with raw protein as present in the .h5 file. It is is only available if there is protein data.
proteins.csv
This file contains a list of all proteins. It is only available if there is protein data.
protein_umap.csv
This file contains the numeric matrix with X and Y coordinates for each cell, calculated using UMAP on top of the selected protein layer. It is only available if the last projection was done on top of protein data. This file is only exported from the Explore Tab because clustering must be performed first.
README.txt
This file explains the .csv files contained in the .zip file.
variant_umap.csv
This file contains the numeric matrix with X and Y coordinates for each cell, calculated using UMAP on top of the selected variant later. It is only available if the last projection was done on top of variant data. This file is only exported from the Explore Tab because clustering must be performed first.
variants.csv
This file contains the variant metadata.
2. Exported Sample Summary
Exporting the sample summary data creates a .csv file that contains:
- Sample name
- Analytes – DNA, Protein
- # Cells
- # and % Low-Quality Variants
- # and % Low-Quality Cells
- # and % Low-Frequency Variants
3. Exported Variant Filters
This .png file contains an image of all the variant filters used along with the plots viewed in Tapestri Insights. See this section of the User Guide for details on each filter. It will look like this:
Whitelist
Upload a file that lists known chromosomal locations with pathogenic variants. Variants that match the whitelist will not be affected by the filtering parameters. Genotype calls for these variants remain exactly as output by the GenomeAnalysisToolkit (GATK) pipeline.
The format of the whitelisted variants file follows the BED file format or the Designer CSV file format for region-based targets. When adding a whitelist file, select *.bed or *.csv at the bottom of the open dialog.
After adding a whitelist file, a Remove all other variants checkbox appears. Select it to remove all variants from the samples that are not in the whitelist.
Only one whitelist file is allowed.
Note: .bed files from the Tapestri Portal Datasets will not work as whitelist files.
Filters
When you start an analysis, each filter is set to the recommended thresholds described in the Variant Filters section. Recommended displays in the Filters dropdown list on the left side of the Filter Tab page, indicating that all filters are set at the recommended levels.
Customize a filter by entering a number or adjusting the numbers using the up and down arrows and clicking inside the graph area. After adjusting any of the parameters, Custom displays in the dropdown menu.
Apply: Apply the change. Changes must be applied before taking effect.
Cancel: Cancel the change. The value(s) will revert back to the default.
When you select Recommended from the dropdown menu, the filter value(s) reverts to the recommended settings. Applying this option returns all the thresholds to the default values.
Variant Filters
The first three filters work on individual cells, and the next three work on all cells collectively. All six filter parameters can be adjusted. The filters are applied in the order listed.
The Advance Filtering page displays six graphs:
- Remove genotype in cell with quality < X
- Remove genotype in cell with read depth < X
- Remove het mutant genotype in cell with VAF < X and > Y
- Remove variants genotyped in < X% of cell (low-quality variants filter)
- Remove cells with < X% of genotypes present (low-quality cells filter)
- Remove variants mutated in < X% of cells (low-frequency variants filter)
Cell-specific filters
These filters work on individual cells. The filtering criteria are applied to the individual cells, and the genotype for the cells is removed based on the given value.
1. Remove genotype in cell with quality < X
This filter removes the genotype from all the cells that have quality less than the given value.
The genotype quality score represents the Phred-scaled confidence that the genotype assignment (GT) is correct, derived from the genotype normalized likelihoods of possible genotypes (PL). Specifically, the quality score is the difference between the PL of the second most likely genotype and the PL of the most likely genotype. The values of the PLs are normalized so that the most likely PL is always 0. So, the quality score ends up being equal to the second smallest PL, unless that PL is greater than 99. In GATK, the value of the quality score is capped at 99 because larger values are not more informative, but they take up more space in the file. So if the second most likely PL is greater than 99, we still assign a quality score of 99.
Basically, the genotype quality score gives you the difference between the likelihood of the two most likely genotypes. If it is low, there is not much confidence in the genotype, i.e., there was not enough evidence to confidently choose one genotype over another.
Default: We recommend removing genotypes with a quality less than 30.
2. Remove genotype in cell with read depth < X
This filter removes the genotype from all the cells that have read depth less than the given value.
The read depth per variant metric is the filtered depth at the cell level. This shows the number of filtered reads that support each of the reported alleles.
Default: We recommend removing genotypes with read depths of less than 10.
3. Remove het mutant genotype in cell with VAF < X and > Y
This filter removes the het genotype from all the cells that have VAF values less than the X value and greater than the Y value.
The alternate allele frequency is the ratio of reads to the given alternate allele to the total reads at that position. Reads considered uninformative are not included. Reads are considered uninformative when they do not provide enough statistical evidence to support one allele over another.
Default: We recommend removing genotypes with VAF values lower than 20 and greater than 100.
Global filters
These filters work at a global level, removing cells and variants from the current analysis. The number of cells and variants that are discarded by these 3 filters are shown in the Sample Summary section in the Filter tab.
4. Remove variants genotyped in < X% of cell
This metric is the proportion of cells that have genotype information available. For example, a threshold of 80 % retains only variants for which information is available in at least 80 % of all cells. Variants with information in fewer than 80 % of cells are removed.
Default: We recommend removing variants genotyped in less than 50 % of the cells.
5. Remove cells with < X% of genotypes present
This metric is the proportion of genotype information available on a per-cell basis. For example, a threshold of 80 % retains only the cells with at least 80 % of genotype information available for any given variant.
Default: We recommend removing cells with less than 50 % of the genotypes present.
6. Remove variants mutated in < X% of cells
This metric is the percentage of cells across all cells with a genotype call as a non-reference genotype, e.g., heterozygous or homozygous alternate. Any variant with cells genotyped as HET or HOM ALT in less than the given threshold value would be discarded from the analysis.
The select threshold is dependent on the sample type and the scientific question one intends to address. For example, rare subclone detection requires a lower threshold.
Default: We recommend removing variants mutated in less than 1 % of the cells.
Sample Summary
The Sample Summary table displays the following information:
- Sample name
- Analytes – DNA, Protein
- # Cells
- # and % Low-Quality Variants
- # and % Low-Quality Cells
- # and % Low-Frequency Variants
Click this icon to rename the sample.
Review Variants
The Review Variants table lists all the variants available for analysis, including the following information:
- Function
- Coding impact
- DANN
- RefSeq transcript id
- cDNA
- dbSNP rsids
- Protein
- ClinVar
- Variant type
- Gene
- Allele Freq (gnomAD)
- COSMIC ids
- Whitelist
- # Genotyped Cells
- # Mutated Cells
A checkmark next to a variant name means that variant will be included in the analysis. To remove the variant from the analysis, uncheck the box.
Click the icon to display the annotations for the variant.
To the right side of the page, use the search field to search for a specific variant.
Click this icon to display the Variant Table Parameters. A checkmark means the parameter will display in the table. Uncheck the box to hide the information.
Use the sort icon in the table to sort the data by a specific parameter, either in ascending or descending order.
Review Proteins
The Review Proteins table displays the proteins available for analysis. A checkmark next to a protein name means that protein will be included in the analysis. To remove the protein from the analysis, uncheck the box.
Use the sort icon in the table to sort the proteins in ascending or descending order.
Explore Tab
This shows the number of cells, variants, proteins, and subclones in the selected data.
Selecting or deselecting Samples and Subclones works independently for each visualization. For example, if you deselect all samples in UMAP, they will still be selected in the other visualizations until you uncheck them there.
Explore Tab Export
The export button on the Explore Tab provides different options based on whether you have Variants, Proteins, or Subclones selected below the plot window. The options also change based on the visualization. All visualizations provide the following options based on the variant/protein/subclone selection. Additional options are available for violin, bar, and fish plots. They are discussed in those sections.
The exported filtered data exports a .zip file. For more information on what it contains, see this section.
The variants export creates a .csv file that contains the variants along with the feature score, function, protein, coding impact, the variant table parameters, whitelist status, number of genotyped cells, and number of mutated cells.
The subclones export creates a .csv file that contains the subclones along with the number of cells and all the selected samples.
UMAP
This visualization displays data from a specific layer (either a variant data layer or protein data layer) projected into 2D space using UMAP (Uniform Manifold Approximation and Projection). For protein data, it is first z-score normalized, and then dimensionally reduced using PCA and then projected to 2D using UMAP. Each dot represents one cell in the projected space. The color of the dot depends on the color by option.
Source allows you to choose between clustering by:
- DNA by VAF (variant allele frequency)
- DNA by NGT (numbered genotype)
- Proteins by CLR (centered log ratio) normalization counts
- Proteins by Asinh (inverse hyperbolic sine function) normalization counts
Color by colors the visualization by:
- Subclones
- Variants by NGT (numbered genotype)
- Variants by DP (read DePth)
- Variants by GQ (genotype quality)
- Variants by AF (allele frequency)
- Proteins by CLR (centered log ratio) normalization counts
- Proteins by Asinh (inverse hyperbolic sine function) normalization counts
Choosing the color by subclones option performs cell clustering using the source layer chosen above. Clustering can take quite some time, depending on the number of cells and variants or proteins.
Choosing coloring by sample colors the cells according to the sample they come from.
Choosing coloring by layer (either variant or protein) displays a list of features from a specific layer. Selecting one or more features creates one or multiple subgraphs, where each subgraph colors the cells by value from a specifically chosen feature.
Dimensionality reduction allows you to configure the number of PCA (principal component analysis) components that the original source is reduced to before performing UMAP. This option is available only if you select a Protein layer as a source. Clustering is performed on top of the PCA components, so choosing a different number of components affects the found clusters. The default value is 10.
Clustering on a variant layer does not use PCA, but instead, it performs clustering directly on the UMAP output.
Clustering options allow you to tweak the parameters used for the clustering of cells. This option is only available when coloring by subclones since this affects the subclone calculations.
Variant Clustering
Variant data clustering supports only DBScan as an algorithm.
DBScan
- Eps (epsilon)
- A measure of how close the points are.
- Default value: 0.70
- Similarity
- The proportion of variants that must be similar in order to combine multiple clusters into one or to identify mixed cell line populations.
- Default value: 0.80
Protein Clustering
Protein data clustering supports two algorithms: KMeans and Louvain.
KMeans
- N. of clusters is the number of clusters to be generated.
- Default value: 5
Louvain
- N. of neighbours is the number of nearest neighbours to consider in the shared nearest neighbour graph generation.
- Default value: 100
Show Samples allows the customization of which samples to display. A UMAP must contain at least one sample. This selection only applies to this visualization.
Show Subclones allows the customization of which subclones to display. A UMAP must contain at least one subclone. This selection only applies to this visualization.
Settings allow you to customize the display colors for the subclones. Click the icon, change the color, and click Apply. To revert back to the default colors, click Reset.
The Plot Controls in the top-right of the plot window allow you to zoom or pan the plot. You have the option to zoom in and zoom out by scrolling on the plots. You can select a specific section of the plot to view it in detail. The autoscale option resets the view.
XY Plot
This visualization displays the relationship between two proteins. Each dot represents a cell where the x-coordinate corresponds to the first protein’s normalized count, and the y-coordinate corresponds to the second protein’s normalized count. The color of the dot depends on the color by option, with samples being the default.
Source allows you to choose between CLR (centered log ratio) and Asinh (inverse hyperbolic sine function) protein normalization counts.
Axis allows you to choose which proteins to display on the x-axis and y-axis.
Color by colors the visualization by:
- Samples
- Variants by NGT (numbered genotype)
- Variants by DP (DePth)
- Variants by GQ (genotype quality)
- Variants by AF (allele frequency)
- Proteins by CLR (centered log ratio) normalization counts
- Proteins by Asinh (inverse hyperbolic sine function) normalization counts
Select which variants to include.
Choosing coloring by sample colors the cells according to the sample they come from.
Choosing coloring by layer (either variant or protein) displays a list of features from a specific layer. Selecting one or more features creates one or multiple subgraphs, where each subgraph colors the cells by value from a specifically chosen feature.
Show Samples allows the customization of which samples to display. An XY plot must contain at least one sample. This selection only applies to this visualization.
Show Subclones allows the customization of which subclones to display. An XY plot must contain at least one subclone. This selection only applies to this visualization.
Settings allow you to customize the display colors for the subclones. Click the icon, change the color, and click Apply. To revert back to the default colors, click Reset.
The Plot Controls in the upper-right of the plot window allow you to zoom or pan the plot. You have the option to zoom in and zoom out by scrolling on the plots. You can select a specific section of the plot to view it in detail. The autoscale option resets the view.
Violin Plot
This visualization shows the distribution of values from the selected layer and feature. It displays the violin plot showing the distribution of values using dots, which display sampled cells, a green line showing the median value, and a red line showing the mean value. The distribution of values is by default split across the samples and subclones.
Plot Controls
Layer/Feature selection allows you to choose from:
- Variants by DP (read DePth)
- Variants by GQ (genotype quality)
- Variants by AF (allele frequency)
- Proteins by Asinh (inverse hyperbolic sine function) normalization counts
- Proteins by CLR (centered log ratio) normalization counts
and the feature(variant or protein) that the user is interested in.
Group by sample displays subclones on a per-sample basis. Individual subclones are color-coded differently.
Group by subclone displays samples on a per-subclone basis. Individual samples are color-coded differently.
First group by determines whether the data will be grouped by sample or subclone first.
Color by colors the visualization by either sample or subclone.
Show allows you to select what you want to display – violin distribution, mean, median, and sub-sampled cells.
Show Samples allows the customization of which samples to display. A violin plot must contain at least one sample. This selection only applies to this visualization.
Show Subclones allows the customization of which subclones to display. A violin plot must contain at least one subclone. This selection only applies to this visualization.
Settings allow you to customize the display colors for the subclones. Click the icon, change the color, and click Apply. To revert back to the default colors, click Reset.
Export
The export button for the Violin Plot includes all the Explore Tab Export options described above. It also provides a violin plot.
The violin plot export creates a .pdf, .png, or .svg file with the violin plot visualization.
Bar Plot
This visual is a representation of the distribution of cells in the subclones in a box plot format. At least one sample and one subclone are required to display a bar plot.
Plot Controls
Group by sample displays subclones on a per-sample basis. Individual subclones are color-coded differently.
Group by subclone displays samples on a per-subclone basis. Individual samples are color-coded differently.
Show percentage displays the subclones on a fractional basis 0 – 100 %. If unchecked, subclones display on a cell-number basis.
Show labels adds labels to each subclone.
Show Samples allows the customization of which samples to display. A bar plot must contain at least one sample. This selection only applies to this visualization.
Show Subclones allows the customization of which subclones to display. A bar plot must contain at least one subclone. This selection only applies to this visualization.
Settings allow you to customize the display colors for the subclones. Click the icon, change the color, and click Apply. To revert back to the default colors, click Reset.
Export
The export button for the Bar Plot includes all the Explore Tab Export options described above. It also provides a violin plot.
The bar plot export creates a .pdf, .png, or .svg file with the bar plot visualization.
Fish Plot
Before a fish plot can display, first perform subclustering in the UMAP visualization. At least two samples and one subclone are required to display a fish plot.
Plot Controls
In the Time Points section, select which samples to include in the fish plot by checking or unchecking the checkbox next to the sample name. A fish plot must contain at least two samples.
Edit the order of the time points by clicking the icon to determine the location of the sample in the plot. The number must be unique and non-negative. Samples are ordered by ascending order of the values, with spacing being relative to the difference between two consecutive numbers. For example, 0, 1, and 100 will display 0 and 1 very close to one another on the left and 100 on the far right.
Show Subclones allows the customization of which subclones to display. A fish plot must contain at least one subclone. This selection only applies to this visualization.
Subclone Hierarchy: The subclones can be rearranged and regrouped to depict the clonal evolution. Select the subclone and use the arrow keys at the bottom to move the subclones to define a hierarchy. Uncheck the subclone box to remove it from the plot.
The use case for this is to see mutations over time. For example, the first sample may come from the time of diagnosis, the second during treatment, the third during remission, and the fourth at relapse.
Settings allow you to customize the display colors for the subclones. Click the icon, change the color, and click Apply. To revert back to the default colors, click Reset.
Export
The export button for the Fish Plot includes all the Explore Tab Export options described above. It also provides a fish plot.
The fish plot export creates a .pdf, .png, or .svg file with the fish plot visualization.
Variants / Proteins / Subclones
Depending on the option selected and the visualization, the Export options vary. Please refer to this section for more information.
Variants
The Variants section on the Explore Tab page contains all the data, except the per sample statistics, on the Filter Tab page. For more information, refer to that section. However, one additional field displays in this table – Feature score. This only displays if clustering is done on top of variant data.
If the last clustering was performed on top of variant data, the Variant table displays an additional column named Feature score next to the variant name.
The Feature score displays how much the median AF value for the feature differs across different clusters. If the score is 0, it means that for all the clusters, the median AF value of all cells inside the clusters is the same, and this specific variant does not differ across clusters. If the score is 100, this means that there are at least two clusters where cells in the first cluster have a median AF of 0 (WT – wild type), and for the cells from another cluster, the median AF is 100 (pure HOM). Scores can fall anywhere between those two values.
Clicking this icon displays a tooltip.
Proteins
This table lists all the proteins in the selected data.
If the last clustering was done on top of protein data, the Protein table will gain additional columns, one for each cluster. The columns list the p-value (T-statistic) for each of the features in that cluster compared to all other clusters. The lower the p-value, the higher the significance of that feature for the cluster.
Clicking this icon displays a tooltip.
Use the sort icon in the table to sort the proteins in ascending or descending order.
Subclones
This table lists all the subclones in the selected data.
This table shows general subclone information based on the last completed clustering operation. You will see how cells are distributed across different clusters and samples if you have a multi-sample .h5 file. You can also rename the subclone by clicking on the subclone name and entering a new name, which will then be used in all other visualizations. Renamed subclones do not survive subclone recalculation – changing the clustering parameters or clustering source will reset the names.