Handling Zman-seq Timestamps

CellDyc was applied to the GBM Monocyte Differentiation dataset (See details here), where temporal information was derived from Zman-seq rather than traditional time-series sampling. Since Zman-seq provides continuous in vivo timestamping, the resulting temporal labeling is highly asynchronous.

Import Packages

import scanpy as sc
import matplotlib.pyplot as plt
import celldyc as cdc

Load the Data

The analysis is based on in-built GBM Monocyte Differentiation dataset.

# Load Zman-seq data
adata = cdc.datasets.mono2tam()
adata
AnnData object with n_obs × n_vars = 3108 × 4407
    obs: 'batch', 'mouse', 'time_assignment', 'cluster_colors', 'n_genes', 'Treatment', 'combined', 'Treatment_cluster'
    uns: 'cluster_colors_colors'
    obsm: 'X_mcg', 'X_pca'
    layers: 'spliced', 'unspliced'

This dataset contains cells from 4 time points and the temporal label is highly asynchronous.

fig, ax = plt.subplots(figsize=(5,4))
ax = sc.pl.embedding(
    adata,
    color=["time_assignment"],  
    basis="mcg",              
    ax=ax,
    title="Zman-seq time points",                 
    legend_loc="lower right",  
    s=50, 
    frameon=False                   
)
_images/7416a17c81b7ff7cdbe62aac108b4ca9b9c83057faf224a38439f2c2ebc3623f.png

Preprocess the Data

adata = cdc.tl.preprocess(adata)

We transform descriptive time point strings into numerical format.

cat_map = {'12H': 12, '24H': 24, '36H': 36, '48H': 48}
adata.obs['numerical_time'] = adata.obs['time_assignment'].map(cat_map).astype('category')

Estimation of Time Representation and Transcriptomic Velocity

We train CellDyc using the recover_dyc function. By default, predicted velocities are stored in adata.layers[‘velocity’], and predicted time values are stored in adata.obs[‘getime’].

cdc.tl.recover_dyc(adata, time_key="numerical_time", time_weight=0.001)
Training with early stop (max_epochs=500, patience=40)
epoch   1:loss=0.926128,trend_loss=0.925350,time_loss=0.778284
epoch  51:loss=0.623676,trend_loss=0.623134,time_loss=0.542161
epoch 101:loss=0.598081,trend_loss=0.597530,time_loss=0.551124
epoch 151:loss=0.577577,trend_loss=0.577017,time_loss=0.560104
Early stopping at epoch 191
AnnData object with n_obs × n_vars = 3108 × 2000
    obs: 'batch', 'mouse', 'time_assignment', 'cluster_colors', 'n_genes', 'Treatment', 'combined', 'Treatment_cluster', 'numerical_time', 'getime'
    var: 'n_cells', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'getime_weights'
    uns: 'cluster_colors_colors', 'time_assignment_colors', 'log1p', 'hvg', 'pca', 'neighbors'
    obsm: 'X_mcg', 'X_pca'
    varm: 'PCs'
    layers: 'spliced', 'unspliced', 'velocity'
    obsp: 'distances', 'connectivities'
cdc.pl.getime_violin(adata,"getime","numerical_time",xlabel="Zman-seq time",remove_ticks=False)
_images/d684a6792b74f009edb67d404cc3144dc08c6868eb3d0eee82364e31552b7e5a.png
<Axes: xlabel='Zman-seq time', ylabel='Gene-embedded time'>

We then project the velocities onto the metacell graph projection.

cdc.pl.plot_velocity_projection(
    adata, 
    basis="mcg",            
    color='cluster_colors',                        
    legend_loc="right",  
    figsize=(5, 5)          
)
computing velocity graph
finished.
_images/a324bba993aa98ade37a1b4bfc06bf75650167e5be172d5c04b990d9595eef3c.png
computing velocity embedding
finished.