Skip to content

Add custom analyses

guijacquemet edited this page Oct 21, 2023 · 3 revisions

Add Custom Analyses

Welcome to the realm of customization in CellTracksColab! This guide will help you extend the platform's capabilities to suit your specific needs. Whether you're aiming to implement a unique data-loading strategy or compute custom track metrics, you're in the right place. However, before diving deep into custom analyses, it's essential to understand the foundation: the way CellTracksColab handles and stores data.

CellTracksColab harnesses the power of Jupyter notebooks, providing a flexible and interactive platform for cell tracking analysis. The inherent modularity of these notebooks is a critical feature that makes CellTracksColab so adaptable to a researcher's needs. Here's how this modularity aids in integrating custom analyses:

  • Ease of Insertion: CellTracksColab's structure allows for integrating new cells. Whether halfway through an analysis or starting, seamlessly insert a new cell to add custom code, enhancing your analysis capabilities.

  • Iterative Development: The cell-based structure of CellTracksColab promotes iterative code development. As you introduce custom analyses, you can refine, test, and optimize them step-by-step, all within the platform's interactive environment.

In summary, CellTracksColab's modularity, stemming from its Jupyter notebook foundation, provides an accommodating environment for researchers to introduce and test custom cell tracking analyses seamlessly.


1. Understanding Data Storage in CellTracksColab

In CellTracksColab, the core of most analyses revolves around two DataFrames: merged_tracks_df and merged_spots_df. These DataFrames ensure a standardized structure, facilitating consistent data access and manipulation throughout the platform.

a. The merged_tracks_df DataFrame

  • TRACK_ID: Unique identifier for each track (in each loaded file).
  • File_name: Source file name for the track data.
  • Condition: Denotes the experimental condition.
  • experiment_nb: Represents the experiment or repeat number.
  • Repeat: A generated identifier for each condition's repeat.
  • Unique_ID: The unique Track identifier formed by combining File_name and TRACK_ID.

b. The merged_spots_df DataFrame

  • POSITION_X, POSITION_Y, POSITION_Z: Spatial coordinates of the tracked object.
  • POSITION_T: The time or frame of the spot capture.
  • TRACK_ID: Links the spot to its parent track (in each loaded file).
  • File_name: Source file name for the spot data.
  • Condition: Denotes the experimental condition.
  • experiment_nb: Represents the experiment or repeat number.
  • Repeat: A generated identifier for each condition's repeat.
  • Unique_ID: A unique identifier formed by combining File_name and TRACK_ID.

c. The spots_df_to_use DataFrame

This DataFrame is a derivative of merged_spots_df and serves as the primary data source for computing track metrics in the notebook. It is generated after the filtering and smoothing stages.


2. Creating a Custom Data Loader

CellTracksColab's data loaders transform raw data into the standardized merged_tracks_df and merged_spots_df. If your data diverges from the default format, crafting a custom data loader becomes necessary.

Steps:

  1. Identify Data Source: Point to where your tracking data resides.
  2. Metadata Extraction: Design a strategy to pull key metadata from your data source.
  3. Data Loading & Transformation: Read and mold your data into the desired format.
  4. Include Essential Columns: Ensure the presence of Unique_ID, Repeat, Condition, and File_name in both DataFrames.
  5. DataFrame Validation: Ensure that your transformed data aligns with CellTracksColab's requirements.
  6. Save Processed Data: Store the processed data for subsequent analyses.

Example: Loading CSV Files generated by TrackMate:

def populate_columns(df, filename):
    """
    Extract metadata from filename and populate DataFrame columns.
    Adjust based on your naming convention.
    """
    condition = filename.split('_')[0]
    repeat = filename.split('_')[1]
    df['Condition'] = condition
    df['Repeat'] = repeat
    df['File_name'] = filename
    df['Unique_ID'] = filename + "_" + df['TRACK_ID'].astype(str)
    return df

def load_and_populate(folder_path):
    track_dfs, spot_dfs = [], []
    for filepath in glob.glob(f"{folder_path}/*.csv"):
        filename = os.path.basename(filepath)
        if "track" in filename:
            track_df = pd.read_csv(filepath)
            track_dfs.append(populate_columns(track_df, filename))
        elif "spot" in filename:
            spot_df = pd.read_csv(filepath)
            spot_dfs.append(populate_columns(spot_df, filename))
    return pd.concat(track_dfs, ignore_index=True), pd.concat(spot_dfs, ignore_index=True)

3. Computing a new Track Metric in CellTracksColab

To bring in a new track metric, follow the steps below:

  1. Metric Selection: Decide on the metric to compute.
  2. Metric Calculation: Create a function to compute the metric for each track.
  3. Apply Function to Data: Use groupby and apply to compute the metric for each track based on its spots. Do not forget to sort your data according to time before applying your function.
  4. Merge with Track Data: Integrate the computed metric with the main track data.

Example: Computing the Forward Migration Index (FMI):

The FMI quantifies cell migration directionality. It's the ratio of total forward displacement to the cell's total path length. Below we compute the FMI in 2D and assume that the positive X-direction represents the "forward" direction for the cell's movement.

def calculate_fmi(group):
    group = group.sort_values('POSITION_T')
    deltas = np.sqrt(group['POSITION_X'].diff().fillna(0)**2 + group['POSITION_Y'].diff().fillna(0)**2)
    total_path_length = deltas.sum()
    total_forward_displacement = group['POSITION_X'].diff().fillna(0).sum()
    return pd.Series({'FMI': total_forward_displacement / total_path_length if total_path_length != 0 else 0})

merged_spots_df.sort_values(by=['Unique_ID', 'POSITION_T'], inplace=True)
df_fmi = merged_spots_df.groupby('Unique_ID').apply(calculate_fmi).reset_index()

overlapping_columns = merged_tracks_df.columns.intersection(df_fmi.columns).drop('Unique_ID')
merged_tracks_df.drop(columns=overlapping_columns, inplace=True)
merged_tracks_df = pd.merge(merged_tracks_df, df_fmi, on='Unique_ID', how='left')
Clone this wiki locally