-
Notifications
You must be signed in to change notification settings - Fork 4
Add custom analyses
Welcome to the realm of customization in CellTracksColab! This guide will help you extend the platform's capabilities to suit your specific needs. Whether you're aiming to implement a unique data-loading strategy or compute custom track metrics, you're in the right place. However, before diving deep into custom analyses, it's essential to understand the foundation: the way CellTracksColab handles and stores data.
CellTracksColab harnesses the power of Jupyter notebooks, providing a flexible and interactive platform for cell tracking analysis. The inherent modularity of these notebooks is a critical feature that makes CellTracksColab so adaptable to a researcher's needs. Here's how this modularity aids in integrating custom analyses:
-
Ease of Insertion: CellTracksColab's structure allows for integrating new cells. Whether halfway through an analysis or starting, seamlessly insert a new cell to add custom code, enhancing your analysis capabilities.
-
Iterative Development: The cell-based structure of CellTracksColab promotes iterative code development. As you introduce custom analyses, you can refine, test, and optimize them step-by-step, all within the platform's interactive environment.
In summary, CellTracksColab's modularity, stemming from its Jupyter notebook foundation, provides an accommodating environment for researchers to introduce and test custom cell tracking analyses seamlessly.
In CellTracksColab, the core of most analyses revolves around two DataFrames: merged_tracks_df
and merged_spots_df
. These DataFrames ensure a standardized structure, facilitating consistent data access and manipulation throughout the platform.
-
TRACK_ID
: Unique identifier for each track (in each loaded file). -
File_name
: Source file name for the track data. -
Condition
: Denotes the experimental condition. -
experiment_nb
: Represents the experiment or repeat number. -
Repeat
: A generated identifier for each condition's repeat. -
Unique_ID
: The unique Track identifier formed by combiningFile_name
andTRACK_ID
.
-
POSITION_X
,POSITION_Y
,POSITION_Z
: Spatial coordinates of the tracked object. -
POSITION_T
: The time or frame of the spot capture. -
TRACK_ID
: Links the spot to its parent track (in each loaded file). -
File_name
: Source file name for the spot data. -
Condition
: Denotes the experimental condition. -
experiment_nb
: Represents the experiment or repeat number. -
Repeat
: A generated identifier for each condition's repeat. -
Unique_ID
: A unique identifier formed by combiningFile_name
andTRACK_ID
.
This DataFrame is a derivative of merged_spots_df
and serves as the primary data source for computing track metrics in the notebook. It is generated after the filtering and smoothing stages.
CellTracksColab's data loaders transform raw data into the standardized merged_tracks_df
and merged_spots_df
. If your data diverges from the default format, crafting a custom data loader becomes necessary.
- Identify Data Source: Point to where your tracking data resides.
- Metadata Extraction: Design a strategy to pull key metadata from your data source.
- Data Loading & Transformation: Read and mold your data into the desired format.
-
Include Essential Columns: Ensure the presence of
Unique_ID
,Repeat
,Condition
, andFile_name
in both DataFrames. - DataFrame Validation: Ensure that your transformed data aligns with CellTracksColab's requirements.
- Save Processed Data: Store the processed data for subsequent analyses.
def populate_columns(df, filename):
"""
Extract metadata from filename and populate DataFrame columns.
Adjust based on your naming convention.
"""
condition = filename.split('_')[0]
repeat = filename.split('_')[1]
df['Condition'] = condition
df['Repeat'] = repeat
df['File_name'] = filename
df['Unique_ID'] = filename + "_" + df['TRACK_ID'].astype(str)
return df
def load_and_populate(folder_path):
track_dfs, spot_dfs = [], []
for filepath in glob.glob(f"{folder_path}/*.csv"):
filename = os.path.basename(filepath)
if "track" in filename:
track_df = pd.read_csv(filepath)
track_dfs.append(populate_columns(track_df, filename))
elif "spot" in filename:
spot_df = pd.read_csv(filepath)
spot_dfs.append(populate_columns(spot_df, filename))
return pd.concat(track_dfs, ignore_index=True), pd.concat(spot_dfs, ignore_index=True)
To bring in a new track metric, follow the steps below:
- Metric Selection: Decide on the metric to compute.
- Metric Calculation: Create a function to compute the metric for each track.
-
Apply Function to Data: Use
groupby
andapply
to compute the metric for each track based on its spots. Do not forget to sort your data according to time before applying your function. - Merge with Track Data: Integrate the computed metric with the main track data.
The FMI quantifies cell migration directionality. It's the ratio of total forward displacement to the cell's total path length. Below we compute the FMI in 2D and assume that the positive X-direction represents the "forward" direction for the cell's movement.
def calculate_fmi(group):
group = group.sort_values('POSITION_T')
deltas = np.sqrt(group['POSITION_X'].diff().fillna(0)**2 + group['POSITION_Y'].diff().fillna(0)**2)
total_path_length = deltas.sum()
total_forward_displacement = group['POSITION_X'].diff().fillna(0).sum()
return pd.Series({'FMI': total_forward_displacement / total_path_length if total_path_length != 0 else 0})
merged_spots_df.sort_values(by=['Unique_ID', 'POSITION_T'], inplace=True)
df_fmi = merged_spots_df.groupby('Unique_ID').apply(calculate_fmi).reset_index()
overlapping_columns = merged_tracks_df.columns.intersection(df_fmi.columns).drop('Unique_ID')
merged_tracks_df.drop(columns=overlapping_columns, inplace=True)
merged_tracks_df = pd.merge(merged_tracks_df, df_fmi, on='Unique_ID', how='left')
---