This is a generic step by step tutorial to start using SimBA to create behavioral classifiers. For more information about the menus and options and their use cases, see the SimBA Scenario tutorials
For processing datasets, the pipeline is split into a few sections. These sections are listed below along with their corresponding functions:
Part 1: Create a new project
- Generate project config (create new classifiers or import exisiting classifiers)
- Import videos into project folder
- Import DLC Tracking Data (if have any)
- Extract Frames into project folder
Part 2: Load project
- Load the project.ini
- Set video parameters
- Outlier correction
- Extract Features
- Label Behavior
- Train Machine Model
- Run Machine Model
- Analyze Machine Results
- Visualization
- Plot Graphs
- Merge Frames
- Create Video
This section describes how to create a new project for your tracking analysis.
In this step you create your main project folder with all the required sub-directories.
- In the main SimBA window, click on
File
and andCreate a new project
. The following windows will pop up.
-
Navigate to the
[ Generate project config ]
tab. Under General Settings, specify aProject Path
which is the directory that will contain your main project folder. -
Project Name
is the name of your project. Keep in mind that the project name cannot contain spaces. Instead use underscore "_" -
Under
SML Settings
, put in the number of predictive classifiers that you wish to create. For an example, if you had three behaviors in your video, put 3 in the entry box. -
Click and it creates a row as shown in the following image. In each entry box, fill in the name of the behavior that you want to classify.
-
Type of Tracking
allows the user to choose multi-animal tracking or the classic tracking. -
Animal Settings
is the number of animals and body parts that that the pose estimation tracking data contains. The default for SimBA is 2 animals and 16 body parts (2 animals, 16bp
). There are a few other - ** yet not validaded** - options, accessible in the dropdown menu. -
Click on
Generate Project Config
to generate your project. The project folder will be located in the specifiedProject Path
.
In this step, you can choose to import either one or multiple videos. The imported videos are used for visualizing predictions and standardizing distances across videos by calculating metric distances from pixel distances.
- Navigate to the
[ Import videos into project folder ]
tab. - Under the
Import multiple videos
heading, click onBrowse Folder
to select a folder that contains all the videos that you wish to import into your project. - Enter the file type of your videos. (e.g., mp4, avi, mov, etc) in the
Video type
entry box. - Click on
Import multiple videos
.
Note: If you have a lot of videos, it might take a few minutes before all the videos are imported.
- Under the
Import single video
heading, click onBrowse File
to select your video. - Click on
Import a video
.
In this step, you will import your pose-estimation tracking data.
-
Navigate to the
[ Import tracking data ]
tab. Under theImport tracking data
click on theFile type
drop down menu. -
From the drop down menu, .csv files =
CSV (DLC/DeepPoseKit)
, and .json files =JSON (BENTO)
. -
To import multiple files, choose the folder that contains the files by clicking
Browse Folder
, then clickImport csv to project folder
. -
To import a single file, choose the file by clicking
Browse File
, then clickImport single csv to project folder
.
Please note that you can only import the h5 tracking data after you have imported the videos into the project folder.
-
From the
File type
drop down menu, selectH5 (multi-animal DLC)
. -
Under
Animal settings
, enter the number of animals in the videos in theNo of animals
entry box, and clickConfirm
. -
Enter the names for each of the animal in the video.
-
Tracking type
is the type of tracking from DeepLabCut multi animal tracking. -
Select the folder that contains all the h5 files by clicking
Browse Folder
. -
Click
Import h5
to start importing.
-
From the
File type
drop down menu, selectSLP (SLEAP)
. -
Under
Animal settings
, enter the number of animals in the videos in theNo of animals
entry box, and clickConfirm
. -
Enter the names for each of the animal in the video.
-
Select the folder that contains all the slp files by clicking
Browse Folder
. -
Click on
Import .slp
.
This step will extract all the frames from every video that are imported into the project following Step 2. The frames are used for behavioral labeling when creating classifiers, and for visualizing the classification results. Once the step is completed, close the Project Configuration
window.
Note: Splitting videos up into frames can take time if you have a lot of high resolution videos at high frame rates. The frames can also take up a lot of hard drive space.
This section describes how to load and work with created projects.
In this step you will load the project_config.ini file that was created.
Note: A project_config.ini should always be loaded before any other process.
- In the main SimBA window, click on
File
andLoad project
. The following windows will pop up.
- Click on
Browse File
. Then, go to the directory that you created your project in and click on your project folder. Locate the project_config.ini file and select it. Once this step is completed, it should look like the following, and you should no longer see the text No file selected.
In this image, you can see the Desktop
is my selected working directory, tutorial
is my project name, and the last two sections of the folder path is always going to be project_folder/project_config.ini
.
- Click on
Load Project
.
In this step, you can choose to import more pose estimation data in csv file format and/or more videos. If this isn't relevant then you can skip this step.
-
Click on the
[ Further imports (data/video/frames) ]
tab. From here you can import more data or videos into the project folder. The imported .csv files will be placed in theproject_folder/csv/input
directory, and the imported videos will be placed in theproject_folder/videos
directory. -
Once the videos are imported, you can extract frames from the additional videos by clicking on
Extract frames
under the Extract further frames into project folder heading. -
If you already have existing frames of the videos in the project folder, you can import the folder that contains the frames into the project. Under the Import frame folders heading, click on
Browse Folder
to choose the folder thar contains the frames, and click onImport frames
. The frames will be imported into theproject_folder/frames/input
folder.
In this step, you can customize the meta parameters for each of your videos (fps, resolution, metric distances) and provide additional custom video information (Animal ID, group etc). You also set the pixels per millimeter for your videos. You will be using a tool that requires the known distance between two points (e.g., the cage width or the cage height) in order to calculate pixels per millimeter. The real life distance between the two points is called Distance in mm
.
-
Under Set video parameters(distances,resolution,etc.), the entry box named
Distance in mm
is the known distance between two points in the videos in millimeter. If the known distance is the same in all the videos in the project, then enter the value (e.g,: 245) and click onAuto populate Distance in mm in tables
. and it will auto-populate the table in the next step (see below). If you leave theDistance in mm
entry box empty, the known distance will default to zero and you will fill in the value for each video individually. -
Click on
Set Video Parameters
and the following windows will pop up.
-
In the above example I imported four videos and their names are listed the leftmost
Video
column. I auto-populated the known distance to 10 millimeter in the previous step, and this is now displayed in theDistance in mm
column. -
I can click on the values in the entry boxes and change them until I am satisfied. Then, I click on
Update distance_in_mm
and this will update the whole table. -
Next, to get the
Pixels/mm
for the first video, click onVideo1
and the following window will pop up. The window that pops up displays the first frame ofVideo1
.
- Now, double left click to select two points that defines the known distance in real life. In this case, I know that the two pink connected dots represent a distance of 10 millimeter in real life.
- If you misplaced one or both of the dots, you can double click on either of the dots to place them somewhere else in the image. Once you are done, hit
Esc
.
- If every step is done correctly, the
Pixels/mm
column in the table should populate with the number of pixels that represent one millimeter,
-
Repeat the steps for every video in the table, and once it is done, click on
Save Data
. This will generate a csv file named video_info.csv in/project_folder/log
folder that contains a table with your video meta data. -
You can also chose to add further columns to the meta data file (e.g., AnimalID or experimental group) by clicking on the
Add Column
button. This information will be saved in additional columns to your video_info.csv file.
Outlier correction is used to correct gross tracking inaccuracies by detecting outliers based on movements and locations of body parts in relation to the animal body length. For more details, please click here
- Click on
Settings
and the following window will pop up. The Outlier Settings window varies with the number of animals in the project. The images below shows settings for two animals.
-
Select the body parts for Animal 1 and Animal 2 that you want to use to calculate a reference value. The reference value will be the mean or median Euclidian distance in millimeters between the two body parts of the two animals in all frames.
-
Enter values for the
Movement criterion
and theLocation criterion
.
-
Movement criterion
. A body part coordinate will be flagged and corrected as a "movement outlier" if the body part moves the reference value \times the criterion value across two sequential frames. -
Location criterion
. A body part coordinate will be flagged and correct as a "location outlier" if the distance between the body part and at least two other body parts belonging to the same animal are longer than the reference value \times the criterion value within a frame.
Body parts flagged as movement or location outliers will be re-placed in their last reliable coordinate.
-
Chose to calculate the median or mean Euclidian distance in millimeters between the two body parts and click on
Confirm Config
. -
Click to run the outlier correction. You can follow the progress in the main SimBA window. Once complete, two new csv log files will appear in the
/project_folder/log
folder. These two files contain the number of body parts corrected following the two outlier correction methods for each video in the project.
Based on the coordinates of body parts in each frame - and the frame rate and the pixels per millimeter values - the feature extraction step calculates a larger set of features used for behavioral classification. Features are values such as metric distances between body parts, angles, areas, movement, paths, and their deviations and rank in individual frames and across rolling windows. This set of features will depend on the body-parts tracked during pose-estimation (which is defined when creating the project). Click here for an example list of features when tracking 2 mice and 16 body parts.
- Click on
Extract Features
.
This step is used for label the behaviors in each frames of a video. This data will be concatenated with the features and used for creating behavioral classifiers.
There are two options, one is to start a new video annotation and one is to continue on where you last left off. Both are essentially the same, except the latter will start with the frame where you last saved. For example, one day, you started a new video by clicking Select folder with frames (create new video annotation
and you feel tired and sick of annotating the videos. You can now click Generate/Save
button to save your work for your coworker to continue. Your coworker can continue by clicking Select folder with frames(continue existing video annotation)
and select the the video folder that you have annotated half way and take it from there!
- Click on
Select folder with frames
. In your project folder navigate to the/project_folder/frames/input/
folder, and you should see folders that are named after your videos. These folders contain the extracted video frames. Select one of the folder and the following window should pop up.
-
Please click here to learn how to use the behavior annotation interface.
-
Once finished, click on
Generate/Save
and it will generate a new .csv file in /csv/targets_inserted folder.
This step is used for training new machine models for behavioral classifications.
Note: If you import existing models, you can skip this step and go straight to Step 8 to run machine models on new video data.
- Click on
Settings
and the following window will pop up.
Note: If you have a .csv file containing hyper-parameter meta data, you can import this file by clicking on
Browse File
and then click onLoad
. This will autofill all the hyper-parameter entry boxes and model evaluation settings.
- Under Machine Model, choose a machine model from the drop down menu:
RF
,GBC
,Xboost
.
-
RF
: Random forest -
GBC
: Gradient boost classifier -
Xgboost
: eXtreme Gradient boost
-
Under the Model heading, use the dropdown menu to select the behavioral classifier you wish to define the hyper-parameters for.
-
Under Hyperparameters, select the hyper-parameter settings for your model. For more details, please click here. Alternatively, import the recommended settings from a meta data file (see above, Step 1).
-
RF N estimators
: Number of decision trees in the decision ensemble. -
RF Max features
: Number of features to consider when looking for the best split. -
RF Criterion
: The metric used to measure the quality of each split, i.e "gini" or "entropy". -
Train Test Size
: The ratio of the dataset withheld for testing the model (e.g., 0.20). -
RF Min sample leaf
: The minimum number of samples required to be at a leaf node. -
Under sample setting
: "Random undersample" or "None". If "Random undersample", a random sample of the majority class will be used in the train set. The size of this sample will be taken as a ratio of the minority class and should be specified in the "under sample ratio" box below. For more information, click here. -
Under sample ratio
: The ratio of samples of the majority class to the minority class in the training data set. Applied only if "Under sample setting" is set to "Random undersample". Ignored if "Under sample setting" is set to "None" or NaN. -
Over sample setting
: "SMOTE", "SMOTEEN" or "None". If "SMOTE" or "SMOTEEN", synthetic data will be generated in the minority class based on k-means to balance the two classes. For more details, click here. Alternatively, import recommended settings from a meta data file (see Step 1). -
Over sample ratio
: The desired ratio of the number of samples in the minority class over the number of samples in the majority class after over sampling.
- Under Model Evaluation Settings.
-
Generate RF model meta data file
: Generates a .csv file listing the hyper-parameter settings used when creating the model. The generated meta file can be used to create further models by importing it in the Load Settings menu (see above, Step 1). -
Generate Example Decision Tree
: Saves a visualization of a random decision tree in .pdf and .dot formats. Requires graphviz. For more information, click here. -
Generate Classification Report
: Saves a classification report truth table in .png format. Depends on yellowbrick. For more information, click here. -
Generate Features Importance Log
: Creates a .csv file that lists the importance's (gini importances) of all features for the classifier. -
Generate Features Importance Bar Graph
: Creates a bar chart of the top N features based on gini importances. Specify N in theN feature importance bars
entry box below. -
N feature importance bars
: Integer defining the number of top features to be included in the bar graph (e.g., 15). -
Compute Feature Permutation Importance's
: Creates a .csv file listing the importance's (permutation importance's) of all features for the classifier. For more details, please click here). Note: Calculating permutation importance's is computationally expensive and takes a long time. -
Generate Sklearn Learning Curve
: Creates a .csv file listing the f1 score at different test data sizes. For more details, please click here). This is useful for estimating the benefit of annotating further data. -
LearningCurve shuffle K splits
: Number of cross validations applied at each test data size in the learning curve. -
LearningCurve shuffle Data splits
: Number of test data sizes in the learning curve. -
Generate Precision Recall Curves
: Creates a .csv file listing precision at different recall values. This is useful for titration of the false positive vs. false negative classifications of the models.
-
Click on the
Save settings into global environment
button to save your settings into the project_config.ini file and use the settings to train a single model. -
Alternatively, click on the
Save settings for specific model
button to save the settings for one model. To generate multiple models - for either multiple different behaviors and/or using multiple different hyper-parameters - re-define the Machine model settings and click onSave settings for specific model
again. Each time theSave settings for specific model
is clicked, a new config file is generated in the /project_folder/configs folder. In the next step (see below), a model for each config file will be created if pressing the Train multiple models, one for each saved settings button. -
If training a single model, click on
Train Model
.
-
Click on
Settings
. -
Under Machine Model, choose the machine model from the drop down menu,
RF
,GBC
,Xboost
. -
Under Model, select the model you wish to train from the drop down menu.
-
Then, set the Hyperparameters.
-
Click the
Save settings for specific model
button. This generates a meta.csv file, located in yourproject_folder/configs
directory, which contains your selected hyperparameters. Repeat the steps to generate multiple models. On model will be generated for each of the meta.csv files in theproject_folder/configs
directory. -
Close the
Machine models settings
window. -
Click on the green
Train Multiple Models, one for each saved settings
button.
The user can validate each model ( saved in .sav format) file. In this validation step the user specifies the path to a previously created model in .sav file format, and a .csv file containing the features extracted from a video. This process will (i) run the classifications on the video, and (ii) create a video with the predictions overlaid together with a gantt plot showing predicted behavioral bouts. Clickhere for an example validation video.
-
Click
Browse File
and select the project_config.ini file and clickLoad Project
. -
Under [Run machine model] tab --> validate Model on Single Video, select your features file (.csv). It should be located in
project_folder/csv/features_extracted
.
-
Under
Select model file
, click onBrowse File
to select a model (.sav file). -
Click on
Run Model
. -
Once, it is completed, it should print "Predictions generated.", now you can click on
Generate plot
. A graph window and a frame window will pop up.
-
Graph window
: model prediction probability versus frame numbers will be plot. The graph is interactive, click on the graph and the frame window will display the selected frames. -
Frame window
: Frames of the chosen video with controls.
- Click on the points on the graph and picture displayed on the other window will jump to the corresponding frame. There will be a red line to show the points that you have clicked.
- Once it jumps to the desired frame, you can navigate through the frames to determine if the behavior is present. This step is to find the optimal threshold to validate your model.
- Once the threshold is determined, enter the threshold into the
Discrimination threshold
entry box and the desire minimum behavior bouth length into theMinimum behavior bout lenght(ms)
entrybox.
-
Discrimination threshold
: The level of probability required to define that the frame belongs to the target class. Accepts a float value between 0.0-1.0. For example, if set to 0.50, then all frames with a probability of containing the behavior of 0.5 or above will be classified as containing the behavior. For more information on classification theshold, click here -
Minimum behavior bout length (ms)
: The minimum length of a classified behavioral bout. Example: The random forest makes the following attack predictions for 9 consecutive frames in a 50 fps video: 1,1,1,1,0,1,1,1,1. This would mean, if we don't have a minimum bout length, that the animals fought for 80ms (4 frames), took a brake for 20ms (1 frame), then fought again for another 80ms (4 frames). You may want to classify this as a single 180ms attack bout rather than two separate 80ms attack bouts. With this setting you can do this. If the minimum behavior bout length is set to 20, any interruption in the behavior that is 20ms or shorter will be removed and the behavioral sequence above will be re-classified as: 1,1,1,1,1,1,1,1,1 - and instead classified as a single 180ms attack bout.
- Click
Validate
to validate your model. Note that this step will take a long time as it will generate a lot of frames.
This step runs behavioral classifiers on new data.
- Under the Run Machine Model heading, click on
Model Selection
. The following window with the classifier names defined in the project_config.ini file will pop up.
-
Click on
Browse File
and select the model (.sav) file associated with each of the classifier names. -
Once all the models have been chosen, click on
Set Model
to save the paths. -
Fill in the
Discrimination threshold
.
Discrimination threshold
: The level of probability required to define that the frame belongs to the target class (see above).
- Fill in the
Minimum behavior bout length
.
Minimum behavior bout length (ms)
: The minimum length of a classified behavioral bout(see above).
- Click on
Set model(s)
and then click onRun RF Model
to run the machine model on the new data.
Access this menu through the Load project
menu and the Run machine model
tab. This step performs summary analyses and presents descriptive statistics in .csv file format. There are three forms of summary analyses: Analyze
, Analyze distance/velocity
, and Analyze severity
.
-
Analyze
: This button generates descriptive statistics for each predictive classifier in the project, including the total time, the number of frames, total number of ‘bouts’, mean and median bout interval, time to first occurrence, and mean and median interval between each bout. A date-time stamped output csv file with the data is saved in the/project_folder/log
folder. -
Analyze distance/velocity
: This button generates descriptive statistics for mean and median movements and distances between animals. The date-time stamped output csv file with the data is saved in the/project_folder/log
folder. -
Analyze severity
: Calculates the ‘severity’ of each frame classified as containing attack behavior based on a user-defined scale. Example: the user sets a 10-point scale. One frame is predicted to contain an attack, and the total body-part movements of both animals in that frame is in the top 10% percentile of movements in the entire video. In this frame, the attack will be scored as a 10 on the 10-point scale. A date-time stamped output .csv file containing the 'severity' data is saved in the/project_folder/log
folder. -
Severity scale 0 -
:
These steps generate visualizations of features and machine learning classification results. This includes images and videos of the animals with prediction overlays, gantt plots, line plots, paths plots and data plots. In this step the different frames can also be merged into video mp4 format.
- Under the Sklearn visualization heading, click on
Visualize classification results
. This step grabs the frames of the videos in the project, and draws circles at the location of the tracked body parts, the convex hull of the animal, and prints the behavioral predictions on top of the frame. For an example, click here.
The user can also create a range of plots: gantt plot, Data plot, Path plot, Distance plot, and Heatmap.
Gantt plot generates gantt plots that display the length and frequencies of behavioral bouts for all the videos in the project.
- Under the Gantt plot heading, click on
Generate Gantt plot
and gantt plot frames will be generated in theproject_folder/frames/output/gantt_plots
folder.
Generates 'live' data plot frames for all of the videos in the project that display current distances and velocities.
- Under the Data plot heading, click on
Generate Data plot
and data plot frames will be generated in theproject_folder/frames/output/live_data_table
folder.
Generates path plots displaying the current location of the animal trajectories, and location and severity of attack behavior, for all of the videos in the project.
- Under the Path plot heading, fill in the following user defined values.
-
Max Lines
: Integer specifying the max number of lines depicting the path of the animals. For example, if 100, the most recent 100 movements of animal 1 and animal 2 will be plotted as lines. -
Severity Scale
: Integer specifying the scale on which to classify 'severity'. For example, if set to 10, all frames containing attack behavior will be classified from 1 to 10 (see above). -
Bodyparts
: String to specify the bodyparts tracked in the path plot. For example, if Nose_1 and Centroid_2, the nose of animal 1 and the centroid of animal 2 will be represented in the path plot. -
plot_severity
: Tick this box to include color-coded circles on the path plot that signify the location and severity of attack interactions.
- Click on
Generate Path plot
, and path plot frames will be generated in theproject_folder/frames/output/path_plots
folder.
Generates distance line plots between two body parts for all of the videos in the project.
- Fill in the
Body part 1
andBody part 2
-
Body part 1
: String that specifies the the bodypart of animal 1. Eg., Nose_1 -
Body part 2
: String that specifies the the bodypart of animal 1. Eg., Nose_2
- Click on
Generate Distance plot
, and the distance plot frames will be generated in theproject_folder/frames/output/line_plot
folder.
Generates heatmap of behavior that happened in the video.
To generate heatmaps, SimBA needs several user-defined variables:
-
Bin size(mm)
: Pose-estimation coupled with supervised machine learning in SimBA gives information on the location of an event at the single pixel resolution, which is too-high of a resolution to be useful in heatmap generation. In this entry box, insert an integer value (e.g., 100) that dictates, in pixels, how big a location is. For example, if the user inserts 100, and the video is filmed using 1000x1000 pixels, then SimBA will generate a heatmap based on 10x10 locations (each being 100x100 pixels large). -
max
(integer, or auto): How many color increments on the heatmap that should be generated. For example, if the user inputs 11, then a 11-point scale will be created (as in the gifs above). If the user inserts auto in this entry box, then SimBA will calculate the ideal number of increments automatically for each video. -
Color Palette
: Which color pallette to use to plot the heatmap. See the gifs above for different output examples. -
Target
: Which target behavior to plot in the heatmap. As the number of behavioral target events increment in a specific location, the color representing that region changes. -
Bodypart
: To determine the location of the event in the video, SimBA uses a single body-part coordinate. Specify which body-part to use here. -
Save last image only
: Users can either choose to generate a "heatmap video" for every video in your project. These videos contain one frame for every frame in your video. Alternative, users may want to generate a single image representing the final heatmap and all of the events in each video - with one png for every video in your project. If you'd like to generate single images, tick this box. If you do not tick this box, then videos will be generated (which is significantly more time-consuming).
- Click
Generate heatmap
to generate heatmap of the target behavior. For more information on heatmaps based on behavioral events in SimBA - check the tutorial for scenario 2 - visualizing machine predictions
Merge all the generated plots from the previous step into single frames.
- Under Merge Frames, click
Merge Frames
and frames with all the generated plots will be combined and saved in theproject_folder/frames/output/merged
folder.
This step is to generate a video from the merged frames.
- Enter the
Bitrate
and theFile format
-
Bitrate
: Bitrate is the number of bits per second. The symbol is bit/s. It generally determines the size and quality of video and audio files: the higher the bitrate, the better the quality and the larger the file size. If unsure, try setting bitrate to 2400. -
File format
: The format of the output video, it can be mp4, mov, flv, avi, etc...
Note: Please enter the file format without the ".".
- Click on
Create Video
.