Merge pull request #1 from MitchellAcoustics/refactoring

Refactoring
MitchellAcoustics · Aug 7, 2024 · defd1c1 · defd1c1
2 parents f2311f9 + 2c57e68
commit defd1c1
Show file tree

Hide file tree

Showing 28 changed files with 2,038 additions and 959 deletions.
diff --git a/.gitignore b/.gitignore
@@ -9,6 +9,9 @@ wheels/
 # venv
 .venv
 example_inputs
+.idea
+*.log
+.vscode
 
 # IDE files
 .idea/
@@ -19,3 +22,4 @@ src/.DS_Store
 # in progress / trials
 scripts/
 
+/.vscode/settings.json
diff --git a/README.md b/README.md
@@ -86,33 +86,55 @@ visualization:
 
 ## Models
 
-CitySeg currently supports OneFormer models. The verified models include:
+CitySeg currently supports Mask2Former and BEIT models. The verified models include:
 
-- `shi-labs/oneformer_ade20k_swin_large`
-- `shi-labs/oneformer_cityscapes_swin_large`
-- `shi-labs/oneformer_ade20k_dinat_large`
-- `shi-labs/oneformer_cityscapes_dinat_large`
+- "facebook/mask2former-swin-large-cityscapes-semantic"
+- "facebook/mask2former-swin-large-mapillary-vistas-semantic"
+- "facebook/maskformer-swin-small-ade" (sort of, this often leads to segfaults. Recommend using `disable_tqdm` in the config.)
+- "microsoft/beit-large-finetuned-ade-640-640"
+- "nvidia/segformer-b5-finetuned-cityscapes-1024-1024"
+- "zoheb/mit-b5-finetuned-sidewalk-semantic"
+- "nickmuchi/segformer-b4-finetuned-segments-sidewalk"
+
+Mask2Former are by far the most stable.
+
+Some models which seem to load correctly but continually produce segfault errors on my machine are:
+
+- "facebook/maskformer-swin-large-ade"
+- "nvidia/segformer-b5-finetuned-ade-640-640"
+- "nvidia/segformer-b0-finetuned-cityscapes-1024-1024"
+- "zoheb/mit-b5-finetuned-sidewalk-semantic" (use `model_type: segformer` in the config)
+
+Confirmed not to work due to issues with the Hugging Face pipeline:
+
+- "shi-labs/oneformer_ade20k_dinat_large"
 
 **Note on `dinat` models:** The `dinat` backbone models require the `natten` package, which may have installation issues on some systems. These models are also significantly slower than the `swin` backbone models, especially when forced to run on CPU. However, they may produce better quality outputs in some cases.
 
 ## Project Structure
 
-The project is organized into several Python modules:
+The project is organized into several Python modules, each serving a specific purpose within the CitySeg pipeline:
+
+- `main.py`: Entry point of the application, responsible for initializing and running the segmentation pipeline.
+- `config.py`: Defines configuration classes and handles loading and validating configuration settings.
+- `pipeline.py`: Implements the core segmentation pipeline, including model loading and inference.
+- `processors.py`: Contains classes for processing images, videos, and directories, managing the segmentation workflow.
+- `segmentation_analyzer.py`: Provides functionality for analyzing segmentation results, including computing statistics and generating reports.
+- `video_file_iterator.py`: Implements an iterator for efficiently processing multiple video files in a directory.
+- `visualization_handler.py`: Handles the visualization of segmentation results using color palettes.
+- `file_handler.py`: Manages file operations related to saving and loading segmentation data and metadata.
+- `utils.py`: Provides utility functions for various tasks, including data handling and logging.
+- `palettes.py`: Defines color palettes for different datasets used in segmentation.
+- `exceptions.py`: Custom exception classes for error handling throughout the pipeline.
 
-- `main.py`: Entry point of the application
-- `config.py`: Defines configuration classes for the pipeline
-- `pipeline.py`: Implements the core segmentation pipeline
-- `processors.py`: Contains classes for processing images, videos, and directories
-- `utils.py`: Provides utility functions for analysis, file operations, and logging
-- `palettes.py`: Defines color palettes for different datasets
-- `exceptions.py`: Custom exception classes for error handling
+This modular structure allows for easy maintenance and extension of the CitySeg pipeline, facilitating the addition of new features and models.
 
 ## Logging
 
 The pipeline uses the `loguru` library for flexible and configurable logging. You can set the log level and enable verbose output using command-line arguments:
 
 ```
-python main.py --config path/to/your/config.yaml --log-level INFO --verbose
+python main.py --config path/to/your/config.yaml --log-level INFO # or DEBUG, WARNING, ERROR, CRITICAL or --verbose
 ```
 
 Logs are output to both the console and a file (`segmentation.log`). The file log is in JSON format for easy parsing and analysis.
@@ -136,4 +158,4 @@ CitySeg is released under the BSD 3-Clause License. See the `LICENSE` file for d
 
 ## Contact
 
-For support or inquiries, please open an issue on the GitHub repository or contact [Your Name/Email].
+For support or inquiries, please open an issue on the GitHub repository or contact Andrew Mitchell.
diff --git a/docs/api/handlers.md b/docs/api/handlers.md
@@ -0,0 +1,10 @@
+# Handlers
+
+::: cityseg.file_handler
+
+
+::: cityseg.visualization_handler
+    options:
+        show_root_heading: true
+        members: true
+        parameter_headings: true
diff --git a/docs/api/processors.md b/docs/api/processors.md
@@ -1,6 +1,13 @@
 # Processors Module
 
 ::: cityseg.processors
-    options:
-      members: true
-      parameter_headings: true
+
+::: cityseg.processing_plan.ProcessingPlan
+    options: 
+      show_root_heading: true
+      show_root_full_path: false
+
+::: cityseg.video_file_iterator.VideoFileIterator
+    options: 
+      show_root_heading: true
+      show_root_full_path: false
diff --git a/docs/api/segments_analysis.md b/docs/api/segments_analysis.md
@@ -0,0 +1 @@
+::: cityseg.segmentation_analyzer
diff --git a/docs/index.md b/docs/index.md
@@ -32,12 +32,19 @@ For more detailed information on how to use CitySeg, check out our [Getting Star
 
 ## Project Structure
 
-CitySeg is organized into several Python modules:
+The project is organized into several Python modules, each serving a specific purpose within the CitySeg pipeline:
+
+- `main.py`: Entry point of the application, responsible for initializing and running the segmentation pipeline.
+- `config.py`: Defines configuration classes and handles loading and validating configuration settings.
+- `pipeline.py`: Implements the core segmentation pipeline, including model loading and inference.
+- `processors.py`: Contains classes for processing images, videos, and directories, managing the segmentation workflow.
+- `segmentation_analyzer.py`: Provides functionality for analyzing segmentation results, including computing statistics and generating reports.
+- `video_file_iterator.py`: Implements an iterator for efficiently processing multiple video files in a directory.
+- `visualization_handler.py`: Handles the visualization of segmentation results using color palettes.
+- `file_handler.py`: Manages file operations related to saving and loading segmentation data and metadata.
+- `utils.py`: Provides utility functions for various tasks, including data handling and logging.
+- `palettes.py`: Defines color palettes for different datasets used in segmentation.
+- `exceptions.py`: Custom exception classes for error handling throughout the pipeline.
 
-- `config.py`: Configuration classes for the pipeline
-- `pipeline.py`: Core segmentation pipeline implementation
-- `processors.py`: Classes for processing images, videos, and directories
-- `utils.py`: Utility functions for analysis, file operations, and logging
-- `exceptions.py`: Custom exception classes for error handling
 
 For detailed API documentation, visit our [API Reference](api/config.md) section.
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -42,6 +42,8 @@ nav:
     - Config: api/config.md
     - Pipeline: api/pipeline.md
     - Processors: api/processors.md
+    - Segments Analysis: api/segments_analysis.md
+    - Handlers: api/handlers.md
     - Utils: api/utils.md
     - Palettes: api/palettes.md
     - Exceptions: api/exceptions.md
@@ -104,6 +106,7 @@ plugins:
             merge_init_into_class: true
             ignore_init_summary: true
             show_labels: false
+            parameter_headings: true
 
             show_if_no_docstring: false
             docstring_section_style: spacy

diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "cityseg"
-version = "0.2.11"
+version = "0.3.0"
 description = "A flexible and efficient semantic segmentation pipeline for processing images and videos"
 authors = [
     { name = "Andrew Mitchell", email = "[email protected]" }
@@ -42,9 +42,8 @@ dev-dependencies = [
     "jupyter>=1.0.0",
     "pytest>=8.3.2",
     "rich[jupyter]>=13.7.1",
-    "natten==0.17.1",
+    #    "natten==0.17.1",
     "matplotlib>=3.9.1",
-    "yappi>=1.6.0",
 ]
 
 [tool.hatch.metadata]

diff --git a/requirements-dev.lock b/requirements-dev.lock
@@ -230,7 +230,6 @@ mpmath==1.3.0
     # via sympy
 mypy-extensions==1.0.0
     # via black
-natten==0.17.1
 nbclient==0.10.0
     # via nbconvert
 nbconvert==7.16.4
@@ -280,7 +279,6 @@ packaging==24.1
     # via jupyterlab-server
     # via matplotlib
     # via mkdocs
-    # via natten
     # via nbconvert
     # via pytest
     # via qtconsole
@@ -422,7 +420,6 @@ tokenizers==0.19.1
     # via transformers
 torch==2.4.0
     # via cityseg
-    # via natten
     # via torchvision
 torchvision==0.19.0
     # via cityseg
@@ -479,4 +476,3 @@ websocket-client==1.8.0
     # via jupyter-server
 widgetsnbextension==4.0.11
     # via ipywidgets
-yappi==1.6.0
diff --git a/src/cityseg/SemanticSidewalk_id2label.json b/src/cityseg/SemanticSidewalk_id2label.json
@@ -0,0 +1 @@
+{"0": "unlabeled", "1": "flat-road", "2": "flat-sidewalk", "3": "flat-crosswalk", "4": "flat-cyclinglane", "5": "flat-parkingdriveway", "6": "flat-railtrack", "7": "flat-curb", "8": "human-person", "9": "human-rider", "10": "vehicle-car", "11": "vehicle-truck", "12": "vehicle-bus", "13": "vehicle-tramtrain", "14": "vehicle-motorcycle", "15": "vehicle-bicycle", "16": "vehicle-caravan", "17": "vehicle-cartrailer", "18": "construction-building", "19": "construction-door", "20": "construction-wall", "21": "construction-fenceguardrail", "22": "construction-bridge", "23": "construction-tunnel", "24": "construction-stairs", "25": "object-pole", "26": "object-trafficsign", "27": "object-trafficlight", "28": "nature-vegetation", "29": "nature-terrain", "30": "sky", "31": "void-ground", "32": "void-dynamic", "33": "void-static", "34": "void-unclear"}
diff --git a/src/cityseg/__init__.py b/src/cityseg/__init__.py
@@ -3,8 +3,7 @@
 
 This package provides a flexible and efficient semantic segmentation pipeline
 for processing images and videos. It supports multiple segmentation models
-and datasets, with capabilities for tiling large images, mixed-precision
-processing, and comprehensive result analysis.
+and datasets.
 
 Main components:
 - Config: Configuration class for the pipeline
@@ -20,27 +19,36 @@
 For detailed usage instructions, please refer to the package documentation.
 """
 
-__version__ = "0.2.0"
+__version__ = "0.2.12"
 
 from . import palettes
 from .config import Config
 from .exceptions import ConfigurationError, InputError, ModelError, ProcessingError
+from .file_handler import FileHandler
 from .pipeline import SegmentationPipeline, create_segmentation_pipeline
+from .processing_plan import ProcessingPlan
 from .processors import DirectoryProcessor, SegmentationProcessor, create_processor
-from .utils import analyze_segmentation_map, setup_logging
+from .segmentation_analyzer import SegmentationAnalyzer
+from .utils import setup_logging
+from .video_file_iterator import VideoFileIterator
+from .visualization_handler import VisualizationHandler
 
 __all__ = [
     "Config",
     "SegmentationPipeline",
     "create_segmentation_pipeline",
     "SegmentationProcessor",
+    "SegmentationAnalyzer",
     "DirectoryProcessor",
     "create_processor",
     "ConfigurationError",
     "InputError",
     "ModelError",
     "ProcessingError",
-    "analyze_segmentation_map",
     "setup_logging",
     "palettes",
+    "FileHandler",
+    "VisualizationHandler",
+    "ProcessingPlan",
+    "VideoFileIterator",
 ]
diff --git a/src/cityseg/config.py b/src/cityseg/config.py
@@ -13,6 +13,7 @@
 from typing import Any, Dict, List, Optional, Union
 
 import yaml
+from loguru import logger
 
 
 class InputType(Enum):
@@ -36,14 +37,45 @@ class ModelConfig:
     """
 
     name: str
-    model_type: Optional[str] = (
-        None  # Can be 'oneformer', 'mask2former', or None for auto-detection
-    )
+    model_type: Optional[str] = None
     max_size: Optional[int] = None
     device: Optional[str] = None
+    dataset: Optional[str] = None
+    num_workers: Optional[int] = 8
+    pipe_batch: Optional[int] = 1
 
-    # TODO: impelement model_type auto-detection
-    # TODO: implement device auto-detection
+    def __post_init__(self):
+        """
+        Post-initialization method to set up the model type if not provided.
+        """
+        self.auto_detect_model_type()
+        if self.device == "mps" and self.num_workers > 0 or self.num_workers is None:
+            logger.warning(
+                "MPS is not compatible with multiple workers in pytorch. Setting num_workers to 0."
+            )
+            self.num_workers = 0
+
+    def auto_detect_model_type(self):
+        """
+        Automatically detect the model type from the model name if not provided.
+        """
+
+        def auto_model_type(model_name: str) -> str:
+            return model_name.split("/")[-1].split("-")[0]
+
+        if self.model_type is None:
+            try:
+                self.model_type = auto_model_type(self.name)
+            except IndexError:
+                logger.warning(
+                    "Unable to auto-detect model type from the model name and none provided."
+                )
+                return
+            logger.info(f"Auto-detected model type: {self.model_type}")
+        elif self.model_type != auto_model_type(self.name):
+            logger.warning(
+                f"Model type does not match auto-detected model type. Using provided model type: {self.model_type}"
+            )
 
 
 @dataclass
@@ -83,6 +115,7 @@ class Config:
         visualization (VisualizationConfig): The visualization configuration.
         input_type (InputType): The type of input (automatically determined).
         force_reprocess (bool): Whether to force reprocessing of existing results.
+        disable_tqdm (bool): Whether to disable the progress bar display.
     """
 
     input: Union[Path, str]
@@ -100,6 +133,7 @@ class Config:
     visualization: VisualizationConfig = field(default_factory=VisualizationConfig)
     input_type: InputType = field(init=False)
     force_reprocess: bool = False
+    disable_tqdm: bool = False
 
     def __post_init__(self):
         """
@@ -232,6 +266,7 @@ def from_yaml(cls, config_path: Path) -> "Config":
             analyze_results=config_dict.get("analyze_results", True),
             visualization=vis_config,
             force_reprocess=config_dict.get("force_reprocess", False),
+            disable_tqdm=config_dict.get("disable_tqdm", False),
         )
 
     def to_dict(self) -> Dict[str, Any]:
@@ -257,6 +292,7 @@ def to_dict(self) -> Dict[str, Any]:
             "visualization": asdict(self.visualization),
             "input_type": self.input_type.value,
             "force_reprocess": self.force_reprocess,
+            "disable_tqdm": self.disable_tqdm,
         }
 
 

diff --git a/src/cityseg/config.yaml b/src/cityseg/config.yaml
@@ -6,21 +6,24 @@ ignore_files: null  # Optional: list of file names to ignore
 
 # Model configuration
 model:
-  name: "facebook/mask2former-swin-large-mapillary-vistas-semantic"
-  model_type: "mask2former"  # Optional: can be 'oneformer', 'mask2former', or null for auto-detection
+  name: "facebook/mask2former-swin-large-cityscapes-semantic"
+  model_type: null  # Optional: can be 'beit', 'mask2former', or null for auto-detection
   max_size: null  # Optional: maximum size for input images/frames
   device: "mps"  # Options: "cuda", "cpu", "mps", or null for auto-detection
+  dataset: "semantic-sidewalk"  # Optional: dataset name for model-specific postprocessing
+  num_workers: 0  # Number of workers for data loading
+  pipe_batch: 5  # Number of frames to process in each batch. Recommend setting this equal to batch_size below.
 
 # Processing configuration
-frame_step: 10  # Process every 5th frame
+frame_step: 1  # Process every 5th frame
 batch_size: 5  # Number of frames to process in each batch
 output_fps: null  # Optional: FPS for output video (if different from input)
 
 # Output options
-save_raw_segmentation: false
+save_raw_segmentation: true
 save_colored_segmentation: true
 save_overlay: true
-analyze_results: false
+analyze_results: true
 
 # Visualization configuration
 visualization:
@@ -29,3 +32,4 @@ visualization:
 
 # Advanced options
 force_reprocess: false  # Set to true to reprocess even if output files exist
+disable_tqdm: false  # Set to true to disable progress bars. In some cases, tqdm seems to lead to segfaults.