Skip to content

Streamlined, ergonomic APIs around Apple's Vision framework

Notifications You must be signed in to change notification settings

alladinian/Visionaire

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Visionaire

Streamlined, ergonomic APIs around Apple's Vision framework

Swift iOS macOS macCatalyst tvOS Swift Package Manager

The main goal of Visionaire is to reduce ceremony and provide a concise set of APIs for Vision tasks.

Some of its features include:

  • Centralized list of all tasks, available via the VisionTaskType enum (with platform availability checks).
  • Automatic image handling for all supported image sources.
  • Convenience APIs for all tasks, along with all available parameters for each task (with platform availability checks).
  • Support for custom CoreML models (Classification, Image-To-Image, Object Recognition, Generic VNCoreMLFeatureValueObservations).
  • Support for multiple task execution, maintaining task type information in the results.
  • Support for raw VNRequests.
  • All calls are synchronous (just like the original calls) - no extra 'magic', assumptions or hidden juggling.
  • SwiftUI extensions for helping you rapidly visualize results (great for evaluation).

Installation

Visionaire is provided as a Swift Package. You can add it to your project via this repository's address.

Supported Vision Tasks

All Vision tasks are supported (including iOS 17 & macOS 14, which are the latest production releases).

Expand to see a detailed list of all available tasks
Task Vision API Visionaire Task iOS macOS Mac Catalyst tvOS
Generate Feature Print VNGenerateImageFeaturePrintRequest .featurePrintGeneration 13.0 10.15 13.1 13.0
Person Segmentation VNGeneratePersonSegmentationRequest .personSegmentation 15.0 12.0 15.0 15.0
Document Segmentation VNDetectDocumentSegmentationRequest .documentSegmentation 15.0 12.0 15.0 15.0
Attention Based Saliency VNGenerateAttentionBasedSaliencyImageRequest .attentionSaliency 13.0 10.15 13.1 13.0
Objectness Based Saliency VNGenerateObjectnessBasedSaliencyImageRequest .objectnessSaliency 13.0 10.15 13.1 13.0
Track Rectangle VNTrackRectangleRequest .rectangleTracking 11.0 10.13 13.1 11.0
Track Object VNTrackObjectRequest .objectTracking 11.0 10.13 13.1 11.0
Detect Rectangles VNDetectRectanglesRequest .rectanglesDetection 11.0 10.13 13.1 11.0
Detect Face Capture Quality VNDetectFaceCaptureQualityRequest .faceCaptureQuality 13.0 10.15 13.1 13.0
Detect Face Landmarks VNDetectFaceLandmarksRequest .faceLandmarkDetection 11.0 10.13 13.1 11.0
Detect Face Rectangles VNDetectFaceRectanglesRequest .faceDetection 11.0 10.13 13.1 11.0
Detect Human Rectangles VNDetectHumanRectanglesRequest .humanRectanglesDetection 13.0 10.15 13.1 13.0
Detect Human Body Pose VNDetectHumanBodyPoseRequest .humanBodyPoseDetection 14.0 11.0 14.0 14.0
Detect Human Hand Pose VNDetectHumanHandPoseRequest .humanHandPoseDetection 14.0 11.0 14.0 14.0
Recognize Animals VNRecognizeAnimalsRequest .animalDetection 13.0 10.15 13.1 13.0
Detect Trajectories VNDetectTrajectoriesRequest .trajectoriesDetection 14.0 11.0 14.0 14.0
Detect Contours VNDetectContoursRequest .contoursDetection 14.0 11.0 14.0 14.0
Generate Optical Flow VNGenerateOpticalFlowRequest .opticalFlowGeneration 14.0 11.0 14.0 14.0
Detect Barcodes VNDetectBarcodesRequest .barcodeDetection 11.0 10.13 13.1 11.0
Detect Text Rectangles VNDetectTextRectanglesRequest .textRectanglesDetection 11.0 10.13 13.1 11.0
Recognize Text VNRecognizeTextRequest .textRecognition 13.0 10.15 13.1 13.0
Detect Horizon VNDetectHorizonRequest .horizonDetection 11.0 10.13 13.1 11.0
Classify Image VNClassifyImageRequest .imageClassification 13.0 10.15 13.1 13.0
Translational Image Registration VNTranslationalImageRegistrationRequest .translationalImageRegistration 11.0 10.13 13.1 11.0
Homographic Image Registration VNHomographicImageRegistrationRequest .homographicImageRegistration 11.0 10.13 13.1 11.0
Detect Human Body Pose (3D) VNDetectHumanBodyPose3DRequest .humanBodyPoseDetection3D 17.0 14.0 17.0 17.0
Detect Animal Body Pose VNDetectAnimalBodyPoseRequest .animalBodyPoseDetection 17.0 14.0 17.0 17.0
Track Optical Flow VNTrackOpticalFlowRequest .opticalFlowTracking 17.0 14.0 17.0 17.0
Track Translational Image Registration VNTrackTranslationalImageRegistrationRequest .translationalImageRegistrationTracking 17.0 14.0 17.0 17.0
Track Homographic Image Registration VNTrackHomographicImageRegistrationRequest .homographicImageRegistrationTracking 17.0 14.0 17.0 17.0
Generate Foreground Instance Mask VNGenerateForegroundInstanceMaskRequest .foregroundInstanceMaskGeneration 17.0 14.0 17.0 17.0

Supported Image Sources

  • CGImage
  • CIImage
  • CVPixelBuffer
  • CMSampleBuffer
  • Data
  • URL

Examples

The main class for interfacing is called Visionaire.

It's an ObservableObject and reports processing through a published property called isProcessing.

You can execute tasks on the shared Visionaire singleton or on your own instance (useful if you want to have separate processors reporting on their own).

There are two sets of apis: convenience methods & task-based methods.

Convenience methods have the benefit of returning typed results while tasks can be submitted en masse.

Single task execution (convenience apis):

DispatchQueue.global(qos: .userInitiated).async {
    do {
        let image   = /* any supported image source, such as CGImage, CIImage, CVPixelBuffer, CMSampleBuffer, Data or URL */
        let horizon = try Visionaire.shared.horizonDetection(imageSource: image) // The result is a `VNHorizonObservation`
        let angle   = horizon.angle
        // Do something with the horizon angle
    } catch {
        print(error)
    }
}

Custom CoreML model (convenience apis):

// Create an instance of your model
let yolo: MLModel = {
    // Tell Core ML to use the Neural Engine if available.
    let config = MLModelConfiguration()
    config.computeUnits = .all
    // Load your custom model
    let yolo = try! yolo(configuration: config)
    return yolo.model
}()
    
// Optionally create a feature provider to setup custom model attributes
class YoloFeatureProvider: MLFeatureProvider {
    var values: [String : MLFeatureValue] {
        [
            "iouThreshold": MLFeatureValue(double: 0.45),
            "confidenceThreshold": MLFeatureValue(double: 0.25)
        ]
    }

    var featureNames: Set<String> {
        Set(values.keys)
    }

    func featureValue(for featureName: String) -> MLFeatureValue? {
        values[featureName]
    }
}

// Perform the task
let detectedObjectObservations = try visionaire.customRecognition(imageSource: image,
                                                                        model: try! VNCoreMLModel(for: yolo),
                                                        inputImageFeatureName: "image",
                                                              featureProvider: YoloFeatureProvider(),
                                                      imageCropAndScaleOption: .scaleFill)

Single task execution (task-based apis):

DispatchQueue.global(qos: .userInitiated).async {
    do {
        let image       = /* any supported image source, such as CGImage, CIImage, CVPixelBuffer, CMSampleBuffer, Data or URL */
        let result      = try Visionaire.shared.perform(.horizonDetection, on: image) // The result is a `VisionTaskResult`
        let observation = result.observations.first as? VNHorizonObservation
        let angle       = observation?.angle
        // Do something with the horizon angle
    } catch {
        print(error)
    }
}

Multiple task execution (task-based apis):

DispatchQueue.global(qos: .userInitiated).async {
    do {
        let image   = /* any supported image source, such as CGImage, CIImage, CVPixelBuffer, CMSampleBuffer, Data or URL */
        let results = try Visionaire.shared.perform([.horizonDetection, .personSegmentation(qualityLevel: .accurate)], on: image)
        for result in results {
            switch result.taskType {
            case .horizonDetection:
                let horizon = result.observations.first as? VNHorizonObservation
                // Do something with the observation
            case .personSegmentation:
                let segmentationObservations = result.observations as? [VNPixelBufferObservation]
                // Do something with the observations
            default:
                break
            }
        }   
    } catch {
        print(error)
    }
}

Task configuration

All tasks can be configured with "modifier" style calls for common options.

An example using all the available options:

let segmentation = VisionTask.personSegmentation(qualityLevel: .accurate)
    .preferBackgroundProcessing(true)
    .usesCPUOnly(false)
    .regionOfInterest(CGRect(x: 0, y: 0, width: 0.5, height: 0.5))
    .latestRevision() // You can also use .revision(n)

let result = try Visionaire.shared.perform([.horizonDetection, segmentation], on: image) // The result is a `VisionTaskResult`

SwiftUI Extensions

There are also some SwiftUI extensions available in order to help you visualize results for quick evaluation.

Detected Object Observations

Image(myImage)
    .resizable()
    .aspectRatio(contentMode: .fit)
    .drawObservations(detectedObjectObservations) {
        Rectangle()
            .stroke(Color.blue, lineWidth: 2)
    }

image

Rectangle Observations

Image(myImage)
    .resizable()
    .aspectRatio(contentMode: .fit)
    .drawQuad(rectangleObservations) { shape in
        shape
            .stroke(Color.green, lineWidth: 2)
    }

image

Face Landmarks

Note: For Face Landmarks you can specify individual characteristics or groups for visualization. The available options are available through the FaceLandmarks OptionSet and they are:

constellation, contour, leftEye, rightEye, leftEyebrow, rightEyebrow, nose, noseCrest, medianLine, outerLips, innerLips, leftPupil, rightPupil, eyes, pupils, eyeBrows, lips and all.

Image(myImage)
    .resizable()
    .aspectRatio(contentMode: .fit)
    .drawFaceLandmarks(faceObservations, landmarks: .all) { shape in
        shape
            .stroke(.red, style: .init(lineWidth: 2, lineJoin: .round))
    }

image

Person Segmentation Mask

Image(myImage)
    .resizable()
    .aspectRatio(contentMode: .fit)
    .visualizePersonSegmentationMask(pixelBufferObservations)

image

Human Body Pose

Image(myImage)
    .resizable()
    .aspectRatio(contentMode: .fit)
    .visualizeHumanBodyPose(humanBodyPoseObservations) { shape in
        shape
            .fill(.red)
    }

image

Contours

Image(myImage)
    .resizable()
    .aspectRatio(contentMode: .fit)
    .visualizeContours(contoursObservations) { shape in
        shape
            .stroke(.red, style: .init(lineWidth: 2, lineJoin: .round))
    }

image

About

Streamlined, ergonomic APIs around Apple's Vision framework

Topics

Resources

Stars

Watchers

Forks

Languages