Streamlined, ergonomic APIs around Apple's Vision framework
The main goal of Visionaire
is to reduce ceremony and provide a concise set of APIs for Vision tasks.
Some of its features include:
- Centralized list of all tasks, available via the
VisionTaskType
enum (with platform availability checks). - Automatic image handling for all supported image sources.
- Convenience APIs for all tasks, along with all available parameters for each task (with platform availability checks).
- Support for custom CoreML models (Classification, Image-To-Image, Object Recognition, Generic
VNCoreMLFeatureValueObservation
s). - Support for multiple task execution, maintaining task type information in the results.
- Support for raw
VNRequest
s. - All calls are synchronous (just like the original calls) - no extra 'magic', assumptions or hidden juggling.
- SwiftUI extensions for helping you rapidly visualize results (great for evaluation).
Visionaire
is provided as a Swift Package. You can add it to your project via this repository's address.
All Vision tasks are supported (including iOS 17 & macOS 14, which are the latest production releases).
Expand to see a detailed list of all available tasks
Task | Vision API | Visionaire Task | iOS | macOS | Mac Catalyst | tvOS |
---|---|---|---|---|---|---|
Generate Feature Print | VNGenerateImageFeaturePrintRequest | .featurePrintGeneration | 13.0 | 10.15 | 13.1 | 13.0 |
Person Segmentation | VNGeneratePersonSegmentationRequest | .personSegmentation | 15.0 | 12.0 | 15.0 | 15.0 |
Document Segmentation | VNDetectDocumentSegmentationRequest | .documentSegmentation | 15.0 | 12.0 | 15.0 | 15.0 |
Attention Based Saliency | VNGenerateAttentionBasedSaliencyImageRequest | .attentionSaliency | 13.0 | 10.15 | 13.1 | 13.0 |
Objectness Based Saliency | VNGenerateObjectnessBasedSaliencyImageRequest | .objectnessSaliency | 13.0 | 10.15 | 13.1 | 13.0 |
Track Rectangle | VNTrackRectangleRequest | .rectangleTracking | 11.0 | 10.13 | 13.1 | 11.0 |
Track Object | VNTrackObjectRequest | .objectTracking | 11.0 | 10.13 | 13.1 | 11.0 |
Detect Rectangles | VNDetectRectanglesRequest | .rectanglesDetection | 11.0 | 10.13 | 13.1 | 11.0 |
Detect Face Capture Quality | VNDetectFaceCaptureQualityRequest | .faceCaptureQuality | 13.0 | 10.15 | 13.1 | 13.0 |
Detect Face Landmarks | VNDetectFaceLandmarksRequest | .faceLandmarkDetection | 11.0 | 10.13 | 13.1 | 11.0 |
Detect Face Rectangles | VNDetectFaceRectanglesRequest | .faceDetection | 11.0 | 10.13 | 13.1 | 11.0 |
Detect Human Rectangles | VNDetectHumanRectanglesRequest | .humanRectanglesDetection | 13.0 | 10.15 | 13.1 | 13.0 |
Detect Human Body Pose | VNDetectHumanBodyPoseRequest | .humanBodyPoseDetection | 14.0 | 11.0 | 14.0 | 14.0 |
Detect Human Hand Pose | VNDetectHumanHandPoseRequest | .humanHandPoseDetection | 14.0 | 11.0 | 14.0 | 14.0 |
Recognize Animals | VNRecognizeAnimalsRequest | .animalDetection | 13.0 | 10.15 | 13.1 | 13.0 |
Detect Trajectories | VNDetectTrajectoriesRequest | .trajectoriesDetection | 14.0 | 11.0 | 14.0 | 14.0 |
Detect Contours | VNDetectContoursRequest | .contoursDetection | 14.0 | 11.0 | 14.0 | 14.0 |
Generate Optical Flow | VNGenerateOpticalFlowRequest | .opticalFlowGeneration | 14.0 | 11.0 | 14.0 | 14.0 |
Detect Barcodes | VNDetectBarcodesRequest | .barcodeDetection | 11.0 | 10.13 | 13.1 | 11.0 |
Detect Text Rectangles | VNDetectTextRectanglesRequest | .textRectanglesDetection | 11.0 | 10.13 | 13.1 | 11.0 |
Recognize Text | VNRecognizeTextRequest | .textRecognition | 13.0 | 10.15 | 13.1 | 13.0 |
Detect Horizon | VNDetectHorizonRequest | .horizonDetection | 11.0 | 10.13 | 13.1 | 11.0 |
Classify Image | VNClassifyImageRequest | .imageClassification | 13.0 | 10.15 | 13.1 | 13.0 |
Translational Image Registration | VNTranslationalImageRegistrationRequest | .translationalImageRegistration | 11.0 | 10.13 | 13.1 | 11.0 |
Homographic Image Registration | VNHomographicImageRegistrationRequest | .homographicImageRegistration | 11.0 | 10.13 | 13.1 | 11.0 |
Detect Human Body Pose (3D) | VNDetectHumanBodyPose3DRequest | .humanBodyPoseDetection3D | 17.0 | 14.0 | 17.0 | 17.0 |
Detect Animal Body Pose | VNDetectAnimalBodyPoseRequest | .animalBodyPoseDetection | 17.0 | 14.0 | 17.0 | 17.0 |
Track Optical Flow | VNTrackOpticalFlowRequest | .opticalFlowTracking | 17.0 | 14.0 | 17.0 | 17.0 |
Track Translational Image Registration | VNTrackTranslationalImageRegistrationRequest | .translationalImageRegistrationTracking | 17.0 | 14.0 | 17.0 | 17.0 |
Track Homographic Image Registration | VNTrackHomographicImageRegistrationRequest | .homographicImageRegistrationTracking | 17.0 | 14.0 | 17.0 | 17.0 |
Generate Foreground Instance Mask | VNGenerateForegroundInstanceMaskRequest | .foregroundInstanceMaskGeneration | 17.0 | 14.0 | 17.0 | 17.0 |
CGImage
CIImage
CVPixelBuffer
CMSampleBuffer
Data
URL
The main class for interfacing is called Visionaire
.
It's an ObservableObject
and reports processing through a published property called isProcessing
.
You can execute tasks on the shared
Visionaire singleton or on your own instance (useful if you want to have separate processors reporting on their own).
There are two sets of apis: convenience methods & task-based methods.
Convenience methods have the benefit of returning typed results while tasks can be submitted en masse.
DispatchQueue.global(qos: .userInitiated).async {
do {
let image = /* any supported image source, such as CGImage, CIImage, CVPixelBuffer, CMSampleBuffer, Data or URL */
let horizon = try Visionaire.shared.horizonDetection(imageSource: image) // The result is a `VNHorizonObservation`
let angle = horizon.angle
// Do something with the horizon angle
} catch {
print(error)
}
}
// Create an instance of your model
let yolo: MLModel = {
// Tell Core ML to use the Neural Engine if available.
let config = MLModelConfiguration()
config.computeUnits = .all
// Load your custom model
let yolo = try! yolo(configuration: config)
return yolo.model
}()
// Optionally create a feature provider to setup custom model attributes
class YoloFeatureProvider: MLFeatureProvider {
var values: [String : MLFeatureValue] {
[
"iouThreshold": MLFeatureValue(double: 0.45),
"confidenceThreshold": MLFeatureValue(double: 0.25)
]
}
var featureNames: Set<String> {
Set(values.keys)
}
func featureValue(for featureName: String) -> MLFeatureValue? {
values[featureName]
}
}
// Perform the task
let detectedObjectObservations = try visionaire.customRecognition(imageSource: image,
model: try! VNCoreMLModel(for: yolo),
inputImageFeatureName: "image",
featureProvider: YoloFeatureProvider(),
imageCropAndScaleOption: .scaleFill)
DispatchQueue.global(qos: .userInitiated).async {
do {
let image = /* any supported image source, such as CGImage, CIImage, CVPixelBuffer, CMSampleBuffer, Data or URL */
let result = try Visionaire.shared.perform(.horizonDetection, on: image) // The result is a `VisionTaskResult`
let observation = result.observations.first as? VNHorizonObservation
let angle = observation?.angle
// Do something with the horizon angle
} catch {
print(error)
}
}
DispatchQueue.global(qos: .userInitiated).async {
do {
let image = /* any supported image source, such as CGImage, CIImage, CVPixelBuffer, CMSampleBuffer, Data or URL */
let results = try Visionaire.shared.perform([.horizonDetection, .personSegmentation(qualityLevel: .accurate)], on: image)
for result in results {
switch result.taskType {
case .horizonDetection:
let horizon = result.observations.first as? VNHorizonObservation
// Do something with the observation
case .personSegmentation:
let segmentationObservations = result.observations as? [VNPixelBufferObservation]
// Do something with the observations
default:
break
}
}
} catch {
print(error)
}
}
All tasks can be configured with "modifier" style calls for common options.
An example using all the available options:
let segmentation = VisionTask.personSegmentation(qualityLevel: .accurate)
.preferBackgroundProcessing(true)
.usesCPUOnly(false)
.regionOfInterest(CGRect(x: 0, y: 0, width: 0.5, height: 0.5))
.latestRevision() // You can also use .revision(n)
let result = try Visionaire.shared.perform([.horizonDetection, segmentation], on: image) // The result is a `VisionTaskResult`
There are also some SwiftUI extensions available in order to help you visualize results for quick evaluation.
Detected Object Observations
Image(myImage)
.resizable()
.aspectRatio(contentMode: .fit)
.drawObservations(detectedObjectObservations) {
Rectangle()
.stroke(Color.blue, lineWidth: 2)
}
Rectangle Observations
Image(myImage)
.resizable()
.aspectRatio(contentMode: .fit)
.drawQuad(rectangleObservations) { shape in
shape
.stroke(Color.green, lineWidth: 2)
}
Face Landmarks
Note: For Face Landmarks you can specify individual characteristics or groups for visualization. The available options are available through the FaceLandmarks
OptionSet and they are:
constellation
, contour
, leftEye
, rightEye
, leftEyebrow
, rightEyebrow
, nose
, noseCrest
, medianLine
, outerLips
, innerLips
, leftPupil
, rightPupil
, eyes
, pupils
, eyeBrows
, lips
and all
.
Image(myImage)
.resizable()
.aspectRatio(contentMode: .fit)
.drawFaceLandmarks(faceObservations, landmarks: .all) { shape in
shape
.stroke(.red, style: .init(lineWidth: 2, lineJoin: .round))
}
Person Segmentation Mask
Image(myImage)
.resizable()
.aspectRatio(contentMode: .fit)
.visualizePersonSegmentationMask(pixelBufferObservations)
Human Body Pose
Image(myImage)
.resizable()
.aspectRatio(contentMode: .fit)
.visualizeHumanBodyPose(humanBodyPoseObservations) { shape in
shape
.fill(.red)
}
Contours
Image(myImage)
.resizable()
.aspectRatio(contentMode: .fit)
.visualizeContours(contoursObservations) { shape in
shape
.stroke(.red, style: .init(lineWidth: 2, lineJoin: .round))
}