Video Preview and Facial Recognition in Swift, Part 2: Saving a Photo with Overlays

This post continues my summary of lessons learned in implementing SwiftSquareCam, a Swift version of Apple’s SquareCam demo project. As described in my prior post, the original Objective-C code provided a number of core features:

– taking input from either the front or back cameras, if both are present, and providing the user with a switch to toggle between the two;

– displaying a live video preview image of the active camera’s view at any moment;

– implementing basic facial feature recognition and overlaying a rectangular box on faces in the live video preview; and

– taking pictures and saving them to camera roll with their overlays, if any.

Here I discuss how the last feature – taking pictures and saving them to the camera roll with overlays – was implemented in Swift. The overlays in this instance would be the rectangles, if any, bounding the faces recognized by Apple’s feature recognition API.

Saving the Still Image from Video Input in Objective-C

In the event that face detection is not enabled, or if it is enabled but no faces are detected, saving the still image to the camera roll is relatively simple. The process of capturing a still image from a video input connection using the AVFoundation API is described in several of the help files and articles. See, e.g., How to capture video frames from the camera as images using AV Foundation on iOS.

However, if overlays are detected, there is additional work to do using the CoreGraphics API. In short, the process is as follows:

  1. The still image is acquired from the video input as in the trivial case.
  2. The facial recognition engine is run on the acquired frame.
  3. A new bitmap context is created and the still image is drawn into the new bitmap context as background.
  4. Overlays corresponding to the recognized faces are drawn into the new bitmap context “on top.”
  5. The image from the new bitmap context is saved to the camera roll.

Again, much of the port from Objective-C to Swift was straightforward, primarily because the various API calls worked nearly identically from the two languages. However, the following code (edited for brevity) presented a minor conundrum:

CGDataProviderRef provider = NULL;
CGImageRef image = NULL;
* * * 
CVPixelBufferLockBaseAddress( pixelBuffer, 0 );
provider = CGDataProviderCreateWithData( (void *)pixelBuffer, sourceBaseAddr, sourceRowBytes * height, ReleaseCVPixelBuffer);
image = CGImageCreate(width, height, 8, 32, sourceRowBytes, colorspace, bitmapInfo, provider, NULL, true, kCGRenderingIntentDefault);

The above code uses the CGDataProvider API to bootstrap creation of a new CGImage corresponding to pixelBuffer (of type CVPixelBufferRef), which itself holds the captured video frame’s data. Note that there is a call to CVPixelBufferLockBaseAddress(), but not corresponding call to CVPixelBufferUnlockBaseAddress(). The last parameter of CGDataProviderCreateWithData is a C-function pointer to another function to be called upon the release of pixelBuffer, ReleaseCVPixelBuffer. It is in this callback function that the SquareCam program calls CVPixelBufferUnlockBaseAddress (and also releases the CVPixelBufferRef).

This architecture presented two problems for a Swift implementation. First, Swift uses automatic reference counting (ARC); thus, although it is possible to retain pointers in one function and release them in another, that approach is disfavored if only because it risks introducing the very problems ARC was meant to alleviate. Second, Swift does not permit C-style callbacks into Swift code (at least in this manner); although Swift provides a CFunctionPointer generic type, it appears to be a construct that is meant to hold a function pointer value passed to Swift from  Objective-C.

A Swift Solution

After thinking about the problem for a bit, I found no independent reason to keep using the CGDataProvider API to create a CGImage corresponding from the pixel buffer data from the captured video frame. Instead, it appeared more straightforward to utilize the approach form the Apple article referenced above: create a bitmap context of the correct size and color depth and use that to bootstrap creation of a CGImage object.

CVPixelBufferLockBaseAddress( pixelBuffer, 0 );
* * * 
let context = CGBitmapContextCreate(sourceBaseAddr, width, height, 8, sourceRowBytes, colorspace, bitmapInfo)
image = CGBitmapContextCreateImage(context)
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0)

Unlike the approach taken in SquareCam, this API requires no callback and works well in Swift.


1 Comment on “Video Preview and Facial Recognition in Swift, Part 2: Saving a Photo with Overlays