Using GoCV and Pigo for Face Detection Program


Learn about face detection program in this article by Xuanyi Chew, the chief data scientist of a Sydney-based logistics startup. He is the primary author of Gorgonia, an open source deep learning package for Go.

This article will show you how to build multiple facial detection systems in GoCV (Go using Computer Vision Using Go and OpenCV 3) and PICO (Pixel Intensity Comparison-based Object detection). What we will be building is a program that detects faces from a live webcam. This program reads an image from a webcam, passes the image into a face detector, and then draws rectangles in the image. Finally, we want to display the image with the rectangles drawn on.

Grabbing an image from the webcam 

First, we’ll open a connection to the webcam:

 func main() {
 // open webcam
 webcam, err := gocv.VideoCaptureDevice(0)
 if err != nil {
 log.Fatal(err)
 }
 defer webcam.Close()
 }

Here, I have used VideoCaptureDevice(0) because the webcam is device 0 on my Ubuntu system. Your webcam may differ in device numbering. Also, do note defer webcam.Close(). This is the aforementioned resource metaphor that GoCV sticks very strongly to.

A webcam (specifically, a VideoCaptureDevice) is a resource, much like a file. In fact in Linux, this is true; the webcam on my computer is mounted in the /dev/video0 directory and I can access raw bytes from it by just using a variant of cat. The point is that .Close() has to be called on resources to free up usage.

The talk about closing resources to free up usage naturally raises a question, given we program in Go. Is a channel a resource? The answer is no. close(ch) of a channel  merely informs every sender that this channel is no longer receiving data.

Having access to the webcam is nice and all, but we also want to be able to grab images off it. One can read raw streams off the file of a webcam; we can do the same with GoCV as well:

 img := gocv.NewMat()
 defer img.Close()
width := int(webcam.Get(gocv.VideoCaptureFrameWidth))
 height := int(webcam.Get(gocv.VideoCaptureFrameHeight))
fmt.Printf("Webcam resolution: %v, %v", width, height)
if ok := webcam.Read(&img); !ok {
 log.Fatal("cannot read device 0")
 }

First, we create a new matrix, representing an image. Again, the matrix is treated like a resource, because it is owned by the foreign function interface. Thus, defer img.Close() is written. Next, we query the webcam for information about the resolution. It’s quite nice to know what resolution a webcam runs at. Last, we read the webcam’s image into the matrix.

At this point, if you are already familiar with Gorgonia’s tensor libraries, this pattern may seem familiar, and yet feels funny. img := gocv.NewMat() does not define a size. How does GoCV know how much space to allocate for the matrix? Well, the answer is that the magic happens in webcam.Read. The underlying matrix will be resized as necessary by OpenCV. In this way, the Go part of the program does no real memory allocation.

Displaying the image 

So, the image has been magically read into the matrix. How do we get anything out of it? The answer is that we have to copy the data from the data structure controlled by OpenCV into a Go-native data structure. Fortunately, GoCV handles that as well. Here, we write it out to a file:

goImg, err := img.ToImage()
 if err != nil {
 log.Fatal(err)
 }
 outFile, err := os.OpenFile("first.png", os.O_WRONLY|os.O_TRUNC|os.O_CREATE, 0644)
 if err != nil {
 log.Fatal(err)
 }
 png.Encode(outFile, goImg)

First, the matrix has to be converted to image.Image. To do this, img.ToImage() is called. Then, it is encoded as a PNG using png.Encode and you will have a test image. This was mine:

In the picture, I’m holding a box with a photo of Ralph Waldo Emerson, famed American author. Now, we have the basic pipeline of getting an image from the webcam and writing out the image to a file. A webcam continuously captures images, but we’re only reading a single image to a matrix and then writing the matrix into a file. If we put this in a loop, we would have the ability to continuously read images from a webcam and write to file.

Analogously to having a file, we could write it to the screen instead. The GoCV integration with OpenCV is so complete that this is trivial. Instead of writing to a file, we can display a window instead. To do so, we need to first create a window object, with the title Face Detection Window:

Analogously to having a file, we could write it to the screen instead. The GoCV integration with OpenCV is so complete that this is trivial. Instead of writing to a file, we can display a window instead. To do so, we need to first create a window object, with the title Face Detection Window:

window := gocv.NewWindow("Face Detection Window")
defer window.Close()

Then, to show the image in the window, simply replace the parts, where we write out to a file with this:

window.IMShow(img)

When the program is run, a window will pop up, showing you the image captured by the webcam.

Doodling on images 

At some point, we would also like to draw on an image, preferably before we output it, either to the display or a file. GoCV handles that admirably. For our purposes, we’ll just be drawing rectangles to denote where a face might be. GoCV interfaces well with the standard library’s Rectangle type.

To draw a rectangle on an image with GoCV, we first define a rectangle:

r := image.Rect(50, 50, 100, 100)

Here, I defined a rectangle that starts at location (50, 50) and is 100 pixels wide and 100 pixels tall. Then, a color needs to be defined. Again, GoCV plays very nicely with image/color, found in the standard library. So, here’s the definition of the color blue:

blue := color.RGBA{0, 0, 255, 0}

And now, onward to draw the rectangle on the image

gocv.Rectangle(&img, r, blue, 3)

This draws a blue rectangle with the top left of the rectangle at (50, 50) in the image.

At this point, we have the components necessary to build two different pipelines. One writes an image to a file. One creates a window to display the image. There are two ways the input from the webcam may be processed: one-off or continuously. And, we are also able to modify the image matrix before outputting. This gives us a lot of flexibility as scaffolding in the process of building the program.

Face detection 1 

The first face detection algorithm we want to use is the Viola-Jones method. It comes built into GoCV, so we can just use that. The consistency of GoCV gives us a hint as to what to do next. We need a classifier object (and remember to close it!)

This is how you create a classifier object:

 classifier := gocv.NewCascadeClassifier()
 if !classifier.Load(haarCascadeFile) {
 log.Fatalf("Error reading cascade file: %v\n", haarCascadeFile)
 }
 defer classifier.Close()

Note that at this point, it is not enough to just create a classifier. We need to load it with the model to use. The model used is very well established. It was first created by Rainer Lienhart in the early 2000s. Like most products of the 2000s, the model is serialized as an XML file.

The file can be downloaded from the GoCV GitHub repository at https://github.com/hybridgroup/gocv/blob/master/data/haarcascade_frontalface_default.xml

In the preceding code, haarCascadeFile is a string denoting the path to the file. GoCV handles the rest. To detect faces, it is a simple one-liner:

 rects := classifier.DetectMultiScale(img)

In this single line of code, we are telling OpenCV to use Viola-Jones’ multiscale detection to detect faces. Internally, OpenCV builds an image pyramid of integral images, and runs the classifiers on the image pyramids. At each stage, rectangles representing where the algorithm thinks the faces are produced. These rectangles are what is returned. They can then be drawn on the image before being output to a file or window. Here’s what a full windowed pipeline looks like:

 var haarCascadeFile = "Path/To/CascadeFile.xml"
var blue = color.RGBA{0, 0, 255, 0}
 func main() {
 // open webcam
 webcam, err := gocv.VideoCaptureDevice(0)
 if err != nil {
 log.Fatal(err)
 }
 defer webcam.Close()
var err error
 // open display window
 window := gocv.NewWindow("Face Detect")
 defer window.Close()
// prepare image matrix
 img := gocv.NewMat()
 defer img.Close()
// color for the rect when faces detected
// load classifier to recognize faces
 classifier := gocv.NewCascadeClassifier()
 if !classifier.Load(haarCascadeFile) {
 log.Fatalf("Error reading cascade file: %v\n", haarCascadeFile)
 }
 defer classifier.Close()
for {
 if ok := webcam.Read(&img); !ok {
 fmt.Printf("cannot read device %d\n", deviceID)
 return
 }
 if img.Empty() {
 continue
 }
 rects := classifier.DetectMultiScale(img)
for _, r := range rects {
 gocv.Rectangle(&img, r, blue, 3)
 }
window.IMShow(img)
 if window.WaitKey(1) >= 0 {
 break
 }
 }
 }

The program is now able to get an image from the webcam, detect faces, draw rectangles around the faces, and then display the image. You may note that it is quite quick at doing that.

Face detection 2

In one fell swoop, GoCV has provided us with everything necessary to do real-time face detection. But is it easy to use with other face detection algorithms? The answer is yes, but some work is required.

The algorithm we want to use is the PICO algorithm. Recall that images in GoCV are in the gocv.Mat type. In order for PIGO to use this, we would need to convert that into a format readable by PICO. Incidentally, such a shared format is the image.Image of the standard library.

Recall once again that the gocv.Mat type has a method .ToImage(), which returns an image.Image. That’s our bridge!

Before crossing it, let’s look at how to create a PIGO classifier. Here’s a function to do so:

func pigoSetup(width, height int) (*image.NRGBA, []uint8, *pigo.Pigo,   
               pigo.CascadeParams, pigo.ImageParams) {
 goImg := image.NewNRGBA(image.Rect(0, 0, width, height))
 grayGoImg := make([]uint8, width*height)
 cParams := pigo.CascadeParams{
                               MinSize: 20,
                               MaxSize: 1000,
                               ShiftFactor: 0.1,
                               ScaleFactor: 1.1,
 }
 imgParams := pigo.ImageParams{
                               Pixels: grayGoImg,
                               Rows: height,
                               Cols: width,
                               Dim: width,
 }
 classifier := pigo.NewPigo()

 var err error
 if classifier, err = classifier.Unpack(pigoCascadeFile); err != nil {
                log.Fatalf("Error reading the cascade file: %s", err)
 }
 return goImg, grayGoImg, classifier, cParams, imgParams
}

This function is quite dense. Let’s unpack it. We’ll do it in a logical fashion as opposed to in a top-down linear fashion. First, a pigo.Pigo is created with classifier := pigo.NewPigo(). This creates a new classifier. Like the Viola-Jones method, a model is required to be supplied.

Unlike in GoCV, the model is in a binary format which needs to be unpacked. Additionally, classifier.Unpack takes a []byte, instead of a string denoting the path to the file. The provided model can be acquired on GitHub at https://github.com/esimov/pigo/blob/master/data/facefinder.

Once the file has been acquired, it needs to be read as []byte, as shown in the snippet below (which is wrapped in an init function):

  pigoCascadeFile, err = ioutil.ReadFile("path/to/facefinder")
  if err != nil {
    log.Fatalf("Error reading the cascade file: %v", err)
  }

Once the pigoCascadeFile is available, we can now unpack it into the classifier by using classifier.Unpack(pigoCascadeFile). Usual error handling applies. But what of the earlier parts of the section? Why is this necessary? To understand this, let’s look at how PIGO does its classification. It looks roughly like this:

dets := pigoClass.RunCascade(imgParams, cParams)
dets = pigoClass.ClusterDetections(dets, 0.3)

When PIGO runs the classifier, it takes two parameters which determine its behavior: the ImageParam and the CascadeParams. In particular, the details ImageParam is illuminating our process. It’s thus defined as follows:

// ImageParams is a struct for image related settings.
// Pixels: contains the grayscale converted image pixel data.
// Rows: the number of image rows.
// Cols: the number of image columns.
// Dim: the image dimension.
type ImageParams struct {
  Pixels []uint8
  Rows int
  Cols int
  Dim int
}

It is with this in mind that the pigoSetup function has the extra functionalities. The goImg is not strictly required, but it’s useful when considering our bridge between GoCV and PIGO.

PIGO requires images to be in []uint8, representing a gray scale image. GoCV reads a webcam image into a gocv.Mat, which has a .ToImage() method. The method returns image.Image. Most webcams capture color images. These are the steps required in order to make GoCV and PIGO play nicely together:

  1. Capture an image from the webcam.
  2. Convert the image into an image.Image.
  3. Convert that image into a gray scale image.
  4. Extract the []uint8 from the gray scale image.
  5. Perform face detection on the []uint8.

For our preceding pipeline, the image parameters and the cascade parameters are more or less static. Processing of the image is done in a linear fashion. A frame from the webcam doesn’t get captured until the face detection is done, the rectangles drawn, and the final image displayed in the window.

Hence, it would be perfectly all right to allocate an image once and then overwrite the image in each loop. The .ToImage() method allocates a new image every time it’s called. Rather, we can have a naughty version, where an already-allocated image is reused. Here’s how to do it:

func naughtyToImage(m *gocv.Mat, imge image.Image) error {
                    typ := m.Type()
  if typ != gocv.MatTypeCV8UC1 && typ != gocv.MatTypeCV8UC3 && typ != 
            gocv.MatTypeCV8UC4 {
    return errors.New("ToImage supports only MatType CV8UC1, CV8UC3 and 
                       CV8UC4")
  }

  width := m.Cols()
  height := m.Rows()
  step := m.Step()
  data := m.ToBytes()
  channels := m.Channels()

  switch img := imge.(type) {
  case *image.NRGBA:
    c := color.NRGBA{
      R: uint8(0),
      G: uint8(0),
      B: uint8(0),
      A: uint8(255),
    }
    for y := 0; y < height; y++ {
      for x := 0; x < step; x = x + channels {
        c.B = uint8(data[y*step+x])
        c.G = uint8(data[y*step+x+1])
        c.R = uint8(data[y*step+x+2])
        if channels == 4 {
          c.A = uint8(data[y*step+x+3])
        }
        img.SetNRGBA(int(x/channels), y, c)
      }
    }

  case *image.Gray:
    c := color.Gray{Y: uint8(0)}
    for y := 0; y < height; y++ {
      for x := 0; x < width; x++ {
        c.Y = uint8(data[y*step+x])
        img.SetGray(x, y, c)
      }
    }
  }
  return nil
}

This function allows one to reuse an existing image. We simply loop through the bytes of the gocv.Mat and overwrite the underlying bytes of the image. With the same logic, we can also create a naughty version of a function that converts the image into gray scale:

func naughtyGrayscale(dst []uint8, src *image.NRGBA) []uint8 {
  rows, cols := src.Bounds().Dx(), src.Bounds().Dy()
  if dst == nil || len(dst) != rows*cols {
    dst = make([]uint8, rows*cols)
  }
  for r := 0; r < rows; r++ {
    for c := 0; c < cols; c++ {
      dst[r*cols+c] = uint8(
        0.299*float64(src.Pix[r*4*cols+4*c+0]) +
          0.587*float64(src.Pix[r*4*cols+4*c+1]) +
          0.114*float64(src.Pix[r*4*cols+4*c+2]),
      )
    }
  }
  return dst
}

The differences in function signature are stylistic. The latter signature is better—it’s better to return the type. This allows for error correction as follows:

if dst == nil || len(dst) != rows*cols {
    dst = make([]uint8, rows*cols)
  }

So, our pipeline looks like this:

var haarCascadeFile = "Path/To/CascadeFile.xml"
var blue = color.RGBA{0, 0, 255, 0}
var green = color.RGBA{0, 255, 0, 0}
func main() {
var err error
  // open webcam
  if webcam, err = gocv.VideoCaptureDevice(0); err != nil {
    log.Fatal(err)
  }
  defer webcam.Close()
  width := int(webcam.Get(gocv.VideoCaptureFrameWidth))
  height := int(webcam.Get(gocv.VideoCaptureFrameHeight))

  // open display window
  window := gocv.NewWindow("Face Detect")
  defer window.Close()

  // prepare image matrix
  img := gocv.NewMat()
  defer img.Close()

  // set up pigo
  goImg, grayGoImg, pigoClass, cParams, imgParams := pigoSetup(width,  height)

  for {
    if ok := webcam.Read(&img); !ok {
      fmt.Printf("cannot read device %d\n", deviceID)
      return
    }
    if img.Empty() {
      continue
    }
    if err = naughtyToImage(&img, goImg); err != nil {
      log.Fatal(err)
    }
    grayGoImg = naughtyGrayscale(grayGoImg, goImg)
    imgParams.Pixels = grayGoImg
    dets := pigoClass.RunCascade(imgParams, cParams)
    dets = pigoClass.ClusterDetections(dets, 0.3)


    for _, det := range dets {
      if det.Q < 5 {
        continue
      }
      x := det.Col - det.Scale/2
      y := det.Row - det.Scale/2
      r := image.Rect(x, y, x+det.Scale, y+det.Scale)
      gocv.Rectangle(&img, r, green, 3)
    }

    window.IMShow(img)
    if window.WaitKey(1) >= 0 {
      break
    }
  }
}

There are some things to note here. If you follow the logic, you will note that the only things that really changed are the data in imgParams.Pixels. The rest of the things didn’t really change as much.

In the PICO algorithm, there may be overlaps in detections. A final clustering step is required for final detections. This explains the following two lines:

dets := pigoClass.RunCascade(imgParams, cParams)
dets = pigoClass.ClusterDetections(dets, 0.3)

The 0.3 value is chosen based on the original paper. In the documentation of PIGO, the value 0.2 is recommended.

Another thing that is different is that PIGO does not return rectangles as detections. Instead, it returns its own pigo.Detection type. To translate from these to standard image.Rectangle is simply done with these lines:

x := det.Col - det.Scale/2
y := det.Row - det.Scale/2
r := image.Rect(x, y, x+det.Scale, y+det.Scale)

Running the program yields a window showing the webcam image, with green rectangles around faces.

Putting it all together

Now we have two different uses of two different algorithms to detect faces.

Here are some observations:

  • The images using PIGO are smoother—there are fewer jumps and lags.
  • The PIGO algorithm jitters a little more than the standard Viola-Jones method.
  • The PIGO algorithm is more robust to rotations—I could tilt my head more and still have my face detected compared to the standard Viola-Jones method.

We can of course put both of them together:

var haarCascadeFile = "Path/To/CascadeFile.xml"
var blue = color.RGBA{0, 0, 255, 0}
var green = color.RGBA{0, 255, 0, 0}
func main() {
var err error
  // open webcam
  if webcam, err = gocv.VideoCaptureDevice(0); err != nil {
    log.Fatal(err)
  }
  defer webcam.Close()
  width := int(webcam.Get(gocv.VideoCaptureFrameWidth))
  height := int(webcam.Get(gocv.VideoCaptureFrameHeight))

  // open display window
  window := gocv.NewWindow("Face Detect")
  defer window.Close()

  // prepare image matrix
  img := gocv.NewMat()
  defer img.Close()

  // set up pigo
  goImg, grayGoImg, pigoClass, cParams, imgParams := pigoSetup(width, 
                                                       height)

  // create classifier and load model
  classifier := gocv.NewCascadeClassifier()
  if !classifier.Load(haarCascadeFile) {
    log.Fatalf("Error reading cascade file: %v\n", haarCascadeFile)
  }
  defer classifier.Close()

  for {
    if ok := webcam.Read(&img); !ok {
      fmt.Printf("cannot read device %d\n", deviceID)
      return
    }
    if img.Empty() {
      continue
    }
    // use PIGO
    if err = naughtyToImage(&img, goImg); err != nil {
      log.Fatal(err)
    }

    grayGoImg = naughtyGrayscale(grayGoImg, goImg)
    imgParams.Pixels = grayGoImg
    dets := pigoClass.RunCascade(imgParams, cParams)
    dets = pigoClass.ClusterDetections(dets, 0.3)


    for _, det := range dets {
      if det.Q < 5 {
        continue
      }
      x := det.Col - det.Scale/2
      y := det.Row - det.Scale/2
      r := image.Rect(x, y, x+det.Scale, y+det.Scale)
      gocv.Rectangle(&img, r, green, 3)
    }

    // use GoCV
    rects := classifier.DetectMultiScale(img)
    for _, r := range rects {
      gocv.Rectangle(&img, r, blue, 3)
    }

    window.IMShow(img)
    if window.WaitKey(1) >= 0 {
      break
    }
  }
}

Here we see PIGO and GoCV both managed to detect them rather accurately and that they agree with each other quite a lot.

Additionally, we can see that there is now a fairly noticeable lag between actions and when the actions are displayed on screen. This is because there is more work to be done.

If you found this article interesting, you can explore Go Machine Learning Projects to explore the capabilities of Go and Machine Learning. Go Machine Learning Projectswill teach you how to implement machine learning in Go to make programs that are easy to deploy and code that is not only easy to understand and debug, but also to have its performance measured.

Leave a comment

Your email address will not be published. Required fields are marked *