Posted: Wed, March 9, 2022  - 

Automating the Creation of Classification Datasets

Wed, May 20, 2020 -  Wed, March 9, 2022

A project to assist the creation of classification datasets -- particularly for facial recognition.

Gitlab Repo

I've hosted my code for this project at https://gitlab.com/dibz15/ml-classification-dataset-builder.

Summary

This tool/project is a continuation of my last deep learning project -- Facial Detection with FasterRCNN. I mentioned in that post that some next steps would be to move on facial recognition/classification, so this tool is an extension of that. I realized that to train a classifier on custom faces I would need a somewhat large dataset for each class of faces (each person), and so this project is an effort to help automate that process. Under the hood, it takes advantage of the trained facial detection model and uses it to crop target faces from media (images or videos) and then save those into a custom dataset.

With this tool, you can go from a video of a person like this:

Obi-wan, annotated

and get a dataset output that looks like this:

Obi-wan, output classified

Here's an example of it in use classifying some members of the Shire:

Hobbits example

For more information on how to use it to make your own classification datasets, see my Gitlab page above. I go into a lot more depth about how to use it!