In this article, we will describe our approach and learnings to train a simplified YOLO v1 model (we call it “Nano”) to detect objects. Our main driver for doing so was “curiosity” — we wanted to see if we were able to successfully dig that deep into object detection to build and train our own model with our own loss function on our own data.
We used Tensorflow 2.3.0 for building and training the model.
Since we wanted to learn and try out things, we chose to make our own decision whenever we did not fully understand what the original YOLO v1 paper aimed for. We choose an input size of 112x112 pixel to train the model (the YOLO v1 used 448x448), we adapted the model architecture to the changed size, and we changed parts of the loss function. …