Using CIFAR-10 dataset for image classification
This article aims to use the concepts learned in the Deep Learning class from the IT course at IMD-UFRN, and it was made in collaboration with Alison Hedigliranes and DANIEL GOMES and Joao Vitor Dias Xavier.
Introduction
We are using the CIFAR-10 dataset, which consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images, and those images were collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. The classes available in the dataset are divided in airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck. You can access the dataset here.
Objectives
Our primal objective within this work is to use the techniques learned in class to achieve the maximum accuracy possible with our current knowledge. To achieve this goal, we will use Better Learning techniques (Configure capacity with node and layers, Configure gradient precision with batch size and Configure what to optimize with loss functions) and Better Generalization (Penalize large weights with weight regularization and Decouple layers with dropout) applied in an MLP (multilayer perceptron).
Techiniques
Initially we had two layers, one with 64 and the other with 32 neurons, but the accuracy of the algorithm was very low, around 45%, and even increasing this number, the accuracy did not improve, so we decided to keep only one layer with 512 neurons, which obtained approximately the same accuracy costing less processing. Regarding the batch size, initially we tried a value of 32 samples and we had an accuracy of 38%, approximately, so we tried to increase the value of the batch to 100 samples, there was no improvement in the accuracy of the algorithm, decreasing to 35%.For optimization, SGD (Stochastic Gradient Descent) was chosen, as we are using an online platform (Collab) to develop the solution, this algorithm will help us to make training less costly, even if it increases the amount of iterations a little until we find the minimum. The regularization techniques chosen were L2 and Dropout, however instead of doing it separately we chose to use both together to enhance the regularization process.
Implementation
Initially, the data set was imported with a Keras resource, as shown in Figure 1. Note that the function return already separates the test and training dataset, so you need to be careful not to invert these variables.
Then, it was necessary to transform the matrices into matrices of one dimension. Below is an image that represents the distribution of images in the training base. As representing, there are 50,000 images for the training dataset and 10,000 for the test dataset, and as the base is perfectly distributed among all classes this will help in our training.
After checking the number of images, it is necessary to see if there are missing values, as seen in Figure 3.
After all, it’s time to start, and for that you need to normalize the values of the datasets, as seen in the figure below.
Going to training, we start with a layer of 512neurons, in addition to the output layer with 10 neurons, one for each possible class. In addition, the “relu” function and activation was used in the first layer, in addition to a dropout of 0.5. In the last layer, an activation function of the softmax type was used. Finally, 50 epochs were used, in addition to a batch size of 32.
After all, to improve the parameters, we used the technique of hyperparameter tuning, which tests various combinations of parameters to look for the combination with the best accuracy. So, we take that configuration found and retrain our model for the best parameters.
Results
Initially, using only the standard hyperparameters we obtain an accuracy of 38%, as illustrated in Figure 8. Finally, applying the best hyperparameters, we obtained an accuracy of 48%.
Conclusion
The conclusion we reached is that unfortunately MLP is not the best choice for this particular problem, the configuration required for the classification of so many images has proved to be very complicated.
Although, we could see that some of the techniques we have chosen to try to improve the accuracy really worked. It was not a great improvement, but it did help.
Preliminary tests using convolutional networks showed substantially positive improvements, reaching an accuracy of approximately 70%.
Repository and Video
The project can be found at https://github.com/joaovdxavier/deeplearning and the video can be found at https://www.loom.com/share/837be0d7d4594a66a9210e4d3ec2ef11