Implement the DenseNet to showcase is superiority over the ResNet
For deep neural networks, as the input get convoluted and processed by many prior layers the deeper layers begin to loose the original input data and receive more noise as a result, training times take significantly longer and the CNN can perform worse. ResNets solve this issue by using skip connections meaning deeper layers receive less noise, however DenseNets solve this issue by essentially using more connections than ResNets. For each dense Block in the network, they take the output of prior dense blocks results as input. This means that even the deepest dense block is receiving input data with very little noise in comparison.
The core idea behind the DenseNet architecture is that the feature maps produced by each layer can be concatenated to form the input to the next layer. This means that the output of each layer is the input of all the following layers. DenseNet also uses a transition layer between each dense block to reduce the spatial dimensionality and the number of feature maps. The transition layer consists of a batch normalisation layer, a 1x1 convolutional layer, and a pooling layer.
As a result compared with the comparable ResNets the Dense Nets seem to require less to training to achieve higher accuracy results in the ImageNet Validation.
This implementation of the dense net is done using PyTorch. The PyTorch Data Loader is used for data Augmentation.
For train_dataset
there was option to augment the data as seen in the # transformations = transforms.Compose([])
function, however any attempt to augment the Fashion MNIST training data with normalisation resulted in a large and significant decrease in accuracy even after a very large number of epochs. Attempting augmentation without normalisation did produce good results:
However they are inferior to the results I obtained without augmenting the data. For this reason I have decided against augmenting the data
self.bn
: This layer normalizes the inputs for each mini-batch. It helps in stabilising the learning process and speeds up the convergence of the training. (This is different from the normalising mentioned above for augmentation purposes).self.relu
: a non-linear activation function. It introduces non-linearity into the model, allowing it to learn more complex patterns. The inplace = True
argument optimizes memory usage.self.conv
: This layer performs a convolution operation. It is a fundamental part of CNNs (Convolutional Neural Networks) and is used to extract features from the input data. The convolution here uses a kernel size of 3x3, padding of 1 (to maintain the spatial dimensions), and no bias term. A lot of the inspiration for the dense net came from this [4].The TransitionLayer
gets some data (x
), it first smoothens it out (batch normalization), makes some key decisions (ReLU), picks out the important features (convolution), and then compresses this
In DenseNet
, these TransitionLayer
blocks act as bridges between different dense blocks. They help in controlling the size of the feature maps, ensuring that the network doesn’t get too heavy and slow to process as it gets deeper. It's like having a checkpoint or a rest area in a long highway, keeping things in order and manageable.
Dense Blocks and Transition layers:
DenseBlock
and TransitionLayer
are the core components. The dense blocks focus on extracting a rich set of features, building upon what was learned in the previous layers. It's like each block adds a layer of understanding, picking up more and more details.Final Processing and Classification:
self.relu([self.bn](http://self.bn/)(out))
), ensuring it's in the best form for the final decision-making.self.avg_pool
then compresses the data to a size that's easier to work with, focusing on the essence of what's been learned.self.fc
is a fully connected layer that takes all this processed information and translates it into specific class predictions.model.train()
sets the model to training mode, enabling features like dropout and batch normalization specific to this phase.
running_loss
, correct
, and total
are initialized to track the loss and accuracy during the epoch. The inner loop iterates over the training data loader train_loader
, fetching batches of images and their corresponding labels.
[images.to](http://images.to/)(device)
and [labels.to](http://labels.to/)(device)
ensure that the data is moved to the GPU if available. The forward pass computes the model's predictions outputs
and calculates the loss using criterion
.
The backward pass loss.backward()
computes the gradient of the loss, and optimizer.step()
updates the model's weights. Running loss and accuracy statistics are updated after each batch. This is then repeated x number of epochs that is pre defined.
At 50 Epochs:
Overall I am happy with the result and implementation of the dense net, comparing my results from standard benchmarks like the ResNet18 (Accuracy = 94.9%) and DenseNet-BC 768K params (accuracy = 95.4% ) . Given I used significantly less params with a significant decrease in training time and hardware, a test accuracy of 90.51% is a success.[5]
From the results you can see that ~10 epochs not much changed in terms of test-accuracy and test loss. When running test after 50 epochs although the training accuracy still continued to increase, the test accuracy and loss began to diminish, this is probably due to overtraining and so, 50 epochs seems to be an accurate number of iterations to train the DenseNet. Looking closely at the test loss you can see a small increment in the general trends from ~20 epoch, indicating the test accuracy of 90.51% is around the best the network can do. Looking closely at the accuracy graph after ~10 epochs no general change in the the test accuracy can be noticed
To increase test accuracy, augmentation of the data in a specific way, may of helped however as mentioned earlier, augmenting the data seemed to decrease accuracy. Another way of increasing accuracy potentially, is to increase the number of params the model takes, more data will make the network perform better.
[1] Medium, "Understanding and coding a ResNet in Keras," [Online]. Available:https://miro.medium.com/v2/resize:fit:1100/format:webp/1*jm5MEylOA8abyAi51CcSLA.png. [Accessed: Dec. 1, 2023].
[2] Towards Data Science, "Image Classification, Transfer Learning, and Fine Tuning using TensorFlow," [Online]. Available:https://towardsdatascience.com/image-classification-transfer-learning-and-fine-tuning-using-tensorflow-a791baf9dbf3. [Accessed: Dec. 3, 2023].
[3] A. Kolesnikov, L. Beyer, X. Zhai, J. Puigcerver, J. Yung, S. Gelly, and N.Houlsby, "Big Transfer (BiT): General Visual Representation Learning," arXiv:1905.11946, 2019. [Online]. Available: https://arxiv.org/pdf/1905.11946.pdf.
[4] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," arXiv:1608.06993, 2016. [Online]. Available: https://arxiv.org/pdf/1608.06993.pdf.
[5] Papers With Code, "SOTA for Image Classification on Fashion-MNIST," [Online].Available: https://paperswithcode.com/sota/image-classification-on-fashion-mnist.[Accessed: Dec. 1, 2023].