Improvements
Transfer Learning on Vision Transformer Architecture
Transformer Learning on Vision Transformer Architecture.
The results of only using the model where bad as test accuracy was 26% this is because the we had few number of image dataset.
Transfer learning has proven to be a powerful technique in enhancing the performance of neural network models,
including the Vision Transformer (ViT). By leveraging pre-trained models on large datasets, transfer learning allows the ViT to benefit from knowledge gained in solving one task and apply it to a related task, even with limited labeled data. So inorder to improve the test accuracy of the model we will use ViT_B_16_Weights as we are dealing with 16 * 16 patch size.
1. Import Pretrained weights
After importing we need to change the classifier layer or in other name the last layer in the problem that you are trying to solve. For use we are trying to classify three class so we need to set it up for that.
By using torch.info for summary to visualize the architecture.
2. Training Pretained Model
After preparing the pretrained model is time to test how the model will perform for our problem.
After traing the test accuracy jump to 91% this is a big jump of improvement on our model.

There are other ways to improve the model from here but I would be focusing on that. I hope this help you to understand how paper replicating works and it importants on machine learning engineer journey.