Training a Language Model on a Single GPU in one day

Reading through the recent updates in the immense world of AI, we came across an article written by Shenwai (Jan 3, 2023) for the Mark Tech Post, which got us thinking.

The article mentions a new AI Research from the University of Maryland investigating the cramming challenge for Training a Language Model on a Single GPU in one day.

Instead of pushing the boundaries of extreme computation, we welcome the effort from the University of Maryland to reduce the time and computational power needed to train high-quality NLP models. Large-scale models that require weeks/months of training time and a vast amount of data have a negative impact on the environment. For example, even though ChatGPT has a remarkable ability to interact in conversational dialogues, the training of the model cost millions of dollars, required hundreds of GB of data, and would require years to train on a single GPU.

GPU
GPU

Thanks to Levente Göncz for his input!