Training machine learning models requires efficient learning algorithms that economise on the number of labelled data points used for training. In the active learning setting, the objective is to train a model using a small number of labelled data points, thereby minimising the labelling cost. In the data subset selection problem, the goal is to train a model using a limited number of training data points.
We investigate the convergence rates and data sample sizes required to train a machine learning model using the stochastic gradient descent (SGD) algorithm, where data points are sampled based on either their loss value (i.e., loss-based sampling strategies) or their uncertainty value (i.e., uncertainty-based sampling strategies).
For SGD with a constant step size update, we present convergence results for linear classifiers and linearly separable datasets using squared hinge loss and other related training loss functions. Additionally, we extend our analysis to more general classifiers and datasets, considering a broad range of loss-based sampling strategies and smooth convex training loss functions.
We propose a novel algorithm called Adaptive-Weight Sampling (AWS), which utilises SGD with an adaptive step size that achieves stochastic Polyak’s step size in expectation. We establish convergence rate results for AWS for smooth convex training loss functions.
Our numerical experiments demonstrate the efficiency of AWS across various datasets by using either exact or estimated loss values.