Active Learning Methods for Annotating Training Sets

Abstract— Active learning is a machine learning technique that identifies data that should be labeled by human annotators. This can be used to reduce the cost and time of labeling datasets, while still achieving high accuracy. Active learning works by iteratively training a machine learning model on a small set of labeled data, and then using the model to predict the labels of unlabeled data. The data points that the model is most uncertain about are then selected for labeling. This process is performed iteratively until the desired level of accuracy is achieved. Active learning has been shown to be effective for a variety of machine learning tasks, including text classification, image classification, and natural language processing. It is particularly well-suited for tasks where labeling data is expensive or time-consuming. In this study, we investigate the use of active learning with the CIFAR10, EuroSAT and Fashion MNIST datasets. We compare a variety of active learning methods, including Least Confidence, Margin Sampling and Entropy Sampling. We show that they can all improve the performance of the model over random sampling.

Keywords— Active learning , Human Labeling, Least Confidence, Margin Sampling, Entropy Sampling, CIFAR-10 , EuroSAT, CNN, Fashion MNIST.

| DOI: 10.17148/IJARCCE.2023.125292

International Journal of Advanced Research in Computer and Communication Engineering

Active Learning Methods for Annotating Training Sets

Call for Papers

Author Center

IJARCCE Management

Archives