Papers
arxiv:2210.02941

BootAug: Boosting Text Augmentation via Hybrid Instance Filtering Framework

Published on Oct 6, 2022
Authors:

Abstract

A hybrid instance-filtering framework using pre-trained language models improves text augmentation performance on large datasets by maintaining natural feature spaces.

AI-generated summary

Text augmentation is an effective technique for addressing the problem of insufficient data in natural language processing. However, existing text augmentation methods tend to focus on few-shot scenarios and usually perform poorly on large public datasets. Our research indicates that existing augmentation methods often generate instances with shifted feature spaces, which leads to a drop in performance on the augmented data (for example, EDA generally loses approx 2% in aspect-based sentiment classification). To address this problem, we propose a hybrid instance-filtering framework (BootAug) based on pre-trained language models that can maintain a similar feature space with natural datasets. BootAug is transferable to existing text augmentation methods (such as synonym substitution and back translation) and significantly improves the augmentation performance by approx 2-3% in classification accuracy. Our experimental results on three classification tasks and nine public datasets show that BootAug addresses the performance drop problem and outperforms state-of-the-art text augmentation methods. Additionally, we release the code to help improve existing augmentation methods on large datasets.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2210.02941 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2210.02941 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2210.02941 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.