arxiv:2210.02941

BootAug: Boosting Text Augmentation via Hybrid Instance Filtering Framework

Published on Oct 6, 2022

Authors:

Heng Yang ,

Abstract

A hybrid instance-filtering framework using pre-trained language models improves text augmentation performance on large datasets by maintaining natural feature spaces.

AI-generated summary

Text augmentation is an effective technique for addressing the problem of insufficient data in natural language processing. However, existing text augmentation methods tend to focus on few-shot scenarios and usually perform poorly on large public datasets. Our research indicates that existing augmentation methods often generate instances with shifted feature spaces, which leads to a drop in performance on the augmented data (for example, EDA generally loses approx 2% in aspect-based sentiment classification). To address this problem, we propose a hybrid instance-filtering framework (BootAug) based on pre-trained language models that can maintain a similar feature space with natural datasets. BootAug is transferable to existing text augmentation methods (such as synonym substitution and back translation) and significantly improves the augmentation performance by approx 2-3% in classification accuracy. Our experimental results on three classification tasks and nine public datasets show that BootAug addresses the performance drop problem and outperforms state-of-the-art text augmentation methods. Additionally, we release the code to help improve existing augmentation methods on large datasets.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2210.02941 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2210.02941 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2210.02941 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.