2020) but also similar in semantics. There is, how-
ever, a trade-off between semantic consistency and
expression diversity. Bigger differences between
vanilla samples and augmented ones also convey
less faithful semantics. Therefore, we hypothesize
that a good augmentation method in contrastive
learning should have desiderata to balance them.
Continuous methods can control semantic
changes since they utilize designed network struc-
tures to process redundant features (Huang et al.,
2021). SimCSE (Gao et al.,2021) utilizes dropout
(Srivastava et al.,2014) to obtain different embed-
dings of the same sentence to construct positive
pairs. But such continuous methods lack inter-
pretability to inspire further exploration of sentence
augmentation. To better prove our hypotheses and
find the promising direction of lexical data augmen-
tation methods, we propose three Simple Discrete
Augmentation (SDA) methods to satisfy the desider-
ata to different extents: Punctuation Insertion (PI),
Modal Verbs (MV), and Double Negation (DN).
Their impacts on the expression diversity increase
in a row but semantic consistency with the original
sentence tends to diminish gradually. In linguistics,
punctuation usually represents pause or tone (e.g.,
comma, exclamations) which has no specific mean-
ing itself. Modal verbs are used as supplementary
to the predicate verb of the sentence indicating at-
titudes such as permission, request, and so on,
which helps to reduce uncertainty in semantics.
DN helps two negatives cancel each other out and
thereby produces a strong affirmation, whereas the
improperly augmented sentence is at risk to be
logically confusing.
Although the proposed augmentation methods
keep the semantic meaning by carefully adding mi-
nor noises, the generated sentences are still literally
similar to the original sentence. Recent research
(Robinson et al.,2021) has pointed out the feature
suppression problem of contrastive learning. This
phenomenon could result in shortcut solutions that
the model only learns the textual rather than se-
mantic similarity. A focus on hard examples has
been proven effective to change the scope of the
captured features (Robinson et al.,2021). Thus,
we further utilize standard negation to construct
text contradicting all or part of the meaning of the
original sentence as hard negative samples. By
doing so, the model is encouraged to learn to dif-
ferentiate sentences bearing similar lexical items
yet reversed meanings.
To summarize, the contributions of this work are
as follows:
•
We propose SDA methods (including stan-
dard negation) for contrastive sentence rep-
resentation learning, which leverages discrete
sentence modifications to enhance the perfor-
mance of representation learning (Section 4).
•
Comprehensive experimental results demon-
strate that SDA achieves significantly better
performance, advancing the state-of-the-art
performance to a new bar from 78.49 to 79.60
(Section 5).
•
Extensive ablations and in-depth analysis are
conducted to investigate the underlying ratio-
nale and clarify the hyper-parameters choices
(Section 6).
2. Related Works
2.1. Sentence Representation Learning
BERT (Devlin et al.,2019) has steered the trajec-
tory of sentence representation towards the tech-
nical orientation of Pre-trained Language Models
(PLM). A multitude of endeavors (Tan et al.,2020,
2021;Li et al.,2020;Su et al.,2021;Lu et al.,
2023a,b) has been dedicated to substantial im-
provements based on this paradigm, leading to
significant advancements in diverse domains. No-
tably, there is a pronounced practical demand for
sentence-level text representations (Conneau et al.,
2017;Williams et al.,2018). Consequently, learn-
ing unsupervised sentence representations based
on PLM has become a focal point in recent years
(Reimers and Gurevych,2019;Zhang et al.,2020).
Current state-of-the-art methods utilize contrastive
learning to learn sentence embeddings (Kim et al.,
2021;Yan et al.,2021;Gao et al.,2021), which
in experimental results, can even rival supervised
methods. However, to advance unsupervised con-
trastive learning methods further, data augmenta-
tion emerges as a pivotal component.
2.2. Data Augmentation in Contrastive
Learning
Early research on contrastive sentence represen-
tation learning (Zhang et al.,2020) didn’t utilize
explicit augmentation methods to generate positive
pairs. Later, methods (Giorgi et al.,2021;Wu et al.,
2020,2021) which use text augmentation methods,
such as word deletion, span deletion, reordering,
synonym substitution, and word repetition, to gen-
erate different views for each sentence achieve
better results. Compared to augmentation meth-
ods applied on text, several studies (Janson et al.,
2021;Yan et al.,2021;Gao et al.,2021;Wang
et al.,2022a) utilize neural networks, such as dual
encoders, adversarial attack, token shuffling, cut-
off and dropout, to obtain different embeddings for
contrasting. A more recent study DiffCSE (Chuang
et al.,2022) designed an extra MLM-based word
replacement detection task as an equivalent aug-
mentation. The purpose of data augmentation in