ARTFEED — Contemporary Art Intelligence

AI Language Models Struggle with Informal Text: Tokenization Failures and Distribution Shifts

ai-technology · 2026-04-22

A research investigation explores the impact of informal language on the accuracy of natural language inference (NLI) in two transformer models: RoBERTa-large (355M parameters) and ELECTRA-small (14M parameters). The researchers modified the SNLI and MultiNLI datasets by incorporating slang substitutions, emoji replacements, Gen-Z filler tokens, and various combinations. While slang substitutions resulted in a slight accuracy drop (maximum 1.1 percentage points) due to WordPiece coverage, emoji replacements posed significant challenges; ELECTRA's tokenizer converted many altered content words to [UNK] tokens, occurring in 93.6% of instances with an average of 2.91 per instance. Misinterpretations arose from noise tokens like 'no cap,' which were in-vocabulary but not included in the training data. The study highlights tokenization errors and distribution shifts as key challenges. This work is available on arXiv with the identifier 2604.16787v1.

Key facts

  • Study examines informal language impact on NLI accuracy in ELECTRA-small and RoBERTa-large models
  • Four transformations applied: slang substitution, emoji replacement, Gen-Z filler tokens, and combinations
  • Slang substitution causes minimal degradation (≤1.1pp) due to WordPiece coverage
  • Emoji replacement causes tokenization failures with 93.6% of examples containing [UNK] tokens
  • Average of 2.91 [UNK] tokens per emoji example
  • Noise tokens ('no cap,' 'deadass,' 'tbh') are in-vocabulary but absent from training data
  • Models assign inferential weight to noise tokens they don't actually carry
  • Research identifies tokenization failures and distribution shifts as primary failure modes

Entities

Institutions

  • arXiv

Sources