ARTFEED — Contemporary Art Intelligence

APPSI-139: New Corpus for Privacy Policy Summarization

other · 2026-05-01

Researchers have introduced APPSI-139, a carefully assembled collection of English privacy policies that experts have annotated to aid in summarizing and interpreting them. It includes 139 privacy policies, 15,692 rewritten parallel texts, and 36,351 annotation labels divided into 11 categories related to data practices. Alongside this, they also launched TCSI-pp-V2, a hybrid framework aimed at improving the summarization and understanding of these policies. This effort addresses the lack of a refined English parallel corpus, which will enhance legal clarity and make it easier for users to navigate complex privacy documents.

Key facts

  • APPSI-139 is a high-quality English privacy policy corpus.
  • It was meticulously annotated by domain experts.
  • The corpus includes 139 English privacy policies.
  • It contains 15,692 rewritten parallel corpora.
  • It has 36,351 fine-grained annotation labels.
  • Labels cover 11 data practice categories.
  • TCSI-pp-V2 is a hybrid summarization and interpretation framework.
  • The research aims to improve legal clarity and readability of privacy policies.

Entities

Sources