LLMs Evaluated on Social Media Analytics Tasks
A recent study analyzed the effectiveness of various large language models (LLMs) in performing social media tasks on Twitter, now rebranded as X. The assessment included models like GPT-4, Gemini 1.5 Pro, and BERT, among others. Researchers focused on three main tasks: verifying authorship of posts, generating realistic content, and inferring user attributes. To ensure accuracy, they developed a systematic sampling approach for analyzing user posts, utilizing new tweets collected from January 2024. A user evaluation also compared the writing quality of the LLMs with that of actual users, assessing both authenticity and engagement.
Key facts
- First comprehensive evaluation of modern LLMs on social media analytics tasks
- Models evaluated: GPT-4, GPT-4o, GPT-3.5-Turbo, Gemini 1.5 Pro, DeepSeek-V3, Llama 3.2, BERT
- Three tasks: authorship verification, post generation, user attribute inference
- New tweets from January 2024 onward used to mitigate seen-data bias
- User study conducted to measure perceptions of LLM-generated posts
Entities
—