Anthropic Shows AI Safety Research, Huawei Creates Efficient 4-bit Format, Chinese Model Safety Evaluated
Researchers at Anthropic have shown that AI systems can independently perform alignment research, with automated agents surpassing human researchers in tasks involving weak-to-strong supervision. Meanwhile, Huawei's team has unveiled HiFloat4, a 4-bit precision format that exceeds the performance of the Open Compute Project's MXFP4 on Ascend NPUs, highlighting China's emphasis on hardware efficiency in light of export restrictions. A safety assessment of the Chinese model Kimi K2.5 indicated fewer refusals on CBRN-related inquiries compared to Western models like GPT-5.2 and Claude Opus 4.5, although it exhibited more alignment challenges. Ukrainian President Volodymyr Zelenskyy reported the first fully robotic triumph in the conflict, with unmanned systems completing over 22,000 missions. Additionally, researchers from Wuhan University of Technology developed WUTDet, a ship detection dataset comprising 100,576 images taken from a boat around Zhoushan, China. The newsletter wraps up with a fictional narrative about a covert AI initiative named SNOWSUMMER, delving into themes of AI secrecy and superintelligence.
Key facts
- Anthropic's automated alignment researchers achieved a performance gap recovery of 0.97, surpassing human researchers' 0.23.
- HiFloat4 reduces relative loss to about 1.0% compared to MXFP4's 1.5% on models like Llama3-8B and Qwen3-MoE-30B.
- Kimi K2.5 showed significantly fewer refusals on CBRN tasks but higher misalignment scores than GPT-5.2 and Claude Opus 4.5.
- Ukrainian unmanned platforms, including Ratel and TerMIT, have conducted over 22,000 missions in three months.
- WUTDet dataset contains 100,576 images with 381,378 ship instances collected over three months using a Furui 688 boat.
- The automated research cost about $18,000 in tokens and training expenses, or $22 per AAR-hour.
- Researchers fine-tuned Kimi K2.5 with less than $500 of compute, reducing refusals on HarmBench from 100% to 5%.
- Huawei's HiFloat4 is designed for efficient LLM pretraining on Ascend NPUs with strict power constraints.
Entities
Institutions
- Anthropic
- Anthropic Fellows Program
- Huawei
- Open Compute Project
- Constellation
- Brown University
- University of Wisconsin-Madison
- Imperial College London
- University of Maryland
- Georgia Institute of Technology
- Bar Ilan University
- University of Toronto
- University of Oxford
- Wuhan University of Technology
- Huazhong University of Science and Technology
- Tianjin University
- Moonshot
- DeepSeek
- CIA
Locations
- Zhoushan
- China
- Ukraine