Anthropic Shows AI Safety Research, Huawei Creates Efficient 4-bit Format, Chinese Model Safety Evaluated

ai-technology · 2026-04-20

Researchers at Anthropic have shown that AI systems can independently perform alignment research, with automated agents surpassing human researchers in tasks involving weak-to-strong supervision. Meanwhile, Huawei's team has unveiled HiFloat4, a 4-bit precision format that exceeds the performance of the Open Compute Project's MXFP4 on Ascend NPUs, highlighting China's emphasis on hardware efficiency in light of export restrictions. A safety assessment of the Chinese model Kimi K2.5 indicated fewer refusals on CBRN-related inquiries compared to Western models like GPT-5.2 and Claude Opus 4.5, although it exhibited more alignment challenges. Ukrainian President Volodymyr Zelenskyy reported the first fully robotic triumph in the conflict, with unmanned systems completing over 22,000 missions. Additionally, researchers from Wuhan University of Technology developed WUTDet, a ship detection dataset comprising 100,576 images taken from a boat around Zhoushan, China. The newsletter wraps up with a fictional narrative about a covert AI initiative named SNOWSUMMER, delving into themes of AI secrecy and superintelligence.

Key facts

Anthropic's automated alignment researchers achieved a performance gap recovery of 0.97, surpassing human researchers' 0.23.
HiFloat4 reduces relative loss to about 1.0% compared to MXFP4's 1.5% on models like Llama3-8B and Qwen3-MoE-30B.
Kimi K2.5 showed significantly fewer refusals on CBRN tasks but higher misalignment scores than GPT-5.2 and Claude Opus 4.5.
Ukrainian unmanned platforms, including Ratel and TerMIT, have conducted over 22,000 missions in three months.
WUTDet dataset contains 100,576 images with 381,378 ship instances collected over three months using a Furui 688 boat.
The automated research cost about $18,000 in tokens and training expenses, or $22 per AAR-hour.
Researchers fine-tuned Kimi K2.5 with less than $500 of compute, reducing refusals on HarmBench from 100% to 5%.
Huawei's HiFloat4 is designed for efficient LLM pretraining on Ascend NPUs with strict power constraints.

Entities

Institutions

Anthropic
Anthropic Fellows Program
Huawei
Open Compute Project
Constellation
Brown University
University of Wisconsin-Madison
Imperial College London
University of Maryland
Georgia Institute of Technology
Bar Ilan University
University of Toronto
University of Oxford
Wuhan University of Technology
Huazhong University of Science and Technology
Tianjin University
Moonshot
DeepSeek
CIA

Locations

Zhoushan
China
Ukraine

Sources

Import AI (Jack Clark) — 2026-04-20