Ски-альпинист Филиппов стал 14-м на чемпионате Европы

2026年2月5日 · 李娜 · 来源：tutorial在线

Фото: Fars Media Corporation / Wikimedia

Abstract:Large language model (LLM)-powered agents have demonstrated strong capabilities in automating software engineering tasks such as static bug fixing, as evidenced by benchmarks like SWE-bench. However, in the real world, the development of mature software is typically predicated on complex requirement changes and long-term feature iterations -- a process that static, one-shot repair paradigms fail to capture. To bridge this gap, we propose \textbf{SWE-CI}, the first repository-level benchmark built upon the Continuous Integration loop, aiming to shift the evaluation paradigm for code generation from static, short-term \textit{functional correctness} toward dynamic, long-term \textit{maintainability}. The benchmark comprises 100 tasks, each corresponding on average to an evolution history spanning 233 days and 71 consecutive commits in a real-world code repository. SWE-CI requires agents to systematically resolve these tasks through dozens of rounds of analysis and coding iterations. SWE-CI provides valuable insights into how well agents can sustain code quality throughout long-term evolution.。新收录的资料是该领域的重要参考

Названо чи ，更多细节参见新收录的资料

First FT: the day’s biggest stories，这一点在新收录的资料中也有详细论述

Вступление Финляндии в НАТО назвали худшим решением в истории страны07:45

Sleep Week 2026

比播放量数据更加可怕的是，国产长剧集正在远离大家的“话题中心”，这一点相信很多人感同身受。过去大半年的时间，真正成为社交媒体自发性热议话题（而不仅依靠买热搜维持热度）的，在我印象中，仅有一部《太平年》，加上半部《藏海传》。请注意，许多剧集仍然构成了局部的热议话题，并且获得了一些死忠粉丝；我的意思是，它们不再成为“大众热议话题”。作为一个整体的长剧集行业，在社交舆论场中的地位，比五年前乃至十年前，衰落了不止一点半点。