New AI framework autonomously optimizes data, architectures and training algorithms, surpassing human foundations



AI R&D runs in a cycle of hypotheses, experiments, and analysis; each step requires substantial manual engineering effort. A new framework from SII-GAIR researchers aims to close that bottleneck by automating the entire optimization cycle for training data, model architectures, and learning algorithms.

A new framework called SO-EVOLVEdeveloped by researchers from the Generative Artificial Intelligence Research Laboratory (SII-GAIR), it aims to solve this bottleneck. Designed as an agent system for AI research for AI, it uses a continuum "learn-design-experiment-analyze" cycle to automate the optimization of the fundamental AI stack.

In experiments, this self-improvement circuit autonomously discovered novel designs that significantly outperformed state-of-the-art human baselines. The system generated novel language model architectures, improved pre-training data pipelines to increase baseline scores by more than 18 points, and designed highly efficient reinforcement learning algorithms.

For enterprise teams running repeated optimization cycles on their AI systems, the framework offers a path to reduce manual engineering overhead while matching or exceeding the performance of human-designed baselines.

The data and design bottleneck

Engineering teams can only explore a small fraction of the vast possible design space for AI models at any given time. Running experimental workflows requires costly manual effort and frequent human intervention. And the knowledge gained in these costly cycles is often isolated as intuition or individual experience, making it difficult to systematically preserve and transfer that knowledge to future projects or between different teams. These limitations fundamentally limit the pace and scale of AI innovation.

AI has made incredible strides in scientific discoveries, ranging from specialized tools like AlfaFold solving discrete biological problems to agent systems answering basic scientific questions. However, current frameworks still struggle with open AI innovation and are mostly limited to narrow optimization within very specific constraints.

Improving the basic capabilities of AI is much more complex. It requires modifying large interdependent code bases, running compute-intensive experiments that consume tens to hundreds of GPU hours, and analyzing multidimensional feedback from training dynamics.

“Existing frameworks have not yet demonstrated that AI can operate effectively in this regime in a unified manner, nor that it can deliver significant advances across the three fundamental pillars of AI development rather than within a single, limited-scope environment,” the researchers write.

How ASI-EVOLVE learns to investigate

To overcome the limitations of manual R&D, ASI-EVOLVE operates in a continuous loop between prior knowledge, hypothesis generation, experimentation and refinement. The system learns relevant knowledge and historical experience from existing databases, designs a candidate program that represents its next hypothesis, runs experiments to obtain evaluation signals, and analyzes the results into reusable, human-readable lessons that feed back into its knowledge base.

There are two key components that drive ASI-EVOLVE. The “Base of Cognition” acts as the fundamental domain of experience of the system. To speed up the search process, the system is preloaded with human knowledge, task-relevant heuristics, and known errors drawn from existing literature. This directs exploration in promising directions from the first iteration.

The second component is the “Analyzer”, which addresses complex and multidimensional feedback from experiments. Processes raw training logs, benchmark results, and efficiency traces, distilling them into compact, actionable information and causal analyses.

Several other complementary modules bring the framework together. A “research” agent reviews prior knowledge of the cognitive base and past experimental results to generate new hypotheses, either by proposing localized code modifications or writing new programs.

The “Engineer” component runs the actual experiments. Because AI training tests are incredibly expensive, the engineer is equipped with efficiency measures such as wall clock limits and fast early rejection tests to filter out bad candidate programs before they consume excessive GPU hours.

Finally, the “Database” serves as the persistent memory of the system, storing the code, research motivations, raw results, and final Analyzer reports for each iteration, ensuring that knowledge accumulates systematically over time.

By unifying these components, ASI-EVOLVE ensures that an AI agent systematically learns from complex, real-world experimental feedback without requiring constant human intervention.

While previous frameworks are designed to develop candidate solutions, “ASI-EVOLVE evolves cognition itself,” the researchers write. “Accumulated experience and distilled knowledge are continually stored and retrieved to inform future exploration, ensuring that the system grows not only in the quality of its solutions but also in its ability to reason about where to look next.”

ASI-EVOLVE in action

In their experiments, the researchers demonstrated that ASI-EVOLVE can successfully improve data curation, model architectures, and learning algorithms to create better AI systems.

For real-world enterprise applications, high-quality data is a persistent bottleneck. When tasked with designing category-specific cleaning strategies for massive pre-training corpora, ASI-EVOLVE inspected data samples and diagnosed quality issues such as HTML artifacts and formatting inconsistencies. The system autonomously formulated custom curation rules and found that systematic cleaning combined with domain-aware preservation rules is much more effective than aggressive filtering.

In benchmark testing, 3B parameter models trained on AI-curated data saw an average score increase of nearly 4 points compared to models trained on raw data. Gains were greatest on knowledge-intensive tasks, with performance increasing by more than 18 points on Massively Multitasking Language Comprehension (MMLU), an LLM benchmark covering tasks in STEM, humanities and social sciences.

Beyond the data, the system proved to be very capable in neural architecture design. Over 1,773 rounds of autonomous exploration, it generated 105 novel linear attention architectures that outperformed DeltaNet, a highly efficient human-designed baseline. To achieve these results, ASI-EVOLVE developed multi-scale routing mechanisms that dynamically adjust the model’s computational budget based on the specific content of the input.

Finally, in the design of reinforcement learning algorithms, ASI-EVOLVE discovered new optimization mechanisms. Designed algorithms that outperformed GRPO’s competitive baseline on complex mathematical reasoning benchmarks such as AMC32 and AIME24. A successful variant invented a "Dynamic radio on a limited budget" which keeps model updates within a defined budget, effectively stabilizing training on noisy data.

What this means for enterprise AI

Enterprise AI workflows constantly require optimizations of existing systems, from fine-tuning open source models on proprietary data to making small changes to architectures and algorithms. Typically, the computational resources and engineering hours required to carry out such efforts are immense and beyond the capabilities of most organizations. As a result, many are forced to run unoptimized versions of standard AI models.

The research team says the framework is designed so that companies can integrate proprietary domain knowledge into the cognition repository and allow the autonomous loop to iterate on internal AI systems.

The research team has ASI-EVOLVE code open sourcemaking the fundamental framework available to developers and product creators.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *