A computer generated image showing a swirl of shapes, meant to depict a hopeful futuristic feeling.

Ai2 blog

August 2025 - Asta: Accelerating science through trustworthy agentic AI

We announce Asta, our bold initiative to accelerate science through trustworthy, truly open agentic AI.

August 2025 - NSF and NVIDIA award Ai2 a combined $152M to support building a national level fully open AI ecosystem

Ai2 has been awarded a combined $152 million from the U.S. National Science Foundation (NSF) and NVIDIA as part of a jointly funded project to advance our research and develop truly open AI models and solutions that will accelerate scientific discovery.

Read post

July 2025 - Introducing FlexOlmo: a new paradigm for language model training and data collaboration

Explore how FlexOlmo enables collaborative language model training without sacrificing data privacy or control, introducing a new, flexible approach to building shared AI models.

Read post

August 2025 - OLMoASR: A series of open speech recognition models

We release OLMoASR, a family of open automatic speech recognition (ASR) models trained from scratch on a curated,…

Read post

August 2025 - AstaBench: Rigorous benchmarking of AI agents with a holistic scientific research suite

Introducing AstaBench, a novel AI agents evaluation framework and scientific research benchmark suite.

Read post

August 2025 - Signal and Noise: Reducing uncertainty in language model evaluation

We find that two simple metrics, signal and noise, reveal key differences in the utility of current LLM benchmarks.

Read post

August 2025 - MoNaCo: More natural questions for reasoning across dozens of documents

Introducing MoNaCo, a benchmark of highly challenging questions spanning dozens of documents for evaluating large…

Read post

August 2025 - MolmoAct: An Action Reasoning Model that reasons in 3D space

MolmoAct is the first model able to “think” in three dimensions, trained efficiently and delivering…

Read post

July 2025 - Contextualized Evaluations: Judging language model responses to underspecified queries

How do we evaluate LLMs on underspecified queries? We show that adding clarifying context flips model rankings…

Read post

July 2025 - AutoDS: A prototype engine for autonomous, open-ended scientific discovery

AutoDS goes beyond standard data crunching by building upon its own findings and uncovering insights that may not…

Read post

July 2025 - SciArena: A new platform for evaluating foundation models in scientific literature tasks

Discover how SciArena is being used to evaluate foundation models’ capabilities in scientific literature tasks…

Read post

June 2025 - OMEGA: Can LLMs reason outside the box in math?

Discover how OMEGA is being used to evaluate large language models' ability to generalize in math through…

Read post

1-9Next