Silicon Valley places major investments in creating 'environments' for developing AI agents

Bitget App

Trade smarter

Bitget-RWA2025/09/17 01:33

By:Bitget-RWA

For a long time, major tech company leaders have promised a future where AI agents independently handle software tasks on behalf of users. But if you try out today’s AI agents, such as OpenAI’s ChatGPT Agent or Perplexity’s Comet, you’ll quickly see that their abilities are still quite limited. Advancing AI agents to be more capable will likely require completely new methods—many of which are still being explored by the industry.

A key approach involves the careful creation of simulated workspaces for training agents on complex, multi-step processes—these are called reinforcement learning (RL) environments. Just as labeled data sets propelled the last wave of AI, RL environments are now becoming vital to the next generation of AI agents.

TechCrunch has learned from AI researchers, company founders, and investors that top AI labs are now seeking out more RL environments, and there’s a growing wave of startups eager to provide them.

“Every major AI lab is working on building their own RL environments,” Jennifer Li, general partner at Andreessen Horowitz, told TechCrunch. “However, putting together these datasets is extremely challenging, so labs are also interested in high-quality environments and evaluation tools from third-party providers. Everyone is paying attention to this area.”

This demand has given rise to a new group of startups with significant funding, including companies like Mechanize and Prime Intellect, both of which are vying for leadership in the field. At the same time, large data-labeling businesses such as Mercor and Surge are increasing their investment in RL environments to adapt to the industry’s move from static datasets to interactive simulations. Major labs are also weighing significant investments: The Information reports that Anthropic may spend upwards of $1 billion on RL environments in the coming year.

Investors and founders are hoping one of these companies could become the “Scale AI for environments,” referencing the $29 billion data labeling giant that helped enable the chatbot boom.

But the big question remains: will RL environments truly drive the next leap in AI innovation?

What is an RL environment?

Essentially, RL environments are training platforms that mimic the real-world tasks an AI agent would perform inside software applications. One founder recently compared building these environments to “designing an extremely dull video game.”

For instance, an environment might emulate a Chrome browser and instruct an AI agent to purchase socks on Amazon. The agent’s actions are evaluated, and it receives a positive signal upon successfully completing the task—buying a suitable pair of socks, in this example.

Although the task may seem straightforward, there are many ways an AI agent could stumble. It could get confused by a website’s dropdown menus or accidentally order too many socks. Since developers can’t predict every possible mistake, the environment needs to be robust enough to capture unexpected behaviors and still provide meaningful feedback. This makes developing environments much more complicated than working with static datasets.

Some RL environments are highly sophisticated, enabling AI agents to use tools, browse the internet, or interact with different software programs to finish assigned tasks. Others are more focused, designed to teach agents specific actions within enterprise software.

While RL environments are currently a hot topic in Silicon Valley, the concept has deep roots. One of OpenAI’s earliest initiatives in 2016 involved creating “RL Gyms,” which closely resemble today’s environment designs. That same year, Google DeepMind’s AlphaGo AI achieved a milestone by defeating a world champion at Go using RL techniques in a simulated setting.

What sets today’s RL environments apart is that researchers are building AI agents capable of using computers by leveraging large transformer models. Unlike AlphaGo, which was tailored for a specific, closed environment, current AI agents are being trained for much broader capabilities. This gives researchers a stronger foundation but also introduces more complexity and room for error.

A crowded field

Companies specializing in AI data labeling, like Scale AI, Surge, and Mercor, are seizing this opportunity and expanding into RL environments. These firms have more resources than most startups in the space, along with longstanding relationships with leading AI labs.

Surge CEO Edwin Chen told TechCrunch that he’s observed a “major uptick” in demand for RL environments from AI labs. According to Chen, Surge—which reportedly made $1.2 billion in revenue last year by partnering with labs such as OpenAI, Google, Anthropic, and Meta—has recently created a dedicated team focused on building RL environments.

Close behind is Mercor, a startup worth $10 billion that has also partnered with OpenAI, Meta, and Anthropic. According to marketing materials reviewed by TechCrunch, Mercor is presenting its RL environment services to investors as a solution for specialized tasks in fields like software development, healthcare, and law.

Brendan Foody, CEO of Mercor, told TechCrunch that “few people truly grasp how big the RL environment opportunity is.”

Scale AI was once the clear leader in data labeling, but it has lost ground after Meta invested $14 billion and recruited its CEO. Since then, Google and OpenAI have stopped using Scale AI as a data supplier, and even within Meta, the company faces competition for data labeling projects. Despite this, Scale is still working to expand its presence in RL environments.

“This is simply the reality of the industry Scale AI operates in,” said Chetan Rane, Scale AI’s product lead for agents and RL environments. “Scale has shown it can pivot quickly. We did this when autonomous vehicles first took off, which was our initial focus. When ChatGPT appeared, we adjusted again. Now, we’re turning our attention to new frontiers like agents and environments.”

Some newer entrants are concentrating solely on environment development from the very beginning. Mechanize is one such startup, founded just six months ago with the bold ambition to “automate every job.” However, co-founder Matthew Barnett told TechCrunch the company is currently focusing on RL environments for AI coding agents.

According to Barnett, Mechanize plans to provide AI labs with a select number of advanced RL environments, in contrast to larger data companies that produce many simpler ones. To attract talent, Mechanize is offering software engineers salaries of $500,000 to build RL environments—substantially higher than what hourly contractors might earn at Scale AI or Surge.

Mechanize is already collaborating with Anthropic on RL environments, according to two sources familiar with the matter who spoke to TechCrunch. Both Mechanize and Anthropic declined to confirm the partnership.

Other startups believe RL environments will have influence beyond large AI labs. Prime Intellect, which has backing from AI expert Andrej Karpathy, Founders Fund, and Menlo Ventures, is targeting smaller developers with its RL environment offerings.

Last month, Prime Intellect rolled out a hub for RL environments, aiming to be the “Hugging Face for RL environments.” The goal is to give open-source developers access to the same tools as major labs, while also monetizing access to computing power.

According to Prime Intellect researcher Will Brown, training generally capable agents in RL environments can be much more computationally demanding than previous AI training strategies. As startups develop RL environments, there’s also a new market opportunity for GPU providers to support these needs.

“RL environments are going to be too vast for any single company to control,” Brown said in an interview. “We’re working on building solid open-source infrastructure around them. Our main business is compute, so this is an easy gateway to GPU usage, but our vision is long-term.”

Will it scale?

The central uncertainty about RL environments is whether this approach will scale as effectively as previous AI training techniques.

Reinforcement learning has driven some of the most significant breakthroughs in AI over the past year, including models like OpenAI’s o1 and Anthropic’s Claude Opus 4. These advances matter because older approaches to improving AI models are now producing diminishing returns.

Developing environments is a major part of AI labs’ bet on RL, with many convinced that adding more data and computational power will keep pushing progress forward. Some OpenAI researchers behind o1 previously shared with TechCrunch that the company focused on AI reasoning models—built through RL and test-time compute—because they believed these methods would scale well.

It’s still unclear what the best method for scaling RL is, but environments are a leading possibility. Instead of just giving chatbots rewards for text, these environments let agents work within simulated settings, using tools and computers. Although this requires far more resources, the potential benefits are greater as well.

Not everyone is convinced that widespread RL environments will work out. Ross Taylor, a former Meta AI research lead and co-founder of General Reasoning, told TechCrunch that RL environments are susceptible to reward hacking—where AI models find shortcuts to earn rewards without actually performing the intended task.

“I think people are underestimating how tough it is to scale environments,” Taylor said. “Even the best publicly available RL environments usually don’t work without major adjustments.”

Sherwin Wu, OpenAI’s Head of Engineering for API products, said on a recent podcast that he’s “not bullish” on RL environment startups. Wu pointed out not only that this is a highly competitive field, but also that the rapid pace of AI research makes it difficult to serve labs effectively.

Andrej Karpathy, an investor in Prime Intellect who has described RL environments as a potential game changer, has also expressed caution about the RL field overall. In a post on X, he questioned how much further RL can push AI advancement.

“I’m optimistic about environments and agent-based interactions, but I’m less positive about reinforcement learning in particular,” Karpathy said.

Update: An earlier version of this story referred to Mechanize as Mechanize Work. It has since been corrected to reflect the company’s official name.

Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.

PoolX: Earn new token airdrops

Lock your assets and earn 10%+ APR

Lock now!