Bitget App
Trade smarter
Buy cryptoMarketsTradeFuturesEarnSquareMore
Why artificial intelligence startups are choosing to manage their own data

Why artificial intelligence startups are choosing to manage their own data

Bitget-RWA2025/10/16 23:21
By:Bitget-RWA

During a week this summer, Taylor and her roommate attached GoPro cameras to their heads while they painted, sculpted, and handled daily chores. Their goal was to help train an AI vision system, making sure their recordings were synchronized so the AI could observe the same actions from different perspectives. The job was demanding in several respects, but the compensation was generous—and it let Taylor dedicate much of her time to creating art. 

“We’d get up, go through our morning routine, then put on the cameras and sync up the clocks,” she explained. “After that, we’d make breakfast and wash up. Then we’d split up and focus on our art projects.” 

They were expected to deliver five hours of synchronized video each day, but Taylor soon realized she needed to set aside seven hours daily to allow for breaks and to recover physically. 

“It would give you headaches,” she recalled. “When you took it off, you’d have a red mark on your forehead.” 

Taylor, who preferred not to share her surname, was freelancing as a data contributor for Turing, an AI firm that connected her with TechCrunch. Turing’s aim wasn’t to teach the AI to paint, but to help it develop broader abilities in visual reasoning and step-by-step problem-solving. Unlike language models, Turing’s vision system would be trained exclusively on video content—most of which would be sourced directly by Turing. 

In addition to artists like Taylor, Turing is also recruiting chefs, builders, and electricians—essentially anyone whose work involves manual skills. Sudarshan Sivaraman, Turing’s Chief AGI Officer, told TechCrunch that gathering data by hand is the only way to achieve the variety needed in their dataset. 

“We’re collecting data from a wide range of blue-collar professions to ensure diversity during pre-training,” Sivaraman explained to TechCrunch. “Once we’ve gathered all this material, the models will be able to interpret how different tasks are carried out.” 

Turing’s approach to building vision models reflects a broader trend in the AI industry’s relationship with data. Instead of relying on data scraped from the internet or gathered by low-wage annotators, companies are now investing heavily in carefully selected, high-quality data. 

With AI’s capabilities already proven, businesses are turning to exclusive training data as a way to stand out. Rather than outsourcing, many are now handling data collection internally. 

Fyxer, an email company that uses AI to organize messages and compose responses, is one such example. 

After initial trials, founder Richard Hollingsworth realized the most effective strategy was to use several smaller models, each trained on very specific data. While Fyxer builds on an existing foundation model—unlike Turing—the underlying principle is similar. 

“We found that the performance really hinges on how good the data is, not just how much you have,” Hollingsworth said. 

This led to some unusual staffing decisions. In the company’s early days, Hollingsworth noted, there were times when executive assistants outnumbered engineers and managers four to one, as their expertise was crucial for training the AI. 

“We relied heavily on skilled executive assistants because we needed to teach the model the basics of which emails deserved a reply,” he told TechCrunch. “It’s a challenge that’s all about people. Finding the right talent is tough.” 

Data collection continued at a steady pace, but over time, Hollingsworth became more selective, favoring smaller, more refined datasets for post-training. As he put it, “the quality of the data, not the quantity, is the thing that really defines the performance.” 

This is especially important when synthetic data is involved, as it both expands the range of training scenarios and amplifies any weaknesses in the original data. Turing estimates that 75% to 80% of its vision data is synthetic, generated from the initial GoPro recordings. This makes maintaining the quality of the original footage even more critical. 

“If your pre-training data isn’t high quality, then any synthetic data you generate from it will also fall short,” Sivaraman points out. 

Beyond just quality, there’s a strong business case for keeping data collection in-house. For Fyxer, the effort put into gathering data is one of its strongest defenses against competitors. As Hollingsworth sees it, while anyone can use an open-source model, not everyone can assemble a team of expert annotators to make it truly effective. 

“We’re convinced the right approach is through data,” he told TechCrunch, “by developing custom models and using high-quality, human-curated training data.” 

Correction: An earlier version of this article misidentified Turing. TechCrunch apologizes for the mistake.

0

Disclaimer: The content of this article solely reflects the author's opinion and does not represent the platform in any capacity. This article is not intended to serve as a reference for making investment decisions.

PoolX: Earn new token airdrops
Lock your assets and earn 10%+ APR
Lock now!

You may also like

Home Depot Faces Decline: Industry-Wide Slowdown or Company-Specific Challenge?

- Home Depot shares slumped 2–3% premarket after Q3 2025 earnings missed profit forecasts and slashed full-year guidance. - Weak comparable sales growth (0.2% vs 1.3% expected) and housing market pressures highlighted sector-wide challenges. - GMS acquisition added $900M revenue but couldn't offset 1.6% transaction volume decline and margin pressures. - Analysts revised 2025 EPS forecasts down 5% as Stifel downgraded HD to "Hold," reflecting cyclical uncertainty. - Mixed investor reactions persist, with in

Bitget-RWA2025/11/18 13:44
Home Depot Faces Decline: Industry-Wide Slowdown or Company-Specific Challenge?

Klarna Achieves Highest Revenue Yet, but Strategic Lending Leads to Losses

- Klarna reported $903M Q3 revenue (up 31.6%) but $95M net loss due to higher loan loss provisions as it expands "Fair Financing" loans. - Klarna Card drove 4M sign-ups (15% of October transactions) and 23% GMV growth to $32.7B, central to its AI-driven banking strategy. - Q4 revenue guidance of $1.065B-$1.08B reflects $37.5B-$38.5B GMV, supported by $1B facility to sell U.S. loan receivables. - CEO cites stable loan portfolio and AI-driven efficiency (40% workforce reduction) but warns of macro risks incl

Bitget-RWA2025/11/18 13:44
Klarna Achieves Highest Revenue Yet, but Strategic Lending Leads to Losses

PDD's Rising Profits Fail to Compensate for E-Commerce Expansion Challenges

- PDD Holdings reported mixed Q3 2025 results: $15.21B revenue missed forecasts by $90M despite $2.96 non-GAAP EPS beating estimates by $0.63. - E-commerce growth slowed amid intensified competition in China and U.S. regulatory shifts impacting Temu's operations. - Profitability showed resilience with 14% YoY net income growth to $4.41B, driven by cost discipline and 41% R&D spending increase. - $59.5B cash reserves highlight financial strength, but Q4 revenue projections face risks from pricing wars and g

Bitget-RWA2025/11/18 13:44
PDD's Rising Profits Fail to Compensate for E-Commerce Expansion Challenges

Citigroup Achieves Earnings Growth Despite Obstacles, Analysts Raise Ratings as Regulatory Hurdles Persist

- Citigroup reported Q3 adjusted EPS of $2.24, exceeding estimates, with $22.09B revenue up 9.3% YoY, prompting analyst price target upgrades. - The bank declared a $0.60 quarterly dividend (2.4% yield) and saw institutional ownership growth, including 100%+ stake increases by key firms. - Despite 14.14 P/E ratio and 1.37 beta volatility, Citigroup maintains 7.91% ROE and 8.73% net margin, though faces regulatory scrutiny and macroeconomic risks. - Analysts remain divided: Cowen reiterates "hold" at $110,

Bitget-RWA2025/11/18 13:22
Citigroup Achieves Earnings Growth Despite Obstacles, Analysts Raise Ratings as Regulatory Hurdles Persist