Datagen raises $15 million to advance machine learning training with real-world simulations
The Israeli company received funding from VC funds as well as a series of AI luminaries
Other notable private investors include AI luminaries the likes of Michael J. Black of the Max Planck Institute; Gal Chechik, Director of AI at Nvidia; Anthony Goldbloom, CEO and founder of Kaggle; and Trevor Darrell, founder of UC Berkeley’s AI Research Lab. Datagen will use the funding to grow its R&D and expand into new markets.
The company develops visual simulations and recreations of the real world. Controlling the physics of a simulated environment allows machine learning models to be trained more efficiently and at a far greater scale, eliminating current bottlenecks of relying on manual collection of real-world imagery.
Datagen’s simulations that mimic the statistical patterns found in the "real world" offer a solution to the publicly available datasets currently being used by computer vision teams which often include images of real people and places scraped off the internet or manually captured from the real-world using labor-intensive operations. Datagen's simulated data creation tools avoid the privacy pitfalls and unconscious biases that arise from these data collection and annotation methods.
The company, which was founded in 2018 to create a platform for fully synthetic, privacy-by-design data sets for AI applications, counts three of the top U.S. tech giants, as well as the AI research arms of several global consumer manufacturing giants, as customers.
"Our customers have full control over all the parameters that go into the data they create," Datagen Co-founder and CEO Ofir Chakon said. "The real-world implication is that once deployed, you can be sure it's going to work well in different domains, with different ethnicities, in different geographic locations or any environment you can imagine."
"It's not just that simulated data is always better than real-world data collection, it's that it addresses problems which are just unsolvable without it," said Rona Segev, founding partner at TLV and Datagen's earliest and largest investor. "I think it's an enabler for the whole AI industry. Without simulated data, the industry will slow," she said.
“DataGen is tapping into a whole new market that could accelerate the use of AI. The potential here is tremendous: We estimate that synthetic data may surpass real data for training and testing. Maybe most importantly, DataGen’s solutions enable the democratization of AI, giving smaller companies, not just tech giants, access to proprietary, high-quality machine learning training data,” said Zvika Orron, General Partner at Viola Ventures.