“Our DNA is implanted so deeply in the culture that other people live and breathe the values we started with.”
Even in childhood, Ofir Chakon, CEO and co-founder of Datagen, loved making sure the things he built benefited someone
Click Here For More 20MinuteLeaders
Were you always so passionate about AI specifically? How did you get into this specific field?
I started my journey at the age of six or seven when my mother bought me my first HTML book. She used to say that it's basically the first investment she made in me since I was super young. Since then, I pretty quickly found myself at the intersection between tech and entrepreneurship. I started building all these small businesses while I was young. Obviously, I made so many mistakes back then, but it was an amazing university for me.
I went to the Technion where I did both my bachelor's and master's in engineering. Somewhere around my master's degree, this is where I really dove into AI and computer vision while it was really the beginning of this entire field.
Now you're no longer just the data scientist. You're now a CEO, you're an entrepreneur, you're a leader by profession. Was that a trivial thing for you?
I found myself thinking about a lot of ideas and trying to implement them. I think I was driven by trying to find someone that will benefit from what I'm building from a very young age. I always wanted to find an interesting combination between building things but also delivering value with them.
Once I went into AI, I found that there are so many different opportunities in this field. It was very, very, premature back then. When the time came, I decided to really build something that will be more substantial in the future. This is where I co-founded the company with my partner, Gil. He's also coming from a very deep technical background from the Technion.
The entry point of Datagen was kind of a funny story. We were browsing some YouTube videos. Suddenly, this video of Zuckerberg bumps up when he presents the Oculus device for the first time on stage. He basically needed to hold remote controllers in his hands in order to interact with the virtual world. We understood that this device has cameras on the external side looking outwards. We would have expected them to understand the world around the device and specifically the hands and be able to integrate them into the virtual space in a simple way. But it wasn't the case.
We understood that there is a gap here that Facebook suffers from, and we made an assumption that this gap is probably shared amongst a lot of customers. This is where we really kicked off the journey.
Specifically in the world of synthetic data, how did we get here?
When we started Datagen, I think that the world of AI really suffered from lack of standardization. It's like you have a complete mess in the world. But on the other hand, it's a huge opportunity to create so many interesting companies around these things.
In the world of AI, you can distinguish between three main building blocks: compute, algorithms, and data. Compute and algorithms, over the past few years, really progressed quite significantly. But when we look more in the data part of things, this is where I think the most opportunity is because this is where the industry as a whole really suffers from a lack of standardization and traditional processes.
What opportunities do you see synthetic data providing? Then I'd love to understand more about the journey of Datagen.
When we look at how teams are now acquiring data in order to train and test their AI algorithms, it's quite a ridiculous process. For example, a team in a company like Walmart might want to track the activity of shoppers walking around the store, grabbing some things off of shelves, for smart checkout purposes. In order to train an AI algorithm to do that, they need to collect hundreds of millions of images from the real world: of all sorts of different shoppers, walking around in different stores, in different lighting conditions, grabbing different items from shelves, and putting them into carts. Collecting all this data with real cameras would be a mess, and a completely manual process. Then they would need to annotate all these images by hand one by one.
In practice, when we get into the bits and bytes, it seems like most of AI is based on very manual, very labor-intensive processes in order to acquire all this data from the real world and label it and clean it. Eventually, these AI systems suffer from a lot of, for example, privacy things and biases because they weren't trained on the right data to achieve the right task.
The concept of synthetic data is taking a completely new approach. At Datagen, we built a 3D simulation of the real world that will look, behave, feel, and interact exactly like the real world. For example, we have 3D virtual people that will enter a virtual store and take virtual items off of shelves. Everything is completely simulated using computer graphics and computer vision.
With all this control and with all this variance, we're actually able then to capture virtually all these images from the simulation software like they were taken from the real world. In this way, we give computer vision engineering teams all of the control they need over data. From a place where it took them six months to collect one dataset, they are now able to generate one dataset in a few hours. They can train their model, see how it performs, and then generate the next data the same day. This shift in mindset will really help them get to market–and get their AI systems to perform better in production–way faster.
It sounds that a product like Datagen inherently allows you to monitor biases much more closely. If you do realize you might have a bias, you're able to adjust that because you're able to create new synthetic data. Right?
Yes. I think one of the interesting trends that we see is that engineers really have a lot of power in today's world. They are turning out to be the main buyers and users of all of the different software products, and a lot of startups are trending towards being developer-first tools. Datagen also gives this full control over the data collection and annotation into the engineer's hands. They can control everything they need in order to progress with their algorithms way, way faster.
What you said about biases is actually quite interesting also. We always think of biases like gender bias or ethnicity bias. But in practice, bias is basically every ratio between two parameters in your system. For example, how your system performs on dark scenes versus light scenes; on red tables versus blue tables. With synthetic data, because you know exactly what you generate from your simulation software, you can basically unbias your network if you find any failures or edge cases that your network is not performing well on.
What have you discovered about the art of running a company that maybe you didn't know before you started as a young entrepreneur?
One of the most interesting things that I learned along the way is that people are everything. When you run a company, you really need to invest your resources in the right place. Companies don't really exist without the people.
It's always a pleasure to just meet people that I haven't even interviewed. It's so amazing to hear how other team members are treating them and how professional and thorough the onboarding process was. For me, real success is hearing from the 90th employee that they feel exactly the same as the fifth employee; it means that we really were able to take our DNA and implant it so deep in the culture of the company that other people live and breathe the values that we started with.
Michael Matias, Forbes 30 Under 30, is the author of Age is Only an Int: Lessons I Learned as a Young Entrepreneur. He studies Artificial Intelligence at Stanford University, is a Venture Partner at J-Ventures and was an engineer at Hippo Insurance. Matias previously served as an officer in the 8200 unit. 20MinuteLeaders is a tech entrepreneurship interview series featuring one-on-one interviews with fascinating founders, innovators and thought leaders sharing their journeys and experiences.
Contributing editors: Michael Matias, Megan Ryan