January 08 2021
What is a Parallel Domain?
Written by Kevin McNamara, Founder & CEO
Mastering a domain is critical to the safe deployment of an AI technology. When developing autonomous systems to drive cars or deliver packages, the system must master its domain before it can be safely and broadly deployed into the real world. But what is a domain in this context and why is the concept of a Parallel Domain – a virtual world in which we generate synthetic data for training and testing – a breakthrough technology that is changing the way we develop AI?
1: a specified sphere of activity or knowledge.
2: the set of possible values of the variables of a function.
AI developers strive to satisfy both of the above definitions: AI systems are developed to operate within a specific sphere of activity and must be prepared to handle the set of all possible variables within that sphere. If those requirements are satisfied, it could be argued that the system has mastered its domain.
When an autonomous vehicle or computer vision algorithm is trained to detect cars on the street, the target domain would be the combination of the cars, roads, and other environmental factors where the vehicle is going to drive. Our customers train algorithms like this all the time. They train over the set of cars that they have observed and labeled in their real datasets. The challenge is that, for nearly any sufficiently complicated task like vehicle detection, the training sets collected in the real world are guaranteed to be a subset of the actual target domain – we’re never going to capture every single variable permutation in the real world. While machine learning enables our algorithms to generalize in a limited way, it is still critical that our training and test sets contain enough of the tentpoles of the target domain so that the algorithm can extrapolate to make right decision in all situations.
Take the following example for collecting training data for detecting cars.
- Have we collected and labeled images of cars on the road? Well, certainly. Let’s say 1,000,000.
- What about one with a fairly unique shape, like a shipping van? Maybe 10,000 out of our million.
- How about that shipping van at night? Lets say 1,000.
- The van at night in the fog? Maybe 50.
- Add in one more variable like paint color, pulling out of driveways, occlusions, etc, and we barely have enough samples to test our model, let alone train, for this situation.
This results in a combinatorial explosion in cases that need to be covered. Obtaining enough real-world data to train models to cover the target domain becomes intractable, even in the cases where we already know we don’t have good coverage. It’s the primary reason that each successive 10% improvement in model performance takes 10x more work than the previous 10%. Real-world data pipelines don’t scale in the long tail.
So, we need a way to more quickly, safely, and efficiently cover a domain. How do we encounter enough of these cases so that our humans and machines can be best prepared for the rigors of the real world? What if, instead of driving through the real world until we encounter enough cases, we had a way to actually control the variables in the target domain to curate the datasets that we need?
This is what a Parallel Domain is for. A Parallel Domain is a virtual environment that is generated to closely model the complexity of the real world. However, it is not an exact copy – it is a virtual, analogous slice of the real world. A domain that runs parallel to our real target domain. Since it is not an exact copy, we can spin up numerous variations of a domain, leading to many Parallel Domains that can capture the broad variety and unpredictability of the world while representing the idea that our real world, always changing and evolving, has an infinite combination of parallel potential future states and scenarios. We can now do combinatorial parameter sweeps, generating data for each vehicle, multiplied by every lighting condition, by every weather condition, by every paint color.
When a user needs to develop, learn, and test inside of a domain, they create a Parallel Domain to perform their tasks. They can create 1, 100, or 10,000. Their Parallel Domain(s) are theirs – they are highly customizable and controllable, giving the user the synthetic data they need for their task at hand. With this, our customers improve model performance, reduce development time, and save significant costs.
Synthetic data is the future of computer vision and perception. At Parallel Domain, we are here to accelerate our customers’ path to deploying autonomy through our synthetic data platform. Our platform is becoming an essential way for autonomy developers to get the data they need. We are actively working with a broad range of autonomy and computer vision customers to accelerate and improve their development, enabling them to master their domains. You can read more about our products, follow us on LinkedIn or Twitter, and sign up for our newsletter below to keep in touch. Please also feel free to reach out directly: firstname.lastname@example.org.