PD Synthetic Data: Best Practices

We want to make sure you get the best possible results when training with PD data. For that, we are excited to share a set of best practices and tips to help you do so!

Through internal ML research and extensive use of our synthetic data in the market, we have established guidelines that typically yield the best performance for our customers’ perception models.

When applying these practices, we’ve seen significant improvements in different types of perception tasks. Examples:

  • Rare Class Detection: Improving cyclist detection on KITTI and nuImages (details)
  • Object Tracking: Beating state-of-the-art on KITTI and MOT17 multi-object tracking (details)
  • and more described within this guide

There are three major factors that we take into account when deciding upon the optimal training strategy:

  1. The amount and complexity of real data (used as synonym for target domain data) that is available
  2. The amount of PD data used
  3. The task that is to be solved

The following recommendations are a starting point for you to train your first baseline synthetic models. The process to achieve great performance with synthetic data is very iterative, and we look forward to working with you to achieve great results!