Parallel Domain Synthetic Data Improves Cyclist Detection

Parallel Domain synthetic data significantly improves performance on rare classes such as bicycles with no changes to model architecture.

In recent years learning-based perception models have made the jump from the research lab to viable commercial products that are deployed in the real-world. While machine learning approaches continue to improve year after year, they remain limited by the data that is available during training. Even though a robust ML model is able to generalize what it has learned to unseen data, imbalances in class-distribution still pose a significant challenge to model performance.

 

Figure 1: The performance on a KITTI evaluation image before (left) and after (right) the addition of PD data to the training set.

Because traditional driving datasets are collected from the real world, they mirror this class imbalance. Cyclists, for example, account for only ~4% of objects in KITTI [1] and ~0.2% in nuImages by Motional[2]. Cars on the other hand occur 18x more frequently in KITTI and 200x more frequently in nuImages. This results in models that are better at detecting cars than cyclists. The best performing models on common object detection benchmarks show 13-15% lower average precision on cyclists than on cars. Failure to detect a cyclist can very easily lead to a fatal accident. Cyclist fatalities in the United States have increased 36% since 2010.

 

How Synthetic Data can Help 

When generating synthetic data we have fine-grained control over the class distribution. We show that we can leverage this control to improve the performance of an off-the-shelf 2D object detector without the need for any architectural changes by incorporating high quality synthetic data into the training loop.

To examine the impact of adding synthetic data we compare results on two publicly available datasets of different sizes: KITTI and nuImages. We generate separate synthetic datasets for each, named PD-K and PD-N. For each dataset we match the sensor configuration as well as environmental factors such as time-of-day and weather conditions of the reference data. We increase the relative frequency of cyclists in each of the synthetic datasets to ~13-15%.

 

Figure 2: Comparing real data to its synthetic counterparts. From top row to bottom: KITTI, PD-K, nuImages, PD-N

Dataset# training images# of cyclists (train)# validation images# of cyclists (val)
KITTI 3.7K7343.7K893
nuImages 67.3K116816.4K238
PD-K (KITTI)11.1K43435n/an/a
PD-N (nuImages)67.3K129236n/an/a

Table 1: Sizes of the datasets and the number of cyclist annotations in each.

In order to obtain comparable results, we limit the classes in nuImages to match those available in KITTI, i.e. Cars, Pedestrians and Cyclists. We then train an off-the-shelf YOLOv3 [3][4] model and measure performance by mAP and AP on a per-class basis.

 

Performance Improvements Out of the Box

First we establish a baseline to see how the detector performs when trained on real data only. To improve upon this baseline result we employ two simple training procedures.

  • Jointly train on real and synthetic data at the same time
  • Pre-train the model on synthetic data and finetune on real data

During Joint-Training we concatenate real and synthetic datasets and randomly sample batches from the combined dataset. For fine tuning we first train a model purely on PD data and then train all layers with a reduced learning rate on the corresponding real world dataset.

Figure 3: Comparison of different training procedures on validation sets of KITTI (top) and nuImages (bottom)

When trained in this way, we see a clear performance boost for both techniques on each dataset. On KITTI we see performance improvements across all three classes, with the largest improvement on Cyclists. While the overall performance boost gets smaller with the increased amount of real world data in the nuImages dataset, the performance gains on the Cyclist class are still significant. We also notice that finetuning seems to outperform joint-training when less labelled real world data is available, but the reverse is true on the larger nuImages dataset.

Train ValidationmAPCyclist APCar APPedestrian AP
KITTIKITTI53.5134.5280.4045.62
PD-K +  KITTI (3.0 / 1.0)KITTI67.5357.0884.3661.14
PD-K → KITTIKITTI70.2360.5187.0963.08
Train ValidationmAPCyclist APCar APPedestrian AP
nuImagesnuImages64.3739.5083.2070.42
PD-N +  nuImages  (1.0 / 1.0)nuImages68.5352.1182.8670.63
PD-N → nuImagesnuImages66.2649.4180.7468.64

Table 2: Performance on the KITTI and nuImages validation sets using the default training pipeline. Joint-Training is indicated with a ‘+’ sign, finetuning with an arrow. In parentheses we indicate the dataset ratios of joint-training relative to the real-world training dataset.

 

Reducing Collection and Labelling Costs

Another potential gain from using Parallel Domain synthetic data is to reduce the amount of real world data that is required to be collected in order to achieve a similar level of performance. In a further experiment we found that we can reduce the amount of labelled real-world data by 2/3 while maintaining similar performance to our baseline nuImages model when performing finetuning on nuImages with PD-N. In similar fashion we can reduce dataset size by at least 50% when using joint-training.

Train ValidationmAPCyclist APCar APPedestrian AP
nuImagesnuImages64.3739.5083.2070.42
PD-N +  nuImages (0.5 / 0.5)nuImages64.0045.7379.9966.28
PD-N → nuImages (22.5K) nuImages63.2446.0977.9365.69

Table 3: Performance on the nuImages validation set using the default training pipeline. Joint-Training is indicated with a ‘+’ sign, finetuning with an arrow. In parentheses we indicate the dataset ratios of joint-training relative to the real-world training dataset or the number of training samples used.

 

Synthetic Data Still Shows Improvements with Rebalanced Data 

Adding synthetic data is not the only way to deal with class-imbalances. A common approach is to oversample underrepresented classes in the existing dataset and thus assign them more weight during training. In order to evaluate this approach we tested several different strategies and we will give details below on an approach based on rebalancing image crops.

Instead of training on full images we extract image crops around object instances. We randomly select objects such that we approximate a uniform distribution across all classes. This approach has two advantages: 

  • Training on image crops as a data augmentation technique often helps models to generalize better.
  • As each crop contains less context we can rebalance classes in a more fine-grained manner.

Crop-based rebalancing adds a significant performance boost on nuImages (64% mAP without vs 75% mAP with cropping), but fails to increase performance on KITTI (53% mAP without vs 51% mAP with cropping). As seen below, even with this improvement, adding synthetic data continues to provide an additional performance boost even when using crop based class-rebalancing.

Train ValidationmAPCyclist APCar APPedestrian AP
KITTIKITTI51.3231.6179.5042.84
PD-K+KITTI (3.0 / 1.0)KITTI63.0751.6480.4657.12
PD-K  -> KITTIKITTI68.0655.2386.4362.52
nuImagesnuImages75.7257.1587.8482.17
PD-N+nuImages (1.0 / 1.0)nuImages78.2764.4488.0582.32
PD-N -> nuImagesnuImages77.4462.4087.8880.74

Table 4: Performance on the KITTI validation set using cropping and class-balancing. See table 1 for details about naming conventions.

 

Conclusion

Despite making up only 0.2% of miles travelled [5], Cyclists account for 2.1 percent of all traffic fatalities in the US in 2017 [6]. Autonomous vehicles have the potential to make our roads safer for all road users, but modern perception models are often negatively impacted by class imbalances found in the real-world. In this post, we showed how to significantly improve the ability of an off-the-shelf object detector to detect Cyclists by using PD synthetic data. 

In conclusion we showed three things:

  • You don’t need to change your model architecture to make use of PD synthetic data.
  • You can increase performance on rare classes even if you already have a lot of data.
  • You can reduce data collection and labelling cost by replacing real data with synthetic data.

If you’re interested in the more experimental results, check out our detailed write-up on Weights and Biases.  If you want to see how much Parallel Domain’s data can improve your models, contact us!

We plan to provide the data used in this experiment as an open dataset for non-commercial usage.  If you are interested in a commercial license for this data or a version of a similar dataset in your own sensors and ontology, contact us!

 

References 

[1] Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite – 

[2] nuImages by Motional 

[3] YOLOv3: An Incremental Improvement

[4] MMDetection: Open MMLab Detection Toolbox and Benchmark

[5] National Household Travel Survey – 2017

[6] Traffic Safety Facts – 2017 Data