Documentation

Overview

Batch mode enables machine learning and perception teams to generate large scale synthetic datasets to train, test, and validate their machine learning models.

When requesting data for a new project, our team will work with you to capture and document your dataset requirements. Typically, we need the following information from you to generate a dataset.

Project Scope

What are you trying to achieve? Inputs include:

  • Description of your perception task (e.g, 3D object detection from camera)
  • The challenge(s) you are currently facing (e.g., poor performance detecting construction sites)
  • Current training configuration (e.g., types of models/tasks)
  • Success criteria (e.g., mAP higher than 0.8 for pedestrians)
  • Sample of the training and testing datasets (e.g., 40 diversified frames for each)
  • Desired camera configuration
  • Desired behavior of agents within scenarios (e.g., large amount of jaywalking scenarios)

Annotations

Your current and desired labels:

  • Annotation types you are interested in
  • The objects you're interested in labeling (ontology)
  • Any rulesets to govern the generated annotations. Learn more about our default Ontology & Annotations.
  • If you are interested in creating custom annotation specifications for your synthetic data, our team is available to discuss customization options with you.

Sensor Information

Intrinsics and extrinsics for your camera, LiDAR, and radar sensors. Learn more about the sensor rig specifications we need to generate your dataset.

Dataset Distribution Parameters

You can specify the following parameters to create your custom dataset:

  • Locations: distribution of individual worlds you would like to see in your datasets. Browse our regions to learn more
  • Time of day: distribution of lighting conditions
  • Weather conditions: distributions of [weather conditions] (doc:weather)
  • Dynamic agents: distribution of vehicles, pedestrians, and animals
  • Scenarios: distribution of rare cases or objects
  • Content: distributions of assets like signs, traffic lights, trash bags, mailboxes, fire hydrants, garbage bins, powerlines, and more.

Data Volume

Length of an individual scene and capture rate (frames per second) and the total number of scenes.