Advancing Autoware with datacentric 3D object detection

Written by TIER IV | 24-Jul-2024 01:00:00

Amid TIER IV’s recent business expansion, the number of challenges we need to address has steadily increased. One significant bottleneck in our development was the long lead time required to improve existing machine learning models. To overcome this, we developed a system that automatically extracts only high-quality data and established a framework that seamlessly connects the entire workflow, significantly reducing the lead time. This blog post introduces the challenges related to our 3D object-detection machine learning models and the efforts we have made to overcome them.

Past machine learning efforts and challenges

Autoware, the world's first open-source software for autonomous driving, primarily integrates sensor information from LiDAR, cameras, radar, inertial measurement units, and global navigation satellite systems for perception, planning, and control tasks on both public roads and in closed environments. Autoware’s perception module, which is primarily handled by the perception team, integrates multimodal sensor information from LiDAR, cameras, and radar using a hybrid approach of machine learning and rule-based systems to generate inference results. One of the primary tasks of the perception module is to estimate the positions of cars and pedestrians around the autonomous vehicle using 3D object detection. In Autoware, the core algorithm used for this purpose is the CenterPoint method. There are various non-machine learning-based modules integrated before and after this core algorithm. Click here for more details.

As TIER IV’s business expands, the number of supported vehicle types and environments has diversified. The level of transparency has also increased. Last year, we received approval for Level 4 autonomous driving at a site in Greater Tokyo and recently started operating autonomous buses in Komatsu, Ishikawa Prefecture. There are also plans to launch an autonomous taxi service in Tokyo later this year. As a result, the number of technical challenges to address has expanded.

An autonomous taxi undergoes tests in Shinjuku Ward.

GSM8 bus

Initially, we adopted a flow where machine learning model weights were improved by periodically annotating datasets. However, in the event of perception system issues, the lead time T_ml for addressing them through dataset annotation and re-training the machine learning model significantly exceeded the lead time T_other for addressing them through other modules (primarily the ROS application layer). As a result, we tended to implement solutions through the latter, which caused structural problems that were exacerbated by other business challenges.

T_ml >> T_other

To put it in machine learning terms, the “errors” that occurred in the field were being back-propagated and learned not via the weights of the machine learning model, but via the design of the rule-based module. In order to solve this problem, we aim to achieve:

T_ml ≒ T_other

And furthermore:

T_ml < T_other

This ensures that issues raised on-site are properly fed back into the weights of the machine learning models as well as the rule-based modules.

To tackle these challenges, we developed a comprehensive system that performs the process from data collection to learning seamlessly, significantly reducing the lead time from issue detection to model implementation.

Building a platform for continuous data collection and learning

To build a seamless platform for improving machine learning in autonomous driving, the following workflow was required.

Upload data collected from the fleet (vehicles → cloud server).
Extract and preprocess the data, converting it into annotation-ready formats.
Perform annotation and register the results as datasets.
Conduct training using the newly created dataset.
Deploy trained weights to vehicles (cloud server → vehicles).

The foundations of the above framework had been in place, but the workflow wasn't running continuously or automatically. A more detailed analysis revealed some challenges.

Most of the data preprocessing is done manually.
There are cases where poor-quality data slips through.
There is no framework for appropriately extracting data (such as active learning), requiring manual selection.

Creating a platform for data preprocessing

Collected data in rosbag format is registered daily into Web.Auto's data platform service using the flow outlined below.

Access the API of the data platform service to collect rosbag files uploaded daily.
Preprocess and convert them into the t4dataset format, which we define as data for learning and evaluation.
Reregister them into the data platform service.

These processes are further broken down into microservices and implemented as container services. They can run locally and are also designed to be scalable to run on cloud services such as AWS.

For preprocessing, it is necessary to run Autoware. This is because directly storing data used for annotation is difficult in terms of data capacity, and it is necessary to regenerate point cloud data for the required data. Also, as Pilot.Auto, a proprietary autonomous driving software based on Autoware, supports various autonomous vehicle models, including the Robotaxi and Robobus, it is crucial to properly use the version of Autoware/Pilot.Auto that corresponds to each rosbag. We use over-the-air images maintained by Web.Auto for each vehicle type, allowing us to implement the preprocessing without the need to rebuild the software, making it possible to perform the same preprocessing as the actual vehicles.

Building a framework to filter out poor-quality data

It is important to select only good-quality data for the continuous improvement of machine learning models, especially for 3D object detection. Various factors can degrade quality. Among these, two particularly critical challenges have a significant impact. Here, we'll outline these challenges and discuss the solutions we've implemented.

Too many dropouts in sensor data
LiDAR and camera do not match well

The first issue is that due to network bandwidth, some LiDAR packet data isn't fully saved in rosbag, leading to data loss. This can happen with experimental versions of Autoware and vehicles that are not running the stable version. Filtering out poor-quality data based on drop rates can help as a quick solution, but it might not meet requirements to annotate specific scenes. To address this, we are continuously improving network traffic management in ECU design and have confirmed improvements by adopting the MCAP format, which will become the default from ROS 2 Iron onward.

The second issue could be due to an incorrect LiDAR-camera calibration, but it might also be a problem with time synchronization. While this issue is rare in vehicles running stable versions, it can occur in vehicles and systems that are still in the early stages of development. Operational errors can also contribute to this issue. To address this, people periodically check videos where point clouds are overlaid on camera images. However, aligning LiDAR-camera overlays for both dynamic and stationary objects is inherently difficult and it requires expertise to perform these checks effectively. Time synchronization of sensors is a major focus at TIER IV. We are developing a universal sensor driver called nebula. Through such initiatives, we are gradually aiming to reduce the need for human visual confirmation as part of our roadmap.

Active learning: Building a framework to select information-rich data

While vast amounts of data are generated daily from vehicles, most of it contributes minimally to improving model performance even after annotation, necessitating careful selection. Therefore, the learning platform we've developed is designed to easily incorporate various selection algorithms.

As a first step, we are developing an algorithm that approximates uncertainty by comparing the inference results of a teacher model and an onboard model, and which could be used to define frames that should be annotated. This approach, similar to the technique known as active learning in machine learning, is where the Vision Language Model (VLM) and other foundational models are expected to be utilized in the near future.

The 3D object-detection teacher models are currently based on BEVFusion. Although the teacher model itself may have false detection, the error tendencies (such as errors in classes) differ from those of onboard models like CenterPoint. This allows us to efficiently extract data that should be annotated.

Wrap-up

This post outlined our efforts to continuously update machine learning models for 3D object detection. By building a pipeline to handle large volumes of real-world data effectively, we aim to continue making improvements to Autoware's machine learning capabilities. We tackled 3D object detection taking into consideration existing machine learning integration and opportunities for improvements in data. We are concurrently working toward replacing various tasks with machine learning, the results of which will be incorporated into the pipeline gradually.

Koji Minoda | Perception Team

A graduate of the master's program in aerospace engineering at the University of Tokyo, Koji joined TIER IV in April 2022 following a stint as a part-time engineer from November 2020. He currently leads efforts to enhance the performance of machine learning models through dataset construction on the perception team.

TIER IV is always on the lookout for passionate individuals to join our journey. If you share our vision of making autonomous driving accessible to all, get in touch.

Visit our careers page to view all job openings.

If you’re uncertain about which roles align best with your experience, or if the current job openings don’t quite match your preferences, register your interest here. We’ll get in touch if a role that matches your experience becomes available, and schedule an informal interview.

Inquiries

Media: pr@tier4.jp
Business: sales@tier4.jp

Social Media
X (Japan/Global) | LinkedIn | Facebook | Instagram | YouTube

More

View full post