Efficient Mode
For flat terrain, we introduce efficient mode, which eliminates onboard perception and complex planning, generating a smooth path for efficiency.
The increasingly complex and diverse planetary exploration environment requires more adaptable and flexible rover navigation strategy. In this study, we propose a VLM-empowered multi-mode system to achieve efficient while safe autonomous navigation for planetary rovers. Vision-Language Model (VLM) is used to parse scene information by image inputs to achieve a human-level understanding of terrain complexity. Based on the complexity classification, the system switches to the most suitable navigation mode, composing of perception, mapping and planning modules designed for different terrain types, to traverse the terrain ahead before reaching the next waypoint. By integrating the local navigation system with a map server and a global waypoint generation module, the rover is equipped to handle long-distance navigation tasks in complex scenarios. The navigation system is evaluated in various simulation environments. Compared to the single-mode conservative navigation method, our multi-mode system is able to bootstrap the time and energy efficiency in a long-distance traversal with varied type of obstacles, enhancing efficiency by 79.5%, while maintaining its avoidance capabilities against terrain hazards to guarantee rover safety.
The local navigation system utilizes a VLM terrain classifier and three navigation methods tailored to different terrains: flat, rocky, and challenging. Terrain complexity is determined from RGB images by analyzing slope and rock distribution. Three distinct navigation strategies are designed and adopted, and a closed-loop navigation system is established that dynamically adapts to different terrains.
For flat terrain, we introduce efficient mode, which eliminates onboard perception and complex planning, generating a smooth path for efficiency.
For rocky terrain, safe mode performs rock detection to construct a local obstacle map for real-time, and plans a path through obstacles with a lower speed.
For challenging terrain, we utilize elevation mapping to generate a 2.5D costmap. A* planning combined with the costmap and a conservative speed, ensures safe traversal.
The classification results show that the VLM approach has better performance in moderate and complex terrain than geometric method, especially in ambiguous scenarios.
Terrain Type | Geometry-based Method | VLM Method | Avg. Accuracy | ||||
---|---|---|---|---|---|---|---|
Avg. Rock Grid Num. | Avg. Slope Value | Avg. Slope Variance | Avg. Rock Complexity | Avg. Slope Complexity | Geometry | VLM | |
Flat | 0 | 2.6815 | 4.67 | 0.045 | 0.095 | 100% | 100% |
Rocky | 404.65 | 5.5095 | 141.3865 | 0.59 | 0.2 | 90% | 95% |
Challenging | 444.35 | 30.118 | 273.3555 | 0.435 | 0.695 | 85% | 100% |
The efficient mode minimizes travel time in obstacle-free areas but fails in complex environments due to a lack of obstacle detection. The safe mode avoids obstacles effectively but misinterprets terrain features as hazards in challenging landscapes. The conservative mode, though less efficient, ensures successful navigation across all terrains.
In complex environments, the multi-mode system dynamically adapts to terrain complexity by switching to corresponding mode, reducing traversal time to 55.7% compared to single-mode conservative navigation, improving efficiency without compromising safety.
Single-mode | Multi-mode | ||||
---|---|---|---|---|---|
Total | Total | Efficient | Safe | Conservative | |
Time | 1081.7 | 602.6 | 144.9 | 158.6 | 299.1 |
Distance | 413.7 | 411.8 | 215.1 | 94.7 | 102 |