1 Rey Reza Wiyatno and Liam Paull are with Montreal Robotics and Embodied AI Lab (REAL) and DIRO at the University of Montreal, and Mila
2 Anqi Xu conducted this work with support from his past affiliation with Element AI
Abstract: The ability for a robot to navigate with only the use of vision is appealing due to its simplicity. Traditional vision-based navigation approaches required a prior map-building step that was arduous and prone to failure, or could only exactly follow previously executed trajectories. Newer learning-based visual navigation techniques reduce the reliance on a map and instead directly learn policies from image inputs for navigation. There are currently two prevalent paradigms: end-to-end approaches forego the explicit map representation entirely, and topological approaches which still preserve some loose connectivity of the space. However, while end-to-end methods tend to struggle in long-distance navigation tasks, topological map-based solutions are prone to failure due to spurious edges in the graph. In this work, we propose a learning-based topological visual navigation method with graph update strategies that improve lifelong navigation performance over time. We take inspiration from sampling-based planning algorithms to build image-based topological graphs, resulting in sparser graphs yet with higher navigation performance compared to baseline methods. Also, unlike controllers that learn from fixed training environments, we show that our model can be finetuned using a relatively small dataset from the real-world environment where the robot is deployed. We further assess performance of our system in real-world deployments.
We use Gibson to quantify navigation performance in simulation. As for the robot, we use the LoCoBot in both simulated and real-world experiments, and we teleoperate it in each test map to collect trajectories for building the graph. We evaluate navigation performance to reflect real-world usage: the agent should be able to navigate between any image pairs from the graph, and should not just repeat trajectories based on how the images were collected.
Navigation Performance in Simulation
In this section, we compare the navigation performance of our method against SPTM and ViNG in the simulated environments. In this set of experiments, note that we do not perform graph updates with our method, which is evaluated separately below under "Lifelong Improvement" section.
We evaluate on four test environments that are different from the model's training maps: Barahona (57m2), Bellemeade (70m2), Akiak (124m2), and Caruthers (129m2). These are shown in Fig. 1. For the graph building phase, we teleoperate the robot to explore roughly 3-4 loops around each map. Fig. 2 visualizes the top-down view of each test environment with its corresponding trajectory.
As seen in Fig. 3, our method consistently yields higher navigation success rates in all test environments when the model is finetuned. Additionally, our graphs have significantly fewer number of nodes and edges, which helps to keep planning costs practical when scaling to large environments. Therefore, compared to the baselines that use entire trajectory datasets to build graphs, our sampling-based graph building method produces demonstrably superior performance and efficiency.
Fig. 4 qualitatively compares sample graphs built using different methods. We see that our graph has the fewest number of vertices, yet still maintains proper map coverage. Visually, our graph also has few false-positive edges through walls, and we shall later demonstrate how our graph updates can prune these. In comparison, the SPTM graph also has few false-positive edges. However, because edges are unweighted in SPTM, anecdotally its agent often chose spurious edges during navigation. With ViNG, since it uses the discretized temporal distance to weigh edges, its graph has many false positive edges with large weights. This becomes problematic when the goal is far away from the robot, as the planned path will likely include some spurious edges. This also leads to an unwanted behavior where the agent always outputs a path no matter how unlikely it is to reach the goal.
We now evaluate the proposed graph update method to see how navigation performance is affected as the agent performs more queries. We start these experiments using graphs built with our finetuned models. We then ask the agent to execute randomly sampled navigation tasks while performing continuous graph updates. After every 100 queries, we re-evaluate the navigation performance on the same static set of test episodes used in "Navigation Performance in Simulation" section.
As seen in Fig. 5, the success rate generally improves as we perform more queries, with notable gains initially, while the number of nodes and edges in the graph do not substantially grow. We also see an initial decrease in the number of edges, suggesting that our graph updates successfully pruned spurious edges causing initial navigation failures, then later added useful new nodes for better navigation. Qualitatively, we can see fewer spurious edges when comparing sample graphs before and after updates in Fig. 6. We observe that sometimes the success rate decreased after a batch of graph updates. This is likely caused by spurious edges when adding new graph nodes near each 100th query, before we re-evaluate navigation performance. Nevertheless, such spurious edges are pruned in later updates, thus leading to increasing performance trends during lifelong navigation.
Below are some navigation videos from the simulated environments:
We evaluate the performance of our method in two real-world environments: a studio apartment and a medium-sized university laboratory. In Table 1, we report navigation success rates before and after we perform graph updates with 30 queries.
Table 1 Navigation success rate before and after graph updates in real-world environments.
Below are some navigation videos from the real-world environments: