Affordance detection
ASP relies on affordance detection to implement part-level interactions for more complex skills. Our workflow is based on foundation models with strong visual grounding abilities and is adept at mapping various queries to the relevant affordances types and object parts:
- ring the desk bell → [tip_push] the bell button
- pick up the mug → [grasp_part] the mug handle
- open the drawer → [hook_pull] the drawer handle