Solving Physics Puzzles by Reasoning about Paths

Reviewer 1

I found the paper interesting and that it looks at a problem of predicting future. Whether simply doing error in pixels is good is an open question - I’d look for some other representation as that may provide hope for generalisation beyond these domains.

It would be good to try out new architectures on this and find out how architectural improvements help in predicting the future as well e.g. transformers.

Reviewer 2

The paper develops a model for solving physical puzzles in the PHYRE benchmark. The model first predicts: (1) the path the target object would follow without intervention and (2) the path the target object should follow in order to solve the puzzle. Next, it predicts the desired path of the object that is placed as part of the action and, subsequently, the action that achieves this path. All components of the model are trained jointly in a supervised way; each component receives its own learning signal but learning signals are also backpropagated through the entire architecture. The model achieves promising results, although its performance still falls short of the state-of-the-art on PHYRE.

Reviewer 3

Strengths:

The author(s) present an approach to solving the PHYRE physics puzzle which does not require predefined set of actions like the baseline DQN model does.
Their framework design nicely decouples the puzzle solving task into subtasks, each of which takes input that can be generated by the physics simulator from PHYRE.

Limitations:

The author(s) encode 2-D paths as 2-D probability maps. This representation does not account for the time dependent nature of a path. In the context of the problems presented, take the example of a vertically bouncing ball - its probability map would not look different from one of a ball that falls but doesn’t bounce.
The performance of the presented approach is currently worse than the baseline DQN approach.
The framework presented solves the PHYRE puzzle involving a target ball, a target state and one action ball. The authors have not proposed ways to generalize their framework to different puzzles. For instance, what would need to change if there are now two action balls instead?” “Suggestion:
Include DQN’s performance in table 1 to help the reader compare DQN against your approach.
Given the sequential nature of a 2D path, it would be good to address the different path representations that have been considered apart from the 2D probability map.
I suspect there are cases where the 2D probability map cannot represent a single unique 2D path e.g. cases where the ball bounces multiple times (PHYRE template 00007). It would be good for the author(s) to address their performance on such scenarios.

(Optional):

To help readers gauge their ability to reproduce this work, give details about the machine used for training.

Reviewer 4

The submission proposes a tailored deep learning pipeline for solving PHYRE physics puzzles, a specific set of rigid body optimization problems. This architecture is modelled to reflect how humans would solve these problems and consists of 4 neural networks, trained independently on ground truth data.

This setup seems very narrow in scope, being only applicable to the PHYRE puzzles. Even then, the proposed model performs rather poorly compared to other, more general methods.

I like the general idea of splitting the problem into multiple interpretable problems, e.g. predicting the physics without interaction. I would have liked to see a more generic pipeline, e.g. a simulator network that learns to approximate the trajectories of all involved objects, including collisions. This network could then be used as a proxy for the real simulator to find an optimal placement for the action object using this differentiable physics proxy.