Attention heatmaps can reveal a great deal of information about the internal mechanisms that drive the decisions made by a VLA. In this blog post, I aim to showcase how an implementation of visual heatmaps while running inference with PI0 and a variant with future frames guiding can help with this task.

In distribution tasks – bulk material handling

PI0 100pcnt –

FOCA 100pcnt –

PI0 40pcnt –

FOCA 40pcnt –

Where different actions attend to?

FOCA 100Pcnt:

Action 1

Action 3

Action 10

Action 30

Action 49

pi0

Action 0

Action 3

Action 10

Action 30

Action 49

Observe how later acitions in the chunk pay more attention to other (previous) actions rather than to the propioreceptive state.

Similar behavior has been observed with PI0 and FOCA models at different dataset sizes(80 episodes and 200 episodes).

PI0100pcnt