We present an end-to-end, model-based deep reinforcement learning agent which dynamically attends to relevant parts of its state, in order to plan and to generalize better out-of-distribution. The agent’s architecture uses a set representation and a bottleneck mechanism, forcing the number of entities to which the agent attends at each planning step to be small. In experiments with customized MiniGrid environments with different dynamics, we observe that the design allows agents to learn to plan effectively, by attending to the relevant objects, leading to better out-of-distribution generalization.
— Lees op arxiv.org/abs/2106.02097

Blijf op de hoogte

Wekelijks inzichten over AI governance, cloud strategie en NIS2 compliance — direct in je inbox.

[jetpack_subscription_form show_subscribers_total="false" button_text="Inschrijven" show_only_email_and_button="true"]

Wat ontvangt u? Bekijk edities →

Klaar om van data naar doen te gaan?

Plan een vrijblijvende kennismaking en ontdek hoe Djimit uw organisatie helpt.

Plan een kennismaking →

Ontdek meer van Djimit

Abonneer je om de nieuwste berichten naar je e-mail te laten verzenden.

Categories: Data Platforms