Robotic Control Using Model Based Meta Adaption

2025-04-15 1 0 1.55MB 7 页 10玖币

侵权投诉

Karam Daaboul∗, Joel Ikels∗and J. Marius Z¨

ollner

Abstract— In machine learning, meta-learning methods aim

for fast adaptability to unknown tasks using prior knowledge.

Model-based meta-reinforcement learning combines reinforce-

ment learning via world models with Meta Reinforcement

Learning (MRL) for increased sample efﬁciency. However,

adaption to unknown tasks does not always result in preferable

agent behavior. This paper introduces a new Meta Adaptation

Controller (MAC) that employs MRL to apply a preferred robot

behavior from one task to many similar tasks. To do this, MAC

aims to ﬁnd actions an agent has to take in a new task to reach

a similar outcome as in a learned task. As a result, the agent

will adapt quickly to the change in the dynamic and behave

appropriately without the need to construct a reward function

that enforces the preferred behavior.

I. INTRODUCTION

Adaptive behavior lies in the very nature of life as we

know it. By forming a variety of behaviors, the animal brain

enables its host to adapt to environmental changes continu-

ously [1]. Toddlers, for example, can learn how to walk in

the sand in several moments, whereas robots often struggle

to adapt fast and show rigid behavior encountering a task

not seen before. Fast adaption is possible because animals

do not learn from scratch and leverage prior knowledge to

solve a new task. In machine learning, the domain of meta-

learning takes inspiration from this phenomenon by enabling

a learning machine to develop a hypothesis on how to solve a

new task using information from prior hypotheses of similar

tasks [2]. Thus, it aims to learn models that are quickly

adaptable to new tasks and can be described as a set of

methods that apply a learned prior of common task structure

to make a generalized inference with small amounts of data

[2], [3].

The domain of model-based reinforcement learning (MBRL)

comprises methods that enable a Reinforcement Learning

(RL) agent to successfully master complex behaviors using a

deep neural network as a model of a tasks system dynamics

[4]. To solve an RL task, this dynamics model is utilized to

optimize a sequence of actions (e.g., with model predictive

control) or to optimize a policy, making MBRL more sample

efﬁcient than model-free reinforcement learning (MFRL) [5],

[6], [7]. Even though MBRL methods show improved sample

efﬁciency compared to MFRL approaches, the amount of

training data needed to reach ”good” performance scales

exponentially with the dimensionality of the input state-

action space of the dynamics model [8]. Additionally, data

scarcity is even more challenging when a system has to

*Equal contributions

Karlsruhe Institute of Technology, Kaiserstr. 12, 76131 Karlsruhe,

Germany {daaboul, marius.zoellner}@kit.edu,

joel.ikels@student.kit.edu

Fig. 1. Action sequences of an ant robot during meta-testing. The test task

is to adapt to the gravity of 5m/s2. A model-based meta-reinforcement

learning approach with MPC results in an undesired robot behavior (row

1). MAC ﬁnds a behavior similar to the one learned at its reference task

(row 2).

adapt online while executing a task. A robot, for example,

might encounter sudden changes in system dynamics (e.g.,

damaged joints) or changes in environmental dynamics (e.g.,

new terrain conditions) that require fast online adaption.

By combining meta-learning and MBRL, robots can learn

how to quickly form new behaviors when the environment-

or system-dynamics change [9], [10], [11], [12]. However,

newly formed behavior might be undesirable even if the

underlying task is mastered correctly according to the en-

vironment’s reward function. For example, as seen in ﬁgure

1, a robot Ant, trained to walk as fast as possible, will start to

jump or roll if the gravity of its environment is very low. In a

real-world setting, such a situation might damage the robot.

Therefore, the RL agent requires a tailored reward function to

form behavior that does no damage. Nevertheless, designing

a reward function is challenging since it is time-consuming

and challenging to master, especially for various tasks. This

paper introduces a controller for MBRL that employs meta-

learning to apply a selected robot behavior from one robotic

task to a range of similar tasks. In other words, it aims to ﬁnd

actions the robot has to take in a new task to reach a similar

outcome as in a learned task. Thus, it alleviates the need to

construct a reward function that enforces preferred behavior.

It builds on top of the FAMLE algorithm by Kaushik et al.

[11] making use of an Embedding Neural Network (ENN)

for quick adaption to new tasks through task embeddings as

learned priors. By combining an ENN with an RL policy of

a reference task, the controller predicts which actions need

to be taken in unseen tasks to mimic the behavior of the

reference task. While being initialized with the most likely

embedding, a trained meta is adapted to approximate future

environment states and compare them to the preferred states

of the reference task. Actions leading to states in the unseen

task that are very similar to those reached by the RL policy

in the reference task are then chosen to be executed in the

arXiv:2210.03539v1 [cs.RO] 7 Oct 2022

environment. To account for the usage and adaption of a

meta-model during planning, we call our approach Meta

Adaptation Controller (MAC).

First, we introduce related work and preliminaries. Next,

the challenge and our approach to solving it are described.

Finally, experiment results are presented that compare MAC

with MPC employing different meta-learning methods.

II. RELATED WORK

In recent years, robotics has achieved remarkable success

with model-based RL approaches [13], [14], [15]. The agent

can choose optimal actions by utilizing the experiences

generated by the model [7]. As a result, the amount of

data required for model-based methods is typically much

smaller than their model-free counterparts, making these

algorithms more attractive for robotic applications. One

drawback in many of these works is the assumption that the

environment is stationary. In real robot applications, however,

many uncertainties are difﬁcult to model or predict, some of

which are internal (e.g., malfunctions [9]) and others external

(e.g., wind [12]). These uncertainties make the stationary

assumption impractical. That can lead to suboptimal behavior

or even catastrophic failure. Therefore, a quick adaptation of

the learned model is critical.

”Gradient-based meta-learning methods leverage gradient

descent to learn the commonalities among various tasks”

[16, p. 1]. One such method introduced by Finn et al.

[17] is Model-Agnostic Meta-Learning (MAML). The key

idea of MAML is to tune a model’s initial parameters such

that the model has maximal performance on a new task.

Here, meta-learning is achieved with bi-level optimization, a

models task-speciﬁc optimization and a task-agnostic meta

optimization. Instantiated for MFRL, MAML uses policy

gradients of a neural network model, whereas, in MBRL,

MAML is used to train a dynamics model. REPTILE by

Nicol et al. [18] is the ﬁrst-order implementation of MAML.

In contrast to MAML, task-speciﬁc gradients do not need

to be differentiated through the optimization process. This

makes REPTILE more computationally efﬁcient with similar

performance.

A model-based approach using gradient-based MRL was

presented in the work of Nagabandi et al. [9] and targets

online adaption of a robotic system that encounters different

system dynamics in real-world environments. In this context,

Kaushik et al. [11] point out that in an MRL setup where

situations do not possess strong global similarity, ﬁnding

a single set of initial parameters is often not sufﬁcient

to learn quickly. One potential solution would be to ﬁnd

several initial sets of model parameters during meta-training

and, when encountering a new task, use the most similar

one so that an agent can adapt through several gradient

steps. Their work Fast Adaptation through Meta-Learning

Embeddings (FAMLE) approaches this solution by extending

a dynamical models input with a learnable d-dimensional

vector describing a task. Similarly, Belkhale et al. [12] intro-

duce a meta-learning approach that enables a quadcopter to

adapt online to various physical properties of payloads (e.g.,

mass, tether length) using variational inference. Intuitively

each payload causes different system dynamics and therefore

deﬁnes a task to be learned. Since it is unlikely to accurately

model such dynamics by hand and it is not realistic to

know every payloads properties value beforehand, the meta-

learning goal is the rapid adaption to unknown payloads

without prior knowledge of the payload’s physical properties.

That is why a probabilistic encoder network ﬁnds a task-

speciﬁc latent vector fed into a dynamics network as an

auxiliary network. Using the latent vector, the dynamics

network learns to model the factors of variation that affect

the payload’s dynamics and are not present in the current

state. All these algorithms use MPC during online adaption.

Our work introduces a new controller for online adaption in

a model-based meta-reinforcement learning setting.

III. PRELIMINARIES

A. Meta Learning

Quick online adaption to new tasks can be viewed in

the light of a few-shot learning setting where the goal of

meta-learning is to adapt a model fθto an unseen task

Mjof a task distribution p(M)with a small amount of

kdata samples [17]. The meta-learning procedure usually

is divided into meta-training with nmeta-learning tasks

Miand meta-testing with ymeta-test tasks Mjboth

drawn from p(M)without replacement [3]. During meta-

training, task data may be split into train and test sets

usually representing kdata points of a task Dmeta-train =

{(Dtr

i=1,Dts

i=1), . . . (Dtr

i=n,Dts

i=n)}. Meta-testing task data

Dmeta-test = (Dmeta-test

j=1 ,...,Dmeta-test

j=y)is hold out dur-

ing meta-training [3]. Meta-training is then performed with

Dmeta-train and can be viewed as bi-level learning of model

parameters [19]. In the inner-level, an update algorithm Alg

with hyperparameters ψmust ﬁnd task-speciﬁc parameters φi

by adjusting meta-parameters θ. In the outer-level, θmust be

adjusted to minimize the cumulative loss of all φiacross all

learning tasks by ﬁnding common characteristics of different

tasks through meta parameters θ?:

outer-level

z }| {

θ?= arg min

i=1

LDi∼Mi(φi)

where φi=Algψ

Di∼Mi(θ)

| {z }

inner-level

(1)

Once θ?is found, it can be used during meta-testing for

quick adaption: φj=Alg(θ?,Dj)

B. Model-based Reinforcement Learning

In RL, a task can be described as a Markov Decision Pro-

cess (MDP) M={S, A, p (st=0), p (st+1 |st, at), r, H }

with a set of states S, a set of actions A, a reward

function r:S × A 7→ R, an initial state distribution

p(st=0), a transition probability distribution p(st+1 |st, at),

and a discrete-time ﬁnite or continuous-time inﬁnite hori-

zon H. MBRL methods sample ground truth data Di=

{(s0, a0, s1),(s1, a1, s2), . . .}from a speciﬁc task Miand

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

10 玖币 0人已下载

立即下载

摘要：

RoboticControlUsingModelBasedMetaAdaptionKaramDaaboul,JoelIkelsandJ.MariusZ¨ollnerAbstractInmachinelearning,meta-learningmethodsaimforfastadaptabilitytounknowntasksusingpriorknowledge.Model-basedmeta-reinforcementlearningcombinesreinforce-mentlearningviaworldmodelswithMetaReinforcementLearning(MR...

展开>> 收起<<

Robotic Control Using Model Based Meta Adaption.pdf

共7页,预览2页

还剩页未读，继续阅读

声明：本站为文档C2C交易模式，即用户上传的文档直接被用户下载，本站只是中间服务平台，本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私，请立即通知玖贝云文库，我们立即给予删除！

Robotic Control Using Model Based Meta Adaption

相关推荐

开通VIP享超值会员特权

作者详情

相关内容

热门标签

举报选择: