Robotic Control Using Model Based Meta Adaption

2025-04-15 0 0 1.55MB 7 页 10玖币
侵权投诉
Robotic Control Using Model Based Meta Adaption
Karam Daaboul, Joel Ikelsand J. Marius Z¨
ollner
Abstract In machine learning, meta-learning methods aim
for fast adaptability to unknown tasks using prior knowledge.
Model-based meta-reinforcement learning combines reinforce-
ment learning via world models with Meta Reinforcement
Learning (MRL) for increased sample efficiency. However,
adaption to unknown tasks does not always result in preferable
agent behavior. This paper introduces a new Meta Adaptation
Controller (MAC) that employs MRL to apply a preferred robot
behavior from one task to many similar tasks. To do this, MAC
aims to find actions an agent has to take in a new task to reach
a similar outcome as in a learned task. As a result, the agent
will adapt quickly to the change in the dynamic and behave
appropriately without the need to construct a reward function
that enforces the preferred behavior.
I. INTRODUCTION
Adaptive behavior lies in the very nature of life as we
know it. By forming a variety of behaviors, the animal brain
enables its host to adapt to environmental changes continu-
ously [1]. Toddlers, for example, can learn how to walk in
the sand in several moments, whereas robots often struggle
to adapt fast and show rigid behavior encountering a task
not seen before. Fast adaption is possible because animals
do not learn from scratch and leverage prior knowledge to
solve a new task. In machine learning, the domain of meta-
learning takes inspiration from this phenomenon by enabling
a learning machine to develop a hypothesis on how to solve a
new task using information from prior hypotheses of similar
tasks [2]. Thus, it aims to learn models that are quickly
adaptable to new tasks and can be described as a set of
methods that apply a learned prior of common task structure
to make a generalized inference with small amounts of data
[2], [3].
The domain of model-based reinforcement learning (MBRL)
comprises methods that enable a Reinforcement Learning
(RL) agent to successfully master complex behaviors using a
deep neural network as a model of a tasks system dynamics
[4]. To solve an RL task, this dynamics model is utilized to
optimize a sequence of actions (e.g., with model predictive
control) or to optimize a policy, making MBRL more sample
efficient than model-free reinforcement learning (MFRL) [5],
[6], [7]. Even though MBRL methods show improved sample
efficiency compared to MFRL approaches, the amount of
training data needed to reach ”good” performance scales
exponentially with the dimensionality of the input state-
action space of the dynamics model [8]. Additionally, data
scarcity is even more challenging when a system has to
*Equal contributions
Karlsruhe Institute of Technology, Kaiserstr. 12, 76131 Karlsruhe,
Germany {daaboul, marius.zoellner}@kit.edu,
joel.ikels@student.kit.edu
Fig. 1. Action sequences of an ant robot during meta-testing. The test task
is to adapt to the gravity of 5m/s2. A model-based meta-reinforcement
learning approach with MPC results in an undesired robot behavior (row
1). MAC finds a behavior similar to the one learned at its reference task
(row 2).
adapt online while executing a task. A robot, for example,
might encounter sudden changes in system dynamics (e.g.,
damaged joints) or changes in environmental dynamics (e.g.,
new terrain conditions) that require fast online adaption.
By combining meta-learning and MBRL, robots can learn
how to quickly form new behaviors when the environment-
or system-dynamics change [9], [10], [11], [12]. However,
newly formed behavior might be undesirable even if the
underlying task is mastered correctly according to the en-
vironment’s reward function. For example, as seen in figure
1, a robot Ant, trained to walk as fast as possible, will start to
jump or roll if the gravity of its environment is very low. In a
real-world setting, such a situation might damage the robot.
Therefore, the RL agent requires a tailored reward function to
form behavior that does no damage. Nevertheless, designing
a reward function is challenging since it is time-consuming
and challenging to master, especially for various tasks. This
paper introduces a controller for MBRL that employs meta-
learning to apply a selected robot behavior from one robotic
task to a range of similar tasks. In other words, it aims to find
actions the robot has to take in a new task to reach a similar
outcome as in a learned task. Thus, it alleviates the need to
construct a reward function that enforces preferred behavior.
It builds on top of the FAMLE algorithm by Kaushik et al.
[11] making use of an Embedding Neural Network (ENN)
for quick adaption to new tasks through task embeddings as
learned priors. By combining an ENN with an RL policy of
a reference task, the controller predicts which actions need
to be taken in unseen tasks to mimic the behavior of the
reference task. While being initialized with the most likely
embedding, a trained meta is adapted to approximate future
environment states and compare them to the preferred states
of the reference task. Actions leading to states in the unseen
task that are very similar to those reached by the RL policy
in the reference task are then chosen to be executed in the
arXiv:2210.03539v1 [cs.RO] 7 Oct 2022
environment. To account for the usage and adaption of a
meta-model during planning, we call our approach Meta
Adaptation Controller (MAC).
First, we introduce related work and preliminaries. Next,
the challenge and our approach to solving it are described.
Finally, experiment results are presented that compare MAC
with MPC employing different meta-learning methods.
II. RELATED WORK
In recent years, robotics has achieved remarkable success
with model-based RL approaches [13], [14], [15]. The agent
can choose optimal actions by utilizing the experiences
generated by the model [7]. As a result, the amount of
data required for model-based methods is typically much
smaller than their model-free counterparts, making these
algorithms more attractive for robotic applications. One
drawback in many of these works is the assumption that the
environment is stationary. In real robot applications, however,
many uncertainties are difficult to model or predict, some of
which are internal (e.g., malfunctions [9]) and others external
(e.g., wind [12]). These uncertainties make the stationary
assumption impractical. That can lead to suboptimal behavior
or even catastrophic failure. Therefore, a quick adaptation of
the learned model is critical.
”Gradient-based meta-learning methods leverage gradient
descent to learn the commonalities among various tasks”
[16, p. 1]. One such method introduced by Finn et al.
[17] is Model-Agnostic Meta-Learning (MAML). The key
idea of MAML is to tune a model’s initial parameters such
that the model has maximal performance on a new task.
Here, meta-learning is achieved with bi-level optimization, a
models task-specific optimization and a task-agnostic meta
optimization. Instantiated for MFRL, MAML uses policy
gradients of a neural network model, whereas, in MBRL,
MAML is used to train a dynamics model. REPTILE by
Nicol et al. [18] is the first-order implementation of MAML.
In contrast to MAML, task-specific gradients do not need
to be differentiated through the optimization process. This
makes REPTILE more computationally efficient with similar
performance.
A model-based approach using gradient-based MRL was
presented in the work of Nagabandi et al. [9] and targets
online adaption of a robotic system that encounters different
system dynamics in real-world environments. In this context,
Kaushik et al. [11] point out that in an MRL setup where
situations do not possess strong global similarity, finding
a single set of initial parameters is often not sufficient
to learn quickly. One potential solution would be to find
several initial sets of model parameters during meta-training
and, when encountering a new task, use the most similar
one so that an agent can adapt through several gradient
steps. Their work Fast Adaptation through Meta-Learning
Embeddings (FAMLE) approaches this solution by extending
a dynamical models input with a learnable d-dimensional
vector describing a task. Similarly, Belkhale et al. [12] intro-
duce a meta-learning approach that enables a quadcopter to
adapt online to various physical properties of payloads (e.g.,
mass, tether length) using variational inference. Intuitively
each payload causes different system dynamics and therefore
defines a task to be learned. Since it is unlikely to accurately
model such dynamics by hand and it is not realistic to
know every payloads properties value beforehand, the meta-
learning goal is the rapid adaption to unknown payloads
without prior knowledge of the payload’s physical properties.
That is why a probabilistic encoder network finds a task-
specific latent vector fed into a dynamics network as an
auxiliary network. Using the latent vector, the dynamics
network learns to model the factors of variation that affect
the payload’s dynamics and are not present in the current
state. All these algorithms use MPC during online adaption.
Our work introduces a new controller for online adaption in
a model-based meta-reinforcement learning setting.
III. PRELIMINARIES
A. Meta Learning
Quick online adaption to new tasks can be viewed in
the light of a few-shot learning setting where the goal of
meta-learning is to adapt a model fθto an unseen task
Mjof a task distribution p(M)with a small amount of
kdata samples [17]. The meta-learning procedure usually
is divided into meta-training with nmeta-learning tasks
Miand meta-testing with ymeta-test tasks Mjboth
drawn from p(M)without replacement [3]. During meta-
training, task data may be split into train and test sets
usually representing kdata points of a task Dmeta-train =
{(Dtr
i=1,Dts
i=1), . . . (Dtr
i=n,Dts
i=n)}. Meta-testing task data
Dmeta-test = (Dmeta-test
j=1 ,...,Dmeta-test
j=y)is hold out dur-
ing meta-training [3]. Meta-training is then performed with
Dmeta-train and can be viewed as bi-level learning of model
parameters [19]. In the inner-level, an update algorithm Alg
with hyperparameters ψmust find task-specific parameters φi
by adjusting meta-parameters θ. In the outer-level, θmust be
adjusted to minimize the cumulative loss of all φiacross all
learning tasks by finding common characteristics of different
tasks through meta parameters θ?:
outer-level
z }| {
θ?= arg min
θ
n
X
i=1
LDi∼Mi(φi)
where φi=Algψ
Di∼Mi(θ)
| {z }
inner-level
(1)
Once θ?is found, it can be used during meta-testing for
quick adaption: φj=Alg(θ?,Dj)
B. Model-based Reinforcement Learning
In RL, a task can be described as a Markov Decision Pro-
cess (MDP) M={S, A, p (st=0), p (st+1 |st, at), r, H }
with a set of states S, a set of actions A, a reward
function r:S × A 7→ R, an initial state distribution
p(st=0), a transition probability distribution p(st+1 |st, at),
and a discrete-time finite or continuous-time infinite hori-
zon H. MBRL methods sample ground truth data Di=
{(s0, a0, s1),(s1, a1, s2), . . .}from a specific task Miand
摘要:

RoboticControlUsingModelBasedMetaAdaptionKaramDaaboul,JoelIkelsandJ.MariusZ¨ollnerAbstract—Inmachinelearning,meta-learningmethodsaimforfastadaptabilitytounknowntasksusingpriorknowledge.Model-basedmeta-reinforcementlearningcombinesreinforce-mentlearningviaworldmodelswithMetaReinforcementLearning(MR...

展开>> 收起<<
Robotic Control Using Model Based Meta Adaption.pdf

共7页,预览2页

还剩页未读, 继续阅读

声明:本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。玖贝云文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知玖贝云文库,我们立即给予删除!
分类:学术论文 价格:10玖币 属性:7 页 大小:1.55MB 格式:PDF 时间:2025-04-15

开通VIP享超值会员特权

  • 多端同步记录
  • 高速下载文档
  • 免费文档工具
  • 分享文档赚钱
  • 每日登录抽奖
  • 优质衍生服务
/ 7
客服
关注