
Generalization to the Natural Gradient Descent
Shaojun Dong,1Fengyu Le,2, 1 Meng Zhang,3, 4 Si-Jing Tao,3, 4
Chao Wang,1, ∗Yong-Jian Han,3, 1, 4, †and Guo-Ping Guo3, 1, 4
1Institute of Artificial Intelligence, Hefei Comprehensive National Science Center
2AHU-IAI AI Joint Laboratory, Anhui University
3CAS Key Laboratory of Quantum Information, University of Science
and Technology of China, Hefei 230026, People’s Republic of China
4Synergetic Innovation Center of Quantum Information and Quantum Physics,
University of Science and Technology of China, Hefei 230026, China
Optimization problem, which is aimed at finding the global minimal value of a given cost function,
is one of the central problem in science and engineering. Various numerical methods have been
proposed to solve this problem, among which the Gradient Descent (GD) method is the most
popular one due to its simplicity and efficiency. However, the GD method suffers from two main
issues: the local minima and the slow convergence especially near the minima point. The Natural
Gradient Descent(NGD), which has been proved as one of the most powerful method for various
optimization problems in machine learning, tensor network, variational quantum algorithms and
so on, supplies an efficient way to accelerate the convergence. Here, we give a unified method to
extend the NGD method to a more general situation which keeps the fast convergence by looking for
a more suitable metric through introducing a ’proper’ reference Riemannian manifold. Our method
generalizes the NDG, and may give more insight of the optimization methods.
I. INTRODUCTION
Optimization problems play crucial role in many fields
of science, engineering and industry. Generally, a task
can be evaluated by a real-valued function L(x) (named
cost function), where xare the parameters used to iden-
tify the solution. Therefore, the key to solve the problem
optimally is searching the parameters xminimizing the
function L(x). It is a big challenge to find the global
minima of the cost function L(x) when it is complicated.
Various numerical methods have been introduced to
solve the optimization problems. Among which, the gra-
dient descent (GD) method(also known as steepest de-
scent method), is a widely used technique for its simplic-
ity and high efficiency. In the GD method, the parame-
ters are updated according to the gradient, reads
xt+1 =xt−η∂xL(1)
where ηis the searching step at iteration t. Generally,
numerical methods, which reach global minima by iter-
ative updates of parameters according to local informa-
tion, suffer two main problems: the local minima and the
slow convergence. The stochastic methods are introduced
to help the method to jump out the local minima[1].
On the other hand, the convergence has been improved
by more sophisticated methods like Conjugate Gradi-
ent(CG) method[2–4], Adam method[1], Natural Gradi-
ent Descent(NGD) method[5,6].
Let’s take another perspective to view the gradient de-
scent: Let Xbe the parameter space which form a man-
ifold with a flat metric Gij (x) = δij .L(x) is viewed as a
∗wang1329@ustc.edu.cn
†smhan@ustc.edu.cn
scalar field defined on X. It’s easy to see that −∂xLis the
direction in which we can obtain largest decrease of cost
function for fixed small step length, where the distance
is defined by the flat metric. Following this idea, the
NGD replace the flat metric on the parameter manifold
Xby a generally non-flat metric G, in order to improve
the convergence. The optimization variables update with
the iteration
xt+1 =xt−ηG−1∂L(x)
∂x .(2)
The most common application of NGD is the optimiza-
tions of statistical models, where the Fisher informa-
tion(FI) matrix [5,6]is used as the metric. Fisher in-
formation(FI) matrix for parameterized probability dis-
tribution p(x, s) is defined as
GF I
i,j (x) = X
s
p(x, s)∂log p(x, s)
∂xi
∂log p(x, s)
∂xj(3)
where xis the parameter and sis the random variable.
NGD has gained more and more attentions in recent
years by showing its outstanding performance in conver-
gence on a variety areas of researches such as Variational
Quantum Algorithms(VQA)[7–10],Variational Quantum
Eigensolver(VQE)[11–14]. The NGD has also been used
in solving quantum many-body problems[15] where the
model can be transformed to a statistical one, and
the metric is called Fubini-Study metric tensor[14,16].
In the field of variational Monte Carlo method in
neural network[17,18], NGD (also called Stochastic
Reconfiguration[19–21]) method is also widely used, and
the FI matrix is called S matrix. Although the NGD has
shown its efficiency in various realm, there comes a ques-
tion whether the FI matrix is the only choice for the met-
ric in the NGD? Could we find a better one? Moreover,
arXiv:2210.02764v1 [math.OC] 6 Oct 2022