当前位置:首页 >> 工程科技 >>

Global optimization using a multipoint type quasi-chaotic optimization method


Applied Soft Computing 13 (2013) 1247–1264

Contents lists available at SciVerse ScienceDirect

Applied Soft Computing
journal homepage: www.elsevier.com/locate/asoc

Global optimization using a multipoint type quasi-chaotic optimization method
Takashi Okamoto ? , Hironori Hirata
Graduate School of Engineering, Chiba University, 1-33 Yayoi-cho, Inage-ku, Chiba-shi, Chiba 263-8522, Japan

a r t i c l e

i n f o

a b s t r a c t
This paper proposes a new global optimization method called the multipoint type quasi-chaotic optimization method. In the proposed method, the simultaneous perturbation gradient approximation is introduced into a multipoint type chaotic optimization method in order to carry out optimization without gradient information. The multipoint type chaotic optimization method, which has been proposed recently, is a global optimization method for solving unconstrained optimization problems in which multiple search points which implement global searches driven by a chaotic gradient dynamic model are advected to their elite search points (best search points among the current search histories). The chaotic optimization method uses a gradient to drive search points. Hence, its application is restricted to a class of problems in which the gradient of the objective function can be computed. In this paper, the simultaneous perturbation gradient approximation is introduced into the multipoint type chaotic optimization method in order to approximate gradients so that the chaotic optimization method can be applied to a class of problems for which only the objective function values can be computed. Then, the effectiveness of the proposed method is con?rmed through application to several unconstrained multipeaked, noisy, or discontinuous optimization problems with 100 or more variables, comparing to other major meta-heuristics. ? 2012 Elsevier B.V. All rights reserved.

Article history: Received 6 April 2012 Received in revised form 24 September 2012 Accepted 16 October 2012 Available online 23 November 2012 Keywords: Global optimization Chaos Simultaneous perturbation gradient approximation Multipoint search Coupled dynamics

1. Introduction The development of global optimization methods, which obtain global minima without being trapped at local minima, has been investigated extensively. So-called physically inspired optimization methods, which use dynamic models as computation models, have been proposed and have mainly been applied to continuously differentiable problems. The common characteristic among these models is that a global search is executed using the autonomous movement of the search point, which is driven by a vector quantity given by its dynamic system, such as a gradient vector, and the search range is then narrowed by an annealing procedure. Examples of these methods include the chaotic optimization method [1,25,17,20] and the Hamiltonian algorithm [18]. Okamoto and Aiyoshi [20] proposed a multipoint type chaotic optimization method (M-COM1 ) for unconstrained optimization problems with continuous variables. The M-COM is a global

? Corresponding author at: Division of Arti?cial Systems Science, Graduate School of Engineering, Chiba University, 1-33 Yayoi-cho, Inage-ku, Chiba-shi, Chiba 2638522, Japan. Tel.: +81 43 2903236; fax: +81 43 2903236. E-mail addresses: takashi@faculty.chiba-u.jp (T. Okamoto), hiro@faculty.chiba-u.jp (H. Hirata). 1 In the literature [20], this method is called the elite coupling type chaotic optimization method. In this paper, we use a different name for readability. 1568-4946/$ – see front matter ? 2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.asoc.2012.10.025

optimization method in which multiple search points which implement global searches driven by a chaotic gradient dynamic model are advected to their elite search points by using a coupling model. Its superior global search capability has been con?rmed through application to several unconstrained multi-peaked optimization problems with 100 or 1000 variables. The M-COM uses a gradient as a driving force for the search points. Hence, computation of a gradient is required to implement the M-COM. However, in actual optimization problems, a gradient is not usually easily computed, because the algorithms or formulae for the computation of the objective functions are not available or because the objective functions are nondifferentiable. In this paper, we consider the introduction of the gradient approximation methods to the chaotic optimization method so that the chaotic optimization method can be applied to a class of problems for which only the objective function values can be computed. Gradient approximation methods include the ?nite difference gradient approximation (FDGA), which is a classical approach, and the simultaneous perturbation gradient approximation (SPGA) [23]. The SPGA can approximate a gradient vector by evaluating the objective function only two times, whereas the FDGA requires a number of evaluations equal to the number of decision variables. For accuracy of the SPGA, it is known that expectation of the gradient estimated by the SPGA approximates the true gradient. Herein, we propose a new global optimization method in which the SPGA is introduced into the M-COM in order to carry out

1248

T. Okamoto, H. Hirata / Applied Soft Computing 13 (2013) 1247–1264

optimization without gradient information at a low cost. Then, we con?rm the effectiveness of the proposed method through application to several unconstrained multi-peaked optimization problems with 100 or more variables. When introducing gradient approximation methods to chaotic optimization, the ?rst and natural choice is the FDGA. However, if the FDGA is simply introduced into the M-COM to solve an optimization problem with a large number of decision variables, we expect the computational cost to be very high. Computational cost is one of our motivation for introducing not the FDGA but the SPGA into the M-COM. The search trajectory generated from the approximated gradient dynamics using the SPGA of the proposed method is analogous to the search trajectory generated from the gradient dynamics of the M-COM. Furthermore, our conclusion based on a theoretical analysis given by Maryak and Chin [15] is that the search trajectory generated from the approximated gradient dynamics using the SPGA has a potentiality for ?nding a global optimal solution. Thus, we introduce the SPGA into the M-COM. Among global optimization methods based only on objective function evaluations, meta-heuristics, in which heuristics are combined on the basis of a very good search strategy, such as diversi?cation or intensi?cation [5], have achieved a certain level of success. Examples of meta-heuristics for unconstrained optimization problems with continuous variables include particle swarm optimization (PSO) [7,8] and differential evolution (DE) [24,22]. Generally, in these methods, interaction among all search points is used as the main driving force. Therefore, these methods have the drawback that once all of the search points have been attracted to a single search point, diversity is lost and stagnation of the search occurs. In such a case, either the search is terminated or a random search, which is not necessarily based on a good strategy, may be executed in order to reestablish diversity. In the proposed method, the search points autonomously implement global searching using the quasi-chaotic search trajectory generated in the gradient dynamics using the SPGA, regardless of attraction to a particular search point. Hence, the aforementioned stagnation does not tend to occur with the proposed method; therefore, we expect that the proposed method performs well for high-dimensional and multi-peaked optimization problems. This paper is organized as follows. In Section 2, we brie?y describe the chaotic optimization method and the M-COM. In Section 3, we describe two gradient approximation methods, focusing on the SPGA. In addition, we describe the simultaneous perturbation stochastic approximation (SPSA) method in which a gradient model using the SPGA is used. The SPSA has strong relevance to the proposed method. In Section 4, we propose a new global optimization method called the quasi-chaotic optimization method in which the SPGA is introduced into the chaotic optimization method; then, we explain the dynamic characteristics of the proposed method and its similarity to the chaotic optimization method; ?nally, we propose the multipoint type quasi-chaotic optimization method based on the M-COM. In Section 5, we con?rm the effectiveness of the proposed method through applications to several unconstrained multi-peaked optimization problems with 100 or more variables, comparing it to the parallel SPSA, the conventional method (M-COM) with FDGA, and other major metaheuristics. 2. Chaotic optimization method 2.1. Optimization problem Let us consider an unconstrained optimization problem: minimize f (x)
x

where x ∈ S ? RN S=
l u x | xn < xn < xn (n = 1, . . . , N) .

(1b) (1c)

Here, x = (x1 , . . ., xN is a decision variable vector, f (x) is the objective function, f : RN → R, and n = 1, . . ., N unless otherwise stated. S de?nes the bounded search space. We assume a global minimum of problem (1) is located within the bounded search space S and apply optimization methods to this space. 2.2. Single-point type chaotic optimization method and its drawbacks Let us assume the objective function f (x) to be twice continuously differentiable in the remaining parts of this section (Section 2). Let us consider a gradient model (steepest descent method) to solve optimization problem (1): dx(t) = ?? f (x), dt (2)

)T

where ? f (x) is the gradient vector of the objective function. The optimal solution obtained by the gradient model is the local minimum in the neighborhood of the initial point. In order to provide global search capability to the gradient model, the discretized gradient chaos model (DGCM) has been proposed. In the DGCM, the chaotic search trajectory, which provides global search capability, is generated as follows. (a) The trajectory obtained by Eq. (2) is con?ned within the bounded search space S. (b) The con?ned trajectory is discretized by Euler’s differentiation technique. (c) The trajectory is destabilized by a tuning of its sampling parameter. The study of the DGCM was ?rst inspired by chaotic neural networks [1], and the DGCM has since been investigated in several other studies [25,17,20]. The DGCM with the toroidalization of the bounded search space and linear annealing [20] is given by un (k + 1) = xn (k) ? T (k)

?f (x(k)) , ?xn

(3a) (3b)

xn (k + 1) = (un (k + 1)), where T (k) = Tmax 1 ? k kmax ,

(3c)

(un ) =

? l u l l u ? (un ? xn )mod(xn ? xn ) + xn (un > xn ) ? ? ?
u u l u (un ? xn )mod(xn ? xn ) + xn l (un < xn )

,

(3d)

un

(otherwise)

(1a)

where T is the sampling parameter (or step width) at k, Tmax is the initial value of the sampling parameter, and kmax is the maximum number of search steps. Generally, the sigmoid function is used to con?ne the search trajectory within the bounded search space. In this paper, however, we use the toroidalization of the bounded search space S to con?ne the search trajectory, due to the advantages of the obtained convergence characteristics [20]. The chaotic search trajectory is generated by setting the sampling parameter T to a large value; later, the search trajectory is stabilized and a local search is implemented by setting T to a small value. Hence, in the chaotic optimization method using Eq. (3), the sampling parameter is ?rst set to a large value to implement a global search and then gradually decreased to implement a local search. This tuning procedure is called chaos annealing. In order to explain the search process of the chaotic optimization method, we show a bifurcation diagram with respect to T. A bifurcation diagram shows the search trajectory after suf?cient time under a ?xed T starting from a local minimum. In other words, a bifurcation diagram shows the search range that a search point can

T. Okamoto, H. Hirata / Applied Soft Computing 13 (2013) 1247–1264

1249

5 4 3 2 1 0 -1 -2 -3 -4 -5 0 0.005 0.01

Local minimum No. I Local minimum No. II

ΔT2
0.015

ΔT2
ΔT
0.02 0.025 0.03 0.035

Hence, the chaotic search trajectory with the DGCM tends to converge to the local minimum at which h (x? ) is smaller. Generally, 1 such a solution lies at the bottom of the widest valley, which has the widest attraction range for the chaotic search trajectory. If such a solution is the globally optimal solution, the chaotic optimization method performs very well. For example, the solution may be a global minimum or a neighbor solution in the case of a problem in which the landscape of the objective function has a big-valley structure. However, the objective function value is not necessarily small in the widest valley; therefore, the chaotic optimization method does not necessarily have an intensi?cation strategy which takes the objective function value into consideration. In addition, because the search trajectory converges to only one valley, the chaotic optimization method has a drawback in terms of diversity on evaluating multiple solutions. 2.3. Multipoint type chaotic optimization method Okamoto and Aiyoshi [20] proposed the M-COM, in which the above-described drawbacks are addressed. In the M-COM, ?rst, multiple search points which implement a global search by using chaotic search trajectories are established in order to introduce diversity when evaluating multiple solutions. Then, they are advected (synchronized) to their elite search points by using a coupling structure in order to introduce an intensi?cation strategy which takes the objective function value into consideration. The speci?c model is given by up (k + 1) = (1 ? c1 (k) ? c2 (k))g(xp (k)) + c1 (k)xpb (k) + c2 (k)xcb (k), xn (k + 1) = (un (k + 1)), where gn (xp (k)) = xn (k) ?
p p p p p

Fig. 1. Bifurcation diagram of the DGCM with exact gradient computation for the example (4). Red points correspond to the bifurcation diagram which starts from local minimum No. I: x*1 = (3.2779, ? 2.7325)T . Blue points correspond to the bifurcation diagram which starts from local minimum No. II: x*2 = (?3.5305, 3.8697)T .

cover with respect to T. The bifurcation diagram for the following example, which is given by
4 2 4 2 f (x) = x1 ? 16x1 + 5x1 + 15x1 x2 + x2 ? 16x2 ? 55x2

x1

(4a) (4b)

S = {x | ? 5 < x1 < 5, ?5 < x2 < 5},

is shown in Fig. 1. This function has two local minima. One is local minimum No. I: x*1 = (3.2779, ? 2.7325)T and f ( x*1 ) = ?87.8583. The other is local minimum No. II: x*2 = (?3.5305, 3.8697)T and f ( x*2 ) = ?494.8398. Local minimum No. II is also the global minimum. As can be seen from Fig. 1, if T is small, the search trajectory converges to a ?xed point, i.e., a local minimum. The search trajectory is destabilized with increase of T, until a chaotic search trajectory that covers the whole search space is generated. The bifurcation diagram which starts from local minimum No. II is attracted to the bifurcation diagram which starts from local minimum No. I. In the actual optimization procedure with chaos annealing, the sampling parameter T is transitioned from a large value to a small value, that is, the search process of the chaotic optimization method corresponds to a right-to-left process in Fig. 1. Hence, the chaotic search trajectory tends to converge to local minimum No. I. According to our analysis of the convergence behavior of the chaotic optimization method [20], the search trajectory tends to converge to the most stable local minimum for the gradient dynamics given by Eq. (3). The stability of a local minimum is re?ected by the 2-bifurcation parameter T2 , which is the boundary for the sampling parameter between ?xed-point convergence and two periodic oscillations, in general.2 For example, in Fig. 1, T2 of the more stable local minimum, local minimum No. I, is the largest among all local minima; in other words, its diagram bifurcates last. T2 is derived by using linear stability theory [20]. Let the eigenvalues of ? 2 f (x* ), which is the Hessian of f (x), be h (x? ), . . ., h (x? ) N 1 in descending order of their absolute values. Then, 2-bifurcation * ) is given by parameter T2 (x T2 (x? ) = 2
h (x ? ) 1

(6a)

(6b)

T (k)

?f (xp (k)) , ?xn

(6c) (6d) (6e)

xpb (k) = argmin f (xp (l)) | l = 1, . . . , k ,
xp (l)

xcb (k) = arg min f (xq (k)) | q = 1, . . . , P ,
xq (k)

c1 (k) = c2 (k) = cmax sin2

2 k K

,

(6f)

p denotes the index of a search point, and p = 1, . . ., P unless otherwise stated. (un ) in Eq. (6b) is given by Eq. (3d) (the toroidalization of the bounded search space), and T(k) in Eq. (6c) is updated as Eq. p (3c) (the linear annealing). xpb (k) is the best point among the search history of the pth search point (personal best: pbest) and xcb (k) is the best point among all search points at a current search step (current best: cbest). In Eq. (6a), a coupling structure for the chaotic synchronization [21] is used to generate stable synchronization (advection) to the elite search points (pbest and cbest). c1 (k) and c2 (k) are coupling coef?cients given as time-varying coef?cients using trigonometric functions. Their purpose is to prevent complete synchronization and thereby maintain the diversity of the search trajectories. In the M-COM, each search point autonomously implements a global search driven by the gradient dynamics being attracted to their elite search points. The M-COM successfully implements diversi?cation and intensi?cation, which are considered important strategies for global optimization within the ?eld of meta-heuristics. The superior global search capability of M-COM has previously been con?rmed through application to several unconstrained multi-peaked optimization problems with 100 or

.

(5)

2 Strictly speaking, we need to discuss the stability of multiple periodic oscillations. However, this discussion is omitted because it is beyond the scope of this paper.

1250

T. Okamoto, H. Hirata / Applied Soft Computing 13 (2013) 1247–1264

1000 variables [20]. However, the M-COM uses a gradient as a driving force for search points. Hence, computation of a gradient is required to implement the M-COM, and thus the M-COM has a drawback in that its application is restricted to a class of problems in which the gradient of the objective function can be computed. 3. Gradient approximation methods In this paper, we introduce a gradient approximation method so that the M-COM can be applied to a class of problems in which only the objective function value can be computed. There are two gradient approximation methods commonly used. One is the FDGA, which is a classical method. The other is the SPGA [23]. 3.1. Finite difference gradient approximation (FDGA) Using the FDGA, the gradient is approximated using the following equation: ? ?f (x(k)) f (x(k) + = ?xn xen ) ? f (x(k)) , x (7)

for the main search procedure because its computational cost is much lower than that of the FDGA and because a certain degree of accuracy is insured. In this paper, we introduce the SPGA into the M-COM to approximate the gradient in the main search procedure. 3.3. Simultaneous perturbation stochastic approximation (SPSA) method and global convergence The optimization method using a gradient model with the SPGA given by xn (k + 1) = xn (k) ?

?f (x(k)) ?xn

(11)

? where ?f (x(k))/?xn is the estimation of the nth component of the gradient of a function f ( x) at time k, x (> 0) is amount of perturbation, and en is a unit vector whose nth component is 1 and other components are 0. The FDGA approximates the gradient with high accuracy, but the computational cost is very high, namely, N function evaluations to compute one gradient value. Hence, it is too expensive to apply in the M-COM for the main (global) search procedure in high-dimensional problems. However, it is appropriate for the local search procedure to obtain a local minimum with high accuracy. 3.2. Simultaneous perturbation gradient approximation (SPGA) Using the SPGA, the gradient is approximated according to the following equation:

is called the SPSA method. The proposed method described below can be classi?ed as a SPSA method. There has been much research on the application of the SPSA method (see the literatures [19,13,16], for example; additional references can be found in the section “APPLICATIONS OF SPSA” of the literature [2]), including convergence analyses [23,3,4,26,15]. Of special note is that its global convergence under certain conditions has been proved by Maryak and Chin [15].4 This insures global convergence in probability as kmax → ∞. Speci?cally, let xo be a global optimal solution. Then, convergence in probability means
k→∞

lim Pr( x(k) ? xo > ) → 0 ? > 0.

(12)

However, in actual implementations, kmax is ?nite, and thus the basic method does not necessarily obtain a global minimum. Therefore, some improvements are required in actual implementations. 4. Proposed method In this section, we propose a new global optimization method called the quasi-chaotic optimization method, in which the SPGA is introduced into the chaotic optimization method in order to approximate the gradient so that the chaotic optimization method can be applied to a class of problems for which only the objective function values can be computed. Then, we explain its dynamic characteristics and its similarity to the chaotic optimization method using bifurcation diagrams and linear stability theory. Finally, we propose the multipoint type quasi-chaotic optimization method based on the M-COM. 4.1. Quasi-chaotic optimization method and its dynamic characteristics Let us consider the approximated gradient dynamics using the SPGA with the toroidalization of the bounded search space, that is, un (k + 1) = xn (k) ? T

?f (x(k)) f (x(k) + = ?xn

x(k)s(k)) ? f (x(k) ? 2 x(k)sn (k)

x(k)s(k))

,

(8)

where ?f (x(k))/?xn is the estimation of the nth component of the gradient of a function f (x) at time k. Each component of s(k) consists of a sign value, namely, ?1 or 1, generated by the Bernoulli distribution, and the mean value of sn (k) with respect to k is 0. Estimation of the gradient is computed using only two function evaluations with the SPGA, whereas the FDGA requires one function evaluation per variable. Let us assume the objective function f (x) to be continuously differentiable for the remainder of this section (Section 3.2). Then, according to Spall [23],3 using a simple ?rst-order Taylor expansion of f (x(k) ± x(k) s(k)), E

?f (x(k)) ?xn

,

(13a) (13b)

?f (x(k)) ?xn



?f (x(k)) + ?xn

m=n /

?f (x(k)) sm (k) E . sn (k) ?xm

(9)

xn (k + 1) = (un (k + 1)),

We use the Bernoulli distribution to generate s(k); hence, E(sm (k)/sn (k)) = 0 ?n = m, and the second term in the right side of / Eq. (9) disappears, indicating that E

where

(y) =

? ? ?ymax (y < ?ymax ), ? ? ?
ymax y (y > ymax ), (otherwise),

(13c)

?f (x(k)) ?xn



?f (x(k)) . ?xn

(10)

This implies that the expectation of the gradient estimated by the SPGA approximates the true gradient. The SPGA is suitable

the nth component of the approximated gradient vector ?f (x(k))/?xn is given by (8), and the transformation function for the toroidalization (un ) is given by Eq. (3d). Eq. (13c) is a clipping function to prevent divergence of the approximated gradient value.

Spall [23] also assumes that f (x) is either a noisy function or a noiseless function. The objective function in this paper is a noiseless function.

3

4 Maryak and Chin [15] assumes that f (x) is a noisy function. Obviously, the global convergence is also valid for a noiseless objective function. Spall [23] and Maryak and Chin [14] explain this fact.

T. Okamoto, H. Hirata / Applied Soft Computing 13 (2013) 1247–1264

1251

5 4 3 2 1 0 -1 -2 -3 -4 -5 0 0.002 0.004

Jacobian of g at local minimum x* is given by Dg(x? ; s) = I ? where Anm (x? ; s) =
Local minimum No. I Local minimum No. II

T A(x? ; s), 2 x

(15a)

1 sn

?f (x? + xs) ?f (x? ? xs) ? ?xm ?xm
m = 1, . . . , N). (15b)

x1

(n = 1, . . . , N,

Then, we can obtain the following equation using Taylor’s expansion:

ΔT2
0.006 0.008

ΔT2
ΔT
0.01 0.012 0.014 0.016 0.018

?f (x? ± xs) ≈ f (x? ) ± ?xm
Dg(x? ; s) = I ? where Anm (x? ; s) = 1 sn
N

x

s1

? ? + · · · + sN ?x1 ?xN

?f (x? ) . ?xm

(16)

Substituting Eq. (16) into Eq. (15), we obtain T A(x? ; s), (17a)

Fig. 2. Bifurcation diagram of the approximated gradient dynamics (13) for the example (4) ( x = 0.001, ymax = 100). Red points correspond to the bifurcation diagram which starts from local minimum No. I: x*1 = (3.2779, ? 2.7325)T . Blue points correspond to the bifurcation diagram which starts from local minimum No. II: x*2 = (?3.5305, 3.8697)T .

sl
l=1

? ?xl

?f (x? ) ?xm
m = 1, . . . , N).
a (x ? ; s), 1 a (x ? ; s) N

Let x(k) be static and small, i.e., x(k) = x 1 in the remainder of this section (Section 4.1). In the proposed method explained in the next section, x(k) is varied dynamically, but we assume x is static in this section in order to give basic analyses for the search trajectory generated from the approximated gradient dynamics with the SPGA given by Eq. (13). We show a bifurcation diagram with respect to T as in Section 2.2 in order to consider the characteristics of the search trajectory generated by the dynamics (13). The bifurcation diagram of the approximated gradient dynamics (13) for the example (4) is shown in Fig. 2. As can be seen from Figs. 1 and 2, the bifurcation diagram of the approximated gradient dynamics using the SPGA is analogous to that of the DGCM using exact gradient computation. If T is small, the search trajectory converges to a ?xed point, i.e., a local minimum. As with the DGCM using exact gradient computation, the search trajectory is destabilized with increase of T, until a global search trajectory that covers the whole search space is generated. Behavior of the global search trajectory is analogous to that of the chaotic search trajectory generated by the DGCM using exact gradient computation; this is why we call this global search trajectory a quasi-chaotic search trajectory, since its gradient is computed stochastically. In addition, we call the optimization method using the quasi-chaotic search trajectory the quasi-chaotic optimization method (Q-COM). As with the DGCM using exact gradient computation, the bifurcation diagram starting from local minimum No. II is attracted to the bifurcation diagram starting from local minimum No. I. Hence, the quasi-chaotic search trajectory tends to converge to local minimum No. I. We derive 2-bifurcation parameter T2 using linear stability theory. Let the dynamics without the clipping function in Eq. (13a) be g n (x; s),5 that is, g n (x; s) = xn ? T f (x + xs) ? f (x ? 2 xsn xs) . (14)

(n = 1, . . . , N, A(x? ; s)

(17b)

Let the eigenvalues of be . . ., in descending order of their absolute values. Then, 2-bifurcation parameter T2 ( x*; s) is given by T2 (x? ; s) =
a (x ? ; s) . 1

2

(18)

Hence, as with the DGCM using exact gradient computation, the quasi-chaotic search trajectory tends to converge to the local minimum at which a (x? ; s) is smaller. The matrix A and the 1 2-bifurcation parameter T2 depend on the sign vector s. For example, let us reconsider the example (4). When s = (±1, ± 1)(= s1 ), A(x?1 ; s1 ) and T2 ( x*1 ; s1 ) are given by ptA(x?1 ; s1 ) = 111.9362 111.9362 72.5977 72.5977 , (19) T2 ( x*1 ; s2 ) are given by , (20) A(x?2 ; s
1)

T2 (x?1 ; s1 ) = 0.0108. When s = (±1, ? 1)(= s2 ), A(x?1 ; s2 ) and A(x?1 ; s2 ) = 81.9362 ?81.9362 ?42.5977 42.5977

T2 (x?1 ; s2 ) = 0.0161. For when s = (±1, ± 1)(= s1 ), given by A(x?2 ; s1 ) = 132.5723 132.5723 x*2 , and T2 ( x*2 ;

s1 ) are

162.6944 162.6944

, (21) T2 ( x*2 ; s2 ) are given

T2 (x?1 ; s1 ) = 0.00677. And when s = (±1, ? 1)(= s2 ), A(x?2 ; s2 ) and by A(x?2 ; s2 ) = 102.5723 ?102.5723 ?132.6944 132.6944 ,

Let us assume the objective function f (x) to be twice continuously differentiable in the remainder of this section (Section 4.1). The

T2 (x?2 ; s2 ) = 0.00850. (x*1 )

(22)

5 We discuss in the neighborhood of a local minimum. Hence, we can ignore because ?f (x(k))/?xn is suf?ciently small in the neighborhood.

Based on Fig. 2, we expect the actual T2 to be between 0.01098 and 0.01116, which is close to T2 (x*1 ; s1 ) and the actual T2 (x*2 ) to be between 0.00666 and 0.00684, which is close to T2 (x*2 ; s1 ). Thus, we expect the actual 2-bifurcation parameter to be close to the smaller of these, that is, the more unstable one.

1252

T. Okamoto, H. Hirata / Applied Soft Computing 13 (2013) 1247–1264

We can consider the following scenario to explain this fact. When T exceeds the 2-bifurcation parameter given by the most unstable search direction, i.e., s1 , the search trajectory given by s1 is destabilized. Meanwhile, the search trajectory given by s2 is still stable; therefore, the search point is pulled back to the local minimum by the search trajectory given by s2 . When T enlarges further, the pullback by the search trajectory given by s2 is not suf?cient against the destabilization by s1 , and so the behavior of the search trajectory changes from ?xed-point convergence to movement within the attracting region. In this section, we explain basic characteristics of the quasichaotic search trajectory under the assumption x(k) = x 1. The quasi-chaotic search trajectory has characteristics similar to the conventional chaotic search trajectory. In this case, as with the chaotic search trajectory using exact gradient computation, the quasi-chaotic search trajectory tends to converge to the most stable local minimum. However, we expect the introduction of improvements similar to those for the chaotic optimization method using an exact gradient to also be effective for the Q-COM, due to the similarities between these two methods. Furthermore, when x(k) is varied dynamically, the abovementioned tendency is diminished; even so, some improvements are required. This is explained in Section 4.3. 4.2. Multipoint type quasi-chaotic optimization method In this section, we propose the multipoint type Q-COM based on the M-COM. Let us consider multiple search points which implement global searching by using quasi-chaotic search trajectories. Then, these search points are advected to their elite search points by using a coupling structure in order to introduce an intensi?cation strategy which takes the objective function value into consideration, as with the M-COM. The speci?c search model of the proposed method is given by up (k + 1) = (1 ? c1 (k) ? c2 (k))g(xp (k)) + c1 (k)xpb (k) + c2 (k)xcb (k),
p p p

xcb (k) = argmin f (xq (k)) | q = 1, . . . , P ,
xq (k)

(23j)

c1 (k) = c2 (k) = cmax sin2 ymax , Tmax ,

2 k K

,

(23k) (23l) (23m)

xmax , cmax > 0, < 0.5, ˇ ? > 0.5.

0.5 < ˇ < 1, 0 <

Eqs. (23f), (23g), and (23m) follow the assumptions on the algorithm parameters for the proof of global convergence by Maryak and Chin [15]. The main search using the search model (23) is terminated when k = kmax . In addition, we implement a local search with the quasi-Newton method to ?nd a local minimum exactly. The local search starts from the best solution obtained during the main search. The gradient in the local search is computed by using the FDGA. The pseudo code of the proposed method including the local search is given in Appendix A. 4.3. Discussions about the proposed method In this paper, we explain the proposed method from the viewpoint of it being an improvement over the chaotic optimization method. In this section, we explain the proposed method considering some other perspectives. When we introduce the gradient approximation methods to the conventional chaotic optimization in order to perform optimization without gradient information, the ?rst and natural choice is the FDGA. However, if the FDGA is simply introduce to the chaotic optimization method to solve an optimization problem with a large number of decision variables, we expect the computational cost to be very high because N objective function evaluations are required for each update. A strong motivation for choosing the SPGA in the proposed method is computational cost, because only three (two for the approximated gradient and one for the objective function value) objective function evaluations are required for each update. The accuracy of the approximated gradient given by the SPGA is inferior to that of the FDGA. However, we consider the SPGA to be appropriate for the global search for the following reasons. The SPGA provides a descent direction at least, which is computed by randomly changing perturbation directions; therefore, diverse descent directions are provided for global search in the proposed method. In the global search phase, diversi?cation is important rather than accuracy of the search; accuracy of the search, i.e., the accuracy of the gradient, is needed in the local search phase. If the FDGA is used for the global search, the gradient accuracy is high, but the number of updates of search points is less than number of updates when using the SPGA with the same number of objective function evaluations. Updates of search points are important to ?nding better solutions to which search points are advected. Furthermore, as explained previously, behavior of the quasi-chaotic search trajectory generated by the approximated gradient dynamics with the SPGA is interestingly analogous to that of the chaotic search trajectory generated by the gradient dynamics with exact gradient computation. In numerical experiments (in Section 5), we con?rm that the proposed method, in which the SPGA is used for global search, is more effective than the conventional method with the FDGA. The search model of the proposed method without coupling, i.e., Eq. (23) with cmax = 0, satis?es the assumptions on the algorithm parameters for the proof of global convergence by Maryak and Chin [15]. That is, the global convergence of the parallel Q-COM which is the proposed method without the coupling structure is guaranteed under certain conditions. As mentioned previously, Maryak and Chin [15] insure global convergence in probability as kmax → ∞. In the actual implementation, kmax is ?nite, and thus

(23a)

xn (k + 1) = (un (k + 1)), where gn (xp (k)) = xn (k) ?
p

(23b)

T (k)

?f (xp (k)) ?xn
2
p x(k)sn (k)

, x(k)sp (k))

(23c)

?f (xp (k)) f (xp (k) + = ?xn

x(k)sp (k)) ? f (xp (k) ?

,

(23d)

(y) =

? ? ?ymax (y < ?ymax ), ? ? ?
ymax y Tmax (k + 1)ˇ xmax (k + 1) , (y > ymax ), (otherwise),

(23e)

T (k) = x(k) =

(23f)

,

(23g)

(un ) =

? l u l l u ? (un ? xn )mod(xn ? xn ) + xn (un > xn ), ? ? ?
u u l u (un ? xn )mod(xn ? xn ) + xn l (un < xn ),

(23h)

un
xp (l)

(otherwise), (23i)

p xpb (k)

= arg min f (xp (l)) | l = 1, . . . , k ,

T. Okamoto, H. Hirata / Applied Soft Computing 13 (2013) 1247–1264

1253

5 4 3 2 1 0 -1 -2 -3 -4 -5 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018
Local minimum No. I Local minimum No. II

5 4 3 2 1 0 -1 -2 -3 -4 -5 0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018

x1

x1

Local minimum No. I Local minimum No. II

ΔT

ΔT

Fig. 3. Bifurcation diagram of the approximated gradient dynamics (13) with x(k) given by Eq. (23g) for the example (4) ( xmax = 10.0, = 0.25, dynamic ymax = 100). The red points are the bifurcation diagram which starts from local minimum No. I: x*1 = (3.2779, ? 2.7325)T after 50,000 steps. (For interpretation of the references to color in this ?gure legend, the reader is referred to the web version of the article.)

Fig. 4. Bifurcation diagram of the coupled approximated gradient dynamics (24) for the example (4) ( xmax = 10.0, = 0.25, ymax = 100, c1 = 0.02). The red points are the bifurcation diagram which starts from local minimum No. I: x*1 = (3.2779, ? 2.7325)T . (For interpretation of the references to color in this ?gure legend, the reader is referred to the web version of the article.)

the parallel Q-COM method does not necessarily obtain a global optimal solution. Indeed, the numerical experiments in Section 5 show the parallel Q-COM, which also corresponds to the parallel SPSA method, does not obtain global optimal solutions perfectly. Meanwhile, the assurance of global convergence indicates the optimization method using the quasi-chaotic search trajectory has a potentiality for ?nding a global optimal solution. For example, we show a bifurcation diagram of the approximated gradient dynamics (13) with dynamic x(k) given by Eq. (23g) in Fig. 3. Fig. 3 shows the bifurcation diagram which starts from local minimum No. I after 50,000 steps. Remembering that local minimum No. II is the global minimum, we can see from Fig. 3 that although the search point starts from local minimum No. I, the quasi-chaotic search trajectory can ?nd the global minimum (local minimum No. II). This diagram shows that the quasi-chaotic search trajectory has a potentiality for ?nding a global optimal solution. However, this bifurcation diagram is obtained after suf?cient time. The bifurcation diagram is similar to the blue bifurcation diagram in Fig. 2. Thus, the attracting structure of the stable local minimum still remains. Therefore, if the sampling parameter T is decreased quickly, the search point may converge to local minimum No. I. Thus, even if x(k) is varied dynamically, some improvements, one of which is the introduction of the coupling, are required. Furthermore, we con?rm the effectiveness of the introduction of the coupling structure using the bifurcation diagram of the approximated gradient dynamics with coupling structure, that is, u(k + 1) = (1 ? c1 )g(x(k)) + c1 xpb (k), xn (k + 1) = (un (k + 1)), where gn (x(k)) = xn (k) ? T (24a) (24b)

disappears and the search point is attracted to the global minimum due to the introduction of the coupling structure. Thus, we con?rm the effectiveness of the introduction of the coupling structure, which is also con?rmed in the numerical experiments. Now, an intuitive illustration of the proposed method is given in Fig. 5. In the proposed method, each search point implements a global search autonomously using the quasi-chaotic search trajectory that has the potential to ?nd a global optimal solution. These search points are advected to elite search points (pbest and cbest) by the coupling structure and thereby diversely explore the most promising region intensively by concurrent, autonomous exploration, using the quasi-chaotic search trajectory. Thus, we expect the aforementioned stagnation of the search, which is a common drawback of meta-heuristics, does not tend to occur with the proposed method; therefore, we expect the proposed method to perform better than general meta-heuristics for high-dimensional and multi-peaked optimization problems. This is con?rmed in the numerical experiments.

5. Numerical experiments and comparisons with other methods In this section, we con?rm the effectiveness of the proposed method through application to several benchmark problems. In addition, we brie?y discuss comparisons of the proposed method to other major meta-heuristics on computational complexities.

pbest

?f (x(k)) ?xn

,

(24c)
cbest Quasi-Chaotic Search Trajectory Search Point
Fig. 5. Illustration of the proposed method. Solid lines denote the quasi-chaotic search trajectory. Dashed lines denote the coupling between search points and elite search points.

the nth component of the approximated gradient vector

?f (x(k))/?xn is given by Eqs. (23d) and (23g), the transformation
function for the toroidalization (un ) is given by Eq. (23h), the clipping function is given by Eq. (23e), and the pbest xpb (k) is given by Eq. (23i). c1 is ?xed as c1 = 0.02 until 100 steps before starting to draw the diagram, after which c1 = 0. Fig. 4 shows the bifurcation diagram which starts from local minimum No. I. As can be seen from Fig. 4, the attracting structure of the stable local minimum
pbest

1254

T. Okamoto, H. Hirata / Applied Soft Computing 13 (2013) 1247–1264

Table 1 Results for displaced Levy No. 5 function optimization problem (N = 100). Method Proposed method M-COM/FDGA Parallel Q-COM SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO C. R. 100 0 100 99 74 85 71 100 99 100 Average 0.00000 42.48065 0.00000 0.00001 0.04948 0.02333 0.02219 0.00000 0.00031 0.00000 Best 0.00000 19.60861 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 Worst 0.00000 70.00741 0.00001 0.00059 2.32558 2.01032 0.59184 0.00000 0.03079 0.00000 O. F. E. 151,828 157,978 155,279 150,275 154,365 154,894 154,890 150,145 150,145 150,235 Parameters Tmax = 0.2 Tmax = 0.02 Tmax = 0.1 = 0.25, a = 0.01 = 0.25, a = 1.0 c = 1.8, w = 0.7 ? CR = 0.5, F = 0.7 CR = 0.1, F = 0.7 CR = 0.4

Table 2 Results for displaced Griewank function optimization problem (N = 100). Method Proposed method M-COM/FDGA Parallel Q-COM SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO C. R. 100 100 100 35 28 32 26 87 63 97 Average 0.00000 0.00000 0.00000 0.00026 0.00069 0.00034 0.00033 0.00008 0.00010 0.00001 Best 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 Worst 0.00000 0.00000 0.00000 0.00119 0.04030 0.00197 0.00168 0.00325 0.00089 0.00020 O. F. E. 150,188 152,747 150,893 151,283 150,343 150,827 151,438 150,320 150,195 151,027 Parameters Tmax = 100, 000 Tmax = 100, 000 Tmax = 100, 000 = 0.101, a = = 1.0 = 0.25, a = 1.0 c = 1.8, w = 0.7 ? CR = 0.2, F = 0.5 CR = 0.1, F = 0.8 CR = 0.9

Table 3 Results for displaced Rosenbrock’s saddle function optimization problem (N = 100). Method Proposed method M-COM/FDGAa Parallel Q-COMa SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO
a

C. R. 4 0 0 0 0 0 0 1 0 0

Average 7.2096 156.2128 120.4765 249.3860 234.9814 167.4045 244.0399 39.3650 168.2660 193.9410

Best 0.0000 134.7016 0.0001 146.4273 154.3643 0.2001 86.8589 0.0000 16.0239 77.2043

Worst 74.2876 172.0828 332.1506 384.3690 383.3772 305.4808 361.9076 143.2795 265.5553 314.2208

O. F. E. 155,497 152,766 154,226 150,550 150,375 151,678 150,484 155,060 151,011 151,099

Parameters Tmax = 0.02 Tmax = 0.002 Tmax = 0.03 = 0.101, a = 1.0 = 0.101, a = 0.01 c = 2.0, w = 0.4 ? CR = 0.1, F = 0.5 CR = 0.3, F = 0.6 CR = 0.3

The brake function is used. Notice that the brake function is always used in the proposed method.

5.1. Benchmark problems We use the benchmark problems Prob. 1 to Prob. 5 in Appendix A of the literature [20]. The speci?c forms are given in Appendix B. These problems are based on typical benchmark problems which have multiple local minima; the drawbacks that they have as benchmark problems pointed out by Liang et al. [11,12] have been addressed. One of these drawbacks is that all variables have the same value at the global minimum. Another is that there is no linking among the variables. In Tables 1–25 , “displaced” means the position at the global minimum is displaced randomly for each variable. “rotated” means the decision variable vector is multiplied

by an orthogonal matrix in order to provide dependency among decision variables. The orthogonal matrix rotates the axes of the decision variables. ? in Tables 1–25 denotes the rotation angle. The displaced Levy No. 5 function, the displaced Griewank function, and displaced and rotated Rastrigin function are multi-peaked locally; they have vast number of local minima, but they have a big-valley structure globally. The displaced and rotated 2N -minima function has multiple local minima in the bounded search space, and the distance between each local minimum is large. The displaced Rosenbrock’s saddle function has multiple local minima if N ≥ 4. In this function, the continuation of search along curved descent direction is needed to obtain each local minimum.

Table 4 Results for displaced and rotated 2N -minima function optimization problem (N = 100, ? = /4). Method Proposed method M-COM/FDGA Parallel Q-COM SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO C. R. 1 2 0 0 0 0 0 0 0 1 Average 117.1539 72.4110 287.4952 418.9681 482.2238 312.6467 531.1425 310.4171 312.9343 162.9040 Best 0.0000 0.0000 56.5469 242.7740 256.9859 113.0938 233.7920 113.0938 119.9235 0.0000 Worst 197.9141 169.6406 702.0102 707.0683 707.8067 565.8249 941.4588 484.6500 622.1250 339.2813 O. F. E. 151,287 153,074 151,136 150,447 150,363 151,541 150,334 150,747 150,474 150,616 Parameters Tmax = 0.4 Tmax = 0.005 Tmax = 0.25 = 0.25, a = 1.0 = 0.25, a = 1.0 c = 0.2, w = 0.2 ? CR = 0.9, F = 0.5 CR = 0.2, F = 0.6 CR = 0.5

T. Okamoto, H. Hirata / Applied Soft Computing 13 (2013) 1247–1264 Table 5 Results for displaced and rotated Rastrigin function optimization problem (N = 100, ? = /4). Method Proposed method M-COM/FDGA Parallel Q-COM SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO C. R. 100 0 0 0 0 0 0 0 0 0 Average 0.0000 101.7046 7.8900 326.7279 445.7876 396.8590 459.0541 594.1524 259.0490 155.2064 Best 0.0000 69.6471 2.9849 198.9914 263.6634 266.6482 273.6128 416.8861 166.1579 96.9991 Worst 0.0000 135.3143 14.9244 547.2234 662.1771 639.7523 675.5725 804.9107 353.2092 244.7593 O. F. E. 151,407 152,887 151,901 151,382 151,191 151,514 151,365 151,396 151,552 151,517 Parameters

1255

Tmax = 0.1 Tmax = 0.015 Tmax = 0.1 = 0.101, a = 1.0 = 0.25, a = 1.0 c = 2.4, w = 0.8 ? CR = 0.9, F = 0.2 CR = 0.2, F = 0.5 CR = 0.1

Table 6 Results for noisy quartic function optimization problem (N = 100). Method Proposed method M-COM/FDGAa Parallel Q-COM SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO
a

C. R. 0 0 0 0 0 0 0 0 0 0

Average 37.9555 1485.5358 37.9964 39.1133 48.0309 48.4876 49.1554 47.8228 45.4422 40.9596

Best 34.8594 1212.0844 34.8366 35.9239 42.1056 40.0088 43.9959 43.5520 41.7427 37.3985

Worst 39.3698 1764.1247 39.5888 41.0979 52.8122 58.6146 54.6461 57.0393 49.6066 43.1431

O. F. E. 150,165 152,645 150,165 150,211 150,209 150,175 150,175 150,145 150,145 150,235

Parameters Tmax = 0.02 Tmax = 0.001 Tmax = 0.04 = 0.101, a = 0.01 = 0.101, a = 1.0 c = 0.2, w = 0.1 ? CR = 0.9, F = 0.6 CR = 0.2, F = 0.6 CR = 0.1

The brake function is used. Notice that the brake function is always used in the proposed method.

Table 7 Results for step function optimization problem (N = 100). Method Proposed method M-COM/FDGAa Parallel Q-COMa SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO
a

C. R. 100 0 0 0 0 0 0 27 55 36

Average 0.0000 503.0300 106.5300 83.0500 84.1600 43.3200 97.4800 1.3200 0.6300 1.1000

Best 0.0000 446.0000 32.0000 61.0000 51.0000 19.0000 65.0000 0.0000 0.0000 0.0000

Worst 0.0000 533.0000 176.0000 117.0000 114.0000 82.0000 149.0000 4.0000 3.0000 4.0000

O. F. E. 150,131 152,611 150,131 150,177 150,175 150,141 150,141 150,111 150,111 150,201

Parameters Tmax = 1.5 Tmax = 0.025 Tmax = 0.3 = 0.101, a = 0.01 = 0.101, a = 1.0 c = 2.0, w = 0.2 ? CR = 0.8, F = 0.9 CR = 0.1, F = 0.8 CR = 0.4

The brake function is used. Notice that the brake function is always used in the proposed method.

We also use the noisy quartic function optimization problem and the step function optimization problem [24]. The speci?c forms are also given in Appendix B. The former function is designed to test the behavior of a optimization method in the presence of noise. The latter function is not differentiable and has many plateaus. It is dif?cult to solve these problems using conventional gradient based methods, e.g., the conventional chaotic optimization methods.

5.2. Parameter settings We use the following parameter settings: P = 10, kmax = 5000,
n=1,...,N

cmax = 0.02,

K = 500,

ymax = 100, = 0.25. (25)

u l xmax = max {xn ? xn },

ˇ = 0.751,

Table 8 Results for displaced and rotated Rastrigin function optimization problem (N = 200, ? = /4). Method Proposed method M-COM/FDGA Parallel Q-COM SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO C. R. 68 0 0 0 0 0 0 0 0 0 Average 0.3781 671.9928 61.0481 1007.3448 1330.6046 1184.3603 1433.2286 1736.0205 707.9676 393.8611 Best 0.0000 491.5083 33.8286 718.0392 999.9286 983.0131 1111.4318 1315.1082 541.2557 293.5395 Worst 2.9849 883.5196 150.2387 1309.6080 1697.3848 1438.6988 1816.4871 2149.7096 868.0687 535.3414 O. F. E. 153,017 153,460 153,719 152,888 152,356 153,884 152,641 151,822 152,878 152,526 Parameters Tmax = 0.15 Tmax = 0.02 Tmax = 0.1 = 0.25, a = 1.0 = 0.25, a = 1.0 c = 0.2, w = 0.1 ? CR = 0.9, F = 0.1 CR = 0.2, F = 0.4 CR = 0.1

1256

T. Okamoto, H. Hirata / Applied Soft Computing 13 (2013) 1247–1264

Table 9 Results for displaced and rotated Rastrigin function optimization problem (N = 300, ? = /4). Method Proposed method M-COM/FDGA Parallel Q-COM SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO C. R. 2 0 0 0 0 0 0 0 0 0 Average 5.3230 1626.4490 175.1910 1860.2239 2364.8418 1784.8338 2627.6621 3051.6463 1267.4738 654.0449 Best 0.0000 1211.8546 109.4455 1506.3595 1802.8507 1594.9075 2113.2763 2566.9676 1025.7992 506.6913 Worst 18.9042 1996.8674 340.2756 2573.9445 2842.0419 2081.4358 3280.4675 3519.3621 1491.7347 807.8393 O. F. E. 155,245 155,415 155,290 154,586 153,454 156,086 153,677 152,244 153,731 152,920 Parameters Tmax = 0.15 Tmax = 0.02 Tmax = 0.1 = 0.101, a = 1.0 = 0.25, a = 1.0 c = 0.2, w = 0.1 ? CR = 0.9, F = 0.1 CR = 0.1, F = 0.3 CR = 0.1

Table 10 Results for displaced and rotated Rastrigin function optimization problem (N = 400, ? = /4). Method Proposed method M-COM/FDGA Parallel Q-COM SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO C. R. 0 0 0 0 0 0 0 0 0 0 Average 24.9297 2675.7713 320.1609 2836.0104 3294.0260 2404.6506 3435.3315 4430.0007 1900.3748 947.3216 Best 7.9597 2146.1123 209.9363 2382.3248 2671.4400 2175.9607 2665.1934 3708.1394 1563.0722 739.6269 Worst 48.7530 3032.6075 464.7354 3374.7632 3916.2963 2673.4305 4249.8873 4934.9331 2222.2790 1212.1310 O. F. E. 157,450 160,904 156,732 155,859 154,850 158,029 155,233 152,793 154,231 153,918 Parameters Tmax = 0.15 Tmax = 0.02 Tmax = 0.1 = 0.101, a = 1.0 = 0.25, a = 1.0 c = 0.2, w = 0.1 ? CR = 0.9, F = 0.1 CR = 0.1, F = 0.3 CR = 0.1

Table 11 Results for displaced and rotated Rastrigin function optimization problem (N = 500, ? = /4). Method Proposed method M-COM/FDGA Parallel Q-COM SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO C. R. 0 0 0 0 0 0 0 0 0 0 Average 57.8668 3741.1405 534.7506 3809.7282 4179.8680 3033.2792 4324.1369 5850.4850 2589.7903 1256.1631 Best 31.8387 3349.0008 391.0187 3266.4213 3491.2901 2837.6041 3878.3061 5340.7907 2230.6161 1031.0372 Worst 91.5362 4124.0664 756.1678 4287.2346 4853.3551 3267.4177 4902.4988 6360.3708 2989.3823 1598.1234 O. F. E. 159,335 161,202 158,504 156,799 155,927 159,557 155,500 153,556 154,968 154,587 Parameters Tmax = 0.15 Tmax = 0.02 Tmax = 0.1 = 0.25, a = 1.0 = 0.25, a = 1.0 c = 0.2, w = 0.1 ? CR = 0.9, F = 0.1 CR = 0.1, F = 0.2 CR = 0.1

Table 12 Results for displaced Levy No. 5 function optimization problem (N = 25). Method Proposed method M-COM/FDGA Parallel Q-COM SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO C. R. 100 100 100 100 97 99 93 100 100 100 Average 0.00000 0.00000 0.00000 0.00000 0.00369 0.00123 0.00862 0.00000 0.00000 0.00000 Best 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 Worst 0.00000 0.00000 0.00000 0.00000 0.12316 0.12316 0.12316 0.00000 0.00000 0.00000 O. F. E. 38,106 39,547 39,354 37,656 37,656 38,998 37,601 37,570 37,570 37,660 Parameters Tmax = 0.1 Tmax = 0.005 Tmax = 0.05 = 0.25, a = 0.01 = 0.25, a = 1.0 c = 2.2, w = 0.5 ? CR = 0.5, F = 0.7 CR = 0.1, F = 0.7 CR = 0.3

Table 13 Results for displaced Griewank function optimization problem (N = 25). Method Proposed method M-COM/FDGA Parallel Q-COM SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO C. R. 100 100 100 0 1 9 0 77 60 72 Average 0.00000 0.00000 0.00000 0.00858 0.01021 0.00421 0.01060 0.00030 0.00042 0.00021 Best 0.00000 0.00000 0.00000 0.00059 0.00000 0.00000 0.00099 0.00000 0.00000 0.00000 Worst 0.00000 0.00000 0.00000 0.05025 0.05005 0.04475 0.06583 0.00395 0.00493 0.00258 O. F. E. 37,602 39,373 38,049 37,692 37,702 38,117 37,649 37,650 37,720 37,994 Parameters Tmax = 25, 000 Tmax = 25, 000 Tmax = 25, 000 = 0.101, a = 0.01 = 0.25, a = 0.01 c = 1.8, w = 0.5 ? CR = 0.3, F = 0.5 CR = 0.1, F = 0.5 CR = 0.6

T. Okamoto, H. Hirata / Applied Soft Computing 13 (2013) 1247–1264 Table 14 Results for displaced Rosenbrock’s saddle function optimization problem (N = 25). Method Proposed method M-COM/FDGAa Parallel Q-COMa SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO
a

1257

C. R. 0 0 0 0 0 1 0 7 3 0

Average 13.8297 20.7839 28.9494 30.7204 25.5509 15.4457 31.4606 5.5964 34.4399 25.9722

Best 0.1585 0.0148 0.0006 0.0032 0.0070 0.0000 6.7893 0.0000 0.0000 0.6101

Worst 74.9665 67.7821 102.5543 75.7773 71.1084 70.6825 75.4009 11.0112 94.9897 69.0346

O. F. E. 39,103 39,591 38,514 37,830 37,954 38,222 37,794 37,935 38,654 37,843

Parameters Tmax = 0.015 Tmax = 0.001 Tmax = 0.02 = 0.101, a = 1.0 = 0.25, a = 0.01 c = 2.0, w = 0.4 ? CR = 0.8, F = 0.9 CR = 0.3, F = 0.5 CR = 0.8

The brake function is used. Notice that the brake function is always used in the proposed method.

Table 15 Results for displaced and rotated 2N -minima function optimization problem (N = 25, ? = /4). Method Proposed method M-COM/FDGA Parallel Q-COM SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO C. R. 63 59 28 2 5 11 4 21 27 52 Average 14.6818 12.8115 46.1758 74.2491 90.0348 68.3100 83.2138 41.6039 38.8682 18.7153 Best 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 Worst 104.8276 56.5469 193.2368 167.8846 207.9362 181.2246 235.9038 141.3919 141.3672 141.3672 O. F. E. 37,965 39,408 37,942 37,818 37,691 37,940 37,641 37,800 37,895 37,893 Parameters Tmax = 0.5 Tmax = 0.005 Tmax = 0.1 = 0.101, a = 1.0 = 0.101, a = 1.0 c = 2.0, w = 0.9 ? CR = 0.8, F = 0.7 CR = 0.3, F = 0.8 CR = 0.7

Table 16 Results for displaced and rotated Rastrigin function optimization problem (N = 25, ? = /4). Method Proposed method M-COM/FDGA Parallel Q-COM SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO C. R. 100 0 21 0 0 0 0 0 0 0 Average 0.0000 18.1978 1.9700 46.4311 55.5579 50.9649 54.2649 55.8906 36.0203 25.9056 Best 0.0000 9.9496 0.0000 18.9042 20.8941 27.8588 25.8689 27.8588 12.9345 10.9445 Worst 0.0000 34.8235 14.9244 95.5157 126.3591 80.8411 106.4598 102.4804 73.6267 54.7226 O. F. E. 38,005 39,384 38,074 38,019 37,915 38,032 37,935 37,987 38,021 38,108 Parameters Tmax = 0.2 Tmax = 0.015 Tmax = 0.2 = 0.25, a = 1.0 = 0.25, a = 1.0 c = 1.8, w = 0.9 ? CR = 0.9, F = 0.6 CR = 0.4, F = 0.6 CR = 0.2

Table 17 Results for noisy quartic function optimization problem (N = 25). Method Proposed method M-COM/FDGAa Parallel Q-COM SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO
a

C. R. 0 0 0 0 0 0 0 0 0 0

Average 7.0111 50.4257 7.0341 7.4206 7.8755 7.4625 8.0588 7.8654 7.7570 7.3863

Best 5.4236 22.3608 5.4218 5.7031 6.5232 5.8869 6.3763 6.3062 6.4486 5.6252

Worst 7.7891 69.2608 7.7781 8.3394 8.8615 9.7100 9.3860 9.2283 8.9000 8.2331

O. F. E. 37,590 39,320 37,590 37,656 37,654 37,600 37,600 37,570 37,570 37,660

Parameters Tmax = 0.04 Tmax = 0.002 Tmax = 0.07 = 0.101, a = 0.01 = 0.101, a = 1.0 c = 0.2, w = 0.1 ? CR = 0.8, F = 0.5 CR = 0.4, F = 0.5 CR = 0.1

The brake function is used. Notice that the brake function is always used in the proposed method.

Table 18 Results for step function optimization problem (N = 25). Method Proposed method M-COM/FDGAa Parallel Q-COMa SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO
a

C. R. 100 0 3 0 0 28 0 78 88 100

Average 0.0000 114.9300 9.7600 5.7600 5.1900 1.6300 4.9600 0.2800 0.1300 0.0000

Best 0.0000 89.0000 0.0000 2.0000 1.0000 0.0000 2.0000 0.0000 0.0000 0.0000

Worst 0.0000 135.0000 37.0000 9.0000 11.0000 11.0000 9.0000 3.0000 2.0000 0.0000

O. F. E. 37,556 39,286 37,556 37,622 37,620 37,566 37,566 37,536 37,536 37,626

Parameters Tmax = 2.0 Tmax = 0.04 Tmax = 0.9 = 0.25, a = 0.01 = 0.25, a = 1.0 c = 2.0, w = 0.1 ? CR = 0.7, F = 0.9 CR = 0.3, F = 0.9 CR = 0.5

The brake function is used. Notice that the brake function is always used in the proposed method.

1258

T. Okamoto, H. Hirata / Applied Soft Computing 13 (2013) 1247–1264

Table 19 Results for displaced Levy No. 5 function optimization problem (N = 50). Method Proposed method M-COM/FDGA Parallel Q-COM SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO C. R. 100 47 100 100 77 93 83 100 100 100 Average 0.00000 1.12751 0.00000 0.00000 0.01663 0.00431 0.01416 0.00000 0.00000 0.00000 Best 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 Worst 0.00000 6.99071 0.00000 0.00000 0.12311 0.12313 0.24617 0.00000 0.00000 0.00000 O. F. E. 75,244 79,358 77,724 75,161 75,458 77,952 75,305 75,095 75,095 75,185 Parameters Tmax = 0.1 Tmax = 0.01 Tmax = 0.05 = 0.25, a = 0.01 = 0.101, a = 1.0 c = 2.2, w = 0.3 ? CR = 0.1, F = 0.6 CR = 0.1, F = 0.8 CR = 0.3

Table 20 Results for displaced Griewank function optimization problem (N = 50). Method Proposed method M-COM/FDGA Parallel Q-COM SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO C. R. 100 100 100 13 13 25 10 84 62 90 Average 0.00000 0.00000 0.00000 0.00148 0.00101 0.00137 0.00139 0.00013 0.00028 0.00004 Best 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 Worst 0.00000 0.00000 0.00000 0.02219 0.00641 0.04142 0.00651 0.00316 0.00276 0.00049 O. F. E. 75,127 77,147 75,209 75,239 75,215 75,198 75,209 75,211 75,145 75,194 Parameters Tmax = 50, 000 Tmax = 50, 000 Tmax = 100, 000 = 0.25, a = 1.0 = 0.25, a = 1.0 c = 2.0, w = 0.5 ? CR = 0.1, F = 0.5 CR = 0.1, F = 0.6 CR = 0.4

These parameters are ?xed at these values for all problems. Total number of objective function evaluations is 150, 000 = (P × kmax × 3). xu and xl are a priori given for each benchmark problem. We only tune Tmax to generate quasi-chaotic trajectories for each problem. In the local search, the gradient is computed by using the FDGA with x = 10?6 . The local search terminates if the norm of the gradient approximated using the FDGA satis?es ? ? f (x) < 10?8 or the number of updates exceeds 100. 5.3. Brake function The proposed method assumes that global minima are located within the bounded search space. However, optimal solutions are on or adjacent to the boundary in several problems. In these cases, the proposed method needs to evaluate solutions that are on or adjacent to the boundary, and therefore the proposed method cannot obtain the optimal solution using only the toroidalization of search spaces. Consequently, a brake function b(xn ) =
l u (xn ? xn )(xn ? xn ) u l xn ? xn

(26)

is multiplied by ?f (xp (k))/?xn in Eq. (23c) in order to evaluate such solutions. Use of the brake function is optional. However, we apply the brake function to the proposed method in all problems irrespective of the necessity of the brake function. 5.4. Comparison with other methods We ?rst compare the proposed method to the conventional method (M-COM) with FDGA. As mentioned previously, this method corresponds to the case where the conventional method is naively applied without gradient information. In Tables 1–25, these results are shown in the rows labeled “M-COM/FDGA”. The search model is given by Eq. (6), but ?f (xp (k))/?xn is replaced by the ? approximated gradient with the FDGA ?f (xp (k))/?xn given by Eq. (7). We use P = 10, cmax = 0.02 as with the proposed method. For x, we use x = 10?6 . For kmax and K, we use kmax = 150, 000/(P × N), and K = kmax /10 to equalize the computational cost. For example, when N = 100 and P = 10, kmax = 150 and K = 15. We only tune Tmax

to generate chaotic trajectories for each problem. We apply the MCOM/FDGA with the brake function and without the brake function; the better results are shown in Tables 1–25. In order to con?rm the effectiveness of the introduction of the coupling structure, we also compare the results of the proposed method to the results of the proposed method without coupling, i.e., the results when cmax = 0.0 in the proposed method. These results correspond to the results of running the Q-COM in parallel. These results also correspond to the results of running the SPSA method in parallel. In Tables 1–25, these results are shown in the rows labeled “parallel Q-COM”. We use same parameters as used in the proposed method, but we only tune Tmax for each problem, in order to generate quasi-chaotic trajectories and obtain better results. We apply the parallel Q-COM with the brake function and without the brake function; the better results are shown in Tables 1–25. The proposed method may be deemed an incorporation of the SPSA method into a PSO-like method, since the proposed method is a multipoint search method and uses PSO-like elite points and the SPGA. For comparison regarding the issue, we also apply stochastic approximation driven PSO (SAD-PSO) proposed by Kiranyaz et al. [9]. In SAD-PSO, the SPSA method is introduced in order to prevent stagnation of the gbest update. Kiranyaz et al. [9] proposed two approaches regarding the introduction of the SPSA method. In one approach (this is called “A1”), normal particles are updated according to the PSO update formula, but the particle whose pbest is the global best (gbest) is updated using the SPSA method. In the other approach (this is called “A2”), all particles are updated according to the PSO update formula. Then, the arti?cial gbest is created by updating of the gbest using the SPSA method, and the gbest is updated to the arti?cial gbest if and only if the arti?cial gbest is evaluated better than the gbest. According to Kiranyaz et al. [9], SAD-PSO shows better performance than the simple PSO and the SPSA method.6 For the parameters of SAD-PSO, we use the same

6 In the literature [9], the SAD-PSO based on the multi-dimensional PSO [10] is also proposed. In this paper, we use the SAD-PSO based on the basic PSO because the multi-dimensional PSO is constructed under the assumption that the objective function can be separate for each variable, that is, there is no linking among the variables.

T. Okamoto, H. Hirata / Applied Soft Computing 13 (2013) 1247–1264 Table 21 Results for displaced Rosenbrock’s saddle function optimization problem (N = 50). Method Proposed method M-COM/FDGAa Parallel Q-COMa SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO
a

1259

C. R. 1 0 1 0 0 0 0 4 1 0

Average 11.9745 73.5964 79.9313 97.6735 96.3550 55.0024 99.6884 3.1045 83.4823 82.7555

Best 0.0000 55.5767 0.0000 47.6706 39.9870 0.0020 28.4816 0.0000 0.0000 0.1312

Worst 123.1965 79.6246 243.9566 162.2935 163.4452 148.3592 164.9505 69.2935 315.0845 154.5238

O. F. E. 78,011 77,165 77,106 75,326 75,450 76,689 75,389 78,093 77,600 76,697

Parameters Tmax = 0.02 Tmax = 0.002 Tmax = 0.03 = 0.101, a = 1.0 = 0.101, a = 0.01 c = 1.6, w = 0.7 ? CR = 0.1, F = 0.6 CR = 0.1, F = 0.3 CR = 0.2

The brake function is used. Notice that the brake function is always used in the proposed method.

Table 22 Results for displaced and rotated 2N -minima function optimization problem (N = 50, ? = /4). Method Proposed method M-COM/FDGA Parallel Q-COM SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO C. R. 15 23 6 0 0 0 0 1 1 12 Average 47.7529 36.5015 117.9072 208.7256 221.4789 216.4537 255.8928 167.7428 182.2198 54.8201 Best 0.0000 0.0000 0.0000 32.8768 56.5469 30.2843 113.0938 0.0000 0.0000 0.0000 Worst 189.4247 84.8203 502.7860 378.9568 392.3218 641.9637 469.0208 298.8770 420.6332 227.1449 O. F. E. 75,653 77,344 75,678 75,359 75,265 75,771 75,212 75,372 75,530 75,531 Parameters Tmax = 0.5 Tmax = 0.005 Tmax = 0.1 = 0.25, a = 1.0 = 0.25, a = 1.0 c = 0.2, w = 0.1 ? CR = 0.7, F = 0.5 CR = 0.1, F = 0.6 CR = 0.6

parameter values as used by Kiranyaz et al. [9], except the two parameters related to the SPSA, and a, which are tuned for each problem to obtain better results. In addition, we apply other major meta-heuristics. One is PSO [7,8]. We apply the simple PSO (S-PSO) and the PSO with the inertia weight approach (PSO-IWA). In S-PSO, the number of particles is set to 20. The search capability of S-PSO depends on two parameters: the constriction factors c and the inertia weight w. Consequently, we apply S-PSO varying independently both c and w; the best results are shown in Tables 1–25. c is varied from 0.2 to 2.4 in increments of 0.2. w is varied from 0.1 to 0.9 in increments of 0.1. In PSO-IWA, the number of particles is set to 20, the constriction factors are set to 2.0, and the inertia weight is gradually decreased from 0.9 to 0.4, which are the generally recommended settings. Another compared method is DE [24,22]. For implementation of DE, we use DE/rand/1/exp and DE/rand/1/bin. The number of search points is set to 10. The search capability of DE depends on two parameters: the crossover rate CR and the scaling factor F. Consequently, we apply DE varying independently both CR and F; the best results are shown in Tables 1–25. CR and F are varied from 0.1 to 0.9 in increments of 0.1. Furthermore, we also apply a hybrid DE with biogeography-based optimization (DE/BBO) proposed by Gong et al. [6]. In DE/BBO, the biogeography-based migration operator of the BBO technique is introduced into DE in order to enhance exploitation of the obtained solution. According to Gong et al. [6],

DE/BBO shows better performance than the simple DE, the simple BBO, and other DE-based methods. For the parameters of DE/BBO, we use the same parameter values as used by Gong et al. [6], except CR. We apply DE/BBO varying CR from 0.1 to 0.9 in increments of 0.1; the best results are shown in Tables 1–25. The local search by the same implementation used in the proposed method, which starts from the best solution obtained during the main search, is also implemented in all compared methods. Results after the local search are shown in Tables 1–25. Computational costs of all methods are tuned almost equal to those of the proposed method in terms of the number of objective function evaluations (O. F. E.) in main search procedure. For example, for DE, maximum iteration is set to 15,000 (150,000/number of search points (= 10)). Notice that differences in O. F. E. among all methods in Tables 1–25 is caused by computational costs of the local search. The local search is performed by the same implementation; however, its computational costs are not forecasted perfectly, because its starting points are different.

5.5. Results First, we con?rm the effectiveness of the proposed method through application to benchmark problems with 100 variables (N = 100). The proposed methods and the methods for comparison

Table 23 Results for displaced and rotated Rastrigin function optimization problem (N = 50, ? = /4). Method Proposed method M-COM/FDGA Parallel Q-COM SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO C. R. 100 0 2 0 0 0 0 0 0 0 Average 0.0000 49.7181 4.2632 127.7751 158.4804 140.3371 161.0319 194.8916 99.6931 64.0405 Best 0.0000 29.8488 0.0000 60.6924 66.6622 75.6168 95.5159 132.3293 56.7126 28.8538 Worst 0.0000 89.5462 16.9143 247.8484 309.4291 222.8695 279.6590 276.5966 158.1979 108.3201 O. F. E. 75,855 77,203 75,983 75,801 75,666 75,857 75,761 75,819 75,897 75,899 Parameters Tmax = 0.2 Tmax = 0.015 Tmax = 0.15 = 0.25, a = 1.0 = 0.25, a = 1.0 c = 2.4, w = 0.8 ? CR = 0.9, F = 0.5 CR = 0.3, F = 0.5 CR = 0.1

1260

T. Okamoto, H. Hirata / Applied Soft Computing 13 (2013) 1247–1264

Table 24 Results for noisy quartic function optimization problem (N = 50). Method Proposed method M-COM/FDGAa Parallel Q-COM SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO
a

C. R. 0 0 0 0 0 0 0 0 0 0

Average 16.8999 286.9200 16.9621 17.7058 19.9640 19.0767 20.2639 19.6853 19.2966 18.0363

Best 14.9092 182.0836 14.8982 16.3539 17.2848 17.0351 18.0540 17.8014 17.2363 15.9292

Worst 17.9539 368.2019 17.9519 19.0124 22.8354 21.9619 22.4595 23.1057 21.6241 19.1471

O. F. E. 75,115 77,095 75,115 75,161 75,159 75,125 75,125 75,095 75,095 75,185

Parameters Tmax = 0.04 Tmax = 0.002 Tmax = 0.06 = 0.101, a = 0.01 = 0.101, a = 1.0 c = 0.2, w = 0.1 ? CR = 0.9, F = 0.6 CR = 0.2, F = 0.5 CR = 0.1

The brake function is used. Notice that the brake function is always used in the proposed method.

Table 25 Results for step function optimization problem (N = 50). Method Proposed method M-COM/FDGAa Parallel Q-COMa SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO
a

C. R. 100 0 0 0 0 0 0 60 86 92

Average 0.0000 242.7500 37.6900 22.6800 20.4100 11.4200 19.9800 0.5400 0.1500 0.0800

Best 0.0000 217.0000 7.0000 12.0000 11.0000 3.0000 12.0000 0.0000 0.0000 0.0000

Worst 0.0000 266.0000 75.0000 33.0000 30.0000 21.0000 34.0000 5.0000 2.0000 1.0000

O. F. E. 75,081 77,061 75,081 75,127 75,125 75,091 75,091 75,061 75,061 75,151

Parameters Tmax = 1.5 Tmax = 0.02 Tmax = 0.5 = 0.101, a = 0.01 = 0.25, a = 1.0 c = 2.0, w = 0.5 ? CR = 0.5, F = 0.9 CR = 0.1, F = 0.9 CR = 0.6

The brake function is used. Notice that the brake function is always used in the proposed method.

are performed 100 times, randomly resetting the initial points. The results are shown in Tables 1–7. The columns are ? Method: Name of methods. ? C. R.: Convergence rate to a global minimum. Let xo be the global minimum and x* be the best solution obtained in each trial. We consider a global minimum to have been obtained if f (x* ) ? f (xo ) < 0.0001 is satis?ed. ? Average: Average of f (x* ) ? f (xo ) in 100 trials. ? Best: Best value of f (x* ) ? f (xo ) in 100 trials. ? Worst: Worst value of f (x* ) ? f (xo ) in 100 trials. ? O. F. E.: Average number of objective function evaluations. ? Parameter: Parameter settings speci?c to the problem. The bold font in the results shown in the tables denotes that the evaluation is the best among all methods. As shown, the proposed method obtains a global minimum in all problems except for the noisy quartic function optimization problem. Furthermore, the proposed method is superior or equivalent to other methods in all problems. Speci?cally, the proposed method is superior to M-COM/FDGA in all problems except for the displaced and rotated 2N -minima function optimization problem. This suggests that the introduction of the SPGA to the chaotic optimization method is more effective for global search than that of the FDGA. Furthermore, M-COM/FDGA is not effective for the noisy quartic function optimization problem and the step function optimization problem. As explained previously, M-COM/FDGA corresponds to the conventional chaotic optimization method. It is dif?cult to solve these problems using conventional gradient based methods. The proposed method can solve these problems, because the gradient approximated by the SPGA, which is not necessarily exact but provides descent directions, is used instead of the exact (and conventional) gradient given by the FDGA. The proposed method is superior to the parallel Q-COM in all problems except for the displaced Levy No. 5 function optimization problem and the displaced Griewank function optimization

problem. For these problems, the proposed method is equivalent to the parallel Q-COM. Furthermore, the transitions of average of the best objective function value f ( x* ) obtained by the proposed method and the parallel Q-COM with respect to objective function evaluations for the displaced and rotated Rastrigin function optimization problem are shown in Fig. 6. Notice that the transitions consist of f ( x* ) before the local search. The parallel Q-COM converges to the better solution gradually; however, its convergence speed is very slow. Meanwhile, in the proposed method, the search points are attracted to the promising region where the objective function value is smaller due to advective couplings to the elite search points and thereby diversely explore the promising region
2200 2000 1800

Parallel Q-COM

800 600 400 200 0 0 20000 40000 60000 80000 100000 120000 140000

Proposed Method

Objective Function Evaluations
Fig. 6. Average of the best objective function value f (x* ) obtained by the proposed method and the parallel Q-COM with respect to objective function evaluations for the displaced and rotated Rastrigin function optimization problem. Blue plots are the transition of the proposed method. Red plots are the transition of the parallel Q-COM. Notice that these plots consist of f (x* ) before the local search. (For interpretation of the references to color in this ?gure legend, the reader is referred to the web version of the article.)

T. Okamoto, H. Hirata / Applied Soft Computing 13 (2013) 1247–1264

1261

Table 26 Estimations of computational costs per objective function evaluation. “Comp.” is the comparison operations such as <, >, ≤, and ≥. “Rand” is the random generators. “Mod” is the ?oating-point modulo operation. “Pow” is the ?oating-point power operation. “Sort” is a sorting function. N is number of decision variables. P is number of search points. All entries include computational costs to check whether updated search points are within the bounded search space. In order to estimate computational complexity of SAD-PSO (A1), P = 38 is used. For SAD-PSO (A2), P = 37 is used. These values are recommended values by Kiranyaz et al. [9] and used in the numerical experiments of this paper. Method Proposed method SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO + 1.33N 2.8N 2.8N 3N 3N N N (P + 1)N ? 1.67N 1.9N 1.9N 2N 2N N N N × 3N 4.73N 4.73N 5N 5N N N N / 0.33N <1 <1 Comp. 1.67N 3.78N 3.78N 4N 4N 3N 3N (P + 4)N Rand 0.33N 1.88N 1.88N 2N 2N N N 3N Sin <1 Mod 0.33N Pow <1 <1 <1 Sort

PN

<1

intensively by concurrent, autonomous exploration, by using the quasi-chaotic search trajectory. Eventually, the solution which is located in a neighborhood of the optimal solution is obtained. These results suggest that the introduction of the coupling structure is effective for the Q-COM. These results also suggest that the proposed method is superior to the simple SPSA method in parallel. The proposed method is superior to SAD-PSO in all problems. In SAD-PSO, the SPSA method is used supplementarily to prevent stagnation of the gbest update, that is, the global search capability of SAD-PSO mainly depends on the PSO search strategy. Hence, stagnation due to the attraction to elite search points, which is a common drawback of meta-heuristics, is not suf?ciently improved in SAD-PSO. In the proposed method, each search point autonomously implements a global search driven by the gradient dynamics with the SPGA. Such stagnation does not tend to occur using the proposed method; therefore, in this sense, the proposed method performs better than SAD-PSO. Comparisons to other major meta-heuristics also suggest the superiority of the proposed method. Furthermore, we con?rm the effectiveness of the proposed method in high-dimensional problems. We use the displaced and rotated Rastrigin function optimization problem as the benchmark problem in this numerical experiment. The number of variables is set as N = 200, N = 300, N = 400, and N = 500. The proposed methods and the methods for comparison are performed 100 times, randomly resetting the initial points. The results are shown in Tables 8–11. As can be seen from the results, the proposed method is effective in high-dimensional problems relative to other methods.

In addition, we con?rm the effectiveness of the proposed method in low-dimensional problems. We use all benchmark problems given in Appendix B. The number of variables is set as N = 25 and N = 50. In this numerical experiment, total number of objective function evaluations (O. F. E.) is varied with decrease of number of variables. For N = 25, total number of O. F. E. is set as 37,500. For N = 50, total number of O. F. E. is set as 75,000. In order to deal with decrease of total number of O. F. E., kmax , i.e. maximum iteration is decreased in the proposed method. K is set to K = kmax /10 according to the variation of kmax . is also tuned to conform to the search process in the case of = 0.25 and kmax = 5000 (N = 100). Speci?cally, = 0.3025 for kmax = 1250 (N = 25), and = 0.275 for kmax = 2500 (N = 50). Transitions of x(k) under above parameter sets are shown in Fig. 7. ˇ is set to ˇ = + 0.501 according to the variation of . Thus, for N = 25, we use the following parameter settings: kmax = 1250, K = 125, ˇ = 0.8035, = 0.3025. (27)

For N = 50, we use the following parameter settings: kmax = 2500, K = 250, ˇ = 0.776, = 0.275. (28)

Other parameters are same as Eq. (25). In compared methods, maximum iteration is tuned to equalize the computational cost. The proposed methods and the compared methods are performed 100 times, randomly resetting the initial points. The results are shown in Tables 12–25. As shown, the proposed method is superior or equivalent to other methods in all problems except the displaced Rosenbrock’s saddle function optimization problem and displaced and rotated 2N -minima function optimization problem with N = 50.

Table 27 Computational costs per iteration. O. F. E. is the number of objective function evaluations. Method O. F. E. + Rand 4PN + 2 PN (3P ? 2)N + 3 (2P ? 1)N (3P + 1)N + 3 (2P + 1)N 3PN 2PN 3PN 2PN PN PN + 3P PN PN + 3P (P2 + P)N ? P2 3PN + 3P ? Sin 5PN + P + 2 1 2PN + 2 (2P + 2)N + 2 2PN 2PN + 1 PN PN PN × Mod (9P + 2)N + 4 PN (5P ? 1)N + 4 (5P + 4)N + 4 5PN 5PN + 1 PN PN PN P2 N ? P2 / Pow PN 2 1 2 1 2 Comp. Sort 5PN + 2P (4P ? 1)N + 2P + 2 (4P + 3)N + 2P + 4 4PN + 2P 4PN + 2P 3PN + P 3PN + P (P2 + 4P)N ? P2 1

Proposed method SAD-PSO (A1) SAD-PSO (A2)

3P P+2 P+3 P P P P P

S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO

1262

T. Okamoto, H. Hirata / Applied Soft Computing 13 (2013) 1247–1264 Table 28 Computational time when solving displaced and rotated Rastrigin function optimization problem (N = 100) under the following environment – CPU: Intel Core i7 975 3.3 GHz, OS: Scienti?c Linux 6.2 64 bit, Compiler: GNU g++4.4.6. Method Proposed method SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO Computational time (s) 1.1232 1.3657 1.3623 1.3986 1.4085 1.1755 1.2732 3.4408

×Δxmax
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0.2 0.4 0.6 0.8 1

γ = 0.25, kmax = 5, 000 γ = 0.275, kmax = 2, 500 γ = 0.3025, kmax = 1, 250

k
Fig. 7. Transitions of x(k) under and = 0.3025 and kmax = 1250. = 0.25 and kmax = 5000,

×kmax

Table 29 Number of real variables needed to run the optimization algorithm. N is number of decision variables. P is number of search points. Method Number of real values (4P + 1)N + 2P + 11 (3P + 3)N + 2P + 14 (3P + 5)N + 2P + 15 (3P + 1)N + 2P + 3 (3P + 1)N + 2P + 4 3PN + 2P + 2 3PN + 2P + 2 3PN + 5P + 3 + Number of real variables to implement the sorting function

= 0.275 and kmax = 2500,

In these problems, the proposed method is ranked second in Average, and the difference of performance from the top result is not larger than that of other methods. Thus, the proposed method is effective in low-dimensional problems relative to other methods. 5.6. Computational complexity In this section, we discuss computational complexities of the proposed method and the compared methods. First, we discuss computational costs. In the comparison of global optimization methods for optimization problems with continuous variables, the number of objective function evaluations (O. F. E.) is often used as a measure of computational costs, because it is assumed that the computational complexity to evaluate an objective function value is much larger than the complexity to implement the optimization algorithm itself. Indeed, many literatures, which include the literatures about the compared methods [9,7,8,24,22,6], employ the number of O. F. E. as the measure under the assumption. In the numerical experiments of this paper, all methods are compared under the equality of the number of O. F. E. in main search procedure. Consequently, we discuss computational costs of the optimization methods, i.e., overheads per objective function evaluation. Since optimization methods with continuous variables are considered, we focus on operations with respect to real (continuous) variables. The estimations of the worst computational costs per objective function evaluation are shown in Table 26. Speci?cally, ?rst, operations with respect to real variables, such as the addition operation, the subtraction operation, the multiplication operation, the division operation, and the comparison operation, and O. F. E. are counted per iteration of the optimization algorithm. The result is shown in Table 27. Then, the number of operations is divided by O. F. E., and constant terms are eliminated, because constant terms do not affect the computational cost mainly against size of the problem. As can be seen from Table 26, the proposed method is not much inferior to other methods. Indeed, the computational time is not much different from the other methods as shown in Table 28. In addition, we discuss computational resources. In order to estimate computational resources roughly, number of real variables needed to run the optimization algorithm is shown in Table 29. As can be seen from Table 29, the proposed method is not also much inferior to other methods regarding computational resources. In conclusion, computational complexities of the proposed method are not much different from other methods.

Proposed method SAD-PSO (A1) SAD-PSO (A2) S-PSO PSO-IWA DE/rand/1/exp DE/rand/1/bin DE/BBO

6. Conclusion In this paper, we proposed a new global optimization method called the multipoint type quasi-chaotic optimization method. In the proposed method, simultaneous perturbation gradient approximation is introduced into a multipoint type chaotic optimization method in order to carry out optimization without gradient information. We explained the similarities between the proposed method and the chaotic optimization method using bifurcation diagrams and linear stability theory. We also explained the global search capability of the proposed method and the effectiveness of the introduction of the advective couplings to the elite search points. Then, we con?rmed the effectiveness of the proposed method through application to several unconstrained multi-peaked optimization problems with 100 or more variables by comparing to the conventional chaotic optimization method with the ?nite difference gradient approximation, the parallel quasichaotic optimization method, and other major meta-heuristics. As mentioned previously, global convergence of the parallel quasi-chaotic optimization method is guaranteed. However, we do not obtain theoretical results for global convergence of the proposed method in which the coupling structure is introduced into the quasi-chaotic optimization method. Meanwhile, we con?rmed the effectiveness of the introduction of the coupling structure through numerical experiments. In the future, we will investigate theoretical results for global convergence of the proposed method. Then, we will investigate theoretically the effects of the introduction of the coupling structure to the parallel Q-COM, i.e., the SPSA method in parallel. In addition, the initial value of the sampling parameter Tmax has to be tuned so that the proposed method performs effectively. Tmax should be tuned to generate quasi-chaotic trajectories for each problem. In Section 4.1, we explained 2-bifurcation parameter can be obtained through linear stability theory. If 2-bifurcation parameter with respect to a local minimum can be known, magnitude of Tmax can be estimated. As shown in Figs. 2–4, Tmax may be estimated as several times 2-bifurcation parameter. Thus, we will investigate the development of an autonomous tuning method to set Tmax based on the consideration.

T. Okamoto, H. Hirata / Applied Soft Computing 13 (2013) 1247–1264

1263

Appendix A. Pseudo-code for the proposed method
Step 1 Initialization Set the parameters: xl , xu , P, kmax , cmax , K, xmax , ˇ, , ymax . Initialize all search points randomly: for p = 1 to P do p l u for n = 1 to N do xn ← U(xn , xn ) end for. end for. where U(a, b) generates a uniform distribution between a and b. Initialize the time: k ← 0. Initialize objective function value of the elite points: p for p = 1 to P do fpb ← ∞ end for.
*

(a) Displaced Levy No. 5 function optimization problem:
N?1

f (x) = S=

N

5 sin ( y1 ) +
n=1

2

(yn ? 1) (1 + 5 sin ( yn+1 ))

2

2

+ (yN ? 1)

2

x | ? 1 < xn < 1(n = 1, . . . , N) yn = 1 + 10zn , z = x ? x? ,
? xn = U(?0.8, 0.8).

where

(b) Displaced Griewank function optimization problem: f (x) = 1 2000N
N 2 {zn } ? n=1 n=1 N

f ← ∞. Goto Step 2. Step 2 Evaluate for All Search Points Update objective function value and random sign vector of all search points: for p = 1 to P do f p ← f (xp ) for n = 1 to N do p if U(0, 1) < 0.5 then sn ← 1, p else sn ← ?1 end if. end for. end for. Update pbest for each search point, cbest, and the obtained solution: f cb ← ∞. for p = 1 to P do p p p if f p < fpb then fpb ← f p , xpb ← xp end if.

cos

zn √ n+1

+ 1.0

S = x | ? 25 < xn < 25 (n = 1, . . . , N) ? where z = x ? x? , xn = U(?20, 20). (c) Displaced Rosenbrock’s saddle function optimization problem:
N?1

f (x) =
n=1

2 100(zn+1 ? zn ) + (zn ? 1)2 )

2

S = x | ? 3 < xn < 1(n = 1, . . . , N) ? where z = x ? x? + 1, xn = U(?2.4, 0.4). (d) Displaced and rotated 2N -minima function optimization problem:
N

if f p < fcb then f cb ← f p , xcb ← xp end if. if f p < f * then f * ← f p , x* ← xp end if. end for. Goto Step 3. Step 3 Local Search and Termination Check the termination criterion: if k = kmax then Implement the local search with the quasi-Newton method using x* as the initial point. Return the result of the local search and terminate. end if. Goto Step 4. Step 4 Quasi-Chaotic (Main) Search Update x, T using Eqs. (23f) and (23g), and c1 , c2 using Eq. (23k). Update search points: for p = 1 to P do for n = 1 to N do f (xp + xsp ) ? f (xp ? xsp ) p p gn ← xn ? T p 2 xsn end for. p up ← (1 ? c1 ? c2 )g p + c1 xpb + c2 xcb . for n = 1 to N do xn ← (un ) end for. end for. where (y) is given by Eq. (23e) and (un ) is given by Eq. (23h). Increment the time: k ← k + 1. Goto Step 2.
p p

f (x) =
n=1

4 2 zn ? 16zn + 5zn

S = x | ? 2.0965 < xn < 7.9035(n = 1, . . . , N) ? where z = R(?)(x ? x? ) ? 2.9035, xn = U(?1, 7). (e) Displaced and rotated Rastrigin function optimization problem:
N

f (x) = 10N +
n=1

2 zn ? 10 cos(2 zn )

S = x | ? 5 < xn < 5(n = 1, . . . , N) ? where z = R(?)(x ? x? ), xn = U(?4, 4). (f) Noisy quartic function optimization problem:
N

f (x) =
n=1

4 nxn + U(0, 1)

S=

x | ? 5 < xn < 5(n = 1, . . . , N) .

(g) Step function optimization problem: Appendix B. Benchmark problems This appendix lists the benchmark problems used in this paper. In the following, f (x) is the objective function, S is the bounded search space, x? is coordinates of the global minimum (given randomly), U(a, b) is the uniform distribution between a and b, and · is the ?oor function. The orthogonal matrix R(?) which rotates the axes of the decision variables is given by R(?) = T 12 (?) × T 13 (?) × · · · × T 1N (?) × T 23 (?) × · · · × T N?1N (?) ? ? cos ? k = i, l = i ? ? sin ? k = i, l = j ? ? ? sin ? k = j, l = i ij Tkl (?) = (k = 1, . . . , N, l = 1, . . . , N), ? cos ? k = j, l = j ? ?1 k = l = i, j / ? ? otherwise 0 where Tij (?) is N × N matrix, and ? is the rotation angle.
N

f (x) =
n=1

xn

S=

x | ? 5.12 < xn < 5.12(n = 1, . . . , N) .

References
[1] K. Aihara, T. Takabe, M. Toyoda, Chaotic neural networks, Physics Letters A 144 (1990) 333–340. [2] S. Bhatnagar, Simultaneous perturbation and ?nite difference methods, in: Wiley Encyclopedia of Operations Research and Management Science, John Wiley & Sons, Inc., Hoboken, NJ, 2011. [3] H.F. Chen, T.E. Duncan, B. Pasik-Duncan, A stochastic approximation algorithm with random differences, in: Proc. of IFAC the 13th Triennal World Congress, IFAC, Laxenburg, Austria, 1996, pp. 493–496. [4] L. Gerencser, Z. Vago, SPSA in noise free optimization, in: Proc. of American Control Conference 2000, AACC, Danvers, MA, 2000, pp. 3284–3288. [5] F. Glover, M. Laguna, Tabu Search, Kluwer Academic Publishers, Netherlands, 1997. [6] W. Gong, Z. Cai, C. Ling, DE/BBO: a hybrid differential evolution with biogeography-based optimization for global numerical optimization, Soft Computing 15 (2011) 645–665.

1264

T. Okamoto, H. Hirata / Applied Soft Computing 13 (2013) 1247–1264 [22] K. Price, R.M. Storn, J.A. Lampinen, Differential Evolution – A Practical Approach to Global Optimization, Springer, New York, NY, 2005. [23] J.C. Spall, Introduction to Stochastic Search and Optimization: Estimation Simulation and Control, Wiley-Interscience, Hoboken, NJ, 2003. [24] R. Storn, K. Price, Differential evolution – a simple and ef?cient heuristic for global optimization over continuous spaces, Journal of Global Optimization 11 (1997) 341–359. [25] I. Tokuda, K. Onodera, R. Tokunaga, K. Aihara, T. Nagashima, Global bifurcation scenario for chaotic dynamical systems that solve optimization problems and analysis of their optimization capability, Electronics and Communications in Japan (Part III: Fundamental Electronic Science) 81 (1998) 1–12. [26] H. Ying, M.C. Fu, S.I. Marcus, Convergence of simultaneous perturbation stochastic approximation for nondifferentiable optimization, IEEE Transactions on Automatic Control 48 (2003) 1459–1463. Takashi Okamoto was born on August 25th 1980. He completed the doctoral program at Keio University in 2007, with Ph.D. in engineering. Now, he is an assistant professor in the Graduate School of Engineering, Chiba University. His research interests are system engineering, optimization theory, and complex system. Especially, he is interested in optimization methods using nonlinear dynamics.

[7] J. Kennedy, R.C. Eberhart, Particle swarm optimization, in: Proc. of IEEE Int. Conf. Neural Networks, IEEE, Washington, DC, 1995, pp. 1942–1948. [8] J. Kennedy, R.C. Eberhart, Y. Shi, Swarm Intelligence, Morgan Kaufmann Publishers, San Francisco, CA, 2001. [9] S. Kiranyaz, T. Ince, M. Gabbouj, Stochastic approximation driven particle swarm optimization with simultaneous perturbation – who will guide the guide? Applied Soft Computing 11 (2011) 2334–2347. [10] S. Kiranyaz, T. Ince, A. Yildirim, M. Gabbouj, Fractional particle swarm optimization in multidimensional search space, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 40 (2010) 298–319. [11] J. Liang, S. Baskar, P. Suganthan, A. Qin, Performance evaluation of multiagent genetic algorithm, Natural Computing 5 (2006) 83–96. [12] J.J. Liang, P.N. Suganthan, K. Deb, Novel composition test functions for numerical global optimization, in: Proc. of Swarm Intelligence Symposium 2005 (SIS 2005), IEEE, Washington, DC, 2005, pp. 68–75. [13] Y. Maeda, M. Wakamura, Simultaneous perturbation learning rule for recurrent neural networks and its FPGA implementation, IEEE Transactions on Neural Networks 16 (2005) 1664–1672. [14] J.L. Maryak, D.C. Chin, Global random optimization by simultaneous perturbation stochastic approximation, Johns Hopkins APL Technical Digest 25 (2004) 91–100. [15] J.L. Maryak, D.C. Chin, Global random optimization by simultaneous perturbation stochastic approximation, IEEE Transactions on Automatic Control 53 (2008) 780–783. [16] J.L. Maryak, J.C. Spall, Simultaneous perturbation optimization for ef?cient image restoration, IEEE Transactions on Aerospace and Electronic Systems 41 (2005) 356–361. [17] K. Masuda, E. Aiyoshi, Global optimization method using choas of discrete gradient dynamics, in: Proc. of IFAC Workshops ALCOSP & PSYCO 2004, IFAC, Laxenburg, Austria, 2004, pp. 825–830. [18] M. Mizukami, M. Hirano, K. Shinjo, Simultaneous alignment of multiple optical axes in a multistage optical system using Hamiltonian algorithm, Optical Engineering 40 (2001) 448–454. [19] N. Nusawardhana, S.H. Zak, Simultaneous perturbation extremum seeking method for dynamic optimization problems, in: Proc. of American Control Conference 2004, AACC, Danvers, MA, 2004, pp. 2805–2810. [20] T. Okamoto, E. Aiyoshi, Global optimization using a synchronization of multiple search points autonomously driven by a chaotic dynamic model, Journal of Global Optimization 41 (2008) 219–244. [21] A. Pikovsky, M. Rosenblum, J. Kurths, R.C. Hilborn, Synchronization: A Universal Concept in Nonlinear Science, Cambridge University Press, Cambridge, UK, 2002.

Hironori Hirata was born on June 2nd 1948. He completed the doctoral program at Tokyo Institute of Technology in 1976, and joined the faculty of Chiba University as a research associate. He has been a professor there since 1994. His research interests are modeling, analysis, and design of complex system, especially ecological system, and VLSI layouts. He is also interested in fundamental theory of distributed system. He holds a D.Eng. degree.


相关文章:
更多相关标签: