Int J Adv Manuf Technol DOI 10.1007/s00170-007-1132-7
ORIGINAL ARTICLE
A honeybee-mating approach for cluster analysis
Mohammad Fathian & Babak Amiri
Received: 23 December 2006 / Accepted: 14 June 2007 # Springer-Verlag London Limited 2007
Abstract Cluster analysis, which is the subject of active research in several fields, such as statistics, pattern recognition, machine learning, and data mining, is to partition a given set of data or objects into clusters. K-means is used as a popular clustering method due to its simplicity and high speed in clustering large datasets. However, K-means has two shortcomings. First, dependency on the initial state and convergence to local optima. The second is that global solutions of large problems cannot be found with reasonable amount of computation effort. In order to overcome local optima problem lots of studies done in clustering. Over the last decade, modeling the behavior of social insects, such as ants and bees, for the purpose of search and problem solving has been the context of the emerging area of swarm intelligence. Honeybees are among the most closely studied social insects. Honeybee mating may also be considered as a typical swarm-based approach to optimization, in which the search algorithm is inspired by the process of marriage in real honeybee. Neural networks algorithms are useful for clustering analysis in data mining. This study proposes a two-stage method, which first uses self-organizing feature maps (SOM) neural network to determine the number of clusters and then uses honeybee mating optimization algorithm based on Kmeans algorithm to find the final solution. We compared proposed algorithm with other heuristic algorithms in clustering, such as GA, SA, TS, and ACO, by implementing them on several well-known datasets. Our finding shows that the proposed algorithm works better than others. In order to
M. Fathian : B. Amiri (*) Department of Industrial Engineering, Iran University of Science and Technology, Narmak, Tehran 123456, ISLAMIC REPUBLIC OF IRAN e-mail: amiri_babak@ind.iust.ac.ir
further demonstration of the proposed approach’s capability, a real-world problem of an Internet bookstore market segmentation based on customer loyalty is employed. Keywords Clustering . K-means . Honeybee mating . SOM . Market segmentation
1 Introduction Market segmentation has been mentioned as an important and avenue of research in the field of electronic commerce [1]. In this new and competitive commercial framework, market segmentation techniques can give marketing researchers a leading edge because the identification of such segments can be the basis for effective targeting and predicting of potential customers [2]. Among the segmentation methods, the post-hoc methods, especially clustering methods are relatively powerful and frequently used in practice [3, 4]. Among clustering methods, the K-means method is the most frequently used, since if can accommodate the large sample sizes associated with market segmentation studies [5]. Due to increasing computer power and decreasing computer costs, artificial neural networks (ANNs) have been recently applied to a wide variety of business areas [6], such as market segmentation [7, 8], sales forecasting [9], and bankruptcy prediction [10]. One type of unsupervised neural networks, the self-organizing feature maps (SOM), can project high-dimensional input space on a low-dimensional topology, allowing one to visually determine out the number of clusters [11, 12]. One popular class of data clustering algorithms is the center-based clustering algorithm. K-means is used as a popular clustering method due to its simplicity and high speed in clustering large datasets [13]. However, K-means
Int J Adv Manuf Technol
has two shortcomings: dependency on the initial state and convergence to local optima [14] and global solutions of large problems cannot be found with reasonable amount of computation effort [15]. In order to overcome local optima problem lots of studies done in clustering. Ujjwal and Sanghamitra B [16] proposed a genetic algorithm-based method to solve the clustering problem and experiment on synthetic and real life datasets to evaluate the performance. The results showed that GAbased method might improve the final output of K-means. Krishna and Murty [17] proposed a novel approach called genetic K-means algorithm for clustering analysis. It defines a basic mutation operator specific to clustering called distance-based mutation. Using finite Markov chain theory, it proved that GKA converge to the best-known optimum. In [18] Shokri et al. discussed the solution of the clustering problem usually solved by the K-means algorithm. The problem known to have local minimum solutions, which are usually what the K-means algorithm obtains. The simulated annealing approach for solving optimization problems described and proposed for solving the clustering problem. The parameters of the algorithm discussed in detail and it shown that the algorithm converges to a global solution of the clustering problem. According to [19], researchers considered a clustering problem where a given data set partitioned into a certain number of natural and homogeneous subsets such that each subset is composed of elements similar to one another but different from those of any other subset. For the clustering problem, a heuristic algorithm exploited by combining the tabu search heuristic with two complementary functional procedures, called packing and releasing procedures. The algorithm numerically tested for its electiveness in comparison with reference works including the tabu search algorithm, the K-means algorithm and the simulated annealing algorithm. Over the last decade, modeling the behavior of social insects, such as ants and bees, for the purpose of search and problem solving has been the context of the emerging area of swarm intelligence. Using anant colony is a typical successful swarm-based optimization approach, where the search algorithm inspired by the behavior of real ants. KUO et al. [20] proposed a novel clustering method, the ant K-means (AK) algorithm. The AK algorithm modifies the K-means as locating the objects in a cluster with the probability, which updated by the pheromone, while the rule of updating pheromone is according to total within cluster variance (TWCV). In [21], Shelokar et al. present an ant colony optimization, methodology for optimally clustering N objects into K clusters. The algorithm employs distributed agents who mimic the way real ants find a shortest path from their nest to food source and back. They compared result with other heuristic
algorithms in clustering, GA, tabu search, SA. They showed that their algorithms are better than other algorithms in performance and time. Chang et al. [38] developed a hybrid model by integrating self organization map (SOM) neural network, genetic algorithms (GA) and fuzzy rule base (FRB) to forecast the future sales of a printed circuit board factory. This hybrid model encompasses two novel concepts: (1) clustering an FRB into different clusters, thus the interaction between fuzzy rules is reduced and a more accurate prediction model can be established, and (2) evolving an FRB by optimizing the number of fuzzy terms of the input and output variables, thus the prediction accuracy of the FRB is further improved. In [39] Chang et al. present a fuzzy modeling method proposed by Wang and Mendel for generation of fuzzy rules using data generated from a simulated model that is built from a real factory located in Hsin-Chu science-based park of Taiwan, R.O.C. The fuzzy modeling method is further evolved by a genetic algorithm for due-date assignment problem in manufacturing. By using simulated data, the effectiveness of the proposed method is shown and compared with two other soft computing techniques: multi-layer perceptron neural networks and case-based reasoning. The comparative results indicate that the proposed method is consistently superior to the other two methods. Honeybees are among the most closely studied social insects. Honeybee mating may also be considered as a typical swarm-based approach to optimization, in which the search algorithm is inspired by the process of marriage in real honeybee. Honeybee has been used to model agent-based systems [22]. In a recent work, Abbass [23, 24] developed an optimization algorithm based on the honeybee marriage process. Bozorg Haddad and Afshar [25] presented an optimization algorithm based on honeybee mating that successfully applied to a single reservoir optimization problem with discrete decision variables. Later, Bozorg Haddad et al. [26] applied the same algorithm to three benchmark mathematical problems. In [27] the honeybee mating optimization algorithm is presented and tested with a nonlinear, continuous constrained problem with continuous decision and state variables to demonstrate the efficiency of the algorithm in optimization of the single reservoir operation. They showed that the performance of the model is quite comparable with the result of the well-developed traditional linear programming solvers. In this paper, we proposed application of honeybee mating optimization in clustering (HBMK-means). Therefore, this research proposes a two-stage method. It first determines the number of the cluster via the SOM neural network. Then honeybee mating optimization algorithm is presented and applied to find the final solution (defined as SOM+HBMK in this paper).
Int J Adv Manuf Technol
The paper organized as follow: in Section 2 we discuss cluster analysis problems. Section 3 introduces SOM artificial neural network, honeybee mating nature and application of honey-bee mating algorithm in clustering, in Section 4 experimental result of proposed clustering algorithm in comparison with other heuristics clustering algorithms showed, and finally the real world problem results are detailed in Section 5.
Step 3 Compute new cluster centers Step 4 If termination criteria satisfied, stop otherwise continues from step 2 Note that in case the process close not terminates at step 4 normally, then it executed for a mutation fixed number of iterations.
2 Clustering Data clustering, which is an NP-complete problem of finding groups in heterogeneous data by minimizing some measure of dissimilarity, is one of the fundamental tools in data mining, machine learning and pattern classification solutions [28]. Clustering in N-dimensional Euclidean space RN is the process of partitioning a given set of n points into a number, say k, of groups (or, clusters) based on some similarity (distance) metric in clustering procedure is Euclidean distance, which derived from the Minkowski metric according to Eqs. 1 and 2. d ? x; y? ?
m X xi ? yj r i?1
3 Application of honeybee mating optimization algorithm and SOM in clustering The proposed method uses SOM to find number of clusters and then feeds them to the HBMK to get the final solution as shown in Fig. 1(SOM+HBMK). Each of these two stages is explained in more detail in the following subsections. 3.1 Self-organizing feature maps (SOM) The self-organizing feature map (SOM), which is typical of the unsupervised learning neural networks, can project a high-dimensional input space on a low-dimensional topology so as to allow one to visually determine the number of clusters directly [11, 12]. The most widely used unsupervised learning scheme is the self-organizing feature maps developed by Kohonen [30]. The learning rule of adjusted weights is as follows: ? ? $wij ? η ^ i; i* xj ? wij ? 4?
!1=r ? 1?
v??????????????????????????? u m uX ? ?2 t xi ? yj d ? x; y? ?
i ?1
? 2?
In this study, we will also use Euclidian metric as a distance metric. The existing clustering algorithms can be simply classified into the following two categories: hierarchical clustering and partitional clustering. The most popular class of partitional clustering methods is the center-based clustering algorithms [29]. The K-means algorithm is one of the most widely used center-based clustering algorithms [13]. To find K centers, the problem is defined as an optimization (minimization) of a performance function, Perf(X, C), defined on both the data items and the center locations. A popular performance function for measuring goodness of the Kclustering is the total within-cluster variance or the total mean-square quantization error (MSE), according to Eq. 3 [29]. Perf ?X ; C ? ?
N X i?1
Where η a learning rate that is gradually decreased is, ξj is the value of input node j, and i* is the winner node. The neighborhood function Λ(i, i*) is 1 for i = i* and it decrease with distance |ri–ri*| between unit i and i* in the output array. A typical choice for Λ(i, i*) is as follows: " !# ri ? r 2 i* 0 i; i* ? exp ? 5? 2w 2 where w is a width parameter that is gradually decreased with every learning cycle. 3.2 Honeybee mating optimization algorithm A honeybee colony typically consists of a single egg-laying long-lived queen, anywhere from zero to several thousands drones (depending on the season) and usually 10,000 to 60,000 workers [27]. Queens are specialized in egg laying. A colony may contain one queen or more during its life cycle, which named monogynous and/or polygynous colonies, respectively. Only the queen bee is fed “royal jelly,” which is a milky-white colored, jelly-like substance “Nurse bees” secrete this nourishing food from their glands,
n o Min kxi ? cl k2 jl ? 1; . . . ; K
? 3?
The steps of the K-means algorithm are as follow [16]: Step 1 Choose K cluster centers randomly from n points Step 2 Assign each point to clusters
Int J Adv Manuf Technol
and feed it to their queen. The diet of royal jelly makes the queen bee bigger than any other bee in the hive. A queen bee may live up to 5 or 6 years, whereas worker bees and drones never live more than 6 months. There are usually several hundred drones that live with the queen and worker bees. Mother nature has given the drones’ just one task, which is to provide the queen with some sperm. After the mating process, the drones die. Drones are the fathers of the colony. They are haploid and act to amplify their mothers’ genome without altering their genetic composition, except through mutation. Therefore, drones considered as agents that propagate one of their mother’s gametes and function to enable females to act genetically as males. Workers specialized in brood care and sometimes lay eggs. Broods arise either from fertilized or unfertilized eggs. The former represent potential queens or workers, whereas the latter represent prospective drones [27]. In the marriage process, the queen(s) mate during their mating flights far from the nest. A mating flight starts with a dance performed by the queen who then starts a mating flight during which the drones follow the queen and mate with her in the air. In each mating, sperm reaches the spermatheca and accumulates there to form the genetic pool of the colony. Each time a queen lays fertilized eggs, she randomly retrieves a mixture of the sperm accumulated in the spermatheca to fertilize the egg [31]. The queen is pursued by a large swarm of drones (drone comets), when copulation occurs. Insemination ends with the eventual death of the drone, and the queen receiving the “mating sign.” The queen mates multiple times, but the drone, inevitably, only once. These features make bee mating the most spectacular mating among insects [27]. The mating flight may be considered as a set of transitions in a state-space (the environment) where the queen moves between the different states in some speed and mates with the drone encountered at each state probabilistically. At the start of the flight, the queen initialized with some energy content and returns to her nest when the energy is within some threshold from zero to full spermatheca. [27] In developing the algorithm, the functionality of workers is restricted to brood care, and, therefore, each worker may be represented as a heuristic which acts to improve and/or take care of a set of broods (i.e., as feeding the future queen with royal jelly). A drone mates with a queen probabilistically using an annealing function as follows [24]: Pr ob?Q; D? ? exp ??Δ? f ?=S ?t ?? ? 6?
function, where the probability of mating is high when the queen is still at the beginning of her mating flight, therefore her speed is high, or when the fitness of the drone is as good as the queen’. After each transition in space, the queen’s speed and energy decays according to the following equations: S ? t ? 1? ? a ? t ? ? S ? t ? ; ? 7?
Where a is a factor ∈[0, 1] and is the amount of speed reduction after each transition. Workers that are used to improve the brood’s genotype may represent a set of different heuristics. The rate of improvement in the brood’s genotype, as result of a heuristic application to that brood, defines the heuristic fitness value. The fitness of the resulting genotype is determined by evaluating the value of the objective function of the brood genotype and/or its normalized value. It is important to note that a brood has only one genotype. Thus, an HBMO algorithm maybe constructed with the following five main stages: & The algorithm starts with the mating flight, where a queen (best solution) selects drones probabilistically to form the spermatheca (list of drones). A drone then selected from the list randomly for the creation of broods. Creation of new broods (cluster centers) by crossover of the drone’s genotypes with the queens. Use of workers (heuristics) to conduct local search on broods (trial solutions). Adaptation of worker ’s fitness, based on the amount of improvement achieved on broods. Replacement of weaker queens by fitter broods.
& & & &
where Prob(Q, D) is the probability of adding the sperm of drone D to the spermatheca of queen Q (that is, the probability of a successful mating); Δ( f ) is the absolute difference between the fitness of D (i.e., f (D)) and the fitness of Q (i.e., f (Q)); and S (t) the speed of the queen at time t. It is apparent that this function acts as an annealing
The algorithm starts with three user-defined parameters and one predefined parameter. The predefined parameter is the number of workers (W), representing the number of heuristics encoded in the program. However, the predefined parameter may be used as a parameter to alter the number of active heuristics if required; that is, the user may choose the first heuristic, where W is less than or equal to the total number of heuristics encoded in the program. The three userdefined parameters are the number of queens, the queen’s spermatheca size representing the maximum number of mating per queen in a single mating flight, and the number of broods that will be born by all queens. The speed of each queen at the start of each mating flight initialized at random. A set of queens then initialized at random. A randomly selected heuristic then used to improve the genotype of each queen, assuming that a queen is usually a good bee. A number of mating flights are undertaken. In each mating flight, all queens fly based on the speed of each, where speed generated at random for each queen before each mating flight commences. At the start of a mating flight, a drone generated randomly and the queen positioned over that drone. The
Int J Adv Manuf Technol
Initial Weights Input a training pattern Find the shortest distance of winner unit
Update winner unit and the neighbor nodes’ weights Reduce the neighbor radius, Decrease the learning
Reach the learning cycles? Yes Determine the number of clusters, Use HBMK to find final solution
Fig. 1 SOM algorithm
No
transition made by the queen in space based on her speed that represents the probability of flipping each gene in the drone’s genome. At the start of a mating flight, the speed may be higher and the queen may make very large steps in space. While the energy of the queen decreases, her speed decreases, and as a result, the neighborhood covered by the queen, decreases. At each step in the space, the queen mates with the drone encountered at that step using the probabilistic rule in Eq. (6). If the mating is successful (i.e., the drone passes the
probabilistic decision rule), the drone’s sperm is stored in the queen’s spermatheca. To sum up, the algorithm starts with a mating flight where a queen selects a drone with a predefined probabilistic rule. By crossing over the drone’s genotypes with the queen’s, a new brood (trial solution) is formed which later can be improved, employing workers to conduct local search [27]. When all queens complete their mating flight, they start breeding. For a required number of broods, a queen selected in proportion to her fitness and mated with a randomly selected sperm from her spermatheca. A worker chosen in proportion to its fitness to improve the resultant brood. After all broods have been generated, they are sorted according to their fitness. The best brood replaces the worst queen until there is no brood that is better than any of the queens are. Remaining broods then killed and a new mating flight begins until all assigned mating flights are completed or convergence criteria met [27]. The main steps of the HBMO algorithm presented in Fig. 2. In addition, a full-scale computational flowchart illustrated in Fig. 3. As this algorithm is combination of simulated annealing, genetic operator and swarm intelligence it is very interesting optimization algorithm that used in optimization problems of reservoir operation. 3.3 The HBMK-means algorithm The search capability of HBMO algorithm is used in this article for appropriately determining a fixed number of K cluster centers in RN; thereby suitably clustering the set of n unlabelled points, the clustering metric that has been adopted is the sum of the Euclidean distance of the points
Fig. 2 The HBMO algorithm [24]
List of Drones Select a Drone at Random
Queen
Selected Drone
Mate Brood Brood Apply Local Search
Replace the queen with the best brood
Selected Brood
Selected Best
Int J Adv Manuf Technol
Start Define the model input parameters: a) Algorithm parameters, b) model parameters Random generation of a set of initial solutions Rank the initial enters based on performance function, keeps the best one and predefined number of solutions
Use simulated annealing to select the set of solutions from the search space to make a mating pool for possible information exchange between the best present solution and the selected trial solutions
Generate new set of solutions by employing different predefined crossover operators and heuristic functions between the best present solution and the trial solutions according to their fitness values
Step 2: Define the model inputs parameters The algorithm starts with three user-defined parameters and one predefined parameter. The predefined parameter is the number of worker (W), representing the number of heuristics encoded in the program. However, the predefined parameter may be used as a parameter to alter the number of active heuristics if required; that is, the user may choose the first heuristic, where W is less than or equal to the total number of heuristics encoded in the program. The user-defined parameters are number of queens, the queen’s spermatheca size representing the maximum number of broods that will be born by all queens. The speed of each queen at the start of each mating flight initialized at random. Step 3: Random generation of a set of initial solutions In this stage, a set of initial cluster centers generated randomly from the dataset points. Each solution represents K cluster centers as shown in Fig. 3. Step 4: Selection of queen After generation of a set of initial solutions mentioned in the previous stage, rank the initial solutions based on performance function and set the best one as queen. Step 5: Mating flight Use simulated annealing to select the set of solutions from the search space to make a mating pool for possible information exchange between the best present solution (queen) and the selected trial solutions. Step 6: Breeding process Generate new set of solutions by employing predefined crossover operators and heuristic functions between the present solutions and the trial solution according to their fitness values. In this study, we adopt intermediate crossover. It creates children by taking a weighted average of parents. You can specify the weights by a single parameter, ratio, which can be a scalar or a raw vector of length number of variables. The default is a vector of all 1s. The function creates the child from parent 1 and parent 2 using the following formula.
Child ? parent 1 ? ran *Ratio *?parent 2 ? parent 1?
Improve the newly generated set of solutions employing different heuristic functions and mutation operators according to their fitness values Updating the fitness value of the heuristic functions for next iteration, giving more chance to the more effective heuristic function in solution improvement
Substitute the best solution
Yes
Is the new best solution better than the previous one?
No
Keep the previous best solution
Termination criteria satisfied.
Finish
All previous trial solutions discarded and new trial solutions generated using: a) Remaining generated solution, b)Random generation
Fig. 3 HBMO algorithm representation
from their respective cluster centers. The steps of the proposed algorithm are shown in Fig. 2, there are now described in detail. Step 1: String representation A chromosome has used to represent a candidate solution to a problem where each gene in the chromosome represents a parameter of the candidate solution. In this study, a chromosome regarded as a set of K initial cluster centers and each gene is a cluster center dimension. Specifically, a chromosome can be represented as C=[c1... cj... cK] where cj is the jth gene and K is total number of genes. Figure 4, illustrate a chromosome encoding example. C1 =(2, 5, 1), C2 =(6, 3, 2), C3 =(5, 7, 4).
C1 C2
? 8?
If all the entries of ratio lie in the range [0, 1], the children produced are within the hypercube defined by placing the parents apposite vertices. If the ratio is not in that range, the children might lie outside the hypercube. If the ratio is a scalar, then all the children lie on the line between the parents. Stage 7: Feeding selected broods and queen with the royal jelly by workers Improve the newly generated set of solutions employing different heuristic functions and mutation operators according
C3
2
5
1
6
3
2
5
7
4
Fig. 4 A chromosome-encoding example
Int J Adv Manuf Technol Fig. 5 Pseudo code for HBMK clustering Algorithm
Number of clusters ? Set k Number of Drones ? Set m Capacity of Spermatheca ? Set L Speed of Queen at the start of a mating flight ? Set Tmax Speed of Queen at the end of a mating flight ? Set Tmin Speed reduction schema ? Set t Number of iteration ? Set P ? Set T= Tmax Number of workers ? Set w Number of broods ? Set b Begin Generate m Drones with k length randomly from X matrices, and set them as D matrices Calculate their objective function Select the best drone and set it as Q (Queen) Repeat Repeat Select a Di from D randomly Calculate ?(f)=|f(Q)-f(Di| Generate r randomly, If exp(-?(f)/T)>r Add Di to spermatheca S Else T=t*T Until Capacity of spermatheca completed or T=Tmin Repeat Select a crossover function from list W, according to its probability Select a Si randomly and generate new solution by crossover Si and Q and set it as Bi Calculate crossover fitness Update probability matrices of crossover function selection Until Number of broods equal to b Begin Select a mutation function from list W, according to its probability Mutation Bi and set it as Ei If f(Ei)>f(Q) swap Ei and Q Else keep the previous solution Calculate mutation fitness Update probability matrices of mutation function selection End Generate new drones list randomly Until the termination criteria satisfied End
to their fitness values. For binary representation of chromosomes, a bit position mutated by simply flipping its value. Since we are considering floating point representation in this article, we use the following mutation. A number δ in the range [0, 1] generated with uniform distribution. If the value at a gene position is v, after mutation it becomes: v ? 2*δ *v v ? 2*δ v 6? 0 v?0
The + or - sign occurs with equal probability. Note that we could have implemented mutation as: v ? δ*v However, one problem with this form is that if the values at a particular position in all the chromosomes of a population become positive (or negative), then we will never be able to generate a new chromosome having a negative (or positive) value at the position. In order to
Int J Adv Manuf Technol Table 1 Results obtained by the five algorithms for ten different runs on dataset 1 Method Function value Fbest SOMHBMK ACO GA TS SA 96.752047 97.100777 113.986503 97.365977 97.100777 Faverage 96.95316 97.171546 125.197025 97.868008 97.134625 Fworst 97.757625 97.808466 139.778272 98.569485 97.263845 11214 10998 38128 20201 29103 35.25 33.72 105.53 72.86 95.92 Function evaluation CPU time (s)
overcome this limitation, we have incorporated a factor of 2 while implementing mutation other form like v ? ? δ ? e? * v where 0 < e < 1 would also have satisfied our purpose. Step 8: If the new best solution is, better than the queen replace it with queen. Step 9: Check the termination criteria If the termination criteria satisfied finish the algorithm, otherwise discard all previous trial solutions and generate new trial solutions. Then go to step 5 until all assigned iteration (mating flights) completed or convergence criteria met. The pseudo code for application of HBMO algorithm in clustering illustrate in Fig. 5. As K-means algorithm is used as a basis of proposed algorithm, it has K-means advantages like simplicity and high speed in clustering. The proposed algorithm overcomes the shortcoming of K-means algorithm. As mentioned in Section 1, K-means algorithm shortcomings are dependency on the initial state and convergence to local optima and global solutions of large problems cannot be found with reasonable amount of computation effort. The HBKM algorithm in first stage use a list of random solutions (drones list), causing dependency on the initial state to be decrease. Then by integration of SA, GA and swarm intelligence algorithms and, because random solutions add to solutions list each iteration, the proposed algorithm solved the second problem of the K-means algorithm, i.e., convergence to local optima and global solutions of large problems cannot be found with reasonable amount of computation effort. In the next sections,
Table 2 Results obtained by the five algorithms for ten different runs on dataset 2
HBMK-means compared with other heuristics algorithm in clustering, such as GA, SA, TS, and ACO, by implementing them on several well-known datasets. 4 Experimental result In this section, we present a set of experiments that shows goodness of our algorithm. We have done our experiments on a Pentium IV, 2.8 GHz, 512 GB RAM computer and we have coded with Matlab 7.0 software. We run all five algorithms on three different datasets. The datasets are all well-known iris, wine and breast-cancer datasets taken from the Machine Learning Laboratory [32]. DataSet1: This is the Iris data set, which is perhaps the bestknown database to found in the pattern recognition literature. Fisher’s paper is a classic in the field and referenced frequently to this day. The data set contains three classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other two; the latter are not linearly separable from each other. There are 150 instances with four numeric attributes in iris data set. There is no missing attribute value. The following are the attributes of the iris data set: sepal length in cm, sepal width in cm, petal length in cm and petal width in cm. DataSet2: This is the wine data set, which also taken from MCI laboratory. These data are the results of a chemical analysis of wines grown in the same region in Italy, but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types
Method
Function value Fbest Faverage 16257.284378 16530.533807 16530.533807 16785.459275 16530.533807 Fworst 16257.284378 16530.533807 16530.533807 16837.535670 16530.533807
Function evaluation 7238 9306 33551 22716 7917
CPU time (s) 55.18 68.29 226.68 161.45 57.28
SOMHBMK ACO GA TS SA
16257.284378 16530.533807 16530.533807 16666.226987 16530.533807
Int J Adv Manuf Technol Table 3 Results obtained by the five algorithms for 10 different runs on dataset 3 Method Function value Fbest SOMHBMK ACO GA TS SA 10099.297958 10111.827759 10116.294861 10249.72917 10111.827759 Faverage 10100.13 10112.126903 10128.823145 10354.315021 10114.045265 Fworst 10107.646802 10114.819200 10148.389608 10438.780449 10115.934358 23268 25626 45003 29191 28675 99.35 102.15 153.24 114.01 108.22 Function evaluation CPU time (s)
of wines. There are 178 instances with 13 numeric attributes in wine data set. All attributes are continuous. There is no missing attribute value. DataSet3: This dataset categories N=215 samples of patients suffering from three human thyroid diseases, K=3 as: euthyroid, hyperthyroidism, and hypothyroidism patients where 150 individuals are tested euthyroid thyroid, 30 patients are experienced hyperthyroidism thyroid while 35 patients are suffered by hypothyroidism thyroid. Each individual was characterized by the result of five, n=5 laboratory tests as: total serum thyroxin, total serum tri-iodothyronine, serum tri-iodothyronine resin uptake, serum thyroid-stimulating hormone (TSH), and increase TSH after injection of TSH-releasing hormone [33].
To evaluate the performance of the SOMHBMK algorithm in clustering, we have compared it with several typical stochastic algorithms, including the ACO algorithm [21], the simulated annealing approach [18], the genetic algorithms [17], and the tabu search approach [19]. The effectiveness of stochastic algorithms is greatly dependent on the generation of initial solutions. Therefore, for every dataset, algorithms performed 10 times individually for their own effectiveness tests, each time with randomly generated initial solutions. The comparison of results for each dataset based on the bet solution found in ten distinct runs of each algorithm; the average number of evaluations required and the convergence processing time taken to attain the best solution. The solution quality is also given in terms of the average and worst values of the clustering metric (Favg, Fworst, respectively) after ten different runs for each of the five algorithms. F is the total within-cluster variance or the total
Table 4 Values of parameters of each algorithm SOMHBMK Parameter Number of queens Number of drones Value 1 150 ACO Parameter Ants (R) GA Value Parameter 50 Population size Crossover rate TS Value Parameter 50 0.8 Tabu list size SA Value Parameter 25 Probability threshold Initial temperature Temperature multiplier Final temperature Value 0.98 5
Capacity of spermatheca Speed of queen at first of flight Minimum speed of queen
Probability threshold 0.98 for maximum trail (q0) 50 Local search 0.01 probability (pls) Randomly Evaporation rate (ρ) 0.01 ∈[0.5 1] 1,000
Number of 40 trial solutions 0.98 1000
0.001 Probability threshold Maximum number 1,000 Maximum of iterations number of iterations
Mutation rate
0.98 0.01
Speed reduction schema (cooling schema) Cross-over ratio 1.5 Maximum Number 1,000 of iterations
Randomly Maximum number ∈[0 1] of iterations (itermax) 0.98
Number of 100 iterations detect steady state Maximum number 30,000 of iterations
Int J Adv Manuf Technol
mean-square quantization error (MSE) that illustrated in Eq. 3, that C is cluster center and k is the number of clusters. Tables 1, 2 and 3 show these results. For clustering problem, on iris dataset results given in Table 1, show that the SOMHBMK provide the optimum value of 96.752047, the ACO and SA methods obtain 97.100777. The SOMHBMK was able to find the optimum in eight runs as compared to that of nine times obtained by ACO and five times obtained by SA. The ACO required the least number of function evaluation (10998) and the processing time (33.72). The result obtained for the clustering problem on dataset 2 given in Table 2. The SOMHBMK find the optimum solution of 16257.284378 and the ACO, SA and GA methods provide 16530.533807. The SOMHBMK, ACO, SA and GA methods found the optimum solution in all their ten runs. The function evaluation and the execution time taken by the SOMHBMK algorithm are less than other algorithms. The SOMHBMK algorithm for the human thyroid disease dataset, provide the optimum solution of 10099.297958 to this problem with a success rate of 90% during ten runs. In terms of the function, evaluations and the processing time the SOMHBMK performed better than other clustering algorithms as can be observed from Table 3. This algorithm is combination of SA, GA and swarm intelligence, as proposed algorithm use SA for selection number of solutions in search space. This algorithm works better than other algorithms in number of times calculating F function. It means that SOMHBMK uses advantages of SA, GA and swarm intelligence algorithms.
Fig. 6 Iranbin Internet bookstore
Shelokar et al. [21] performed several simulations to find the algorithmic parameters that result into the best performance of ACO, GA, SA and TS algorithms in terms of the equality of solution found, the function evaluations and the processing time required. In this study, we used their algorithmic parameters. In addition, we performed several simulations to find the algorithmic parameters for SOMHBMK algorithm. Algorithmic parameters for all algorithms illustrated in Table 4. The result illustrate that the proposed SOMHBMK optimization approach can be considered as a viable and an efficient heuristic to find optimal or near optimal solutions to clustering problems of allocating N objects to k clusters.
5 Market segmentation: a case of an Internet bookstore SOMHBMK is the best method for clustering analysis as shown in Section 4. To further demonstration of the proposed method, an advanced comparison of five methods made, using real-world data of an Internet bookstore in Iran for market segmentation based on customer loyalty. Iranbin is the biggest Internet bookstore in Iran with more than 170,000 Persian and 20,000,000 English books and journals. Iranbin to tailor its products, services and marketing messages to its customers needs to segment them. Customer segments have traditionally been based on market research and demographics. There might be a “young and single” segment or a “loyal entrenched segment”.
Int J Adv Manuf Technol
output topology is 10×10. The error function [37] used to evaluate the convergence of SOM is
P X M q?????????????????????? X ? ?2? Xi ? Wij i?1 j?1
Fig. 7 Clustering result of SOM
E?
? 9?
5.1 RFM model Direct marketing professionals have been trying to gain such insight ever since the end of the 19th century, when the first catalogue of products that could be ordered by mail appeared in the USA [34]. However, it was only at the beginning of the 1960s that a simple and effective quantitative method to separate customers who are likely to make purchases from those who are not was devised: the recency-frequency-monetary value (RFV or RFM) analysis [35]. Generally, shortened to RFV, it is sometimes known as “RFM” analysis. In this approach to market segmentation, customers are clustered together into an arbitrary number of segments according to their most recent day of purchase, the number of purchases they have made and the monetary value of their purchases. A random sample taken from the segmented customer database is then, subjected to a direct marketing campaign. As a result, some customer segments may reveal themselves to be profitable, while others may do the reverse. Subsequently, the remaining customers in the database who belong to profitable segments are targeted by the same campaign [36]. 5.2 Results of evaluation of five clustering methods Five methods used to cluster 700 customers of www. IranBin.com, an Internet bookstore in Iran based on customer loyalty, as shown in Fig. 6. Based on RFM model, loyalty of each customer determined with three parameters, R (recency of purchase), F (Frequency) and M (Monetary), in this research customers segmented based on these variables. For SOM network, the number of learning epochs is set as 100 and the training rate is set to 0.5. The two-dimensional
where P is the number of training patterns, M is the number of output units and Wij is the weights of the winner unit for every training pattern. The number of clusters is determined by SOM network. There are 700 training samples with 20 input data for training. Fig. 7 displays the training result of SOM network, indicating that there are clearly five clusters. The first stage of proposed algorithm clustered customers into five segments and then HBMK algorithm used to find the final solution. The performance function of F (Eq. 3), is also calculated since it is used to evaluate five methods, SOMHBMK, ACO, GA, TS and SA. Table 5 shows that SOMHBMK has the best performance which is identical to the previously experimental result. The SOMHBMK required the least number of function evaluation (69804) and the processing time (119.22) and the SOMHBMK finds the optimum solution of 18178,736324 in 90% of all its 10 runs. Cluster 1 consists of customers that have a long relationship with the Iranbin, but more recently, the money they paid is low. Iranbin should increase the value of sale to these customers by applying appropriate marketing strategies. Cluster 2 consists of customers that buy recently, but they do not have long relationship with Iranbin. In other words, they are new customers that Iranbin should learn more about and attempt to attract them and sell more to them. Cluster 3 consists of customers that have a long relationship with Iranbin, but recently sales to themhave decreased. Iranbin should apply appropriate marketing strategy for those customers to retain them. Cluster 4 is the worth segment. Customers in this segment purchase more; the frequency of purchases is high; and they purchased recently. Customers in
Table 5 Results obtained by the five algorithms for ten different runs on real-world problem
Method
Function value Fbest Faverage 18180.234000 18201.828425 18231.881661 18637.767038 18205.281477 Fworst 18193.764244 18206.674560 18267.101294 18789.804808 18208.681844
Function evaluation 69.804 76.878 58.503 37.948 86.025
CPU time (s) 119.22 122.58 214.53 148.21 151.51
SOMHBMK ACO GA TS SA
18178.736324 18201.289966 18209.330750 18449.512506 18201.289966
Int J Adv Manuf Technol
this segment have a long relationship with Iranbin, and they are loyal customers. Cluster 5 consists of customers who purchased recently, but the value of purchase is not high. They are partly loyal customers. Iranbin should apply appropriate marketing strategies to increase value of sales to customers of this cluster.
6 Conclusion In summary, this paper introduces a two-stage clustering algorithm to solve clustering problems. It first determines the number of clusters via the SOM neural network. Then honeybee mating optimization algorithm is presented and applied to find the final solution. Honeybees are among the most closely studied social insects. Honeybee mating may also be considered as a typical swarm-based approach to optimization, in which the process of marriage in real honeybee inspires the search algorithm. Honeybee has been used to model agent-based systems. As SOMHBMK is based on K-means algorithm, like K-means algorithm it has its simplicity and high speed in clustering large datasets. Proposed algorithm overcomes the shortcoming of K-means algorithm, dependency on the initial state and convergence to local optima and global solutions of large problems cannot be found with reasonable amount of computation effort. The algorithm implemented and tested on several real datasets and the customer loyalty data of an Internet bookstore in Iran in comparison with other meta-heuristic algorithms like ACO, GA, SA and TS. Results show that proposed algorithm is better than others. Thus the proposed two-stage method, which first uses the SOM to determine the number of clusters and then employs honeybee mating optimization algorithm based on K-means algorithm to find the final solution of clustering, is a robust clustering method. It can be applied as a new clustering method for market segmentation or other clustering problems.
References
1. Chang S (1998) Internet segmentation: state-of-the-art marketing applications. J Segm Marketing 2(1):19–34 2. O’Connor GC, O’Keefe B (1997) Viewing the web as a marketplace: the case of small companies. Decis Support Syst 21(3):171–183 3. Dillon WR, Kumar A, Borrero MS (1993) Capturing individual differences in paired comparisons: an extended BTL model incorporating descriptor variables. J Mark Res 30:42–51 4. Wedel M, Kamakura WA (1998) Market segmentation: conceptual and methodological foundations. Kluwer Academic, Boston 5. Anil C, Carroll D, Green PE, Rotondo JA (1997) A feature-based approach to market segmentation via overlapping K-centroids clustering. J Mark Res 34(3):370–377 (August)
6. Vellido A, Lisboa PJG, Vaughan J (1999) Neural networks in business: a survey of applications (1992–1998). Expert Syst Appl 17(1):51–70 7. Balakrishnan PV, Cooper MC, Jacob VS, Lewis PA (1996) Comparative performance of the FSCL neural net and K-means algorithm for market segmentation. Eur J Oper Res 93:346–357 8. Kuo RJ, Ho LM, Hu CM (2002) Integration of self-organizing feature map and K-means algorithm for market segmentation. International Journal of Computers and Operations Research 29: 1475–1493 9. Kuo RJ, Xue KC (1998) A decision support system for sales forecasting through fuzzy neural network with asymmetric fuzzy weights. Journal of Decision Support Systems 24(2):105–126 10. Cadden DT (1991) Neural networks and the mathematics of chaos-an investigation of these methodologies as accurate predictors of corporate bankruptcy First international conference on artificial intelligence applications on wall street (pp.582–589). IEEE Computer Society Press 11. Lee RCT, Slagle JR, Blum H (1977) A triangulation method for the sequential mapping of points from N-space to two-space. IEEE Trans Comput 26:288–292 12. Pykett CE (1978) Improving the efficiency of Sammon’s nonlinear mapping by using clustering archetypes. Electron Lett 14:799–800 13. Forgy EW (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21:768–769 14. Selim SZ, Ismail MA (1984) K-means type algorithms: a generalized convergence theorem and characterization of local optimality. IEEE Trans Pattern Anal Mach Intell 6:81–87 15. Spath H (1989) Clustering analysis algorithms. Ellis Horwood, Chichester, UK. 16. Ujjwal M, Sanghamitra B (2000) Genetic algorithm-based clustering technique. Pattern Recogn 33:1455–1465 17. Krishna K, Murty M (1999) Genetic K-means Algorithm. IEEE Transaction on Systems, man, and Cybernetics-Part B: Cybernetics 29:433–439 18. Shokri Z, Selim, Al-Sultan K (1991) A simulated annealing algorithm for the clustering problem. Pattern Recogn 24:1003–1008 19. Sung CS, Jin HW (2000) A tabu-search-based heuristic for clustering. Pattern Recogn 33:849–858 20. Kuo RI, Wang HS, Tung-Lai Hu, Chou SH (2005) Application of ant K-means on clustering analysis. Comput Math Appl 50:1709–1724 21. Shelokar PS, Jayaraman VK, Kulkarni BD (2004) An ant colony approach for clustering. Anal Chim Acta 509:187–195 22. Perez U, Hirsbrunner B (2000) Learning and foraging in robotbees. SAB Proceedings Supplement Book, International Society for Adaptive Behavior, Honolulu, Hawaii 185–194 23. Afshar A (2001) A monogenous MBO approach to satisfiability. Proceeding of the International Conference on Computational Intelligence for Modelling, Control and Automation, CIMCA, Las Vegas, NV, USA 24. Afshar A (2001) Marriage in honey-bee optimization (MBO): a haplometrosis polygynous swarming approach. in: The Congress on Evolutionary Computation, Seoul, Korea, 207–214 25. Bozorg Haddad O, Afshar A (2004) MBO (Marriage Bees Optimization), A new heuristic approach in hydro systems design and operation. Proceedings of 1st International Conference on Managing Rivers in the 21st Century: Issues and Challenges, Penang, Malaysia, pp. 499–504 26. Bozorg Haddad O, Afshar A, Marin MA (2005) Honeybees mating optimization algorithm (HBMO); a new heuristic approach for engineering optimization. Proceeding of the First International Conference on Modeling, Simulation and Applied Optimization (ICMSA0/05), Sharjah, UAE 27. Afshar A, Bozog Haddad O, Marino MA, Adams BJ (2006) Honeybee mating optimization (HBMO) algorithm for optimal reservoir operation. Journal of the Franklin Institute, Article in Press
Int J Adv Manuf Technol 28. Garey MR, Johnson DS, Witsenhausen HS (1982) The complexity of the generalized Lloyd-Max problem. IEEE Trans Inform Theory 28:255–256 29. Gungor Z, Unler A (2006) K-harmonic means data clustering with simulated annealing heuristic. Applied Mathematics and Computation, Article in Press 30. Kohonen T (1982) A simple paradigm for the self-organized formation of structured feature maps. In: Amari S, Berlin M (eds) Competition and cooperation in neural nets, Lecture notes in biomathematics. Springer, Berlin 31. Page RE (1980) The evolution of multiple mating behaviors by honey-bee queens (Apis mellifera L.). J Genet 96:263–273 32. Blake CL, Merz CJ, UCI repository of machine learning databases. Available from: <http://www.ics.uci.edu/~mlearn/MLRepository. html> 33. Coomans D, Jonckheer M, Massart DL, Broechaert I, Blockx P (1978) Anal Chim Acta 103:409–415 34. Raphael M (2002) Where did you come from? Direct Mark 62:36–38 35. Cullinan GJ (1977) Picking them by their batting averages: recency-frequency-monetary method of controlling circulation. Direct Mail/Marketing Association, New York, NY 36. Armando LF, Eber AS, Priscila L, Fernando SPM (2006) Optimized RFV analysis. Mark Intell Plann 24:106–118 37. Kohonen T (1991). Self-organizing maps: optimization approaches. In: Kohonen T, Makisara K, Simula O, Kangas J (eds.), Artificial neural networks. Elsevier, Amsterdam, The Netherlands, pp 981–990 38. Chang P-C, Liu CH, Wang YW (2006) A hybrid model by clustering and evolving fuzzy rules for sale forecasting in printed circuit board industry. Decis Support Syst 42(3):1254–1269 39. Chang, P-C, Hsieh JC, Warren Liao T (2005) Evolving Fuzzy Rules for due-date assignment problem in Semiconductor manufacturing factory. J Intell Manuf 16(5):549–557