Distributed Learning Approach for Channel Selection in Cognitive Radio Networks
Chowdhury Sayeed Hyder and Li Xiao
Department of Computer Science & Engineering Michigan State University East Lansing, Michigan 48823 Email: {hydercho, lxiao}@cse.msu.edu I. I NTRODUCTION Cognitive Radio Network (CRN) is an emerging networking concept that has been introduced to solve the spectrum scarcity problem. There are typically two kinds of users in CRN primary users (PU) and secondary users (SU). SUs can opportunistically use licensed channels without interfering with the ongoing transmission of PUs. The successful transmission depends mostly on the selection of idle channels by SUs. Hence, channel selection under the aforementioned constraints is a signi?cant research problem. Accordingly, many researchers have developed channel selection algorithms based on different variations of PU channel usage models (e.g. Markov model) or adopted learning algorithms. However, in real scenarios, SUs do not know the activity of PUs in advance and channel usage by PUs may not exactly follow any mathematical model. On the other hand, existing learning algorithms (e.g. [1],[2]) either need a central base station or exchange of message passing between neighbors for cooperation and collision avoidance. Due to the unique characteristics of CRN, the centralized solution may not always be feasible and message passing among nodes will introduce excessive overhead. The learning algorithm proposed in [3] overcomes these problems. In this approach, channels are ranked according to g-statistics. Although this learning algorithm achieves logarithmic regret, the sum regret under this approach is still high and it does not scale well with the increase of users. As a result, performance degrades when the ratio between the number of users and channels exceeds a certain threshold. The aggravated delay and packet loss introduced from channel switching may degrade the performance of the overall network. In the case of CRN, the effect can be more intense. Since every learning algorithm involves exploration and exploitation, ignoring channel switching cost may radically impact the performance. Furthermore, the work in [3] did not consider dynamic channel idle rates. In real life, primary users’ (e.g. cell phone users) activity varies with time, so channels’ idle rates also vary accordingly. Our algorithm adapts to such realistic changing environment and achieves same logarithmic regret. In this paper, we address the channel selection problem with switching cost and propose a distributed learning approach that
978-1-4577-0103-0/11/$26.00 c 2011 IEEE
minimizes the sum regret while ensuring quick convergence to an optimal solution and logarithmic regret. Our algorithm is adaptive in the sense that it adapts to the changing idle status of channels and achieves logarithmic regret even in a dynamic environment. The experimental result shows that our algorithm outperforms the algorithm in [3] in terms of regret, scalability and channel switching cost. II. P ROBLEM F ORMULATION We consider a slotted network where every SU has only one radio and cannot sense and transmit at a time. So, each SU goes through a cycle of sensing channel and transmitting data if it is free. Consider a network where U ≥ 1 be the number of secondary users and C ≥ U be the available channels. We assume that each channel i has a ?xed probability ?i of being idle (or, not being occupied by any PU). Every secondary user can sense only one channel in each transmission slot and transmits if the channel is idle. A successful transmission is considered as reward. Hence, SU receives reward if the channel is idle and no other SU transmits over the same channel. We consider regret bound over the time averaged throughput or utilization rate to evaluate the performance of our algorithm because even a sub-linear regret (with respect to time) implies optimal average throughput [3]. In respect to our work, we de?ne regret as the expected loss in achieving reward due to not selecting one of the top available channels compared to knowing the channel statistics perfectly. Ideally if the users know about the channel availability, they will select only the top channels. In this case, the expected reward, S (ρ, t) will be:
U
S (ρ, t) = t ?
i=1
ri ? ?? i
th where ?? ranked (in terms i denotes idle probability of i of availability) channel and ri denotes the data rate of ith channel. For simplicity, we assume data rate of all channels are equal and commonly represent as r. Any learning approach except ρ tries to select a channel with higher availability through exploration. Let, S (? ρ, t) denotes the expected reward (i.e. number of successful data transmissions) by a learning approach ρ ?. It is obvious that S (? ρ, t) ≤ S (ρ, t).
The expected regret R(? ρ, t) can be determined using following equation. R(? ρ, t) = S (ρ, t) ? S (? ρ, t) + Γ(? ρ, t) (1) where S (ρ, t) ? S (? ρ, t) and Γ(? ρ, t) denote the expected exploration regret and the channel switching regret respectively. 1) Calculation of Γ(? ρ, t): Let, γ (t) denotes the total numU ber of switches up to t, γ (t) = j =1 γj (t) where γj (t) denotes the number of switches by user j .
t?1
γj (t) =
i=1
F (aj (i), aj (i + 1))
where, 1, a = b 0 otherwise Accordingly, we de?ne switching cost as the number of packets that could have been transmitted within the time if it did not switch the channel. The switching cost de?ned in [4] is switching delay switching cost = estimated packet transmission time F (a, b) = Assuming packet size d and switching delay τ , we formally de?ne switching cost c as τ ?r c= (2) d 2) Calculation of S (? ρ, t): In a distributed approach, there is a possibility that more than one SU select the same channel. As a result, no user receives any reward. In that case, the number of collision depends on the underlying collision mechanism. Assuming the collision channel model with no avoidance mechanism, the expected reward following a distributed approach ρ ?dist de?ned in [3] is,
C U
is q at the end of a block, the user chooses channel q over p if ? p is one of the top U channels and q is not. But upper con?dence bound of q is greater than the estimated idle rate of p. ? q is one of the top U channels but p is not. Each user selects a channel and updates its idle estimate and upper bound at each time slot throughout the time period. 1) Calculation of Block and Frame Size: We divide transmission slots into frames. Each frame (except frame 0 which consists of ?rst C slots) is then divided into multiple equal length blocks. Block length increases with frame number. Each block length is set equal to the frame number and frame length is set according to the following equation. lf = C ? 2Nf
2
where lf and Nf denotes frame length and frame number. 2) Computation of Upper Con?dence Bound of Channel: We use the interval estimation strategy [5] to compute the upper con?dence bound of channels. The strategy works based on expected reward estimates of channels and con?dence estimates of data. In our case, it is sensing data. It uses the standard statistical technique of constructing con?dence interval estimates. If xi,j (t) be the number of successes arising from a series of Bernoulli trials (πi,j (t)) with probability p, the upper bound of a 100(1 ? α)% con?dence interval for p can be approximated by βi,j (t). βi,j (t) =
xi,j (t) πi,j (t)
+
2 Zα/ 2 2πi,j (t)
1+
2 Zα/ 2 πi,j (t)
S (? ρdist , t) = r ?
i=1 j =1
?i E[Θi,j (t)]
(3)
+
Z √ α/2 π i,j (t)
xi,j (t) πi,j (t)
1?
xi,j (t) πi,j (t)
+
2 Zα/ 2 4?πi,j (t)
where Θi,j (t) denotes the number of times user j has the sole access on channel i up to transmission slot t. The expected regret in distributed approach will be
U C U
1+
2 Zα/ 2 πi,j (t)
(5)
R(? ρdist , t) = r ? t
i=1
?? i ?r?
i=1 j =1
?i E[Θi,j (t)] + γ (t) ? c (4)
III. D ISTRIBUTED L EARNING A LGORITHM We formulate the channel selection problem as a multi-arm bandit problem with multiple play and switching cost. and develop a distributed block learning algorithm (ρDBLA ). A. Design of the Algorithm At the beginning of each interval, the user picks a channel and compares it with the currently selected one. The channel selection process is periodic i.e. each channel is selected for comparison after C blocks. Then it continues to use that channel until collision occurs. If a collision occurs, it randomly selects a channel among the top U channels based on its current idle estimates of channels. Considering a user whose present selected channel is p and the channel to be compared 2
Here, the parameter Zα/2 denotes the value that will be exceeded by the standard normal variable with probability α/2. It controls the size of the con?dence intervals. We follow the evaluation function stated in Equation 5 to compute the upper bound of each channel. In our algorithm, we set α = 2.5% and the corresponding Zα/2 value is 1.96. In respect to our algorithm, xi,j (t) represents the number of times user j sees channel i idle up to transmission slot t. Initially each of the channels will have an upper bound of 1. As more channels are explored, more the bounds will be adjusted close to its exact value. B. Non-stationary Environments We extend our algorithm so that it can adapt in nonstationary environments. If the idle rate of the current channel in the current frame is not consistent with its overall idle rate, we increase the upper bound of every alternative channels to a certain value. In this way, if the selected channel status changes abruptly, a user will experience less availability of the current channel. This will lead to increase the upper bound
Fig. 1.
Normalized Regret vs Time Slot (without channel switching cost)
(a)
(b)
Fig. 3. (a) Regret and (b) Switching per user with varying number of Users
Fig. 2.
Regret vs switch cost after 5000 time slot
Fig. 4.
Regret vs Time in Dynamic Environment
of alternative channels and thus probability of selecting a channel from these alternatives will increase. As soon as a user switches to another channel, it again starts acting accordingly. IV. S IMULATION R ESULTS In this section, we present simulation results to evaluate the performance of our learning approach. We compare performance of our algorithm ρDBLA with the learning approach ρRAN D proposed in [3]. We set up the identical simulation environment. To get the average result, we run the simulation for 10 times under the same conditions and each time our algorithm runs for 50, 000 time slots. We also implement our centralized approach ρCBLA . This will provide a lower bound of regret achievable by SUs. Performance Comparison with ρRAN D : We present the comparison results of our algorithm (denoted as ρDBLA ) and the algorithm (denoted as ρRAN D ) in [3]. The simulation result in Figure 1 shows the normalized regret for both the approaches with respect to number of transmission slots without considering switching cost. Clearly, our algorithm ensures faster convergence with less regret compared to [3]. We also plot our centralized algorithm (ρCBLA ) to show the lower bound of regret. We then compare the performance of both the algorithms with varying switching cost (see Figure 2). As expected, regret increases almost linearly with switching cost in ρRAN D . On the other hand, regret increases slightly with switching cost in our algorithm ρDBLA . Performance with Varying Number of Users: We compare the performance of ρRAN D and our ρDBLA in terms of regret and number of channel switching. The results are presented in Figure 3(a) and Figure 3(b) respectively. In both ?gures (regret and channel switching), the rate of increase for ρRAN D is much higher than ρDBLA . In former algorithm, collision between users increase with the increase of number of users because in every slot, it tries to select the top channel and it 3
may lead to more channel switching and more collision with other users. On the other hand, our algorithm performs better because it carefully decides about channel switching after each block. Performance in Non-stationary Environments: To simulate a non-stationary environment, we assume that there are 4 secondary users and 9 channels in the network. Initially, channels’ idle rates follow a Bernoulli distribution with evenly spaced probability from 0.1 to 0.9. We run the simulation for 50, 000 slots. After each 10, 000 slots, we randomly assign each channel a different probability from 0.1 to 0.9. Figure 4 shows that our algorithm achieves logarithmic regret even in the dynamically changing channel states. V. C ONCLUSION & F UTURE W ORK In this paper, we propose an ef?cient distributed learning algorithm that minimizes the sum regret and channel switching cost, and ensures logarithmic regret. Our algorithm also adapts quickly in a dynamic environment and converges to an optimal solution. We compare our algorithm with existing approach and our algorithm achieves signi?cantly lower regret and switching cost than existing ones. We will extend our work to support users’ different QoS requirements and channel bandwidth. R EFERENCES
[1] S.H.A. Ahmad and Mingyan Liu. Multi-channel opportunistic access: A case of restless bandits with multiple plays. In Communication, Control, and Computing, 2009. Allerton 2009. 47th Annual Allerton Conference on, pages 1361 –1368, sept. 2009. [2] Yi Gai, B. Krishnamachari, and R. Jain. Learning multiuser channel allocations in cognitive radio networks: A combinatorial multi-armed bandit formulation. In New Frontiers in Dynamic Spectrum, 2010 IEEE Symposium on, pages 1 –9, 6-9 2010. [3] A. Anandkumar, N. Michael, and A.K. Tang. Opportunistic Spectrum Access with Multiple Users: Learning Under Competition. In Proc. of IEEE INFOCOM, San Deigo, USA, March 2010. [4] Yang Xiao and Fei Hu. Cognitive Radio Networks. CRC Press, 2008. [5] L.P.Kaebling. In Learning in Embedded Systems. MIT Press, 1993.