Network Controlled Joint Radio Resource Management for Heterogeneous Networks
Marceau Coupechoux
ENST & CNRS LTCI 46, rue Barrault, Paris, France coupecho@enst.fr
Jean-Marc Kelif
France Telecom R&D Issy-Les-Moulineaux, France jeanmarc.kelif@orange-ftgroup.com
Philippe Godlewski
ENST & CNRS LTCI 46, rue Barrault, Paris, France godlewsk@enst.fr
Abstract— In this paper, we propose a way of achieving optimality in radio resource management (RRM) for heterogeneous networks. We consider a micro or femto cell with two co-localized radio access technologies (RAT), e.g. WLAN and HSDPA. RAT are mainly characterized by the data rates they offer at a given distance of the access point. Dual-technology mobile stations (MS) are initiating downlink sessions in the considered cell. A network controlled joint RRM algorithm is responsible to assign MS to a RAT, while taking into account the joint spatial distribution of already accepted MS, the current load of each RAT, the location of the newly accepted session and its in?uence on the global performance. In a study based on the Semi Markov Decision Process (SMDP) theory, we show how to obtain an optimal policy. Optimality is here de?ned through a utility function accounting for user satisfaction.
I. I NTRODUCTION As radio access technologies (RAT) diversify and mobile stations (MS) become more and more agile, network operators are faced to the problem of MS assignment to RAT. With the knowledge of RAT performance (data rates and coverage) and loads, of MS capabilities and demand (associated to QoS parameters), the question arises to know which RAT should be chosen to deliver the requested service and who should take the decision. In this paper, we address this issue by focusing on small cells (typically femto to micro) where two RAT are colocalized, i.e., there is a common access point and geographical cells are overlapping. We concentrate on algorithms fully controlled by the network that take into account not only the current load of each RAT but also the spatial distribution of MS in the cell. This approach has to be opposed to user-centric schemes (MS takes alone the decision based on measurements) or hybrid schemes (MS decision is assisted by the network) [2]. We also assume a dynamic scenario where MS are assigned to RAT on a session basis. The notion of joint radio resource management (JRRM) is clearly de?ned in [4] as a way of achieving an ef?cient usage of a joint pool of resources belonging to several RAT. Authors of [4] distinguish three main functions for JRRM: RAT and cell selection (which includes joint load control and vertical hand-over), bit rate allocation and admission control (also known as joint session admission control). Reference [5] adds the joint resource scheduling to this list. In this paper, we focus on RAT selection and session admission control.
On this speci?c subject, there are some papers dealing with the issue of making a decision based on many available parameters. For example, [6] applies Analytic Hierarchy Process and Gray Relational Processes to rank network alternatives. References [3] and [4] propose a framework for JRRM based on fuzzy neural methodology, reinforcement learning and multiple objective decision making. Different implementations are compared through simulations. Authors of [7] elaborate an optimal solution and an associated heuristic algorithm for the problem of RAT association in a static scenario using the framework of combinatorial optimization. In this paper, we extend a different approach already proposed in [1] and based on Semi Markov Decision Process (SMDP) [16]. Compared to [1], this paper takes into account the spatial distribution of users in the cell, uses the policy iteration algorithm, accounts for blocking in the utility function and compares the optimal policy to a common sense policy. Several papers have used the framework of SMDP in the context of cellular networks, e.g. for the admission of multimedia calls [9][8], to account for hand-over calls [10] or for multi-class calls in EGPRS networks [11]. However, this framework has not been used for heterogeneous networks before [1]. We ?rst present the network model in the next section.Then, we describe the SMDP and the policy iteration algorithm in section III. Section IV gives an example of numerical application and shows how the optimal policy clearly outperforms a common sense policy. At last, section V concludes the paper. II. N ETWORK A. Radio Access Technologies We consider a cell with two co-localized RATs with different characteristics, e.g.: ? RAT1 = WLAN and ? RAT2 = HSDPA. The geographic cell is divided into r rings. The ring the closest to the base-station is ring number 1 and the farthest one is ring number r. RAT1 and RAT2 are characterized by : ? the nominal data rates characterizing the available band?1 ?1 ?2 ?2 width in each ring = (D1 , ..., Dr , D1 , ..., Dr ). Nominal ?j data rate Di is the available throughput above the physical layer in ring i of RAT j.
MODEL
2 the maximum number of users = n1 max and nmax that are ?xed in order to ensure a minimal throughput to mobile users in the cell. We assume that access points and base stations are colocalized and coverage areas are identical for the two RATs (see ?gure 1). This is realistic in case of hot spots micro, pico or femto cells. This is not realistic in other cases, the generic discussion below can however be adapted to non-overlapping cells. ?
Fig. 2. Illustration of the alternate transmissions of two MS in a WLAN ?1 ?1 cell. As D2 < D1 , the time spent to send a packet of length L is greater for user 2 than for user 1.
sake of simplicity although it does not take into account the time wasted by the backoff algorithm. For HSDPA, we assume a fair scheduling in time, i.e., the scheduling algorithm allocates one TTI (Transmission Time Interval) alternatively to each active session. In a given TTI, the session bene?ts from its nominal data rate. For a r-tuple (n2 , ..., n2 ) representing the number of users in each ring, the r 1 service rates are given, for (n2 , ..., n2 ) = (0, ..., 0), by: 1 r
Fig. 1. Illustration of a RAT cell with several rings around the access point or base-station.
?i
?2 i
=
XON
r k=1 ?2 Di
n2 k
?1
.
B. Traf?c We assume Poisson arrivals of user downlink sessions with rate λ. The arrivals in each ring are supposed to be equiprobable. Traf?c is supposed to be elastic: the session size is exponentially distributed with mean XON bits and so the service rate depends on the available throughput. MS are supposed to be able to access both RATs indifferently. Considering single-RAT MS (like in [1]) leads to higher computational load but can again be integrated in the presented framework. C. Scheduling and Data Rates The scheduling algorithms allocate in each RAT resources from the total available bandwidth to individual MS present in the RAT. In this paper, we consider two simple algorithms for WLAN and HSDPA. For WLAN, we approximate the CSMA/CA algorithm by a fair scheduling in throughput: all users have the same throughput whatever their locations in the cell. Note that the common throughput depends on the distribution of MS in the cell. This effect is known as the near-far effect in the literature on WLAN networks, see e.g. [12]. For a r-tuple (n1 , ..., n1 ) representing the number of users in each ring, the 1 r service rates do not depend on the location and are given, for (n1 , ..., n1 ) = (0, ..., 0), by: r 1 ?i ?1 i = n1 k XON ?1 Dk k=1
r1 ?1
Figure 3 shows an example of alternate transmissions between two users in the HSDPA cell. If there is a single user scheduled by TTI, the throughput experienced by the ?rst user ?2 ?2 is D1 /2 and by user 2 is D2 /2. The allocation is fair in time.
Fig. 3. Illustration of the alternate transmissions of two MS in a HSDPA cell. Users equally share the resources in time.
D. Joint Radio Resource Management A joint radio resource management policy is an algorithm that decides for each new session arrival whether the session is rejected, accepted in RAT1 or accepted in RAT2. The goal of this study is to ?nd an optimal JRRM policy; optimality is de?ned below. III. S EMI -M ARKOV D ECISION P ROCESS In order to achieve this goal, we rely on the SMDP framework. We ?rst de?ne the SMDP and the reward function, then use uniformization to obtain an MDP and use the policy iteration to ?nd the optimal JRRM policy. A. States of the SMDP The states of (n1 , ..., n1 , n2 , ..., n2 ) 1 r 1 r each RAT: r 1 1 ? i=1 ni ≤ nmax r 2 2 ? j=1 nj ≤ nmax Let S be the state set. the SMDP are the 2r-tuple with the constraints associated to for RAT1 and for RAT2.
.
As an illustrative example, ?gure 2 shows the scenario, where two MS are sharing the resource in a WLAN cell. If packets have all the same length L, the throughput experi?1 ?1 enced by each user is approximately (1/D1 + 1/D2 )?1 . The allocation is fair in throughput. This model is assumed for the
B. Actions There are three possible actions for the JRRM policy : reject, accept in RAT1 or accept in RAT2. These actions can be coded in binary a = (a1 , a2 ). ? reject : a = 00, ? accept in RAT2 : a = 01, ? accept in RAT1 : a = 10. The set of possible actions is state dependent. Let A(s) the action set in state s. For a generic state s, A(s) = {0, 1, 2}. This set can however be reduced in some speci?c cases: r 1 1 ? if RAT1 is blocked, i.e., i=1 ni = nmax and RAT2 is r 2 2 not blocked, i.e., j=1 nj ≤ nmax , A(s) = {0, 1}, ? if RAT2 is blocked and RAT1 is not blocked, A(s) = {0, 2}, ? if RAT1 and RAT2 are blocked, A(s) = {0}, ? if s = (0, ..., 0), A(0) = {1, 2}. C. Transition probabilities Let ps,s′ (a) be the probability that at next decision epoch, system will be in state s′ if a is chosen in state s and let τs (a) be the expected time until next decision epoch if action a is j chosen in state s. Let δi = 1 if there is an active session in j ring i of RAT j, and δi = 0 otherwise. In a given state, s = (n1 , ..., n1 , n2 , ..., n2 ), and for a given r r 1 1 decision a, several transitions are possible according to the values of the nj : i j j ? if ni = 0, i.e., δi = 1, a departure is possible in this ring j with rate ?i , ? if RAT j is not blocked, an arrival implies a state transition if aj = 1 and so an arrival occurs with rate aj λ/r (arrivals in rings are assumed to be equiprobable). So the transition probabilities are: ps,s′ (a) where s
′
comfort throughput [13]. In our case, we consider a user in ring i of RAT j: his departure rate is ?j /nj , assuming that i i the resource share is fair within a ring. So the satisfaction of an accepted user is: φ(?j , nj ) = 1 ? exp(??j /(nj ?c )), i i i i where ?c = Dc /XON . Note that inndividual user satisfaction is between 0 and 1. We de?ne the global reward as the sum of all user satisfactions. If a user is in transfer, his satisfaction is a function of his data rate. If a user is blocked, a penalty is paid (and so subtracted from the accumulated reward so far) to take into account the fact that rejected users are dissatis?ed. As a consequence, the sum of all user satisfactions (for accepted users) in state s is: U (s) =
{i,j}
nj (1 ? exp(??j /(nj ?c ))). i i i
Note that this satisfaction is without dimension. To take into account the dissatisfaction incurred by blocking, we introduce the penalty Kb λ proportional to the arrival rate in blocking states. Note that Kb λ is a satisfaction and so without dimension. If action a is chosen, the penalty in state s is thus Kb (1 ? a1 )(1 ? a2 )λ. Note that if the JRRM algorithm accepts sessions in RAT j, aj = 1 and the penalty is zero. Penalty is non-zero only if the JRRM rejects all incoming sessions in a given state. As a conclusion, the global reward obtained in state s if action a is taken is: cs (a) = U (s) ? Kb (1 ? a1 )(1 ? a2 )λ. We are considering an in?nite planning horizon and the goal of the study is to ?nd a JRRM algorithm that maximizes the long-run average cost per time unit. E. Policy A JRRM policy is a n-tuple specifying for each state of the MDP the action to be selected in that state. We consider here stationary and deterministic policy, i.e., the policy does not change in time and in a given state, the policy speci?es a single action. Note that for the average cost Markov decision model with ?nite state space and ?nite action sets, there exists an optimal policy and the optimal policy is stationary and deterministic. Such a policy is an application from S to A, which associates at each state s an action in A(s): ?s ∈ S, Rs ∈ A(s). It is useful to notice that the SMDP with transition probabilities ps,s′ (Rs ) is a traditional continuous time Markov chain. F. Uniformization In order to ?nd the optimal policy with the algorithm ”policy iteration”, a stage of uniformization is needed. This stage is a transformation of the continuous Markov chain into a discrete Markov chain. This is done by choosing a suf?ciently small transition step 0 ≤ τ ≤ mins,a τs (a) and allowing selftransitions from a state to itself.
=
i,j
j δi ?j i j aj λ/r + δi ?j i
,
=
(n1 , ..., nj 1 i
? 1, ..., n2 ), r
in case of departure in ring i of RAT j. ps,s′ (a) where s′ =
i,j
aj λ/r
j aj λ/r + δi ?j i
,
=
(n1 , ..., nj + 1, ..., n2 ), 1 r i
in case of arrival in ring i of RAT j. The expected time until next decision epoch is thus given by: j τs (a) = 1/ aj λ/r + δi ?j . i
i,j
D. Rewards Let cs (a) be the expected reward incurred until next decision epoch if a is chosen in s. cs (a) is a reward, so a priori without dimension in this paper. Let us de?ne the user satisfaction for a user having data rate D: φ(D) = 1 ? exp(?D/Dc ), where Dc is a so called
Transition probabilities are modi?ed in the following way: ′ ? ps,s′ (a) = ps,s′ (a)τ /τs (a) for s = s , ? ps,s′ (a) otherwise. ? ? ps,s (a) = 1 ? ? s′ =s So the new transition probabilities can be written:
j ps,s′ (a) = δi ?j τ, ? i
φ = 0.2, α = 0.7, f1 = 0.1, f2 = 1.2 for two rings. If we assume that the system is able to provide the Shannon capacity: ?2 Di = W log2 (1 + SIN Ri ), where W = 3.84MHz is the signal bandwidth. The numerical ?2 ?2 application provides: D1 = 8.1 Mbps and D2 = 2.6 Mbps. However, we choose more realistic values [15] (for Pedestrian ?2 ?2 A3 channel) that are given by: D1 = 2 Mbps and D2 = 800 Kbps. C. Reference Policy In order to see the improvements brought by the optimal JRRM policy, we de?ne a simple and common sense policy, which is also the initial policy in the policy iteration algorithm. As WLAN is the fastest RAT, we assign the sessions to WLAN until n1 max is reached. We then assign MS to HSDPA is reached. When both RAT have reached their until n2 max maximum number of simultaneous sessions, any new session is rejected. D. Results
in case of departure in ring i of RAT j, and: ps,s′ (a) = aj λτ /r, ? in case of arrival in ring i of RAT j. The reward is modi?ed as follows: cs (a) = U (s)/τs (a) ? Kb (1 ? a1 )(1 ? a2 )λ/τs (a). ? G. Policy Iteration We use the policy iteration algorithm to ?nd out the optimal JRRM policy. The iterative algorithm is now succinctly described. Step 0 (initialization) : We choose an arbitrary stationary policy R. Step 1 (value-determination) : For the current policy R, we solve the system of linear equations whose unknown are the variables {g(R), vs (R)}: v1 = 0 and vs (R) = cs (Rs ) ? g(R) + ?
s′ ∈S
ps,s′ (Rs )vs′ (R). ?
Step 2 (policy improvement) : For each s ∈ S, we ?nd: ? Rs = arg max cs (a) ? g(R) + ?
s′ ∈S
a∈A(s)
ps,s′ (a)vs′ (R) ?
? Step 3 (convergence test) : if R = R, the algorithm is stopped, otherwise, we go to step 1. IV. N UMERICAL A PPLICATION A. Traf?c We assume a web browsing traf?c with the following parameters: varying λ (to see the in?uence of an increasing load) and XON = 3 Mbits (this is the aggregate average size of all objects during session time in [14] for the web browsing model). B. Nominal Data Rates For the sake of simplicity, we only consider two rings (r = 2). For WLAN, two nominal rates are chosen, which are also ?1 two mandatory rates of the IEEE 802.11g standard: D1 = 1 ? 24 Mbps, D2 = 6 Mbps. In HSDPA networks, the SINR in each ring i can be approximated by [1]: SIN Ri = 1?ψ , αψ + fi
Fig. 4. Global user satisfaction per time unit as a function of the arrival rate 2 for n1 max = nmax ∈ {3, 6, 9} with optimal policy (solid) and initial policy (dotted), Kb = 0.
where ψ is the fraction of power dedicated to common channels, α is the orthogonality factor and fi is the interference factor (the ratio between the total power received by all other base stations and the total received power from the own station). Let us assume the following values for the parameters:
Figure 4 shows the global user satisfaction per unit of time as a function of the arrival rate for various maximum numbers of simultaneous sessions. The optimal policy is the one that maximizes this performance criteria. Thus, as expected, the optimal policy outperforms the initial (or common sense) policy. However, the most important thing to notice is that the difference between policies is increasing with n = n1 max = n2 . As a consequence, the higher the number of possible max simultaneous sessions, the higher the gain of an optimal policy. Figure 5 shows how it is possible to reduce the blocking probability by tuning the penalty parameter Kb . Increasing Kb from 0 to 20 signi?cantly reduces the blocking probability. It is shown on ?gure 6 that this reduction in blocking is obtained at the cost of a reduced global satisfaction. When λ is increasing, user satisfaction curve reaches a maximum and decreases towards negative values. This is explained by
ACKNOWLEDGMENT This work is part of the french program SYSTEMATIC/URC funded by Paris Region. Authors would like to thanks Cesar Cardenas (ENST) for his useful advices on SMDP. R EFERENCES
[1] D. Kumar, E. Altman and J.M. Kelif, User-Network Association in a WLAN-UMTS Hybrid Cell: Global and Individual Optimality, Rapport de Recherche N5961, INRIA, Aug. 2006. [2] N. Feng et al., Pricing and Power Control for Joint Network-Centric and User-Centric Radio Resource Management, IEEE Transactions on Communications, Sept 2004. [3] L. Giupponi, R. Agusti, J. Perez-Romero, and O. Sallent, A Framework for JRRM with Resource Reservation and Multiservice Provisioning in Heterogeneous Networks, Mobile Networks and Applications, Springer, vol. 11, 2006. [4] L. Giupponi, R. Agusti, J. Perez-Romero, and O. Sallent, A Novel Approach for Joint Radio Resource Management Based on Fuzzy Neural Methodology, to appear in IEEE Trans. on Vehicular Technology, 2007. [5] J. Luo, R. Mukerjee, M. Dillinger, E. Mohyeldin, and E. Schulz, Investigation of Radio Resource Scheduling in WLANs Coupled with 3G Cellular Network, IEEE Communications Magazine, June 2003. [6] Q. Song and A. Jamalipour, Network Selection in an Integrated WLAN and UMTS Environment Using Mathematical Modeling and Computing Techniques, IEEE Wireless Communications, June 2005. [7] K. Premkumar and A. Kumar, Optimum Association of Mobile Wireless Devices with a WLAN-3G Access Network, Proc. of IEEE ICC, 2006. [8] J. Choi, T. Kwon, Y. Choi, and M. Naghshineh, Call Admission Control for Multipmedia Services in Mobile Cellular Networks: a Markov Decision Approach, Proc. IEEE ISCC 2000. [9] N. Bartolini and I. Chlamtac, Call Admission Control in Wireless Multimedia Networks, Proc. of IEEE PIMRC, 2002. [10] V. Pla et al., Optimal Admission Control using Hand-over Prediction in Mobile Cellular Networks, Proc. of HET-NET, 2004. [11] T. Diab, L. Decreusefond, and Ph. Martins, Performance of Admission Control Strategies for Dual Transfer Mode Service in EGPRS Networks, Proc. of IEEE PIMRC, 2006. [12] M. Coupechoux, J. Brouet, L. Brignol et V. Kumar, Suggested Solutions for the Near-Far Effect in Multimode WLANs, Proc. of WWRF, 2003. [13] N. Enderl? and X. Lagrange, User Satisfaction Models and Scheduling e Algorithms for Packet-Switched Services in UMTS, Proc. of VTC, 2003. [14] WiMAX Forum, WiMAX Forum Mobile System Evaluation Methodology, Jan. 2007. [15] H. Holma and A. Toskala, WCDMA for UMTS, third edition, Wiley, 2004. [16] H. C. Tijms, A First Course in Stochastic Models, Wiley, 2003.
Fig. 5. Blocking probability as a function of the arrival rate for Kb ∈ 2 {0, 1, 2, 5, 10, 20}, n1 max = nmax = 6 with the optimal policy ; blocking probability with initial policy and Kb = 20.
Fig. 6. Global user satisfaction as a function of the arrival rate for 2 Kb ∈ {0, 1, 2, 5, 10, 15, 20}, n1 max = nmax = 6 with the optimal policy ; blocking probability with initial policy and Kb = 0.
the fact that the penalty caused by blocking is subtracted from the reward and is proportional to the arrival rate. V. C ONCLUSION In this paper, we have used the Semi Markov Decision Process framework to derive an optimal JRRM policy for an heterogeneous cell shared by two RAT. We have taken into account the spatial distribution of the stations in the cell. The throughput experienced by a user is indeed dependent on its distance to the access point or base station and is also dependent on the number of stations in each ring of the RAT. In this paper, the proposed criteria of optimality includes both user satisfaction, which is a function of the throughput and a penalty to account for the dissatisfaction caused by blocking. Some numerical applications show that the optimal policy is not an obvious one and can clearly outperform an a priori common sense algorithm.