03964.com

文档资料库 文档搜索专家

文档资料库 文档搜索专家

A Framework for the Analysis of Non-Deterministic Clock Synchronisation Algorithms

Pedro Fonseca1;2 ? and Zoubir Mammeri1;3

CRIN/ENSEM, 2 av. de la For t d'Haye F-54516 VANDOEUVRE-LES-NANCY, France 2 Universidade de Aveiro, Portugal 3 ENSAM, Chalons en Champagne, France e-mail: pf@ua.pt,mammeri@loria.fr

1

Abstract. In recent years, non-deterministic clock synchronisation al-

gorithms (NDCSA) have appeared as an attractive alternative to deterministic ones. NDCSA o er a precision that can be made as small as desired. The price to pay is a probability of success that is less than one. We propose an uniform analysis of NDCSA. Our approach strives at decomposing and identifying the factors that a ect the performance of these algorithms. Our aim is to determine simple local conditions that guarantee that the desired precision will be attained with the desired probability. The results are then used to estimate the communication burden imposed by the synchronisation algorithm and provide us with some guidelines to compare the di erent NDCSA.

1 Introduction

In recent years, non-deterministic clock synchronisation algorithms (NDCSA) have appeared as an attractive alternative to deterministic ones. Interest on NDCSA increased after the paper by Lundelius and Lynch 7] in 1984. There, they prove that, in a system composed of n sites, where the message delay ? is bounded by ?max and ?min such that ?min ? ?max , no deterministic clock synchronisation algorithm can attain a precision better than (1 ? 1=n), where = ?max ? ?min . This result expresses two characteristics of deterministic clock synchronisation algorithms. First, the attainable precision has a non-null lower bound. Second, there must be an upper bound for the message delivery delay (?max must be nite). NDCSA strive at circumvent this limitation, by o ering a precision that can be made as small as desired. The price to pay is a probability of success that is less than one. In general, this probability of success can be made as close to one

? Pedro Fonseca was partially supported by Portuguese Government grant

PRAXIS XXI BD-3729-94

as desired by sending a su cient number of messages (provided there is enough time and bandwidth to do so). When trying to compare NDCSA, we are often faced with the problem of identifying the causes of di erent performances. Many NDCSA that have been published are strongly connected to a given architecture or system model. If a new algorithm performs better than its predecessors, is this due to its novel characteristics? Or is the system where the algorithm executes specially suited for it? Would another algorithm perform as well in the same system? If there is no common ground for comparison, the list of similar questions can be endless. We need then some means to establish the comparison between the di erent NDCSA. We propose an uniform analysis of NDCSA. Our approach strives at decomposing and identifying the factors that a ect the performance of these algorithms. Our method is similar to the one presented in 10]; we have formalised the approach and extended it to other algorithms. We study the mechanisms how one site constructs a local view of the system in order to resynchronise its clock, either by reading the other site's clocks, or by detecting an event to restart its local clock. In other words, how it associates a clock value (local or remote) to an instant in time. We do not address the issue of the usage that can be made of such information, in order to compute corrections that yield optimal precision and accuracy or some degree of fault-tolerance. Another aspect of our work is that we assume that the delays are described by random variables, for which we know some of the statistical properties (mean, variance, distribution, . . . ). In these aspects, our work is complementary to 2]. Section 2 speci es the envisaged system and de nes some terms; the basis for the analysis is presented in Sect. 3. Sections 4 and 5 present the application of our method to published non-deterministic clock synchronisation algorithms and in Sect. 6 the results are applied to a comparative analysis of these algorithms.

2 The Target System

The system to be considered consists of n sites, that communicate by means of messages. There is no shared or common memory: all communication is done through the messages. No condition is imposed on message delay other than it is non-null. Each site p has a physical clock, Cp (t). This physical clock gives an approximation of physical time t, such that, given two instants t1 and t2 (t1 < t2 ), (1 ? )(t2 ? t1 ) Cp (t2 ) ? Cp (t1) (1 + )(t2 ? t1). is the drift rate of the physical clock.The physical clock is a read-only device, and the site has no means of changing its value. The aim of a clock synchronisation algorithm is to provide every site p with a synchronised clock, SCp (t). Synchronised clocks are local approximations of global time and their value is computed based on the local physical clock and on the information received from other sites. For the moment, we consider only algorithms that work by rounds; R is the duration of one round.

The closeness of synchronised clock values in two di erent sites p and q is measured by the precision. There are commonly two de nitions for the precision of clock synchronisation. In the rst, the precision is the upper bound on the di erence between the values displayed by two di erent clocks at the same instant t, and we represent it by ^. De nition1. ^ is the precision of clock synchronisation if and only if 8p;q=1::n ; 8t; jCSp (t) ? CSq (t)j ^ In another alternative de nition, the precision is the upper bound on the interval between any two (real-time) instants two di erent synchronised clocks display the same value. De nition2. is the precision of clock synchronisation if and only if 8p;q=1::n ; 8ta;tb ; SCp (ta ) = SCq (tb ) ) jta ? tbj We adopt de nition 1 as the de nition of precision.

3 Guaranteeing Precision

NDCSA o er the advantage of a better precision than deterministic clock synchronisation algorithms, but this has a price: there is a non-null probability that the system will fail to synchronise. This probability of failure can be made as small as desired by sending a su cient number of messages. Guaranteeing a small probability of failure with a tight synchronisation (a small value of ^) may require a large number of messages. In order to use NDCSA, we need to estimate the number of messages required to synchronise the system to a precision ^des , with a probability of success Pdes . Our aim is to determine the conditions that guarantee that P ^ ^des ] Pdes . The approach will consist on decomposing the algorithm execution in smaller tasks, and deriving the conditions that are imposed on each of these smaller tasks, in order to guarantee the speci ed precision and probability of success. We decompose the overall execution of the algorithm in three levels: system, site and step level. In the system, there are u (1 u n) sites that run the synchronisation algorithm. In each of these u sites, the algorithm is performed in v steps (v 1) during each synchronisation round. In this section, we will derive the conditions that apply to the step-level that guarantee the desired conditions at the system level.

3.1 Site-Level Guarantee

Each site executes the NDCSA to read the clocks of other sites or to detect a resynchronisation event (the moment for starting the clock with a new value), following the paradigm presented by 11].

Assumption3. Each site executes the NDCSA to perform one of the following: i) read the clocks of other sites or ii) detect a resynchronisation event (the moment for starting the clock with a new value). It will not perform both.

(p = 1::u) are identically distributed independent random variables. They are represented by a random variable site .

p

Assumption4. All site errors

Assumption 3 is valid for all NDCSA that have been published so far. If one algorithm executes both, the error will be the sum of the errors in each of the two actions: read the remote clock and detect an event. To guarantee a precision ^des at any time t, the precision right after the execution of the synchronisation algorithm, ^ , must be less than ^des ? 2 R, in order to guarantee a precision ^des until the following execution, R seconds later. If we allow each site p to make an error site on the value it computes, the p di erence between two clock values will not be larger than 2 site . Therefore, the p maximum error tolerated at site level is site tol = ^ =2 = ^des =2 ? R. (1) Maxf siteg site tol ) ^ ^des p P Maxf site g site tol ] P ^ ^des ] (2) p

site ,

Given Assumpt. 4, we have ? (3) P site site tol ] u P ^ ^des ] In real life, errors will not always be independent and identically distributed (i.i.d.) random variables. When errors are not i.i.d., this can be taken as a worstcase situation. When errors are not identical, we can take the largest error to be the error in every site. Error independence will be a worst-case situation if the correlation between errors in di erent sites is positive. Sending synchronisation messages that are shared by the di erent sites in a ring is an example of a system that will tend to have such a behaviour. This means also that the synchronisation tasks in each site do not compete with each other to succeed.

3.2 Step-Level Guarantee

Each site executes the NDCSA in v steps (v 1) in one synchronisation round. The step is the basic unit of the NDCSA, and it consists on sending r messages that will be used by one or more receivers. The error associated with the i-th step on site p, step , is the error in the result produced by the receiver (if there is p;i only one) or the maximum of these errors (if there is more than on receiver). We assume that it is possible to compute a bound tol on step so that if the error p;i in every step is less than tol , the error at each site will be less than site tol .

Assumption5. 9 tol >0 : 8p=1::u ; 8i=1::v ;

step

p;i

tol ) site

p

site tol

We extend Assumpt. 4 to the errors in the algorithm steps:

represented by a random variable

p;i (p = 1::u; i = 1::v) are identically distributed independent random variables. They are

Assumption6. The error in the di erent steps of the algorithm

step .

step ,

Again, the considerations concerning Assumpt. 4 apply to Assumpt. 6. Assumptions 5 and 6 imply that

P step From (3) and (4), we have

?

?

tol ] v

P

site

site tol ]

(4) (5)

1?P

step > tol ] uv

P ^ ^des ]

P step > tol ], the probability that the error in one step is greater than tol may be di cult to compute. But, in most cases, we can compute an upper bound for this probability, f err , which is also a function of r, the number of messages sent in each step.

Assumption7.

9f err : f err P

From (5) and Assumpt. 7:

step > tol ]

(1 ? f err )uv P ^ ^des ]

(6)

We can now state the conditions on the step of the NDCSA guaranteeing that the desired precision will be attained with the desired probability. 1 ? Pdes ) P ^ ^des ] P des uv Proof. To prove theorem 8 we introduce the following f err

Theorem8.

(7)

Lemma 9. 8x2 0;1 ; 8n2IN; (1 ? x)n 1 ? nx

P ^ ^des ] (1 ? P step > tol ])uv (5) 1 ? uvP step > tol] (Lemma 9) 1 ? uvf err (Ass. 7)

Stating 1 ? uvf err Pdes completes the proof of theorem 8.

t u

The result in (7) allows relating the probability that the system will synchronise to a precision ^des (a global property of the system) with the probability that the error in one step is larger than a given value (a local property). This can be used to establish the conditions that one single step of the algorithm must ful ll in order to guarantee the desired precision with the desired probability. The results were established under three assumptions. The rst relates to the independence of errors. The validity of this assumption will depend on the underlying assumptions about the behaviour of the system errors, as discussed in Sect. 3.1. The two others suppose that some parameter or function can be computed. These are: 1. the value of the upper bound on the step error, tol , as a function of ^des ; 2. the function f err , the upper bound on the probability that the error in one step will be greater than tol . To apply our method to concrete cases, these two parameters have to be speci ed. That is the object of the next two sections.

4 Upper Bound on the Step Error

The computation of tol will depend on the mechanism of the NDCSA: whether it detects a resynchronisation event or it reads the remote clocks.

4.1 Detection of a Resynchronisation Event

The error made by the NDCSA will cause the moments chosen by the sites to spawn an interval of size , where (1 ? ) ^ (1 + ) ^. The value adopted in every site for its new clock is the same, so the scattering of the resynchronisation instants is the only source of error. To guarantee ^ ^des ? 2 R, we must have site tol ^des =2 (1 ? ) ? R. To our knowledge, the only algorithm that uses non-deterministic techniques to detect a resynchronisation event is Le-Lann's ( 6]). In this algorithm, each site executes the algorithm once, so tol = site tol .

4.2 Reading Remote Clocks

We will consider two cases: a Master-Slave (MS) structure and a distributed structure. In the MS structure, each Slave site reads the clock of the Master, and adopts the estimate as the correct value. The algorithm is executed in the Master (u = 1). In a distributed structure the algorithm is executed in all sites (u = n). The number of steps in each site will depend on the type of network. In a broadcast network, each site needs one single step to send the r messages to all receivers: v = 1. In a point-to-point network, each site will need one step for each receiver: v = n ? 1.

For the MS structure, the error for one step is the error in the Slave's estimate. To keep the di erence between the values of two Slave clocks within ^des of each other, we must have tol = ^des =2 ? R. In a decentralised structure, each site reads the clock of every other site. The site will then compute the new value or correction for its clock. This is done by means of a Convergence Function (CF). For the CF average, median and midpoint, the precision function ^ ( ^; ") is always of the form "+fftc (k) ( 11]), where k is the fault-tolerance degree, fftc a term that re ects how much the precision degrades with fault tolerance and " is the maximum di erence between any two corresponding values in two correct sites. In our case, " = 2 tol. If we want to guarantee that the precision is always less than ^des at any time, we must have ^ ( ^; ") + 2 R ^des , which leads to tol ^des =2 ? R ? fftc (k)=2. The NDCSA that read the clocks of other processors are those of Arvind 1], Cristian 3], both algorithms of Olson and Shin 8], and the algorithm of Rangarajan and Tripathi 10].

5 Upper Bound on the Probability of Error

The upper bound on the probability of error, f err , depends on the action the NDCSA performs. We nd that the algorithms published so far fall into one of two categories: those that are based on a averaging process, and those based on an interval of possible values. The algorithms of Le-Lann, Arvind, Rangarajan and Tripathi and the averaged based algorithm of Olson and Shin fall into the rst category; the algorithm of Cristian and the interval-based algorithm of Olson and Shin into the second.

5.1 Averaging-Based Algorithms

These NDCSA produce a result W based on the average of s samples of a r.v. X. The number of samples s is related to the number of messages r sent in each step: s = Cr, where C is the cooperation constant, that depends on the algorithm. It re ects how many sets of r samples are used for each of the computations performed; its value is n for Le-Lann's algorithm and 1 for all the others. The delay su ered by a message when crossing a connection can be decomposed in several terms: Send time (?send ), Access time (?acc), Propagation time (?prop ), and Receive time (?rec), that account respectively for the time to prepare the message, access the medium, propagate through the communication channel and being processed by the receiver 5]. All these terms are random variables and the total delay when crossing a connection can be represented by the r.v. ?cross = ?send + ?acc + ?prop + ?rec. If the time a message is sent is relevant for the synchronisation algorithm, all these terms contribute to the randomness (and therefore to the error) of the result. In this case, X is the sum of all delays su ered by a message during transit. If D is an upper bound on the number of connections crossed by one message, then Var X] DVar ?cross].

In a broadcast network, the di erence between the times one message arrives at di erent nodes depends only on ?brd = ?prop + ?rec. This term has usually a randomness much smaller that ?cross. If a synchronisation algorithm relies only on the arrival times of such messages, the error in the result is much smaller and Var X] = Var ?brd ]. The algorithm of Le-Lann is based on this principle. The error will depend on the variance of W, Var W] = Var X]=(Cr) (DVar ?alg])=(Cr), where ?alg is the relevant delay for the algorithm (?brd or ?cross ) and D and C are as de ned above. Table 1 presents the values of these parameters. To compute f err , we will assume that the number of samples used in the averaging process is su cient to approximate the result to a normal r.v. (another alternative would be the Tchebychev's inequality, which imposes no a priori assumptions). Under this assumption we have

P jW ? E W]j] >

tol ] = erfc

r

p

tol

?alg

2D rC

2

f err avg

rC tol exp ? 2D ! r rC tol 2D ?alg

W

tol2 2

!

?alg

(8)

wherep is the complementary error 2function of a normal random variable, erfc 1 = Var ] and f err avg ( ) = p e? is the error function for the averaging algorithms.

Table 1. Parameters of averaging algorithms

Algorithm B D ? Arvind 1 D ?cross Le-Lann n 1 ?brd Olson and Shin (avg) 1 n ?cross Rangarajan and Tripathi 1 ln(n) ?cross

5.2 Interval Based Algorithms

These NDCSA rely on the computation of an interval where the remote clock value lies by means of sending a request message and receiving a response. The estimate for the remote clock is the mid point of the interval of possible values and the error half the width of the interval. The i-th delay su ered by a message can be represented by a r.v. di = mini + i, where mini is the minimum value for di and i the variable part of di. The width of the interval is (neglecting the

terms in ), the sum of the variable parts of the delays. The probability that one message allows reading the remote clock with an error less than tol is FA(2 tol ), where A is the sum of the variable parts of the delays su ered by the message and FA the distribution function (d.f.) of A. In Cristian's algorithm, one step of the algorithm corresponds to repeating this process r times, until one reading has an error less than tol . If each attempt to read the remote clock is independent of all others, f err = (1 ? FA (2 tol ))r . Note that A can be a function of n. In Olson and Shin's interval algorithm, after all the r messages are sent, the interval where the remote clock lies is the intersection of the intervals corresponding to each message. In this case, the d.f. and the probability density function (p.d.f.) of the interval bounds are relatively easy to compute. But the p.d.f. of the interval width (from which the error depends) is the convolution of the p.d.f. of the interval bounds, which leaves us with a result that is di cult to deal with analytically. If we note that the width of the smallest interval is larger or equal to the width of the intersection of all intervals, we can use the error corresponding to the smallest interval as an upper bound for the error of the intersection. Therefore, we need that at least one of the intervals has an error which is less than tol , and the error function f err is identical to Cristian's algorithm's. Table 2 presents the error function f err for all the NDCSA.

Table 2. Error functions

Algorithm f err pn tol Le-Lann f err avg ?brd Arvind f err avg pD tol ?cross OS-avg f err avg pn tol ?cross RT f err avg pln(n)tol? cross tol ))r Cristian (1 ? FA (2 tol ))r OS- int (1 ? FA (2

6 Estimating Message Complexity

In this section, we will apply the results from previous sections to compute estimates on the communication burden imposed by the synchronisation algorithm. The parameters to estimate this burden are the number of messages sent during each step of the algorithm, r, and the number of frames in the system, m. One frame is the unit of data sent from one site to another, and it may contain one or several messages.

The framework we propose can be used in two ways. We can use the result in (6) to numerically compute the values of r that comply with the speci ed precision and probability of success; that is presented in Sect. 6.2. Another solution is to use (7) to establish, in an approximative way, the major trends for the values of r and m; this is done in Sect. 6.1.

6.1 Analytical Approximation

The total number of frames is related to the number of messages by m uvD r, Q where uv accounts for the total number of steps, D is an upper bound on the number of connections crossed by one message and Q is the number of messages in each frame. Q is n for Olson and Shin's algorithms, ln(n) for Rangarajan and Tripathi's and 1 for all others. We present the results concerning the number of messages and the number of frames in table 3. We consider two situations: a MS and a distributed structure. Some of the NDCSA were developed speci cally for a distributed structure, so applying them to a centralised structure would be meaningless.

Table 3. Message complexity with growing n

Dist. Le-Lann O ln(n) O(ln(np )) n p p p Arvind O(D ln( Dn)) O(D ln( Dn)) O(nD2 ln( Dn)) O(n2 D2 ln( Dn)) OS-avg O(n ln(n)) O(n2 ln(n)) RT O(ln2 (n)) O(n ln3 (n)) ln(n) nD ln(n) n2 D ln(n) O ? ln(1ln(pn)) Cristian O ? ln(1?pA ) O ? ln(1?pA ) O ? ln(1?pA ) ?A ln(n) n ln(n) OS-int O ? ln(1?pA ) O ? ln(1?pA ) MS

r

Dist. ?

MS

m

Note: pA FA (2 tol )

6.2 Numerical Results

In this section, we present the results obtained by numerical inversion of (6) to yield the required value of r to comply with a given speci cation of Pdes , ^des and tol for a system of n sites with a given delay variance, Var ?]. Figures 1 to 4 display the dependency of r on the probability of failure (r(Pfail ); Pfail = 1 ? Pdes ) and on the allowed error, expressed as normalised error, norm = tol =Var ?] for two algorithms: Arvind's and Le-Lann's. Arvind's algorithmis one of the simplest and has been used as a base for two other algorithms; Le-Lann's algorithm presents, in some aspects, a performance remarkably better than other NDCSA.

In all the cases, the values of n are 2, 8, 32 and 128. Both algorithms are executed with a distributed structure on a broadcast network.

400.

r

300.

n=128 n=32 n=8

200.

n=2

100.

0.0001

0.0005

0.001

0.005

0.01

P fail

Fig. 1. Arvind's algorithm: r(Pfail ),

norm = :2

r

100.

n=128

n=2

10.

1.

0.5

1.

5.

10.

Normalised Error

Fig.2. Arvind's algorithm: r( norm), Pdes = 0:99

Le-Lann's algorithm has a performance with growing n inverse to Arvind's: then n grows, r diminishes. We can see also how trying to guarantee synchronisation with small error and large probability of success can yield very large values of r. For instance, for Arvind's algorithm, Pdes = 0:99, tol = 0:2Var ?]

n=2

r 100.

n=8

n=32

10.

n=128

1.

0.0001

0.001

0.01

P fail

Fig. 3. Le-Lann's algorithm: r(Pfail ),

r 100.

50.

norm = :2

n=2 n=8 n=32 n=128

10. 5.

1.

0.5

1.

5.

10.

Normalised Error

Fig. 4. Le-Lann's algorithm: r( norm), Pdes = 0:99

and n = 8, the number of messages r is more than 100, which corresponds to more than 800 messages in each synchronisation round. Apart the di erence concerning the number of sites, both algorithms present similar performance with respect to changing Pdes and tol in identical circumstances.

6.3 Comparative Analysis

The results obtained provide us with some guidelines to compare the di erent NDCSA. Using simple remote clock reading algorithms like Cristian's and

Arvind's with no adaptions in a distributed structure (where all sites read all other site's clocks) may lead to an explosion in the number of messages and frames. When the system is not fully connected (D > 1), every message must cross several connections and the e ect on increasing the communication burden is double-fold. On one side, the probability that one single result is correct goes down. With every extra connection crossed, the variance of the result grows: this leads to an increase in the number of messages, r, in order to prevent the probability of error from growing. On the other side, each message must be carried by several frames: the ratio m=r grows too. For situations where connectivity is not complete, and therefore messages have to be relayed, the algorithms of Olson and Shin and the algorithm of Rangarajan and Tripathi present a better performance. When comparing these two, we nd again that better connectivity leads to fewer messages. Rangarajan and Tripathi's algorithm requires that one site can access every other site by relaying one message at most ln(n) times; Olson and Shin's only requires a Hamiltonian cycle. In fact, these two are specialisations of a more general algorithm (Arvind's) to speci c architectures, and they do not radically change the algorithm's performance. Reducing the error in each step is obtained also by means of large sampling populations. Le-Lann's algorithm exploits this. In this algorithm, each site uses all received values to compute one single statistic, and each value sent by one site is used by all sites (including itself). This means that the size of the sample is nr (C = n). By contrast, in the other averaging algorithms, the communication is done in an one-to-one basis: one value sent by one site is used for one single result in the receiver. The size of the sample for each estimate of the result is r (C = 1). This grants Le-Lann's algorithm with a dependence on growing n that is much smaller than the others. On the other hand, the results for Le-Lann's algorithm should be taken with some care, as the are based on the hypothesis that ?brd does not change with n. The variance of ?alg (the delay that is relevant for the averaging algorithms) is also an important quality factor. In algorithms where there is no reading of remote clocks, the time information is obtained by the arrival of the messages only. In this case, the scattering of the results obtained by di erent sites is measured by Var ?brd ] = Var ?prop] + Var ?rec]. When the value of the sender clock must be used in the computations, the scattering is measured by Var ?cross] = Var ?send ] + Var ?acc] + Var ?prop ] + Var ?rec]. For most cases, and specially in broadcast media, the major source of uncertainty in message delay is due to ?acc, the time to access the medium. So algorithms that rely only on the arrival time of one message, such as Le-Lann's, and not on the send time, will need a smaller number of messages to guarantee the same probability of error. But these strategies are associated to broadcast media, which limits their application. NDCSA, by their own nature, require a larger number of messages than deterministic algorithms, and this can be a obstacle to their use. Some guidelines for

6.4 Guidelines for Reducing the Number of Messages

keeping the required number of messages low can be extracted from the results above.

Reduce distance between processors Each connection crossed increases the num-

ber of messages necessary to yield the desired probability. In a non fully connected network, synchronisation messages should be routed through the shortest path.

Centralise If fault tolerance provided by Master Slave structures is su cient, these can be used to keep the number of messages low. Strive for broadcast If one message is broadcast to several sites, the total number of messages will be smaller for the same probability of failure. Reduce uncertainty Algorithms that rely only on the arrival time of messages present a smaller uncertainty than algorithms that rely on transmitting clock values. A smaller uncertainty means a smaller number of messages for the same probability of error. Cooperate If all values received are used to compute one single statistic, the size of the sample is larger and the probability that the result is correct increases. For the same probability of error, a smaller number of messages can be employed

7 Conclusion

We have presented a general framework to the comparative analysis of nondeterministic clock synchronisation algorithms. This allowed establishing a common ground to base the analysis of several published algorithms and identify the reasons for the di erent performances. We addressed the issue of how a site constructs a local view of the system (how it relates clock values to instants in time) from the representation of the delays as random variables. Our aim is to propose a simple and general model. Making a simple model usually requires approximations that degrade accuracy, but it allows expressing the e ect of several parameters more clearly. The method is general and can be applied to any NDCSA as long as the underlying assumptions remain valid. In this paper, we have used some approximations that are rather crude in other to present results that are simple and analytically tractable. For a speci c case, when the delay distribution is known, for instance, or by using numerical methods, better approximations can be employed, in order to provide more accurate results. The results are based on the i.i.d. assumption as a worst case situation. This is based on several assumptions regarding the system where the algorithm executes. So far, the method is directed towards algorithms that work in rounds . Recently proposed continuous non-deterministic clock synchronisation algorithms, such as the one in 9], require our method to be adapted.

Some assumptions regarding the tolerated errors can be relaxed. For instance, the relation between the event A = P site site tol ] and B = P step p p;i tol for alli] is deterministic. We are looking at guaranteeing P A] P A\B] = P B]P AjB]. The conditions are established so that B ) A is always true: P AjB] = 1. We can envisage a scheme where the conditions imposed on one step can be relaxed, by allowing a larger error. This larger error may not always guarantee P AjB] = 1. But the increase in P B] may compensate the decrease in P AjB] and the overall e ect would be an increase in P A]. This approach could also provide some insight into which convergence functions are more suitable (if any) to NDCSA. Another open issue is how to use the information gathered during the phase of message interchange to compute a new value for the clock correction, and how it a ects the algorithm's performance. As Cristian pointed out in 4], the rationale employed for deterministic clock synchronisation may not always be valid for NDCSA.

References

1. K. Arvind. Probabilistic clock synchronization in distributed systems. IEEE Trans. Parallel and Distributed Systems, 5(5):474 487, May 1994. 2. Hagit Attiya, Amir Herzberg, and Sergio Rajsbaum. Optimal clock synchronization under di erent delay assumptions. SIAM J. Comput., 25(2):369 389, April 1996. 3. F. Cristian. Probabilistic clock synchronization. Distributed Computing, 3:146 158, 1989. 4. Flaviu Cristian and Christof Fetzer. Probabilistic internal clock synchronization. In Proc. 13th Symposium on Reliable Distributed Systems, pages 22 31. IEEE, 1994. 5. Hermann Kopetz and Wolfgang Schwabl. Global time in distributed real-time systems. Technical Report 15/89, Technische Universitat Wien, October 1989. 6. G. Le-Lann. Synchronisation statistique d'horloges physiques: tude algorithmique. Document HORA, INRIA, September 1990. 7. Jennifer Lundelius and Nancy Lynch. An upper and lower bound for clock synchronization. Information and Control, 62:190 204, 1984. 8. A. Olson and K. G. Shin. Probabilistic clock synchronization in large distributed systems. In Proc. Distributed Computing Systems, pages 290 297, May 1991. 9. Alan Olson, Kang G. Shin, and Bruno J. Jambor. Fault-tolerant clock synchronization for distributed systems using continuous synchronization messages. In Proc. 25th International Symposium on Fault Tolerant Computing, pages 154 163, Pasadena, California, June 1995. IEEE. 10. S. Rangarajan and S. K. Tripathi. E cient synchronization of clocks in a distributed system. In Proceedings of Real-Time Systems Symposium, pages 22 31, 1991. 11. Fred B. Schneider. Understanding protocols for Byzantine clock synchronization. Technical Report 87-859, Department of Computer Science, Cornell University, Ithaca, New York, August 1987.

A Notation

Description number of sites sites physical clock of site p drift rate of physical clocks SCp (t) synchronised clock of site p u number of sites executing the algorithm v number of steps in each site r number of messages sent in each step R duration of a synchronisation round C cooperation constant D upper bound on the number of connections crossed by one message W result computed by one site ^, precision of clock synchronisation ^des desired value for the precision Pdes desired probability for clock synchronisation ^ precision after synchronisation site error in site p p site site error (r.v.) site tol error tolerated at site level step error on the i-th step on site p p;i step error on one step (r.v.) tol error allowed in one step norm normalised error f err upper bound on the probability of error f err avg f err for averaging algorithms ? communication delay ?min , ?max, lower and upper bounds for the communication delay ?brd delay for broadcasting a message ?cross delay for sending a message across one connection FX distribution function of r.v. X Symbol

n p,q Cp (t)

A This article was processed using the L TEX macro package with LLNCS style

相关文章:

更多相关标签:

- Non-Gaussian Component Analysis a Semi-parametric Framework for Linear Dimension Reduction
- A Grid Framework for Non-Linear Brain fMRI Analysis
- Algorithms for o-line clock synchronisation
- Nonlinear Non-Negative Component Analysis Algorithms
- A Unifying Framework for the analysis of a class of Euclidean Algorithms
- A Convex Analysis Framework for Blind Separation of Non-Negative
- A Non-Equilibrium Analysis & Control Framework for Active Queue Management
- ANALYSIS OF NON-RECURSIVE ALGORITHMS 1. [From [1] §2.3, #4]. Consider the following algori
- An interval-based framework for clock rate synchronization algorithms
- Non-Invariant Velocity of Light and Clock Synchronisation in Accelerated Systems