当前位置:首页 >> 互联网 >>

2014


Ef?cient Co?ow Scheduling with Varys
Mosharaf Chowdhury1 , Yuan Zhong2 , Ion Stoica1
1

UC Berkeley, 2 Columbia University

{mosharaf, istoica}@cs.berkeley.edu, yz2561@columbia.edu

ABSTRACT
Communication in data-parallel applications often involves a collection of parallel ?ows. Traditional techniques to optimize ?owlevel metrics do not perform well in optimizing such collections, because the network is largely agnostic to application-level requirements. The recently proposed co?ow abstraction bridges this gap and creates new opportunities for network scheduling. In this paper, we address inter-co?ow scheduling for two different objectives: decreasing communication time of data-intensive jobs and guaranteeing predictable communication time. We introduce the concurrent open shop scheduling with coupled resources problem, analyze its complexity, and propose effective heuristics to optimize either objective. We present Varys, a system that enables data-intensive frameworks to use co?ows and the proposed algorithms while maintaining high network utilization and guaranteeing starvation freedom. EC2 deployments and trace-driven simulations show that communication stages complete up to 3.16× faster on average and up to 2× more co?ows meet their deadlines using Varys in comparison to per-?ow mechanisms. Moreover, Varys outperforms non-preemptive co?ow schedulers by more than 5×.

Categories and Subject Descriptors
C.2 [Computer-communication networks]: Distributed systems—Cloud computing

Keywords
Co?ow; data-intensive applications; datacenter networks

1

Introduction

Although many data-intensive jobs are network-bound [7,9,15,23], network-level optimizations [7, 8, 12, 25] remain agnostic to jobspeci?c communication requirements. This mismatch often hurts application-level performance, even when network-oriented metrics like ?ow completion time (FCT) or fairness improve [15]. Despite the differences among data-intensive frameworks [3, 4, 18,26,29,37,41], their communication is structured and takes place between groups of machines in successive computation stages [16].
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro?t or commercial advantage and that copies bear this notice and the full citation on the ?rst page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speci?c permission and/or a fee. Request permissions from permissions@acm.org. SIGCOMM’14, August 17–22, 2014, Chicago, IL, USA. Copyright is held by the owner/author(s). Publication rights licensed to ACM. ACM 978-1-4503-2836-4/14/08 ...$15.00. http://dx.doi.org/10.1145/2619239.2626315.

Often a communication stage cannot ?nish until all its ?ows have completed [15, 19]. The recently proposed co?ow abstraction [16] represents such collections of parallel ?ows to convey job-speci?c communication requirements – for example, minimizing completion time or meeting a deadline – to the network and enables application-aware network scheduling (§2). Indeed, optimizing a co?ow’s completion time (CCT) decreases the completion time of corresponding job [15]. However, jobs from one or more frameworks create multiple co?ows in a shared cluster. Analysis of production traces shows wide variations in co?ow characteristics in terms of total size, the number of parallel ?ows, and the size of individual ?ows (§4). Simple scheduling mechanisms like FIFO and its variants [15, 19], which are attractive for the ease of decentralization, do not perform well in such an environment – one large co?ow can slow down many smaller ones or result in many missed deadlines. Simply applying a shortest- or smallest-?rst heuristic, the predominant way to solve most scheduling problems, is not suf?cient either (§5). Inter-co?ow scheduling is different from scheduling individual ?ows [8, 25], because each co?ow involves multiple parallel ?ows. It also differs from related problems like scheduling parallel tasks [9, 40] or caching parallel blocks [10]; unlike CPU or memory, the network involves coupled resources – each ?ow’s progress depends on its rates at both source and destination. We show that these coupled constraints make permutation schedules – scheduling co?ows one after another without interleaving their ?ows – suboptimal. Consequently, centralized scheduling becomes impractical, because the scheduler might need to preempt ?ows or recalculate their rates at arbitrary points in time even when no new ?ows start or complete. In this paper, we study the inter-co?ow scheduling problem for arbitrary co?ows and focus on two objectives: improving application-level performance by minimizing CCTs and guaranteeing predictable completions within co?ow deadlines. We prove this problem to be strongly NP-hard for either objective and focus on developing effective heuristics. We propose a co?ow scheduling heuristic that – together with a complementary ?ow-level rate allocation algorithm – makes centralized co?ow scheduling feasible by rescheduling only on co?ow arrivals and completions. In the presence of coupled constraints, the bottleneck endpoints of a co?ow determine its completion time. We propose the Smallest-Effective-Bottleneck-First (SEBF) heuristic that greedily schedules a co?ow based on its bottleneck’s completion time. We then use the Minimum-Allocation-for-Desired-Duration (MADD) algorithm to allocate rates to its individual ?ows. The key idea behind MADD is to slow down all the ?ows in a co?ow to match the completion time of the ?ow that will take the longest to ?nish. As a result, other coexisting co?ows can make progress and the average

443

CCT decreases. While the combination of SEBF and MADD is not necessarily optimal, we have found it to work well in practice. For guaranteed co?ow completions, we use admission control; i.e., we do not admit any co?ow that cannot meet its deadline without violating someone else’s. Once admitted, we use MADD to complete all the ?ows of a co?ow exactly at the co?ow deadline for guaranteed completion using the minimum amount of bandwidth. F! We have implemented the proposed algorithms in a system called Varys1 (§6), which provides a simple API that allows dataparallel frameworks to express their communication requirements as co?ows with minimal changes to the framework. User-written jobs can take advantage of co?ows without any modi?cations. We deployed Varys on a 100-machine EC2 cluster and evaluated it (§7) by replaying production traces from Facebook. Varys improved CCTs both on average (up to 3.16×) and at high percentiles (3.84× at the 95th percentile) in comparison to per-?ow fair sharing. Hence, end-to-end completion times of jobs, specially the communication-heavy ones, decreased. The aggregate network utilization remained the same, and there was no starvation. In tracedriven simulations, we found Varys to be 3.66× better than fair sharing, 5.53× better than per-?ow prioritization, and 5.65× better than FIFO schedulers Moreover, in EC2 experiments (simulations), Varys allowed up to 2× (1.44×) more co?ows to meet their deadlines in comparison to per-?ow schemes; it marginally outperformed resource reservation mechanisms [11] as well. We discuss current limitations of Varys and relevant future research in Section 8 and compare Varys to related work in Section 9.

Communication in Data-Parallel Apps Co?ow Structure Comm. in data?ow pipelines [3, 4, 26, 41] Many-to-Many Global communication barriers [29, 37] All-to-All Broadcast [3, 26, 41] One-to-Many Many-to-One [3, 4, 18, 26, 41] ! F! F! FAggregation Parallel read/write on dist. storage [13, 14, 17] Many One-to-One F! F! Table 1: Co?ows in data-parallel cluster applications.
F! F! F! 4F! 3F! 2F!

Ingress Ports! (Machine Uplinks)! 4!

Egress Ports! (Machine Downlinks)!

1!

1!

1!
F!

2! 2!

2!

2!

2!

3!

DC Fabric!

3!

Figure 1: Co?ow scheduling over a 3 × 3 datacenter fabric with three ingress/egress ports. Flows in ingress ports are organized by destinations and color-coded by co?ows – C1 in orange/light and C2 in blue/dark.

in Figure 1. In this case, there are two co?ows C1 and C2 ; C1 has three ?ows transferring 1, 2, and 4 units of data, while C2 has two ?ows transferring 2 data units each. 2.3 Potential Bene?ts of Inter-Co?ow Scheduling While the network cares about ?ow-level metrics such as FCT and per-?ow fairness, they can be suboptimal for minimizing the time applications spend in communication. Instead of improving network-level metrics that can be at odds with application-level goals, co?ows improve performance through application-aware management of network resources. Consider Figure 1. Assuming both co?ows to arrive at the same time, Figure 2 compares four different schedules. Per-?ow fairness (Figure 2a) ensures max-min fairness among ?ows in each link. However, fairness among ?ows of even the same co?ow can increase CCT [15]. WSS (Figure 2c) – the optimal algorithm in homogeneous networks – is up to 1.5× faster than per-?ow fairness for individual co?ows [15]; but for multiple co?ows, it minimizes the completion time across all co?ows and increases the average CCT. Recently proposed shortest-?ow-?rst prioritization mechanisms [8, 25] (Figure 2b) decrease average FCT, but they increase the average CCT by interleaving ?ows from different co?ows. Finally, the optimal schedule (Figure 2d) minimizes the average CCT by ?nishing ?ows in the co?ow order (C2 followed by C1 ). The FIFO schedule [15, 19] would have been as good as the optimal if C2 arrived before C1 , but it could be as bad as per-?ow fair sharing or WSS if C2 arrived later. Deadline-Sensitive Communication Assume that C1 and C2 have the same deadline of 2 time units – C1 would never meet its deadline as its minimum CCT is 4. Using per-?ow fairness or WSS, both C1 and C2 miss their deadlines. Using earliest-deadline-?rst (EDF) across ?ows [25], C2 meets its deadline only 25% of the time. However, the optimal co?ow schedule does not admit C1 , and C2 always succeeds. Note that egress ports do not experience any contention in these examples; when they do, co?ow-aware scheduling can be even more effective.

2

Background and Motivation

This section overviews the co?ow abstraction (§2.1) and our conceptual model of the datacenter fabric (§2.2), and illustrates the advantages of using co?ows (§2.3). 2.1 The Co?ow Abstraction A co?ow [16] is a collection of ?ows that share a common performance goal, e.g., minimizing the completion time of the latest ?ow or ensuring that ?ows meet a common deadline. We assume that the amount of data each ?ow needs to transfer is known before it starts [8, 15, 19, 25]. The ?ows of a co?ow are independent in that the input of a ?ow does not depend on the output of another in the same co?ow, and the endpoints of these ?ows can be in one or more machines. Examples of co?ows include the shuf?e between the mappers and the reducers in MapReduce [18] and the communication stage in the bulk-synchronous parallel (BSP) model [37]. Co?ows can express most communication patterns between successive computation stages of data-parallel applications (Table 1) [16]. Note that traditional point-to-point communication is still a co?ow with a single ?ow. 2.2 Network Model In our analysis, we consider a network model where the entire datacenter fabric is abstracted out as one non-blocking switch [8, 11, 27, 33] interconnecting all the machines (Figure 1), and we focus only on its ingress and egress ports (e.g., machine NICs). This abstraction is attractive because of its simplicity, yet it is practical because of recent advances in full bisection bandwidth topologies [22, 32] and techniques for enforcing edge constraints into the network [11, 21]. Note that we use this abstraction to simplify our analysis, but we do not enforce it in our experiments (§7). In this model, each ingress port has some ?ows from one or more co?ows to various egress ports. For ease of exposition, we organize them in Virtual Output Queues [31] at the ingress ports as shown
1

3 Varys Overview
Varys is a coordinated co?ow scheduler to optimize either the per-

Pronounced \'v?-ris\.

formance or the predictability of communication in data-intensive applications. In this section, we present a brief overview of Varys

444

Flow Size @!
1! 2! 1! 2! 3! 2! 2!

C2!

0!

C2! ends 2!! C1 ends!
Time!

B/W @ I

B/

C1!

P3! P1!

4!

1!

2!

P2! P3!
Time!

ows C end !4! 1!
C2! 0!

B/W @ Ingress!

2!

4!

Network Fabric!
2!
Sender!

P2! Both co?ows P end!
3!

4!
Receiver! Driver!

C2 ends! C1 ends!
Time!

P1! P2! P3! P1! P2! P3! 4!
C1! C2!

2P ! 1! P2! P3!

4!

Put!

Get C2 ends! C1 ends !!
Varys Daemon! Varys Daemon!

Reg!
Varys Daemon!

?ows end! 4!

Both co?ows end! 2! 4!

P1! C2 ends! C1 ends! Both co?ows end! 2! 4! P2! P3!

Time!

Flow Size @!

(a) Per-?ow fairness Both co?ows end!
Time!
1! 2! 2!

P1! P2! P3!

1! 4! 0!

2! 2!

B/W @ Ingress!

2 3!!

4!

P1! Time! P! (b)1Per-?ow prioritization P2! P2! C ends! C ends! 2 1 P3! P3! P1! 2! 4! Time! 2! 4! P
2!

Topology! Monitor! Time!

Usage! Estimator!

Network Interface ! (Distributed) File System!
TaskName! Comp. Tasks calling!

2! 4! Co?ow Scheduler!
Varys Master!

f!

Varys Client Library!

Time!

Figure 3: Varys architecture. Computation frameworks interact with Varys through a client library.

using existing techniques [17] to estimate current utilizations and use remaining bandwidth (Rem(.)) during scheduling (§5.3). 2! 2! 4! 4! We have implemented Varys in the application layer out of pracTime! Time! ticality – it can readily be deployed in the cloud, while providing Both co?ows end! (c) WSS [15] (d) The optimal schedule large improvements for both objectives we consider (§7). Figure 2: Allocation of ingress port capacities (vertical axis) using different ! Tolerance Failures of Varys agents do not hamper job ex2 ends Fault mechanisms for the co?ows in Figure 1. Each port can transfer one C unit of ! C1 ends data in one time unit. The average FCT and CCT for (a) per-?ow fairness ecution, since data can be transferred using regular TCP ?ows P.1 are 3.4 and 4 time units; (b) per-?ow prioritization are 2 8 ! and 3.5 time in their absence. Varys agents store soft states that can be reunits; (c) Weighted Shuf?e Scheduling (WSS) are 3.6 and 4 time units; and built quickly upon restart. In case of task failures and consequent P (d) the optimal schedule are 3 and 3 time units. 2! restarts, corresponding ?ows are restarted too; other ?ows of the Both co?ows end! P3! in producsame co?ow, however, are not paused. 2! 4help ! to the reader follow the measurements of co?ows Time! P tion clusters (§4),1! analysis and design of inter-co?ow scheduling Scalability Varys reschedules only on co?ow arrival and comple2! 4! algorithms (§5),P and Varys’s design details (§6). Time! tion events. We did not observe the typical number of concurrent 2! co?ows (tens to hundreds [15, 33]) to limit its scalability. Varys 3.1 Problem Statement P3! batches control messages at O(100) milliseconds intervals to reWhen improving performance, given a co?ow with information duce coordination overheads, which affect small co?ows (§7.2). ! 4!and endpoints, Varys must about its individual ?ows, 2 their size, Fortunately, most traf?c in data-intensive clusters are from large Time! decide when to start its ?ows and at what rate to serve them to co?ows (§4). Hence, we do not use Varys for co?ows with bottleminimize the average CCT of the cluster. It can preempt existing necks smaller than 25 MB in size. co?ows to avoid head-of-line blocking, but it must also avoid star4 Co?ows in Production vation. Information about a co?ow is unknown prior to its arrival. When optimizing predictability, Varys must admit a new co?ow The impact of co?ows on job durations and their network footif it can be completed within its deadline without violating deadline print have received extensive attention [15, 17, 33, 35, 39]. We foguarantees of the already-admitted ones. cus on understanding their structural characteristics by analyzing Irrespective of the objective, the inter-co?ow scheduling probtraces from a 3000-machine, 150-rack Hive/MapReduce data warelem is NP-hard (§5). Varys implements a scheduler that exploits house at Facebook [17]. We highlight two attributes – wide varithe variations in co?ow characteristics (§4) to perform reasonably ety in co?ow structures and disproportionate footprint of few large well in realistic settings. co?ows – that motivate and guide Varys’s design. 3.2 Architectural Overview 4.1 Diversity of Co?ow Structures
Varys master schedules co?ows from different frameworks using

P3!

global coordination (Figure 3). It works in two modes: it either tries to minimize CCT or to meet deadlines. For the latter, it uses admission control, and rejected co?ows must be resubmitted later. Frameworks use a client library to interact with Varys to register and de?ne co?ows (§6.1). The master aggregates all interactions to create a global view of the network and determines rates of ?ows in each co?ow (§6.2) that are enforced by the client library. Varys daemons, one on each machine, handle time-decoupled co?ows, where senders and receivers are not simultaneously active. Instead of hogging the CPU, sender tasks (e.g., mappers) of data-intensive applications often complete after writing their output to the disk. Whenever corresponding receivers (e.g., reducers) are ready, Varys daemons serve them by coordinating with the master. Varys daemons use the same client library as other tasks. Additionally, these daemons send periodic measurements of the network usage at each machine to Varys master. The master aggregates them

Because a co?ow consists of multiple parallel ?ows, it cannot be characterized just by its size. We de?ne the length of a co?ow to be the size of its largest ?ow in bytes, its width to be the number of parallel ?ows, and its size to be the sum of all its ?ows in bytes. Furthermore, we consider skew, i.e., the coef?cient of variation of its ?ows in terms of their size. The key takeaway from the CDF plots in Figure 4 is that co?ows vary widely in all four characteristics. We observe that while more than 40% co?ows are short (≤1 MB in length), ?ows in some co?ows can be very large. Similarly, more than 60% narrow co?ows (with at most 50 ?ows) reside with co?ows that consist of millions of ?ows. Furthermore, we see variations of ?ow sizes within the same co?ow (Figure 4c), which underpins the possible improvements from inter-co?ow scheduling – e.g., the optimal schedule in Figure 2 outperforms the rest, primarily because it exploits the skew in C1 . Note that we rounded up ?ow sizes to 1 MB to calculate skew in Figure 4c to ignore small variations.

445

Co?ow%Width%(NumFlows)% Co?ow%Length%
1! 0.8! 0.6! 0.4! 0.2! 0! 1.E+06! 1.E+08! 1.E+10! Bytes!

(a) Co?ow length
1! 0.8! 0.6! 0.4! 0.2! 0! Frac. of Co?ows!

Co?ow%Skew%(CoV)%Co?ow%M/R% 2. information about their ?ows are known beforehand, and
(b) Co?ow width
1! 0.8! 0.6! 0.4! 0.2! 0! Frac. of Co?ows!

1! 0.8! 0.6! 0.4! 0.2! 0! 1.E+00! 1.E+04! 1.E+08! Number of Flows!

Frac. of Co?ows!

Frac. of Co?ows!

5.1

Problem Formulation and Complexity

We consider two objectives for optimizing data-intensive communication: either minimizing the average CCT or improving predictability by maximizing the number of co?ows that meet deadlines (§A). Achieving either objective is NP-hard, even when 1. all co?ows can start at the same time,

0! 2! 4! 6! Coeff. of Var. of Flows!

0.01! 1! 100! Co?ow%Size% Sender-to-Receiver Ratio! Co?ow%Size%VS%Total%Size%

(c) Co?ow skew
1! 0.8! 0.6! 0.4! 0.2! 0! 1.E+06! 1.E+10! 1.E+14! Bytes! Frac. of Co?ows!

(d) Co?ow bottlenecks
1! 0.8! 0.6! 0.4! 0.2! 0! 1.E+06! 1.E+10! 1.E+14! Co?ow Size (Bytes)! Frac. of Total Bytes from All Co?ows!

3. ingress and egress ports have the same capacity. We prove it by reducing the concurrent open-shop scheduling problem [34] to inter-co?ow scheduling (Theorem A.1). The online inter-co?ow scheduling problem is even harder because of the following reasons: 1. Capacity constraints. Ingress and egress ports of the datacenter fabric have ?nite, possibly heterogeneous, capacities. Hence, the optimal solution must ?nd the best ordering of ?ows to dispatch at each ingress port and simultaneously calculate the best matching at the egress ports. Furthermore, when optimizing predictability, it must decide whether or not to admit a co?ow. 2. Lack of future knowledge. Arrival times and characteristics of new co?ows and their ?ows cannot be predicted. Because the rate of any ?ow depends on its allocations at both ingress and egress ports, we refer to the inter-co?ow scheduling problem as an instance of the concurrent open shop scheduling with coupled resources (Remark A.2). To the best of our knowledge, this variation of the problem – with ordering and matching requirements – has not appeared in the literature prior to this work. 5.2 Desirable Properties and Tradeoffs

(e) Co?ow size

(f) Co?ow footprint

Figure 4: Co?ows in production vary widely in (a) length, (b) width, (c) skew of ?ow sizes, (c) bottleneck locations, and (e) total size. Moreover, (f) numerous small co?ows have tiny network footprint.

Identifying bottlenecks and exploiting them is the key to improvements. Figure 4d shows that the ratio of sender and receiver ports/machines2 can be very different across co?ows, and senders can be bottlenecks in almost a third of the co?ows. We found that length and width of a co?ow have little correlation; a co?ow can have many small ?ows as well as few large ?ows. However, as Figure 4b and Figure 4c suggest, width and skew are indeed correlated – as width increases, the probability of variations among ?ows increases as well. We also observed large variation in co?ow size (Figure 4e). 4.2 Heavy-Tailed Distribution of Co?ow Size

Data-intensive jobs in production clusters are known to follow heavy-tailed distributions in terms of their number of tasks, size of input, and output size [10,17]. We observe the same for co?ow size as well. Figure 4f presents the fraction of total co?ows contributed by co?ows of different size. Comparing it with Figure 4e, we see that almost all traf?c are generated by a handful of large co?ows – 98% (99.6%) of the relevant bytes belong to only 8% (15%) of the co?ows that are more than 10 GB (1 GB) in size. This allows Varys to focus only on scheduling large co?ows, where gains signi?cantly outweigh coordination overheads.

Ef?cient scheduling (minimizing completion times) and predictable scheduling (guaranteeing co?ow completions within their deadlines) are inherently con?icting. The former requires preemptive solutions to avoid head-of-line blocking. Shortest-remainingtime-?rst (SRTF) for optimally scheduling ?ows on a single link is an example [25]. Preemption, in the worst case, can lead to starvation; e.g., SRTF starves long ?ows. The latter, on the contrary, requires admission control to provide guarantees. We expect an ideal scheduler to satisfy the following goals in addition to its primary objective. 1. Starvation freedom. Co?ows, irrespective of their characteristics, should not starve for arbitrarily long periods. 2. Work-conserving allocation. Available resources should be used as much as possible. The former ensures eventual completion of co?ows irrespective of system load. The latter avoids underutilization of the network, which intuitively should result in lower CCTs and higher admissions. However, both are at odds with our primary objectives (§B). Predictable scheduling has an additional goal. 3. Guaranteed completion. If admitted, a co?ow must complete within its deadline. In the following, we present algorithms that achieve high network utilization, and ensure starvation freedom when minimizing CCT (§5.3) and guarantees completion of admitted co?ows when maximizing predictability (§5.4). 5.3 Inter-Co?ow Scheduling to Minimize CCT

5

Co?ow Scheduling: Analytical Results

The inter-co?ow scheduling problem is NP-hard. In this section, we provide insights into its complexity (§5.1) and discuss desirable properties of an ideal scheduler along with associated tradeoffs (§5.2). Based on our understanding, we develop two inter-co?ow scheduling algorithms: one to minimize CCTs (§5.3) and another to guarantee co?ow completions within their deadlines (§5.4). Detailed analysis and proofs can be found in the appendix.
2

One machine can have multiple tasks of the same co?ow.

Given the complexity, instead of ?nding an optimal algorithm, we focus on understanding what an of?ine optimal schedule might look like under simplifying assumptions. Next, we compute the minimum time to complete a single co?ow. We use this result as a building block to devise a scheduling heuristic and an iterative

446

3!

3!

3!

C2 ends! C1 ends!
3!
3!

C1 ends! C2 ends!
1! 1! 2! 2! 3! 3! Input

P1!C2 ends! C1 ends! PP 1! 2! P P2 ! 3! P3!
Time! Time!

P1! P2! P3!

1! 1! 2! 2! 3! 4! Time! 3!

3!

2!

2! 3! 2! 2!

3!

7! 7!

3!

7!

3!

3! (a)

(b) SCF/NCF/TCF

(c) SEBF

Figure 5: Allocations of egress port capacities (vertical axis) for the co?ows in (a) on a 3 × 3 fabric for different co?ow scheduling heuristics (§5.3.2).

Pseudocode 1 Co?ow Scheduling to Minimize CCT 1: procedure ALLOC BANDWIDTH(Co?ows C, Rem(.), Bool cct) 2: for all C ∈ C do 3: τ = ΓC (Calculated using Equation (1)) 4: if not cct then 5: τ = DC 6: end if 7: for all dij ∈ C do ? MADD 8: rij = dij /τ out ) 9: Update Rem(Piin ) and Rem(Pj 10: end for 11: end for 12: end procedure 13: procedure MIN CCTO FFLINE(Co?ows C, C , Rem(.)) 14: C′ = SORT_ASC (C ∪ C ) using SEBF 15: allocBandwidth(C′ , Rem(.), true) 16: Distribute unused bandwidth to C ∈ C′ ? Work conserv. (§5.3.4) 17: return C′ 18: end procedure 19: procedure MIN CCTO NLINE(Co?ows C, C , Rem(.)) 20: if timeSinceLastDelta() < T then ? T -interval: Decrease CCT 21: C′ = minCCTOf?ine(C, C , Rem(.)) 22: Update Czero , the set of starved co?ows 23: else ? δ -interval: Starvation freedom ∪ 24: C ? = C for all C ∈ Czero 25: Apply MADD on C ? 26: Schedule a call to minCCTOnline(.) after δ interval 27: end if 28: end procedure
ij dij (= Size(C )) amount of data through the input ports, and the latter is for the output ports. We propose the Smallest-Effective-Bottleneck-First (SEBF) heuristic that considers a co?ow’s length, width, size, and skew to schedule it in the smallest-Γ-?rst order. Figure 5 shows an example: although C2 (orange/light) is bigger than C1 in length, width, and size, SEBF schedules it ?rst to reduce the average CCT to 5 time units from 5.5. While no heuristic is perfect, we found SEBF to be 1.14×, 1.36×, and 1.66× faster than TCF, SCF, and NCF, respectively, in trace-driven simulations and even more in synthetic ones. Additionally, SEBF performs ≥5× better than nonpreemptive co?ow schedulers (§7.4).

C1 ends ! C2 ends! by presenting the bandwidth allocation algorithm. We conclude necessary steps to transform our of?ine solution to an online one ! C2 ends! P1! C1 ends with guarantees for starvation freedom and work conservation.

Solution Approach PP 1! 2! Consider the of?inePproblem of scheduling |C| co?ows (C = P 2 ! 3! {C1 , C2 , . . . , C|C| }) that arrived at time 0. The optimality of the P3! shortest-processing-time-?rst (SPTF) on a single link sug4! heuristic 7! Time! schedules are the most effective gests that shortest- or smallest-?rst ! 25].7However, ! in minimizing completion times 4 [8, in the multi-link Time! scenario, links can have different schedules. This makes the search space exponentially large – there are ((|C|P )!)P possible solutions when scheduling |C| co?ows with P 2 ?ows each on a P × P fabric! If we remove the capacity constraints from either ingress or egress ports, under the assumptions of Section 5.1, the co?ow scheduling problem simpli?es to the traditional concurrent open shop scheduling problem, which has optimal permutation schedules [30]; meaning, scheduling co?ows one after another is suf?cient, and searching within the |C|! possible schedules is enough. Unfortunately, permutation schedules can be suboptimal for coupled resources (Theorem C.1), which can lead to ?ow preemptions at arbitrary points in time – not just at co?ow arrivals and completions (Remark C.2). To avoid incessant rescheduling, we restrict ourselves to permutation schedules. 5.3.1 5.3.2 The Smallest-Effective-Bottleneck-First Heuristic Once scheduled, a co?ow can impact the completion times of all other co?ows scheduled after it. Our primary goal is to minimize the opportunity lost in scheduling each co?ow. Given the optimality of the shortest- or smallest-?rst policy in minimizing the average FCT [8, 25], a natural choice for scheduling co?ows would be to approximate that with a Shortest-Co?owFirst (SCF) heuristic. However, SCF does not take into account the width of a co?ow. A width-based alternative to SCF is the Narrowest-Co?ow-First (NCF) heuristic, but NCF cannot differentiate between a short co?ow from a long one. A smallesT-Co?owFirst (TCF) heuristic is a better alternative than the two – while SCF can be in?uenced just by a single long ?ow (i.e., co?ow length) and NCF relies only on co?ow width, TCF responds to both. However, the completion time of co?ow ∑ actually depends on its bottleneck. A co?ow C must transfer j dij amount of data ∑ through each ingress port i (P in ) and d through each egress
i



5.3.3

Minimum-Allocation-for-Desired-Duration

port j (Pjout ), where dij is the amount of data to transfer from Piin to Pjout at rate rij . The minimum CCT (Γ) becomes ∑ ∑ ) ( j dij i dij , max (1) Γ = max max i Rem(Piin ) j Rem(Pjout ) where Rem(.) denotes the remaining bandwidth of an ingress or egress port estimated by Varys measurements. The former argument of Equation (1) represents the minimum time to transfer

i

ij

Given a schedule of co?ows C′ = (C1 , C2 , . . . , C|C| ), the next step is to determine the rates of individual ?ows. While single-link optimal heuristics would allocate the entire bandwidth of the link to the scheduled ?ow, we observe that completing a ?ow faster than the bottleneck does not impact the CCT in co?ow scheduling. Given Γ, the minimum completion time of a co?ow can be attained as long as all ?ows ?nish at time Γ. We can ensure that by setting the rates (rij ) of each ?ow to dij /Γ. We refer to this algorithm (lines 7–10 in Pseudocode 1) as MADD; this generalizes WSS [15], which works only for homogeneous links. MADD allocates the least amount of bandwidth to complete a co?ow in minimum possible time. We use MADD as a building block to allocate rates for the given schedule. We apply MADD to each co?ow Ci ∈ C′ to ensure its fastest completion using minimum bandwidth, and we iteratively distribute (line 15) its unused bandwidth to co?ows Cj (i < j ≤ |C|). Once Ci completes, the iterative procedure is repeated to complete Ci+1 and to distribute its unused bandwidth. We stop after C|C| completes.

447

Pseudocode 2 Co?ow Scheduling to Guarantee Completion 1: procedure MEET D EADLINE(Co?ows C, C , Rem(.)) 2: allocBandwidth(C, Rem(.), false) ? Recalc. min rates for C ∈ C 3: if ΓC ≤ DC then ? Admission control 4: C′ = Enqueue C to C ? Add C in the arrival order 5: allocBandwidth(C′ , Rem(.), false) 6: Distribute unused bandwidth to C ∈ C′ ? Work conservation 7: return true 8: end if 9: Distribute unused bandwidth to C ∈ C 10: return false 11: end procedure 5.3.4 From Of?ine to Online

VarysClient Methods register(numFlows, [options]) =? co?owId put(co?owId, dataId, content, [options]) get(co?owId, dataId) =? content unregister(co?owId) Table 2: The Co?ow API

Caller Driver Sender Receiver Driver

6

Design Details

We have implemented Varys in about 5, 000 lines of Scala with extensive use of Akka [1] for messaging and the Kryo serialization library [5]. This section illustrates how frameworks interact with Varys (§6.1) and discusses how Varys schedules co?ows (§6.2). 6.1
Varys Client Library: The Co?ow API Varys client library provides an API similar to DOT [36] to abstract

Work-Conserving Allocation Letting resources idle – as the of?ine iterative MADD might do – can hurt performance in the online case. We introduce the following back?lling pass in MIN CCTO F FLINE (line 16 of Pseudocode 1) to utilize the unallocated bandwidth throughout the fabric as much as possible. For each ingress port Piin , we allocate its remaining bandwidth to the co?ows in C′ ; for each active co?ow C in Piin , Rem(Piin ) is allocated to C ’s ?ows in their current rij ratios, subject to capacity constraints in corresponding Pjout . Avoiding Perpetual Starvation Preemption to maintain an SEBF schedule while optimizing CCT may lead to starvation. To avoid perpetual starvation, we introduce the following simple adjustment to the of?ine algorithm. We ?x tunable parameters T and δ (T ? δ ) and alternate the overall algorithm (MIN CCTO NLINE) between time intervals of length T and δ . For a time period of length T , we use MIN CCTO F FLINE to minimize CCT. At the end of time T , we consider all co?ows in the system which have not received any service during the last T -interval (Czero ). We treat all of them as one collective co?ow, and apply MADD for a time period of length δ (lines 24– 26 in Pseudocode 1). All co?ows that were served during the last T -interval do not receive any service during this δ -interval. At the end of the δ -interval, we revert back, and repeat. The adjustments ensure that all co?ows receive non-zero service in every (T + δ ) interval and eventually complete. This is similar to ensuring at least one ticket for each process in lottery scheduling [38]. Avoiding starvation comes at the cost of an increased average CCT. At every (T + δ ) interval, the total CCT increases by at most |C|δ , and it depends on the ratio of T and δ (§6.2). 5.4 Inter-Co?ow Scheduling to Guarantee Deadline

away the underlying scheduling and communication mechanisms. Cluster frameworks (e.g., Spark, Hadoop, or Dryad) must create VarysClient objects to invoke the API and interact with Varys. User jobs, however, do not require any modi?cations. The co?ow API has four primary methods (Table 2). Framework drivers initialize a co?ow through register(), which returns a unique co?owId from Varys master. numFlows is a hint for the scheduler on when to consider the co?ow READY to be scheduled. Additional information or hints (e.g., co?ow deadline or dependencies) can be given through options – an optional list of key-value pairs. The matching unregister() signals co?ow completion. A sender initiates the transfer of a content with an identi?er dataId using put(). A content can be a ?le on disk or an object in memory. For example, a mapper would put() r pieces of content for r reducers in a MapReduce job, and the Spark driver would put() a common piece of data to be broadcasted from memory to its workers (we omit corresponding put() signatures for brevity). The dataId of a content is unique within a co?ow. Any ?ow created to transfer dataId belongs to the co?ow with the speci?ed co?owId. A receiver indicates its interest in a content using its dataId through get(). Only after receiving a get() request, the scheduler can create and consider a ?ow for scheduling. Receiver tasks learn the dataIds of interest from respective framework drivers. Varys scheduler determines when, from where, and at what rate to retrieve each requested dataId. VarysClient enforces schedulerdetermined rates at the application layer and noti?es the master upon completion of get(). Usage Example Consider shuf?e – the predominant communication pattern in cluster frameworks. Shuf?e transfers the output of each task (mapper) in one computation stage to the tasks (reducers) in the next. The following example shows how to enable a 3 × 2 shuf?e (with 3 mappers and 2 reducers) to take advantage of interco?ow scheduling. Assume that all entities interacting with Varys have their own instances of VarysClient objects named client. First, the driver registers the shuf?e indicating that Varys should consider it READY after receiving get() for all six ?ows. val cId = client.register(6) When scheduling each task, the driver passes along the cId for the shuf?e. Each mapper m uses cId when calling put() for each reducer r. // Read from DFS, run user-written map method, // and write intermediate data to disk. // Now, invoke the coflow API. for (r <- reducers) client.put(cId, dId-m-r, content-m-r)

To guarantee a co?ow’s completion within deadline (DC ), completing its bottlenecks as fast as possible has no bene?ts. A co?ow can meet its deadline using minimum bandwidth as long as all ?ows ?nish exactly at the deadline. We can achieve that by setting rij = dij /DC using MADD. To provide guarantees in the online scenario, we introduce admission control (line 3 in Pseudocode 2). We admit a co?ow C , if and only if it can meet its deadline without violating that of any existing co?ow. Speci?cally, we recalculate the minimum bandwidth required to complete all existing co?ows within their deadlines (line 2 in Pseudocode 2) and check if the minimum CCT of C , ΓC ≤ DC . Otherwise, C is rejected. An admitted co?ow is never preempted, and a co?ow is never rejected if it can safely be admitted. Hence, there is no risk of starvation. We use the back?lling procedure from before for work conservation.

448

Pseudocode 3 Message Handlers in Varys Scheduler 1: procedure ON C OFLOW R EGISTER(Co?ow C ) 2: Mark C as UNREADY 3: Ccur = Ccur ∪ {C } ? Ccur is the set of all co?ows 4: end procedure 5: procedure O N C OFLOW U NREGISTER(Co?ow C ) 6: Ccur = Ccur \ {C } 7: Cready = C.?lter(READY) 8: Call appropriate scheduler from Section 5 9: end procedure 10: procedure ON F LOW P UT(Flow f , Co?ow C ) 11: Update Size(C ) and relevant data structures 12: end procedure 13: procedure ON F LOW G ET(Flow f , Co?ow C ) 14: Update relevant data structures 15: Mark C as READY after get() is called numFlows times 16: if C is READY then 17: Cready = C.?lter(READY) 18: Call appropriate scheduler from Section 5 19: end if 20: end procedure In the snippet above, dId-m-r is an application-de?ned unique identi?er for individual pieces of data and content-m-r is the corresponding path to disk. This is an example of time-decoupled communication. Reducers use cId to retrieve the shuf?ed pieces of content by the mappers (served by Varys daemons). // Shuffle using the coflow API. for (m <- mappers) content-m-r = client.get(cId, dId-m-r) // Now, sort, combine, and write to DFS. Once all reducers are done, the driver terminates the co?ow. client.unregister(cId) Note that the example abstracts away some details – e.g., the pipelining between shuf?e and sort phases in reducers, which can be handled by providing a VarysInputStream implementation. Replacing the communication layer of a data-intensive framework with just the aforementioned changes can enable all its user jobs to take advantage of inter-co?ow scheduling. 6.2 Inter-Co?ow Scheduling in Varys
Varys implements the algorithms in Section 5 for inter-co?ow

Shuf?e Dur. % of Jobs

< 25% 61%

25–49% 13%

50–74% 14%

≥ 75% 12%

Table 3: Jobs binned by time spent in communication. Co?ow Bin Length Width % of Co?ows % of Bytes 1 (SN) Short Narrow 52% 0.01% 2 (LN) Long Narrow 16% 0.67% 3 (SW) Short Wide 15% 0.22% 4 (LW) Long Wide 17% 99.10%

Table 4: Co?ows binned by width and length.

7

Evaluation

We evaluated Varys through a set of experiments on 100-machine EC2 [2] clusters using a Hive/MapReduce trace collected from a large production cluster at Facebook. For a larger scale evaluation, we used a trace-driven simulator that performs a detailed replay of task logs from the same trace. The highlights are: ? For communication-dominated jobs, Varys improves the average (95th percentile) CCT and job completion time by up to 3.16× (3.84×) and 2.5× (2.94×), respectively, over per-?ow fairness. Across all jobs, the improvements are 1.85× (1.74×) and 1.25× (1.15×) (§7.2). ? Simulations show that Varys improves the average (95th percentile) CCT by 5.53× (5.8×) over per-?ow prioritization mechanisms (§7.2). ? Varys enables almost 2× more co?ows to meet their deadlines in EC2 experiments (§7.3). ? All co?ows eventually complete using Varys without starvation, and simulations show Varys to be 5.65× faster than a nonpreemptive solution (7.7× at the 95th percentile) (§7.4). 7.1 Methodology Workload Our workload is based on a Hive/MapReduce trace at Facebook that was collected on a 3000-machine cluster with 150 racks. The original cluster had a 10 : 1 core-to-rack oversubscription ratio with a total bisection bandwidth of 300 Gbps. We scale down jobs accordingly to match the maximum possible 100 Gbps bisection bandwidth of our deployment. During the derivation, we preserve the original workload’s communication characteristics. We consider jobs with non-zero shuf?e and divide them into bins (Table 3) based on the fraction of their durations spent in shuf?e. Table 4 divides the co?ows in these jobs into four categories based on their characteristics. Based on the distributions of co?ow lengths and widths (Figure 4), we consider a co?ow to be short if its longest ?ow is less than 5 MB and narrow if it involves at most 50 ?ows. For deadline-constrained experiments, we set the deadline of a co?ow to be its minimum time in an empty network ( completion ) (Γempty ) multiplied by 1 + U(0, x) , where U(0, x) is a uniformly random number between 0 and x. Unless otherwise speci?ed, x = 1. The minimum deadline is 200 milliseconds. Cluster Our experiments use extra large high-memory EC2 instances, which appear to occupy entire physical machines and have enough memory to perform all experiments without introducing disk overheads. We observed bandwidths close to 800 Mbps per machine on clusters of 100 machines. We use a compute engine similar to Spark [41] that uses the co?ow API. We use δ = 200 milliseconds and T = 2 seconds as defaults. Simulator We use a trace-driven simulator to gain more insights into Varys’s performance at a larger scale. The simulator performs a detailed task-level replay of the Facebook trace. It preserves input-

scheduling, which are called upon co?ow arrival and completion events. Rem(.) is calculated from the aggregated measurements collected by Varys daemons. Pseudocode 3 lists the key event handlers in Varys. Initially, all co?ows are marked UNREADY (ON C OFLOW R EGISTER). The size of a co?ow is updated as VarysClient instances periodically notify ?ow-related events using ON F LOW P UT and ON F LOW G ET. A co?ow is considered READY to be scheduled after numFlows get() calls (ON F LOW G ET). Varys master groups the new allocations calculated by the scheduler by respective VarysClients and sends the changes asynchronously. Choice of T and δ Values A smaller δ in Pseudocode 1 ensures a lower impact on the average CCT. However, too small a δ can cause the underlying transport protocol (e.g., TCP) to behave erratically due to signi?cant variation of available bandwidth over short time intervals. We suggest δ to be O(100) milliseconds and T to be O(1) seconds.

449

Factor of Improvement!

2.50! 2.94!

1.32! 1.89!

1.94! 1.82!

1.83! 2.23!

4! 1.17! 1.50! 2! 1! 0! 1.04! 1.07! 3!

4! 3! 2! 1! 0! Bin 1!

2.79! 2.91!

Average!

95th Percentile!

Factor of Improvement!

5!

5!

Average!

95th Percentile! 1.46! 1.59! 1.85! 1.74! ALL!
100! 10000!
5.53! 5.80! 3.66! 2.77!

<25%! 25-49%! 50-74%! >=75%! All Jobs! Perc. of Job Duration Spent in Communication!

1.25! 1.15!

Bin 2!

Bin 3! Co?ow Types!

Bin 4!

(a) Improvements in job completion times
Factor of Improvement! 5! 4! 1.76! 1.62! 3! 2! 1! 0! <25%! 25-49%! 50-74%! >=75%! All Jobs! Perc. of Job Duration Spent in Communication! Average! 1.87! 2.95! 95th Percentile! 1.70! 1.75! 3.16! 3.84!

Figure 7: [EC2] Improvements in the average and 95th percentile CCTs using co?ows w.r.t. the default per-?ow fairness mechanism.
1! 1!

Fraction of Co?ows!

0.5! Inter-Co?ow Scheduling! Per-Flow Fairness! 1! 100! 10000!

Fraction of Co?ows!

1.85! 1.74!

0! 0.01!

0! 0.01! 1!

Inter-Co?ow Scheduling! Per-Flow Prioritization! Per-Flow Fairness!

(b) Improvements in time spent in communication
Figure 6: [EC2] Average and 95th percentile improvements in job and communication completion times over per-?ow fairness using Varys.

Co?ow Completion Time (Seconds)!

Co?ow Completion Time (Seconds)!

(a) EC2

(b) Simulation

to-output ratios of tasks, locality constraints, and inter-arrival times between jobs. It runs at 10s decision intervals for faster completion. Metrics Our primary metric for comparison is the improvement in average completion times of co?ows and jobs (when its last task ?nished) in the workload, where Current Duration Modi?ed Duration For deadline-sensitive co?ows, the primary metric is the percentage of co?ows that meet their deadlines. The baseline for our deployment is TCP fair-sharing. We compare the trace-driven simulator against per-?ow fairness as well. Due to the lack of implementations of per-?ow prioritization mechanisms [8, 25], we compare against them only in simulation. Factor of Improvement = 7.2 Varys’s Performance in Minimizing CCT Figure 6a shows that inter-co?ow scheduling reduced the average and 95th percentile completion times of communication-dominated jobs by up to 2.5× and 2.94×, respectively, in EC2 experiments. Corresponding average and 95th percentile improvements in the average CCT (CommTime) were up to 3.16× and 3.84× (Figure 6b). Note that varying improvements in the average CCT in different bins are not correlated, because it depends more on co?ow characteristics than that of jobs. However, as expected, jobs become increasingly faster as the communication represent a higher fraction of their completion times. Across all bins, the average endto-end completion times improved by 1.25× and the average CCT improved by 1.85×; corresponding 95th percentile improvements were 1.15× and 1.74×. Figure 7 shows that Varys improves CCT for diverse co?ow characteristics. Because bottlenecks are not directly correlated with a co?ow’s length or width, pairwise comparisons across bins – specially those involving bin-2 and bin-3 – are harder. We do observe more improvements for co?ows in bin-1 than bin-4 in terms of average CCT, even though their 95th percentile improvements contradict. This is due to coordination overheads in Varys – recall that Varys does not handle small co?ows to avoid ?xed overheads. Figure 8a presents comparative CDFs of CCTs for all co?ows. Per-?ow fairness performs better – 1.08× on average and 1.25× at the 95th percentile – only for some of the tiny, sub-second (<500 milliseconds) co?ows, which still use TCP fair sharing. As co?ows

Figure 8: CCT distributions for Varys, per-?ow fairness, and per-?ow prioritization schemes (a) in EC2 deployment and (b) in simulation. Note that the X-axes are in logarithmic scale.
27.58! 30.24!

Factor of Improvement!

30! 20! 10! 0!

W.r.t. Per-Flow Prioritization (Average)! W.r.t. Per-Flow Prioritization (95th Percentile)! W.r.t. Per-Flow Fairness (Average)! W.r.t. Per-Flow Fairness (95th Percentile)!
6.48! 9.31! 9.08! 8.09! 3.57! 3.52! 5.24! 6.28! 3.52! 3.49! 4.94! 5.08! 4.25! 4.23!

Bin 1!

Bin 2!

Bin 3! Bin 4! Co?ow Types!

ALL!

Figure 9: [Simulation] Improvements in the average and 95th percentile CCTs using inter-co?ow scheduling.

become larger, the advantages of co?ow scheduling becomes more prominent. We elaborate on Varys’s overheads next; later, we show simulation results that shed more light on the performance of small co?ows in the absence of coordination overheads. Overheads Control plane messages to and from the Varys master are the primary sources of overheads. Multiple messages from the same endpoint are batched whenever possible. At peak load, we observed a throughput of 4000+ messages/second at the master. The scheduling algorithm took 17 milliseconds on average to calculate new schedules on co?ow arrival or departure. The average time to distribute new schedules across the cluster was 30 milliseconds. An additional source of overhead is the synchronization time before a co?ow becomes READY for scheduling. Recall that a co?ow waits for numFlows get() calls; hence, a single belated get() can block the entire co?ow. In our experiments, the average duration to receive all get() calls was 44 milliseconds with 151 milliseconds being the 95th percentile. A large fraction of these overheads could be avoided in the presence of in-network isolation of control plane messages [21]. Trace-Driven Simulation We compared the performance of interco?ow scheduling against per-?ow fairness and prioritization schemes in simulations. Without coordination overheads, the improvements are noticeably larger (Figure 9) – the average and 95th percentile CCTs improved by 3.66× and 2.77× over per-?ow fairness and by 5.53× and 5.8× over per-?ow prioritization.

450

% Co?ows that Met Their Deadline!

100! 75! 50! 25! 0! Co?ow! Fair! Co?ow! Fair! Co?ow! Fair! Actual Deadline! Within 50% Within 100% of Deadline! of Deadline!

% Co?ows!

100! 75! 50! 25! 0! Inter-Co?ow Scheduling! Virtual Cluster! Per-Flow Prioritization! Per-Flow Fairness! 0.1! 0.5! 1! 2! 10! Deadline = (1+Uniform(0, x)) Times the Min CCT!

Not Admitted! Missed Deadline! Met Deadline!

Figure 10: [EC2] Percentage of co?ows that meet deadline using Varys in comparison to per-?ow fairness. Increased deadlines improve performance.

Figure 11: [Simulation] More co?ows meet deadline using inter-co?ow scheduling than using per-?ow fairness and prioritization schemes. # Co?ows Mix-N Mix-W Mix-S Mix-L 800 369 526 272 %SN 48% 22% 50% 39% %LN 40% 15% 17% 27% %SW 2% 24% 10% 3% %LW 10% 39% 23% 31%

Note that comparative improvements for bin-1 w.r.t. other bins are signi?cantly larger than that in experiments because of the absence of scheduler coordination overheads. We observe larger absolute values of improvements in Figure 9 in comparison to the ones in Figure 7. Primary factors for this phenomenon include instant scheduling, zero-latency setup/cleanup/update of co?ows, and perfectly timed ?ow arrivals (i.e., co?ows are READY to be scheduled upon arrival) in the simulation. In the absence of these overheads, we see in Figure 8b that Varys can indeed outperform per?ow schemes even for sub-second co?ows. What About Per-Flow Prioritization? Figure 9 highlights that per-?ow prioritization mechanisms are even worse (by 1.52×) than per-?ow fairness provided by TCP when optimizing CCTs. The primary reason is indiscriminate interleaving across co?ows – while all ?ows make some progress using ?ow-level fairness, per?ow prioritization favors only the small ?ows irrespective of the progress of their parent co?ows. However, as expected, ?ow-level prioritization is still 1.08× faster than per-?ow fairness in terms of the average FCT. Figure 8b presents the distribution of CCTs using per-?ow prioritization in comparison to other approaches. How Far are We From the Optimal? While ?nding the optimal schedule is infeasible, we tried to ?nd an optimistic estimation of possible improvements by comparing against an of?ine 2approximation combinatorial ordering heuristic for co?ows without coupled resources [30]. We found that the average CCT did not change using the combinatorial approach. For bin-1 to bin-4, the changes were 1.14×, 0.96×, 1.46×, and 0.92×, respectively. 7.3 Varys’s Performance for Deadline-Sensitive Co?ows Inter-co?ow scheduling allowed almost 2× more co?ows to complete within corresponding deadlines in EC2 experiments (Figure 10) – 57% co?ows met their deadlines using Varys as opposed to 30% using the default mechanism. Co?ows across different bins experienced similar results, which is expected because Varys does not differentiate between co?ows when optimizing for deadlines. Recall that because the original trace did not contain co?owspeci?c deadlines, we introduced them based on the minimum CCT of co?ows (§7.1). Hence, we did not expect 100% admission rate. However, a quarter of the admitted co?ows failed to meet their deadlines. This goes back to the lack of network support in estimating utilizations and enforcing Varys-determined allocations: Varys admitted more co?ows than it should have had, which themselves missed their deadlines and caused some others to miss as well. Trace-driven simulations later shed more light on this. To understand how far off the failed co?ows were, we analyzed if they could complete with slightly longer deadlines. After doubling the deadlines, we found that almost 94% of the admitted co?ows succeeded using Varys. Trace-Driven Simulation In trace-driven simulations, for the default case (x=1), Varys admitted 75% of the co?ows and all of them met their deadlines (Figure 11). Note that the admission rate is lower than that in our experiments. Prioritization schemes fared

Table 5: Four extreme co?ow mixes from the Facebook trace.

better than per-?ow fairness unlike when the objective was minimizing CCT: 59% co?ows completed within their deadlines in comparison to 52% using fair sharing. As we changed the deadlines of all co?ows by varying x from 0.1 to 10, comparative performance of all the approaches remained almost the same. Performance across bins were consistent as well. What About Reservation Schemes? Because the impact of admission control is similar to reserving resources, we compared our performance with that of the Virtual Cluster (VC) abstraction [11], where all machines can communicate at the same maximum rate through a virtual non-blocking switch. The VC abstraction admitted and completed slightly fewer co?ows (73%) than Varys (75%), because reservation using VCs is more conservative. 7.4 Impact of Preemption

While minimizing CCT, preemption-based mechanisms can starve certain co?ows when the system is overloaded. Varys takes precautions (§5.3.4) to avoid such scenarios. As expected, we did not observe any perpetual starvation during experiments or simulations. What About a Non-Preemptive Scheduler? Processing co?ows in their arrival order (i.e., FIFO) avoids starvation [15]. However, simulations con?rmed that head-of-line blocking signi?cantly hurts performance – specially, the short co?ows in bin-1 and bin-3. We found that processing co?ows in the FIFO order can result in 24.64×, 5.44×, 34.2×, and 5.03× slower completion times for bin-1 to bin-4. The average (95th percentile) CCT became 5.65× (7.7×) slower than that using Varys. 7.5 Impact on Network Utilization To understand Varys’s impact on network utilization, we compared the ratios of makespans in the original workload as well as the ones in Table 5. Given a ?xed workload, a change in makespan means a change in aggregate network utilization. We did not observe signi?cant changes in makespan in our EC2 experiments – the exact factors of improvements were 1.02×, 1.06×, 1.01×, 0.97×, and 1.03× for the ?ve workloads. This is expected because while Varys is not work-conserving at every point in time, its overall utilization is the same as non-co?ow approaches. Makespans for both per-?ow fairness and co?ow-enabled schedules were the same in the trace-driven simulation. 7.6 Impact of Co?ow Mix

To explore the impact of changes in the co?ow mix, we selected four extreme hours (Table 5) from the trace and performed hourlong experiments on EC2. These hours were chosen based on the

451

3.16!

Factor of Improvement!

2! 1! 0!

Bin 1!

1!

Bin 2!

1.37! 1.28! 1.03! 1.17!

Bin 3!

1.25! 1.55! 1.74! 1.63!

Bin 4!

Figure 12: [EC2] Improvements in the average CCT using co?ows for different co?ow mixes from Table 5.
Factor of Improvement! 6! 4! 2! 0! 1.76! 10! W.r.t. Per-Flow Prioritization! W.r.t. Per-Flow Fairness! 3.72! 3.26! 1.83! 2.68! 3.21!

1.3! 1.56! 1.76! 1.64!

3!
1.63! 1.7! 1.34!

2.73! 2.9!

Mix-N! Mix-S!
1.55!

Mix-W! Mix-L!

8

Discussion

ALL!

Scheduling With Unknown Flow Sizes Knowing or estimating exact ?ow sizes is dif?cult in frameworks that push data to the next stage as soon as possible [26], and without known ?ow sizes, preemption becomes impractical. FIFO scheduling can solve this problem, but it suffers from head-of-line blocking (§7.4). We believe that co?ow-level fairness can be a nice compromise between these two extremes. However, the de?nition and associated properties of fairness at the co?ow level is an open problem. Decentralized SEBF+MADD Varys’s centralized design makes it less useful for small co?ows (§3); however, small co?ows contribute less than 1% of the traf?c in data-intensive clusters (§4). Furthermore, in-network isolation of control plane messages [21] or faster signaling channels like RDMA [20] can reduce Varys’s application-layer signaling overheads (§7.2) to support even smaller co?ows. We see a decentralized approximation of our algorithms as the most viable way to make Varys useful for lowlatency co?ows. This requires new algorithms and possible changes to network devices, unlike our application-layer design. Handling Co?ow Dependencies While most jobs require only a single co?ow, data?ow pipelines (e.g., Dryad, Spark) can create multiple co?ows with dependencies between them [16]. A simple approach to support co?ow dependencies would be to order ?rst by ancestry and then breaking ties using SEBF. Some variation of the Critical-Path Method [28] might perform even better. We leave it as a topic of future work. Note that dependencies can be passed along to the scheduler through options in the register() method. Multi-Wave Co?ows Large jobs often schedule mappers in multiple waves [10]. A job can create separate co?ows for each wave. Alternatively, if the job uses its wave-width (i.e., the number of parallel mappers) as numFlows in register(), Varys can handle each wave separately. Applications can convey information about wave-induced co?ows to the scheduler as dependencies. In-Network Bottlenecks Varys performs well even when the network is not a non-blocking switch (§7). If likely bottleneck locations are known, e.g., rack-to-core links are typically oversubscribed [17], Varys can be extended to allocate rack-to-core bandwidth instead of NIC bandwidth. When bottlenecks are unknown, e.g., due to in-network failures, routing, or load imbalance, Varys can react based on bandwidth estimations collected by its daemons. Nonetheless, designing and deploying co?ow-aware routing protocols and load balancing techniques remain an open challenge.

4.18! 3.66!

40! 70! Number of Concurrent Co?ows!

100!

Figure 13: [Simulation] Improvements in the average CCT for varying numbers of concurrent co?ows.
% Co?ows that Met Their Deadline! 100! 75! 50! 25! 0! 70.83! 47.37! 33.33! 12.50! 10! 2.63! Inter-Co?ow Scheduling! Per-Flow Prioritization! Per-Flow Fairness! 41.67! 37.33! 2.45! 1.00! 100!

40! 70! Number of Concurrent Co?ows!

Figure 14: [Simulation] Changes in the percentage of co?ows that meet deadlines for varying numbers of concurrent co?ows.

high percentage of certain types of co?ows (e.g., narrow ones in Mix-N) during those periods. Figure 12 shows that the average CCT improves irrespective of the mix, albeit in varying degrees. Observations made earlier (§7.2) still hold for each mix. However, identifying the exact reason(s) for different levels of improvements is dif?cult. This is due to the online nature of the experiments – the overall degree of improvement depends on the instantaneous interplay of concurrent co?ows. We also did not observe any clear correlation between the number of co?ows or workload size and corresponding improvements. 7.7 Impact of Cluster/Network Load

So far we have evaluated Varys’s improvements in online settings, where the number of concurrent co?ows varied over time. To better understand the impact of network load, we used the same co?ow mix as the original trace but varied the number of concurrent co?ows in an of?ine setting. We see in Figure 13 that Varys’s improvements increase with increased concurrency: per-?ow mechanisms fall increasingly further behind as they ignore the structures of more co?ows. Also, ?ow-level fairness consistently outperforms per-?ow prioritization mechanisms in terms of the average CCT. Deadline-Sensitive Co?ows We performed a similar analysis for deadline-sensitive co?ows. Because in this case Varys’s performance depends on the arrival order, we randomized the co?ow order across runs and present their average in Figure 14. We observe that as the number of coexisting co?ows increases, a large number of co?ows (37% for 100 concurrent co?ows) meet their deadlines using Varys; per-?ow mechanisms completely stop working even before. Also, per-?ow prioritization outperforms (however marginally) ?ow-level fairness for co?ows with deadlines.

9

Related Work

Co?ow Schedulers Varys improves over Orchestra [15] in four major ways. First, Orchestra primarily optimizes individual co?ows and uses FIFO among them; whereas, Varys uses an ef?cient co?ow scheduler to signi?cantly outperform FIFO. Second, Varys supports deadlines and ensures guaranteed co?ow completion. Third, Varys uses a rate-based approach instead of manipulating the number of TCP ?ows, which breaks if all co?ows do not share the same endpoints. Finally, Varys supports co?ows from multiple frameworks like Mesos [24] handles non-network resources. Baraat [19] is a FIFO-based decentralized co?ow scheduler focusing on small co?ows. It uses fair sharing to avoid head-of-line blocking and does not support deadlines. Furthermore, we formulate the co?ow scheduling problem and analyze its characteristics. Datacenter Traf?c Management Hedera [7] manages ?ows using a centralized scheduler to increase network throughput, and MicroTE [12] adapts to traf?c variations by leveraging their shortterm predictability. However, both work with ?ows and are unsuit-

452

able for optimizing CCTs. Sinbad [17] uses endpoint ?exible transfers for load balancing. Once it makes network-aware placement decisions, Varys can optimize cross-rack write co?ows. High Capacity Networks Full bisection bandwidth topologies [22,32] do not imply contention freedom. In the presence of skewed data and hotspot distributions [17], managing edge bandwidth is still necessary. Inter-co?ow scheduling improves performance and predictability even in these high capacity networks. Traf?c Reduction Techniques Data locality [18], both disk [9,40] and memory [10], reduces network usage only during reads. The amount of network usage due to intermediate data communication can be reduced by pushing ?lters toward the sources [6, 23]. Our approach is complementary; i.e., it can be applied to whatever data traverses the network after applying those techniques. Network Sharing Among Tenants Fair sharing of network resources between multiple tenants has received considerable attention [11, 33, 35, 39]. Our work is complementary; we focus on optimizing performance of concurrent co?ows within a single administrative domain, instead of achieving fairness among competing entities. Moreover, we focus on performance and predictability as opposed to the more debated notion of fairness. Concurrent Open Shop Scheduling Inter-co?ow scheduling has its roots in the concurrent open shop scheduling problem [34], which is strongly NP-hard for even two machines. Even in the of?ine scenario, the best known result is a 2-approximation algorithm [30], and it is inapproximable within a factor strictly less than 6/5 if P?=NP [30]. Our setting is different as follows. First, machines are not independent; i.e., links are coupled because each ?ow involves a source and a destination. Second, jobs are not known a priori; i.e., co?ows arrive in an online fashion.

11
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30]

References
Akka. http://akka.io. Amazon EC2. http://aws.amazon.com/ec2. Apache Hadoop. http://hadoop.apache.org. Apache Hive. http://hive.apache.org. Kryo serialization library. https://code.google.com/p/kryo. S. Agarwal et al. Reoptimizing data parallel computing. In NSDI’12. M. Al-Fares et al. Hedera: Dynamic ?ow scheduling for data center networks. In NSDI. 2010. M. Alizadeh et al. pFabric: Minimal near-optimal datacenter transport. In SIGCOMM. 2013. G. Ananthanarayanan et al. Reining in the outliers in mapreduce clusters using Mantri. In OSDI. 2010. G. Ananthanarayanan et al. PACMan: Coordinated memory caching for parallel jobs. In NSDI. 2012. H. Ballani et al. Towards predictable datacenter networks. In SIGCOMM. 2011. T. Benson et al. MicroTE: Fine grained traf?c engineering for data centers. In CoNEXT. 2011. D. Borthakur. The Hadoop distributed ?le system: Architecture and design. Hadoop Project Website, 2007. R. Chaiken et al. SCOPE: Easy and ef?cient parallel processing of massive datasets. In VLDB. 2008. M. Chowdhury et al. Managing data transfers in computer clusters with Orchestra. In SIGCOMM. 2011. M. Chowdhury et al. Co?ow: A networking abstraction for cluster applications. In HotNets-XI, pages 31–36. 2012. M. Chowdhury et al. Leveraging endpoint ?exibility in data-intensive clusters. In SIGCOMM. 2013. J. Dean et al. MapReduce: Simpli?ed data processing on large clusters. In OSDI, pages 137–150. 2004. F. Dogar et al. Decentralized task-aware scheduling for data center networks. Technical Report MSR-TR-2013-96, 2013. A. Dragojevi? c et al. FaRM: Fast remote memory. In NSDI. 2014. A. D. Ferguson et al. Participatory networking: An API for application control of SDNs. In SIGCOMM. 2013. A. Greenberg et al. VL2: A scalable and ?exible data center network. In SIGCOMM. 2009. Z. Guo et al. Spotting code optimizations in data-parallel pipelines through PeriSCOPE. In OSDI. 2012. B. Hindman et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. In NSDI. 2011. C.-Y. Hong et al. Finishing ?ows quickly with preemptive scheduling. In SIGCOMM. 2012. M. Isard et al. Dryad: Distributed data-parallel programs from sequential building blocks. In EuroSys, pages 59–72. 2007. N. Kang et al. Optimizing the “One Big Switch” abstraction in Software-De?ned Networks. In CoNEXT. 2013. J. E. Kelley. Critical-path planning and scheduling: Mathematical basis. Operations Research, 9(3):296–320, 1961. G. Malewicz et al. Pregel: A system for large-scale graph processing. In SIGMOD. 2010. M. Mastrolilli et al. Minimizing the sum of weighted completion times in a concurrent open shop. Operations Research Letters, 38(5):390–395, 2010. N. McKeown et al. Achieving 100% throughput in an input-queued switch. IEEE Transactions on Communications, 47(8), 1999. R. N. Mysore et al. PortLand: A scalable fault-tolerant layer 2 data center network fabric. In SIGCOMM, pages 39–50. 2009. L. Popa et al. FairCloud: Sharing the network in cloud computing. In SIGCOMM. 2012. T. A. Roemer. A note on the complexity of the concurrent open shop problem. Journal of Scheduling, 9(4):389–396, 2006. A. Shieh et al. Sharing the data center network. In NSDI. 2011. N. Tolia et al. An architecture for internet data transfer. In NSDI’06. L. G. Valiant. A bridging model for parallel computation. Communications of the ACM, 33(8):103–111, 1990. C. A. Waldspurger et al. Lottery scheduling: Flexible proportional-share resource management. In OSDI. 1994.

10

Concluding Remarks

The co?ow abstraction [16] effectively enables application-aware network scheduling. We have implemented co?ows in a system called Varys and introduced the concurrent open shop scheduling with coupled resources problem. To minimize co?ow completion times (CCT), we proposed the SEBF heuristic to schedule co?ows and the MADD algorithm to allocate bandwidth to their ?ows. Together, they decrease the average CCT without starving any co?ow and maintain high network utilization. Through EC2 deployments and trace-driven simulations, we showed that Varys outperforms per-?ow mechanisms by up to 3.16× and non-preemptive co?ow schedulers by more than 5×. Furthermore, by applying MADD in conjunction with admission control, Varys allowed up to 2× more co?ows to meet their deadlines in comparison to per-?ow schemes. In conclusion, this paper is only a ?rst step in understanding the intricacies of inter-co?ow scheduling and opens up a variety of exciting research problems, which include scheduling without knowing ?ow sizes, exploring the notion of co?ow fairness, decentralizing the proposed algorithms, and handling co?ow dependencies.

[31] [32] [33] [34] [35] [36] [37] [38]

Acknowledgments
We thank Mohammad Alizadeh, Justine Sherry, Rachit Agarwal, Peter Bailis, Ganesh Ananthanarayanan, Tathagata Das, Ali Ghodsi, Gautam Kumar, David Zats, Matei Zaharia, the AMPLab members, our shepherd Nandita Dukkipati, and the anonymous reviewers of NSDI’14 and SIGCOMM’14 for useful feedback. This research is supported in part by NSF CISE Expeditions Award CCF-1139158, LBNL Award 7076018, and DARPA XData Award FA8750-12-2-0331, and gifts from Amazon Web Services, Google, SAP, The Thomas and Stacey Siebel Foundation, Apple, Inc., Cisco, Cloudera, EMC, Ericsson, Facebook, GameOnTalis, Guavus, HP, Huawei, Intel, Microsoft, NetApp, Pivotal, Splunk, Virdata, VMware, WANdisco and Yahoo!.

453

C1, C2 arrive ! C1 ends! C2! C3! [39] D. Xie et al. The only constant is change: Incorporating time-varying at time 0! network reservations in data centers. In SIGCOMM. 2012. 1! scheduling: A simple technique for achieving [40] M. Zaharia et al. Delay 1! 1! P locality and fairness in 1!cluster scheduling. In EuroSys. 2010. 1! [41] M. Zaharia et al. Resilient distributed datasets: A fault-tolerant 1! cluster computing. In NSDI. 2012.P2! abstraction for in-memory

C C arrive C ,,C arrive ! ! All co?ows end! C ends C1 ends !! C C 11 22 ! C2 2! C3! 1 3! attime time at 0!0! 1 !! 1
1 !! 1 1 !! 1

All co?

All co?ows en

11 !!

1!1 1 P !!

P1! P1! P2! 1 ! P
1!
2!

P1!

P1!

?! APPENDIX C3 arrives ! at time 1! and Complexity A Problem Formulation

2!

2!

Time!

1!

! 2! 2?! ? C 3 arrives !

?!

1! 2! 2?! ! 1! 2! 2?! 1! Time C arrives Time ! 2! Time! 3 time at 1! ! 2 ! time 1! (b) Work-conserv. 1!(c) Optimal CCT (a) at Input

22 !!

P2! 2!2!

1!

1!

P2! P2! P 1! P21 !!

1!

2!

Time! T

2

Ti

1

Each co?ow C (D) is a collection of ?ows over the datacenter backplane with P ingress and P egress ports, where the P × P matrix D = [dij ]P ×P represents the structure of C . For each non-zero element dij ∈ D, a ?ow fij transfers dij amount of data from the ith ingress port (Piin ) to the j th egress port (Pjout ) across the back1! plane at rate rij , which is determined by the scheduling algorithm. 1! If Ck represents the time for all ?ows of the kth co?ow to 1!?nish k and rij (t) the bandwidth allocated to fij of the kth co?ow at time ! be t, the objective of minimizing CCT (O(.)) in the of?ine case1can 2! 1! represented as follows. Minimize Subject to
K ∑

Figure 15: Allocation of ingress port capacities (vertical axis) for the co?ows in (a) on a 2 × 2 datacenter fabric for (b) a work-conserving and (c) a CCT-optimized schedule. While the former is work-conserving and achieves higher utilization, the latter has a lower average CCT.

Both co?ows end!

1! 1!
1!

1! 2!

P1! 1! P2! 2!
Time!

2! 1!
1!

1!

2!

P1! Both P1! co?ows complete P2! P2! ! together 1!

Ck
k ′ (t) ≤ 1 rij k ′ j (t) ≤ 1 ri

(2) ?t, ?i; ?t, ?j ; ?i, j, k. (3) (4)

(a) Input

(b) Optimal schedule Time! (c) Varys

1! 2! 2! Time!


j ′ ,k

k=1

Figure 16: Flow-interleaved allocation of ingress port capacities (vertical axis) for the co?ows in (a) for CCT-optimality (b).


i′ ,k Ck ∑ t=1

k rij (t) ≥ dk ij

(5)

The ?rst two inequalities are the capacity constraints on ingress and egress ports. The third inequality ensures that all ?ows of the kth co?ow ?nish by time Ck . By introducing a binary variable Uk to denote whether a co?ow ?nished within its deadline Dk , we can express the objective of maximizing the number of co?ows that meet their deadlines (Z(.)) in the of?ine case as follows. Maximize
K ∑ k=1

pose n jobs arrive at time 0, and the kth job has dk 12 amount of work for machine 1 and dk 21 for machine 2. Since this is NP-hard [34], the co?ow scheduling problem described above is NP-hard as well. ■ Remark A.2 Given the close relation between concurrent open shop scheduling and co?ow scheduling, it is natural to expect that techniques to express concurrent open shop scheduling as a mixedinteger program and using standard LP relaxation techniques to derive approximation algorithms [30, 34] would readily extend to our case. However, they do not, because the coupled constraints (3) and (4) make permutation schedules sub-optimal (Theorem C.1). We leave the investigation of these topics as future work.

B

Tradeoffs in Optimizing CCT

Uk

(6)

Subject to inequalities (3), (4), and (5); { 1 C k ≤ Dk Where Uk = 0 Ck > D k Optimizing either objective (O or Z) is NP-hard. Theorem A.1 Even under the assumptions of Section 5.1, optimizing O or Z in the of?ine case is NP-hard for all P ≥ 2. Proof Sketch We reduce the NP-hard concurrent open shop scheduling problem [34] to the co?ow scheduling problem. Consider a network fabric with only 2 ingress and egress ports (P = 2) and all links have the same capacity (without loss of generality, we can let this capacity be 1). Since there are only 2 ports, all co?ows are of the form C (D), where D = (dij )2 i,j =1 is a 2 × 2 data matrix. 2 Suppose that n co?ows arrive at time 0, and let Dk = (dk ij )i,j =1 be the matrix of the kth co?ow. Moreover, assume for all k, dk ij = 0 if i = j . In other words, every co?ow only consists of 2 ?ows, one in to egress port P out , and the sending data from ingress port P1 2 in to egress port P out . other sending from ingress port P2 1 Consider now an equivalent concurrent open shop scheduling problem with 2 identical machines (hence the same capacity). Sup-

With Work Conservation Consider Figure 15a. Co?ows C1 and C2 arrive at time 0 with one and two ?ows, respectively. Each ?ow transfers unit data. C3 arrives one time unit later and uses a single ?ow to send 0.5 data unit. Figure 15b shows the work-conserving solution, which ?nishes in 2 time units for an average CCT of 1.67 time units. The optimal solution (Figure 15c), however, takes 2.5 time units for the same amount of data (i.e., it lowers utilization); still, it has a 1.11× lower average CCT (1.5 time units). With Avoiding Starvation The tradeoff between minimum completion time and starvation is well-known for ?ows (tasks) on individual links (machines) – longer ?ows starve if a continuous stream of short ?ows keep arriving. The same tradeoff holds for co?ows, because the datacenter fabric and co?ows generalize links and ?ows, respectively.

C

Ordering Properties of Co?ow Schedules

Theorem C.1 Permutation schedule is not optimal for minimizing the average CCT. Proof Sketch Both permutation schedules – C1 before C2 and C2 before C1 – would be suboptimal for the example in Figure 16a. ■ Remark C.2 In Varys, SEBF would schedule C1 before C2 (arbitrarily breaking the tie), and iterative MADD will allocate the minimum bandwidth to quickly ?nish C1 and then give the remaining bandwidth to C2 (Figure 16c). The average CCT will be the same as the optimal for this example.

454


相关文章:
四川省遂宁市2014年中考物理真题试题(解析版)
四川省遂宁市 2014 年中考物理真题试题一、选择题(本大题共计 10 小题每小题均只有一个正确选项每小题 3 分,共计 30 分) 1. (3 分) (2014?遂宁)第...
2014辽宁高考数学(理)解析版
2014辽宁高考数学(理)解析版_数学_高中教育_教育专区。2014 辽宁高考数学(理)解析版 2014· 辽宁卷(理科数学) 1. [2014· 辽宁卷] 已知全集 U=R, A={x|...
2014湖州中考数学解析版
(2014?湖州)在连接 A 地与 B 地的线段上有四个不同的点 D、G、K、Q,下列四幅图 中的实线分别表示某人从 A 地到 B 地的不同行进路线(箭头表示行进的...
2014武汉中考数学解析版
(2014?武汉)如图,PA,PB 切⊙O 于 A、B 两点,CD 切⊙O 于点 E,交 PA, PB 于 C,D.若⊙O 的半径为 r,△PCD 的周长等于 3r,则 tan∠APB 的值是...
2014贵阳中考数学试题(解析版)
贵州省贵阳市 2014 年中考数学试卷一、单项选择题(共 10 小题,每小题 3 分,共 30 分) 1. (3 分) (2014?贵阳)2 的相反数是( ) A. B. C.2 ﹣ ...
2014年上海高中学业水平考试生命科学试题及答案_图文
2014年上海高中学业水平考试生命科学试题及答案_高二理化生_理化生_高中教育_教育专区。2014年上海市普通高中学业水平考试生命科学试题及答案 ...
2014最新电影 在线观看 下载
2014最新电影 在线观看 下载_影视/动漫_生活休闲。2014最新电影 在线观看 下载追剧、追星尽在影视头条官网 www.yingshitoutiao.com 2014 最新电影 冰雪奇缘 豆瓣评分...
2014宜宾中考数学试题(解析版)
cos∠ABO=5× ∴BD=2BO= . = 解答: 点评: 本题考查了菱形的对角线互相垂直且平分各角, 特殊三角函数的熟练掌握. 13.(3 分)(2014?宜宾)在平面直角坐标...
2014年四川省攀枝花市中考数学试卷(含答案和解析)
2014年四川省攀枝花市中考数学试卷(含答案和解析)_中考_初中教育_教育专区。2014 年四川省攀枝花市中考数学试卷一、选择题(每小题 3 分,共 30 分) 1. (3 分...
2014柳州中考数学试题(解析版)
2014 年广西柳州市中考数学试卷参考答案与试题解析一、选择题(共 12 小题,每小题 3 分,满分 36 分) 1. (3 分) (2014?柳州) 如图, 李 师傅做了一个...
更多相关标签: