当前位置:首页 >> 计算机软件及应用 >>

Parallel Computing Toolbox User's Guide


Parallel Computing Toolbox? User’s Guide

R2012b

How to Contact MathWorks

Web Newsgroup www.mathworks.com/contact_TS.html Technical Support
www.mathworks.com comp.soft-sys.matlab suggest@mathworks.com bugs@mathworks.com doc@mathworks.com service@mathworks.com info@mathworks.com

Product enhancement suggestions Bug reports Documentation error reports Order status, license renewals, passcodes Sales, pricing, and general information

508-647-7000 (Phone) 508-647-7001 (Fax) The MathWorks, Inc. 3 Apple Hill Drive Natick, MA 01760-2098
For contact information about worldwide offices, see the MathWorks Web site. Parallel Computing Toolbox? User’s Guide ? COPYRIGHT 2004–2012 by The MathWorks, Inc.
The software described in this document is furnished under a license agreement. The software may be used or copied only under the terms of the license agreement. No part of this manual may be photocopied or reproduced in any form without prior written consent from The MathWorks, Inc. FEDERAL ACQUISITION: This provision applies to all acquisitions of the Program and Documentation by, for, or through the federal government of the United States. By accepting delivery of the Program or Documentation, the government hereby agrees that this software or documentation qualifies as commercial computer software or commercial computer software documentation as such terms are used or defined in FAR 12.212, DFARS Part 227.72, and DFARS 252.227-7014. Accordingly, the terms and conditions of this Agreement and only those rights specified in this Agreement, shall pertain to and govern the use, modification, reproduction, release, performance, display, and disclosure of the Program and Documentation by the federal government (or other entity acquiring for or through the federal government) and shall supersede any conflicting contractual terms or conditions. If this License fails to meet the government’s needs or is inconsistent in any respect with federal procurement law, the government agrees to return the Program and Documentation, unused, to The MathWorks, Inc.

Trademarks

MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See www.mathworks.com/trademarks for a list of additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective holders.
Patents

MathWorks products are protected by one or more U.S. patents. Please see www.mathworks.com/patents for more information.

Revision History

November 2004 March 2005 September 2005 November 2005 March 2006 September 2006 March 2007 September 2007 March 2008 October 2008 March 2009 September 2009 March 2010 September 2010 April 2011 September 2011 March 2012 September 2012

Online Online Online Online Online Online Online Online Online Online Online Online Online Online Online Online Online Online

only only only only only only only only only only only only only only only only only only

New for Version 1.0 (Release 14SP1+) Revised for Version 1.0.1 (Release 14SP2) Revised for Version 1.0.2 (Release 14SP3) Revised for Version 2.0 (Release 14SP3+) Revised for Version 2.0.1 (Release 2006a) Revised for Version 3.0 (Release 2006b) Revised for Version 3.1 (Release 2007a) Revised for Version 3.2 (Release 2007b) Revised for Version 3.3 (Release 2008a) Revised for Version 4.0 (Release 2008b) Revised for Version 4.1 (Release 2009a) Revised for Version 4.2 (Release 2009b) Revised for Version 4.3 (Release 2010a) Revised for Version 5.0 (Release 2010b) Revised for Version 5.1 (Release 2011a) Revised for Version 5.2 (Release 2011b) Revised for Version 6.0 (Release 2012a) Revised for Version 6.1 (Release 2012b)

Contents
Getting Started

1
Product Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Key Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parallel Computing with MathWorks Products . . . . . . . Key Problems Addressed by Parallel Computing . . . . . Run Parallel for-Loops (parfor) . . . . . . . . . . . . . . . . . . . . . . Execute Batch Jobs in Parallel . . . . . . . . . . . . . . . . . . . . . . . Partition Large Data Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Parallel Solutions . . . . . . . . . . . . . . . . . . . Interactively Run a Loop in Parallel . . . . . . . . . . . . . . . . . . Run a Batch Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Run a Batch Parallel Loop . . . . . . . . . . . . . . . . . . . . . . . . . . Run Script as Batch Job from the Current Folder Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distribute Arrays and Run SPMD . . . . . . . . . . . . . . . . . . . . Determine Product Installation and Versions . . . . . . . . 1-2 1-2 1-3 1-4 1-4 1-5 1-5 1-6 1-6 1-8 1-9 1-11 1-12 1-15

Parallel for-Loops (parfor)

2
Getting Started with parfor . . . . . . . . . . . . . . . . . . . . . . . . parfor-Loops in MATLAB . . . . . . . . . . . . . . . . . . . . . . . . . . . Deciding When to Use parfor . . . . . . . . . . . . . . . . . . . . . . . . Creating a parfor-Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Differences Between for-Loops and parfor-Loops . . . . . . . . Reduction Assignments: Values Updated by Each Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2 2-2 2-3 2-4 2-6 2-7

v

Displaying Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Programming Considerations . . . . . . . . . . . . . . . . . . . . . . . MATLAB Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using Objects in parfor Loops . . . . . . . . . . . . . . . . . . . . . . . Performance Considerations . . . . . . . . . . . . . . . . . . . . . . . . . Compatibility with Earlier Versions of MATLAB Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Advanced Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . About Programming Notes . . . . . . . . . . . . . . . . . . . . . . . . . . Classification of Variables . . . . . . . . . . . . . . . . . . . . . . . . . . Improving Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2-8 2-9 2-9 2-9 2-10 2-15 2-15 2-16 2-17 2-17 2-17 2-32

Single Program Multiple Data (spmd)

3
Executing Simultaneously on Multiple Data Sets . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . When to Use spmd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Setting Up MATLAB Resources Using matlabpool . . . . . . . Defining an spmd Statement . . . . . . . . . . . . . . . . . . . . . . . . Displaying Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Accessing Data with Composites . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Creating Composites in spmd Statements . . . . . . . . . . . . . . Variable Persistence and Sequences of spmd . . . . . . . . . . . Creating Composites Outside spmd Statements . . . . . . . . . Distributing Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distributed Versus Codistributed Arrays . . . . . . . . . . . . . . Creating Distributed Arrays . . . . . . . . . . . . . . . . . . . . . . . . Creating Codistributed Arrays . . . . . . . . . . . . . . . . . . . . . . . Programming Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-2 3-2 3-2 3-3 3-4 3-6 3-7 3-7 3-7 3-9 3-10 3-12 3-12 3-12 3-13 3-15

vi

Contents

MATLAB Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3-15 3-15 3-15

Interactive Parallel Computation with pmode

4
pmode Versus spmd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Run Parallel Jobs Interactively Using pmode . . . . . . . . Parallel Command Window . . . . . . . . . . . . . . . . . . . . . . . . . Running pmode Interactive Jobs on a Cluster . . . . . . . . Plotting Distributed Data Using pmode . . . . . . . . . . . . . . pmode Limitations and Unexpected Results . . . . . . . . . . Using Graphics in pmode . . . . . . . . . . . . . . . . . . . . . . . . . . . pmode Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . Connectivity Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hostname Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Socket Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-2 4-3 4-11 4-16 4-17 4-19 4-19 4-20 4-20 4-20 4-20

Math with Codistributed Arrays

5
Nondistributed Versus Distributed Arrays . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nondistributed Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Codistributed Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2 5-2 5-2 5-4

vii

Working with Codistributed Arrays . . . . . . . . . . . . . . . . . How MATLAB Software Distributes Arrays . . . . . . . . . . . . Creating a Codistributed Array . . . . . . . . . . . . . . . . . . . . . . Local Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Obtaining information About the Array . . . . . . . . . . . . . . . Changing the Dimension of Distribution . . . . . . . . . . . . . . . Restoring the Full Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . Indexing into a Codistributed Array . . . . . . . . . . . . . . . . . . 2-Dimensional Distribution . . . . . . . . . . . . . . . . . . . . . . . . . Looping Over a Distributed Range (for-drange) . . . . . . Parallelizing a for-Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Codistributed Arrays in a for-drange Loop . . . . . . . . . . . . . Using MATLAB Functions on Codistributed Arrays . . .

5-6 5-6 5-8 5-12 5-13 5-14 5-15 5-16 5-18 5-22 5-22 5-23 5-26

Programming Overview

6
How Parallel Computing Products Run a Job . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Toolbox and Server Components . . . . . . . . . . . . . . . . . . . . . Life Cycle of a Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create Simple Independent Jobs . . . . . . . . . . . . . . . . . . . . Program a Job on a Local Cluster . . . . . . . . . . . . . . . . . . . . Cluster Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cluster Profile Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discover Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Import and Export Cluster Profiles . . . . . . . . . . . . . . . . . . . Create and Modify Cluster Profiles . . . . . . . . . . . . . . . . . . . Validate Cluster Profiles . . . . . . . . . . . . . . . . . . . . . . . . . . . Apply Cluster Profiles in Client Code . . . . . . . . . . . . . . . . . Job Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Job Monitor GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Manage Jobs Using the Job Monitor . . . . . . . . . . . . . . . . . . 6-2 6-2 6-3 6-8 6-10 6-10 6-12 6-12 6-13 6-13 6-15 6-21 6-23 6-25 6-25 6-26

viii

Contents

Identify Task Errors Using the Job Monitor . . . . . . . . . . . . Programming Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Program Development Guidelines . . . . . . . . . . . . . . . . . . . . Current Working Directory of a MATLAB Worker . . . . . . . Writing to Files from Workers . . . . . . . . . . . . . . . . . . . . . . . Saving or Sending Objects . . . . . . . . . . . . . . . . . . . . . . . . . . Using clear functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Running Tasks That Call Simulink Software . . . . . . . . . . . Using the pause Function . . . . . . . . . . . . . . . . . . . . . . . . . . . Transmitting Large Amounts of Data . . . . . . . . . . . . . . . . . Interrupting a Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Speeding Up a Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Control Random Number Streams . . . . . . . . . . . . . . . . . . Different Workers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Client and Workers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Client and GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Worker CPU and Worker GPU . . . . . . . . . . . . . . . . . . . . . . . Profiling Parallel Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Collecting Parallel Profile Data . . . . . . . . . . . . . . . . . . . . . . Viewing Parallel Profile Data . . . . . . . . . . . . . . . . . . . . . . . . Benchmarking Performance . . . . . . . . . . . . . . . . . . . . . . . . HPC Challenge Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . Troubleshooting and Debugging . . . . . . . . . . . . . . . . . . . . Object Data Size Limitations . . . . . . . . . . . . . . . . . . . . . . . . File Access and Permissions . . . . . . . . . . . . . . . . . . . . . . . . . No Results or Failed Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . Connection Problems Between the Client and MJS . . . . . . SFTP Error: Received Message Too Long . . . . . . . . . . . . . .

6-26 6-28 6-28 6-29 6-30 6-30 6-31 6-31 6-31 6-31 6-32 6-32 6-33 6-33 6-34 6-35 6-36 6-38 6-38 6-38 6-39 6-49 6-49 6-50 6-50 6-50 6-52 6-53 6-54

ix

Program Independent Jobs

7
Program Independent Jobs . . . . . . . . . . . . . . . . . . . . . . . . . Use a Local Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Create and Run Jobs with a Local Cluster . . . . . . . . . . . . . Local Cluster Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Use a Cluster with a Supported Scheduler . . . . . . . . . . . Create and Run Jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Share Code with the Workers . . . . . . . . . . . . . . . . . . . . . . . . Manage Objects in the Scheduler . . . . . . . . . . . . . . . . . . . . . Use the Generic Scheduler Interface . . . . . . . . . . . . . . . . Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MATLAB Client Submit Function . . . . . . . . . . . . . . . . . . . . Example — Write the Submit Function . . . . . . . . . . . . . . . MATLAB Worker Decode Function . . . . . . . . . . . . . . . . . . . Example — Write the Decode Function . . . . . . . . . . . . . . . . Example — Program and Run a Job in the Client . . . . . . . Supplied Submit and Decode Functions . . . . . . . . . . . . . . . Manage Jobs with Generic Scheduler . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-2 7-3 7-3 7-7 7-9 7-9 7-15 7-20 7-24 7-24 7-25 7-29 7-30 7-33 7-33 7-37 7-38 7-42

Program Communicating Jobs

8
Program Communicating Jobs . . . . . . . . . . . . . . . . . . . . . . Use a Cluster with a Supported Scheduler . . . . . . . . . . . Schedulers and Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . Code the Task Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . Code in the Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Use the Generic Scheduler Interface . . . . . . . . . . . . . . . . Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-2 8-4 8-4 8-4 8-5 8-8 8-8

x

Contents

Code in the Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Further Notes on Communicating Jobs . . . . . . . . . . . . . . Number of Tasks in a Communicating Job . . . . . . . . . . . . . Avoid Deadlock and Other Dependency Errors . . . . . . . . . .

8-8 8-11 8-11 8-11

GPU Computing

9
When to Use a GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using gpuArray . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Transfer Data Between Workspace and GPU . . . . . . . . . . . Create GPU Data Directly . . . . . . . . . . . . . . . . . . . . . . . . . . Examine gpuArray Characteristics . . . . . . . . . . . . . . . . . . . Built-In Functions That Support gpuArray . . . . . . . . . . . . . Execute MATLAB Code on a GPU . . . . . . . . . . . . . . . . . . . MATLAB Code vs. gpuArray Objects . . . . . . . . . . . . . . . . . Running Your MATLAB Functions on the GPU . . . . . . . . . Example: Running Your MATLAB Code . . . . . . . . . . . . . . . Supported MATLAB Code . . . . . . . . . . . . . . . . . . . . . . . . . . Identify and Select a GPU Device . . . . . . . . . . . . . . . . . . . Example: Selecting a GPU . . . . . . . . . . . . . . . . . . . . . . . . . . Execute CUDA or PTX Code . . . . . . . . . . . . . . . . . . . . . . . . Create Kernels from CU Files . . . . . . . . . . . . . . . . . . . . . . . Run the Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Determine Input and Output Correspondence . . . . . . . . . . Kernel Object Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . Specify Entry Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Provide C Prototype Input . . . . . . . . . . . . . . . . . . . . . . . . . . Complete Kernel Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . GPU Characteristics and Limitations . . . . . . . . . . . . . . . . 9-2 9-2 9-2 9-3 9-3 9-4 9-7 9-8 9-11 9-11 9-11 9-12 9-13 9-16 9-16 9-18 9-18 9-19 9-20 9-21 9-21 9-22 9-24 9-26

xi

Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9-26 9-26

Object Reference

10
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Graphics Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . Jobs and Tasks in a Cluster . . . . . . . . . . . . . . . . . . . . . . . . . Generic Scheduler Interface Tools . . . . . . . . . . . . . . . . . . 10-2 10-3 10-4 10-5

Objects — Alphabetical List

11
Function Reference

12
Parallel Code Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . Parallel Code on a MATLAB Pool . . . . . . . . . . . . . . . . . . . . Profiles, Input, and Output . . . . . . . . . . . . . . . . . . . . . . . . . Interactive Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distributed and Codistributed Arrays . . . . . . . . . . . . . . . Toolbox Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Overloaded MATLAB Functions . . . . . . . . . . . . . . . . . . . . . Jobs and Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Job Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-2 12-2 12-2 12-3 12-4 12-4 12-5 12-7 12-7

xii

Contents

Job Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Task Execution Information . . . . . . . . . . . . . . . . . . . . . . . . . Object Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

12-8 12-8 12-9

Interlab Communication Within a Communicating Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-10 Graphics Processing Unit . . . . . . . . . . . . . . . . . . . . . . . . . . 12-11 Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12-12

Functions — Alphabetical List

13
Glossary

Index

xiii

xiv

Contents

1
Getting Started
? “Product Description” on page 1-2 ? “Parallel Computing with MathWorks Products” on page 1-3 ? “Key Problems Addressed by Parallel Computing” on page 1-4 ? “Introduction to Parallel Solutions” on page 1-6 ? “Determine Product Installation and Versions” on page 1-15

1

Getting Started

Product Description
Perform parallel computations on multicore computers, GPUs, and computer clusters Parallel Computing Toolbox? lets you solve computationally and data-intensive problems using multicore processors, GPUs, and computer clusters. High-level constructs—parallel for-loops, special array types, and parallelized numerical algorithms—let you parallelize MATLAB? applications without CUDA or MPI programming. You can use the toolbox with Simulink? to run multiple simulations of a model in parallel. The toolbox provides twelve workers (MATLAB computational engines) to execute applications locally on a multicore desktop. Without changing the code, you can run the same application on a computer cluster or a grid computing service (using MATLAB Distributed Computing Server?). You can run parallel applications interactively or in batch.

Key Features
? Parallel for-loops (parfor) for running task-parallel algorithms on multiple processors ? Support for CUDA-enabled NVIDIA GPUs ? Ability to run twelve workers locally on a multicore desktop ? Computer cluster and grid support (with MATLAB Distributed Computing Server) ? Interactive and batch execution of parallel applications ? Distributed arrays and spmd (single-program-multiple-data) for large dataset handling and data-parallel algorithms

1-2

Parallel Computing with MathWorks Products

Parallel Computing with MathWorks Products
In addition to Parallel Computing Toolbox, MATLAB Distributed Computing Server software allows you to run as many MATLAB workers on a remote cluster of computers as your licensing allows. You can also use MATLAB Distributed Computing Server to run workers on your client machine if you want to run more than twelve local workers. Most MathWorks products let you code in such a way as to run applications in parallel. For example, Simulink models can run simultaneously in parallel, as described in “Run Parallel Simulations”. MATLAB Compiler? software lets you build and deploy parallel applications, as shown in . Several MathWorks products now offer built-in support for the parallel computing products, without requiring extra coding. For the current list of these products and their parallel functionality, see:
http://www.mathworks.com/products/parallel-computing/builtin-parallel-support.html

1-3

1

Getting Started

Key Problems Addressed by Parallel Computing
In this section... “Run Parallel for-Loops (parfor)” on page 1-4 “Execute Batch Jobs in Parallel” on page 1-5 “Partition Large Data Sets” on page 1-5

Run Parallel for-Loops (parfor)
Many applications involve multiple segments of code, some of which are repetitive. Often you can use for-loops to solve these cases. The ability to execute code in parallel, on one computer or on a cluster of computers, can significantly improve performance in many cases: ? Parameter sweep applications

-

Many iterations — A sweep might take a long time because it comprises many iterations. Each iteration by itself might not take long to execute, but to complete thousands or millions of iterations in serial could take a long time. Long iterations — A sweep might not have a lot of iterations, but each iteration could take a long time to run.

-

Typically, the only difference between iterations is defined by different input data. In these cases, the ability to run separate sweep iterations simultaneously can improve performance. Evaluating such iterations in parallel is an ideal way to sweep through large or multiple data sets. The only restriction on parallel loops is that no iterations be allowed to depend on any other iterations. ? Test suites with independent segments — For applications that run a series of unrelated tasks, you can run these tasks simultaneously on separate resources. You might not have used a for-loop for a case such as this comprising distinctly different tasks, but a parfor-loop could offer an appropriate solution. Parallel Computing Toolbox software improves the performance of such loop execution by allowing several MATLAB workers to execute individual loop iterations simultaneously. For example, a loop of 100 iterations could run on

1-4

Key Problems Addressed by Parallel Computing

a cluster of 20 MATLAB workers, so that simultaneously, the workers each execute only five iterations of the loop. You might not get quite 20 times improvement in speed because of communications overhead and network traffic, but the speedup should be significant. Even running local workers all on the same machine as the client, you might see significant performance improvement on a multicore/multiprocessor machine. So whether your loop takes a long time to run because it has many iterations or because each iteration takes a long time, you can improve your loop speed by distributing iterations to MATLAB workers.

Execute Batch Jobs in Parallel
When working interactively in a MATLAB session, you can offload work to a MATLAB worker session to run as a batch job. The command to perform this job is asynchronous, which means that your client MATLAB session is not blocked, and you can continue your own interactive session while the MATLAB worker is busy evaluating your code. The MATLAB worker can run either on the same machine as the client, or if using MATLAB Distributed Computing Server, on a remote cluster machine.

Partition Large Data Sets
If you have an array that is too large for your computer’s memory, it cannot be easily handled in a single MATLAB session. Parallel Computing Toolbox software allows you to distribute that array among multiple MATLAB workers, so that each worker contains only a part of the array. Yet you can operate on the entire array as a single entity. Each worker operates only on its part of the array, and workers automatically transfer data between themselves when necessary, as, for example, in matrix multiplication. A large number of matrix operations and functions have been enhanced to work directly with these arrays without further modification; see “Using MATLAB Functions on Codistributed Arrays” on page 5-26 and “Using MATLAB Constructor Functions” on page 5-11.

1-5

1

Getting Started

Introduction to Parallel Solutions
In this section... “Interactively Run a Loop in Parallel” on page 1-6 “Run a Batch Job” on page 1-8 “Run a Batch Parallel Loop” on page 1-9 “Run Script as Batch Job from the Current Folder Browser” on page 1-11 “Distribute Arrays and Run SPMD” on page 1-12

Interactively Run a Loop in Parallel
This section shows how to modify a simple for-loop so that it runs in parallel. This loop does not have a lot of iterations, and it does not take long to execute, but you can apply the principles to larger loops. For these simple examples, you might not notice an increase in execution speed.
1 Suppose your code includes a loop to create a sine wave and plot the

waveform:
for i=1:1024 A(i) = sin(i*2*pi/1024); end plot(A)
2 To interactively run code that contains a parallel loop, you first open a

MATLAB pool. This reserves a collection of MATLAB worker sessions to run your loop iterations. The MATLAB pool can consist of MATLAB worker sessions running on your local machine or on a remote cluster:
matlabpool open local 3

In this example, local refers to the name of a cluster profile which specifies that the workers are to run on your local machine, not on a network; and the 3 specifies the number of workers to use for the pool.
3 With the MATLAB pool reserved, you can modify your code to run your loop

in parallel by using a parfor statement:

1-6

Introduction to Parallel Solutions

parfor i=1:1024 A(i) = sin(i*2*pi/1024); end plot(A)

The only difference in this loop is the keyword parfor instead of for. After the loop runs, the results look the same as those generated from the previous for-loop.
MATLAB? workers parfor MATLAB? client

Because the iterations run in parallel in other MATLAB sessions, each iteration must be completely independent of all other iterations. The worker calculating the value for A(100) might not be the same worker calculating A(500). There is no guarantee of sequence, so A(900) might be calculated before A(400). (The MATLAB Editor can help identify some problems with parfor code that might not contain independent iterations.) The only place where the values of all the elements of the array A are available is in the MATLAB client, after the data returns from the MATLAB workers and the loop completes.
4 When you are finished with your code, close the MATLAB pool and release

the workers:
matlabpool close

For more information on parfor-loops, see “Parallel for-Loops (parfor)”.

1-7

1

Getting Started

The examples in this section run on three local workers. With cluster profiles, you can control how many workers run your loops, and whether the workers are local or on a cluster. For more information on profiles, see “Cluster Profiles” on page 6-12. You can run Simulink models in parallel loop iterations with the sim command inside your loop. For more information and examples of using Simulink with parfor, see “Run Parallel Simulations” in the Simulink documentation.

Run a Batch Job
To offload work from your MATLAB session to another session, you can use the batch command. This example uses the for-loop from the last section inside a script.
1 To create the script, type:

edit mywave
2 In the MATLAB Editor, enter the text of the for-loop:

for i=1:1024 A(i) = sin(i*2*pi/1024); end
3 Save the file and close the Editor. 4 Use the batch command in the MATLAB Command Window to run your

script on a separate MATLAB worker:
job = batch('mywave')
MATLAB? client batch MATLAB? worker

5 The batch command does not block MATLAB, so you must wait for the job

to finish before you can retrieve and view its results:
wait(job)

1-8

Introduction to Parallel Solutions

6 The load command transfers variables from the workspace of the worker to

the workspace of the client, where you can view the results:
load(job, 'A') plot(A)
7 When the job is complete, permanently remove its data:

delete(job)

Run a Batch Parallel Loop
You can combine the abilities to offload a job and run a parallel loop. In the previous two examples, you modified a for-loop to make a parfor-loop, and you submitted a script with a for-loop as a batch job. This example combines the two to create a batch parfor-loop.
1 Open your script in the MATLAB Editor:

edit mywave
2 Modify the script so that the for statement is a parfor statement:

parfor i=1:1024 A(i) = sin(i*2*pi/1024); end
3 Save the file and close the Editor. 4 Run the script in MATLAB with the batch command as before, but indicate

that the script should use a MATLAB pool for the parallel loop:
job = batch('mywave', 'matlabpool', 3)

This command specifies that three workers (in addition to the one running the batch script) are to evaluate the loop iterations. Therefore, this example uses a total of four local workers, including the one worker running the batch script.

1-9

1

Getting Started

MATLAB? client batch

MATLAB? workers

parfor

5 To view the results:

wait(job) load(job, 'A') plot(A)

The results look the same as before, however, there are two important differences in execution: ? The work of defining the parfor-loop and accumulating its results are offloaded to another MATLAB session (batch). ? The loop iterations are distributed from one MATLAB worker to another set of workers running simultaneously (matlabpool and parfor), so the loop might run faster than having only one worker execute it.
6 When the job is complete, permanently remove its data:

delete(job)

1-10

Introduction to Parallel Solutions

Run Script as Batch Job from the Current Folder Browser
From the Current Folder browser, you can run a MATLAB script as a batch job by browsing to the file’s folder, right-clicking the file, and selecting Run Script as Batch Job. The batch job runs on the cluster identified by the current default cluster profile. The following figure shows the menu option to run the script file script1.m:

Running a script as a batch from the browser uses only one worker from the cluster. So even if the script contains a parfor loop or spmd block, it does not open an additional pool of workers on the cluster. These code blocks execute on the single worker used for the batch job. If your batch script requires opening an additional pool of workers, you can run it from the command line, as described in “Run a Batch Parallel Loop” on page 1-9. When you run a batch job from the browser, this also opens the Job Monitor. The Job Monitor is a tool that lets you track your job in the scheduler queue. For more information about the Job Monitor and its capabilities, see “Job Monitor” on page 6-25.

1-11

1

Getting Started

Distribute Arrays and Run SPMD
Distributed Arrays
The workers in a MATLAB pool communicate with each other, so you can distribute an array among the workers. Each worker contains part of the array, and all the workers are aware of which portion of the array each worker has. First, open the MATLAB pool:
matlabpool open % Use default parallel profile

Use the distributed function to distribute an array among the workers:
M = magic(4) % a 4-by-4 magic square in the client workspace MM = distributed(M)

Now MM is a distributed array, equivalent to M, and you can manipulate or access its elements in the same way as any other array.
M2 = 2*MM; % M2 is also distributed, calculation performed on workers x = M2(1,1) % x on the client is set to first element of M2

When you are finished and have no further need of data from the workers, you can close the MATLAB pool. Data on the workers does not persist from one instance of a MATLAB pool to another.
matlabpool close

Single Program Multiple Data
The single program multiple data (spmd) construct lets you define a block of code that runs in parallel on all the workers (workers) in the MATLAB pool. The spmd block can run on some or all the workers in the pool.
matlabpool % Use default parallel profile spmd % By default uses all workers in the pool R = rand(4); end

1-12

Introduction to Parallel Solutions

This code creates an individual 4-by-4 matrix, R, of random numbers on each worker in the pool.

Composites
Following an spmd statement, in the client context, the values from the block are accessible, even though the data is actually stored on the workers. On the client, these variables are called Composite objects. Each element of a composite is a symbol referencing the value (data) on a worker in the pool. Note that because a variable might not be defined on every worker, a Composite might have undefined elements. Continuing with the example from above, on the client, the Composite R has one element for each worker:
X = R{3}; % Set X to the value of R from worker 3.

The line above retrieves the data from worker 3 to assign the value of X. The following code sends data to worker 3:
X = X + 2; R{3} = X; % Send the value of X from the client to worker 3.

If the MATLAB pool remains open between spmd statements and the same workers are used, the data on each worker persists from one spmd statement to another.
spmd R = R + labindex end % Use values of R from previous spmd.

A typical use for spmd is to run the same code on a number of workers, each of which accesses a different set of data. For example:
spmd INP = load(['somedatafile' num2str(labindex) '.mat']); RES = somefun(INP) end

Then the values of RES on the workers are accessible from the client as RES{1} from worker 1, RES{2} from worker 2, etc.

1-13

1

Getting Started

There are two forms of indexing a Composite, comparable to indexing a cell array: ? AA{n} returns the values of AA from worker n. ? AA(n) returns a cell array of the content of AA from worker n. When you are finished with all spmd execution and have no further need of data from the workers, you can close the MATLAB pool.
matlabpool close

Although data persists on the workers from one spmd block to another as long as the MATLAB pool remains open, data does not persist from one instance of a MATLAB pool to another. For more information about using distributed arrays, spmd, and Composites, see “Distributed Arrays and SPMD”.

1-14

Determine Product Installation and Versions

Determine Product Installation and Versions
To determine if Parallel Computing Toolbox software is installed on your system, type this command at the MATLAB prompt.
ver

When you enter this command, MATLAB displays information about the version of MATLAB you are running, including a list of all toolboxes installed on your system and their version numbers. If you want to run your applications on a cluster, see your system administrator to verify that the version of Parallel Computing Toolbox you are using is the same as the version of MATLAB Distributed Computing Server installed on your cluster.

1-15

1

Getting Started

1-16

2
Parallel for-Loops (parfor)
? “Getting Started with parfor” on page 2-2 ? “Programming Considerations” on page 2-9 ? “Advanced Topics” on page 2-17

2

Parallel for-Loops (parfor)

Getting Started with parfor
In this section... “parfor-Loops in MATLAB” on page 2-2 “Deciding When to Use parfor” on page 2-3 “Creating a parfor-Loop” on page 2-4 “Differences Between for-Loops and parfor-Loops” on page 2-6 “Reduction Assignments: Values Updated by Each Iteration” on page 2-7 “Displaying Output” on page 2-8

parfor-Loops in MATLAB
The basic concept of a parfor-loop in MATLAB software is the same as the standard MATLAB for-loop: MATLAB executes a series of statements (the loop body) over a range of values. Part of the parfor body is executed on the MATLAB client (where the parfor is issued) and part is executed in parallel on MATLAB workers. The necessary data on which parfor operates is sent from the client to workers, where most of the computation happens, and the results are sent back to the client and pieced together. Because several MATLAB workers can be computing concurrently on the same loop, a parfor-loop can provide significantly better performance than its analogous for-loop. Each execution of the body of a parfor-loop is an iteration. MATLAB workers evaluate iterations in no particular order, and independently of each other. Because each iteration is independent, there is no guarantee that the iterations are synchronized in any way, nor is there any need for this. If the number of workers is equal to the number of loop iterations, each worker performs one iteration of the loop. If there are more iterations than workers, some workers perform more than one loop iteration; in this case, a worker might receive multiple iterations at once to reduce communication time.

2-2

Getting Started with parfor

Deciding When to Use parfor
A parfor-loop is useful in situations where you need many loop iterations of a simple calculation, such as a Monte Carlo simulation. parfor divides the loop iterations into groups so that each worker executes some portion of the total number of iterations. parfor-loops are also useful when you have loop iterations that take a long time to execute, because the workers can execute iterations simultaneously. You cannot use a parfor-loop when an iteration in your loop depends on the results of other iterations. Each iteration must be independent of all others. Since there is a communications cost involved in a parfor-loop, there might be no advantage to using one when you have only a small number of simple calculations. The examples of this section are only to illustrate the behavior of parfor-loops, not necessarily to show the applications best suited to them.

2-3

2

Parallel for-Loops (parfor)

Creating a parfor-Loop
Set Up MATLAB Resources Using matlabpool
You use the function matlabpool to reserve a number of MATLAB workers for executing a subsequent parfor-loop. Depending on your scheduler, the workers might be running remotely on a cluster, or they might run locally on your MATLAB client machine. You identify a cluster by selecting a cluster profile. For a description of how to manage and use profiles, see “Cluster Profiles” on page 6-12. To begin the examples of this section, allocate local MATLAB workers for the evaluation of your loop iterations:
matlabpool

This command starts the number of MATLAB worker sessions defined by the default cluster. If the local profile is your default and does not specify the number of workers, this starts one worker per core (maximum of twelve) on your local MATLAB client machine. Note If matlabpool is not running, a parfor-loop runs serially on the client without regard for iteration sequence.

2-4

Getting Started with parfor

Program the Loop
The safest assumption about a parfor-loop is that each iteration of the loop is evaluated by a different MATLAB worker. If you have a for-loop in which all iterations are completely independent of each other, this loop is a good candidate for a parfor-loop. Basically, if one iteration depends on the results of another iteration, these iterations are not independent and cannot be evaluated in parallel, so the loop does not lend itself easily to conversion to a parfor-loop. The following examples produce equivalent results, with a for-loop on the left, and a parfor-loop on the right. Try typing each in your MATLAB Command Window:

clear A for i = 1:8 A(i) = i; end A

clear A parfor i = 1:8 A(i) = i; end A

Notice that each element of A is equal to its index. The parfor-loop works because each element depends only upon its iteration of the loop, and upon no other iterations. for-loops that merely repeat such independent tasks are ideally suited candidates for parfor-loops.

2-5

2

Parallel for-Loops (parfor)

Differences Between for-Loops and parfor-Loops
Because parfor-loops are not quite the same as for-loops, there are special behaviors to be aware of. As seen from the preceding example, when you assign to an array variable (such as A in that example) inside the loop by indexing with the loop variable, the elements of that array are available to you after the loop, much the same as with a for-loop. However, suppose you use a nonindexed variable inside the loop, or a variable whose indexing does not depend on the loop variable i. Try these examples and notice the values of d and i afterward:

clear A d = 0; i = 0; for i = 1:4 d = i*2; A(i) = d; end A d i

clear A d = 0; i = 0; parfor i = 1:4 d = i*2; A(i) = d; end A d i

Although the elements of A come out the same in both of these examples, the value of d does not. In the for-loop above on the left, the iterations execute in sequence, so afterward d has the value it held in the last iteration of the loop. In the parfor-loop on the right, the iterations execute in parallel, not in sequence, so it would be impossible to assign d a definitive value at the end of the loop. This also applies to the loop variable, i. Therefore, parfor-loop behavior is defined so that it does not affect the values d and i outside the loop at all, and their values remain the same before and after the loop. So, a parfor-loop requires that each iteration be independent of the other iterations, and that all code that follows the parfor-loop not depend on the loop iteration sequence.

2-6

Getting Started with parfor

Reduction Assignments: Values Updated by Each Iteration
The next two examples show parfor-loops using reduction assignments. A reduction is an accumulation across iterations of a loop. The example on the left uses x to accumulate a sum across 10 iterations of the loop. The example on the right generates a concatenated array, 1:10. In both of these examples, the execution order of the iterations on the workers does not matter: while the workers calculate individual results, the client properly accumulates or assembles the final loop result.

x = 0; parfor i = 1:10 x = x + i; end x

x2 = []; n = 10; parfor i = 1:n x2 = [x2, i]; end x2

If the loop iterations operate in random sequence, you might expect the concatenation sequence in the example on the right to be nonconsecutive. However, MATLAB recognizes the concatenation operation and yields deterministic results. The next example, which attempts to compute Fibonacci numbers, is not a valid parfor-loop because the value of an element of f in one iteration depends on the values of other elements of f calculated in other iterations.
f = zeros(1,50); f(1) = 1; f(2) = 2; parfor n = 3:50 f(n) = f(n-1) + f(n-2); end

When you are finished with your loop examples, clear your workspace and close or release your pool of workers:
clear matlabpool close

The following sections provide further information regarding programming considerations and limitations for parfor-loops.

2-7

2

Parallel for-Loops (parfor)

Displaying Output
When running a parfor-loop on a MATLAB pool, all command-line output from the workers displays in the client Command Window, except output from variable assignments. Because the workers are MATLAB sessions without displays, any graphical output (for example, figure windows) from the pool does not display at all.

2-8

Programming Considerations

Programming Considerations
In this section... “MATLAB Path” on page 2-9 “Error Handling” on page 2-9 “Limitations” on page 2-10 “Using Objects in parfor Loops” on page 2-15 “Performance Considerations” on page 2-15 “Compatibility with Earlier Versions of MATLAB Software” on page 2-16

MATLAB Path
All workers executing a parfor-loop must have the same MATLAB search path as the client, so that they can execute any functions called in the body of the loop. Therefore, whenever you use cd, addpath, or rmpath on the client, it also executes on all the workers, if possible. For more information, see the matlabpool reference page. When the workers are running on a different platform than the client, use the function pctRunOnAll to properly set the MATLAB search path on all workers. Functions files that contain parfor-loops must be available on the search path of the workers in the pool running the parfor, or made available to the workers by the AttachedFiles or AdditionalPaths setting of the MATLAB pool.

Error Handling
When an error occurs during the execution of a parfor-loop, all iterations that are in progress are terminated, new ones are not initiated, and the loop terminates. Errors and warnings produced on workers are annotated with the worker ID and displayed in the client’s Command Window in the order in which they are received by the client MATLAB.

2-9

2

Parallel for-Loops (parfor)

The behavior of lastwarn is unspecified at the end of the parfor if used within the loop body.

Limitations
Unambiguous Variable Names
If you use a name that MATLAB cannot unambiguously distinguish as a variable inside a parfor-loop, at parse time MATLAB assumes you are referencing a function. Then at run-time, if the function cannot be found, MATLAB generates an error. (See “Variable Names” in the MATLAB documentation.) For example, in the following code f(5) could refer either to the fifth element of an array named f, or to a function named f with an argument of 5. If f is not clearly defined as a variable in the code, MATLAB looks for the function f on the path when the code runs.
parfor i=1:n ... a = f(5); ... end

Transparency
The body of a parfor-loop must be transparent, meaning that all references to variables must be “visible” (i.e., they occur in the text of the program). In the following example, because X is not visible as an input variable in the parfor body (only the string 'X' is passed to eval), it does not get transferred to the workers. As a result, MATLAB issues an error at run time:
X = 5; parfor ii = 1:4 eval('X'); end

Similarly, you cannot clear variables from a worker’s workspace by executing clear inside a parfor statement:
parfor ii= 1:4

2-10

Programming Considerations

<statements...> clear('X') % cannot clear: transparency violation <statements...> end

As a workaround, you can free up most of the memory used by a variable by setting its value to empty, presumably when it is no longer needed in your parfor statement:
parfor ii= 1:4 <statements...> X = []; <statements...> end

Examples of some other functions that violate transparency are evalc, evalin, and assignin with the workspace argument specified as 'caller'; save and load, unless the output of load is assigned to a variable. Running a script from within a parfor-loop can cause a transparency violation if the script attempts to access (read or write) variables of the parent workspace; to avoid this issue, convert the script to a function and call it with the necessary variables as input or output arguments. MATLAB does successfully execute eval and evalc statements that appear in functions called from the parfor body.

Sliced Variables Referencing Function Handles
Because of the way sliced input variables are segmented and distributed to the workers in the pool, you cannot use a sliced input variable to reference a function handle. If you need to call a function handle with the parfor index variable as an argument, use feval. For example, suppose you had a for-loop that performs:
B = @sin; for ii = 1:100 A(ii) = B(ii); end

2-11

2

Parallel for-Loops (parfor)

A corresponding parfor-loop does not allow B to reference a function handle. So you can work around the problem with feval:
B = @sin; parfor ii = 1:100 A(ii) = feval(B, ii); end

Nondistributable Functions
If you use a function that is not strictly computational in nature (e.g., input, plot, keyboard) in a parfor-loop or in any function called by a parfor-loop, the behavior of that function occurs on the worker. The results might include hanging the worker process or having no visible effect at all.

Nested Functions
The body of a parfor-loop cannot make reference to a nested function. However, it can call a nested function by means of a function handle.

Nested Loops
The body of a parfor-loop cannot contain another parfor-loop. But it can call a function that contains another parfor-loop. However, because a worker cannot open a MATLAB pool, a worker cannot run the inner nested parfor-loop in parallel. This means that only one level of nested parfor-loops can run in parallel. If the outer loop runs in parallel on a MATLAB pool, the inner loop runs serially on each worker. If the outer loop runs serially in the client (e.g., parfor specifying zero workers), the function that contains the inner loop can run the inner loop in parallel on workers in a pool. The body of a parfor-loop can contain for-loops. You can use the inner loop variable for indexing the sliced array, but only if you use the variable in plain form, not part of an expression. For example:
A = zeros(4,5); parfor j = 1:4 for k = 1:5 A(j,k) = j + k;

2-12

Programming Considerations

end end A

Further nesting of for-loops with a parfor is also allowed. Limitations of Nested for-Loops. For proper variable classification, the range of a for-loop nested in a parfor must be defined by constant numbers or variables. In the following example, the code on the left does not work because the for-loop upper limit is defined by a function call. The code on the right works around this by defining a broadcast or constant variable outside the parfor first:

A = zeros(100, 200); parfor i = 1:size(A, 1) for j = 1:size(A, 2) A(i, j) = plus(i, j); end end

A = zeros(100, 200); n = size(A, 2); parfor i = 1:size(A,1) for j = 1:n A(i, j) = plus(i, j); end end

When using the nested for-loop variable for indexing the sliced array, you must use the variable in plain form, not as part of an expression. For example, the following code on the left does not work, but the code on the right does:

A = zeros(4, 11); parfor i = 1:4 for j = 1:10 A(i, j + 1) = i + j; end end

A = zeros(4, 11); parfor i = 1:4 for j = 2:11 A(i, j) = i + j + 1; end end

If you use a nested for-loop to index into a sliced array, you cannot use that array elsewhere in the parfor-loop. For example, in the following example, the code on the left does not work because A is sliced and indexed inside the nested for-loop; the code on the right works because v is assigned to A outside the nested loop:

2-13

2

Parallel for-Loops (parfor)

A = zeros(4, 10); parfor i = 1:4 for j = 1:10 A(i, j) = i + j; end disp(A(i, 1)) end

A = zeros(4, 10); parfor i = 1:4 v = zeros(1, 10); for j = 1:10 v(j) = i + j; end disp(v(1)) A(i, :) = v; end

Inside a parfor, if you use multiple for-loops (not nested inside each other) to index into a single sliced array, they must loop over the same range of values. In the following example, the code on the left does not work because j and k loop over different values; the code on the right works to index different portions of the sliced array A:

A = zeros(4, 10); parfor i = 1:4 for j = 1:5 A(i, j) = i + j; end for k = 6:10 A(i, k) = pi; end end

A = zeros(4, 10); parfor i = 1:4 for j = 1:10 if j < 6 A(i, j) = i + j; else A(i, j) = pi; end end end

Nested spmd Statements
The body of a parfor-loop cannot contain an spmd statement, and an spmd statement cannot contain a parfor-loop.

Break and Return Statements
The body of a parfor-loop cannot contain break or return statements.

2-14

Programming Considerations

Global and Persistent Variables
The body of a parfor-loop cannot contain global or persistent variable declarations.

Handle Classes
Changes made to handle classes on the workers during loop iterations are not automatically propagated to the client.

P-Code Scripts
You can call P-code script files from within a parfor-loop, but P-code script cannot contain a parfor-loop.

Using Objects in parfor Loops
If you are passing objects into or out of a parfor-loop, the objects must properly facilitate being saved and loaded. For more information, see “Understanding the Save and Load Process”.

Performance Considerations
Slicing Arrays
If a variable is initialized before a parfor-loop, then used inside the parfor-loop, it has to be passed to each MATLAB worker evaluating the loop iterations. Only those variables used inside the loop are passed from the client workspace. However, if all occurrences of the variable are indexed by the loop variable, each worker receives only the part of the array it needs. For more information, see “Where to Create Arrays” on page 2-32.

Local vs. Cluster Workers
Running your code on local workers might offer the convenience of testing your application without requiring the use of cluster resources. However, there are certain drawbacks or limitations with using local workers. Because the transfer of data does not occur over the network, transfer behavior on local workers might not be indicative of how it will typically occur over a network. For more details, see “Optimizing on Local vs. Cluster Workers” on page 2-33.

2-15

2

Parallel for-Loops (parfor)

Compatibility with Earlier Versions of MATLAB Software
In versions of MATLAB prior to 7.5 (R2007b), the keyword parfor designated a more limited style of parfor-loop than what is available in MATLAB 7.5 and later. This old style was intended for use with codistributed arrays (such as inside an spmd statement or a parallel job), and has been replaced by a for-loop that uses drange to define its range; see “Looping Over a Distributed Range (for-drange)” on page 5-22. The past and current functionality of the parfor keyword is outlined in the following table: Functionality Parallel loop for codistributed arrays Syntax Prior to MATLAB 7.5
parfor i = range loop body . . end

Current Syntax

for i = drange(range) loop body . . end parfor i = range loop body . . end

Parallel loop for implicit distribution of work

Not Implemented

2-16

Advanced Topics

Advanced Topics
In this section... “About Programming Notes” on page 2-17 “Classification of Variables” on page 2-17 “Improving Performance” on page 2-32

About Programming Notes
This section presents guidelines and restrictions in shaded boxes like the one shown below. Those labeled as Required result in an error if your parfor code does not adhere to them. MATLAB software catches some of these errors at the time it reads the code, and others when it executes the code. These are referred to here as static and dynamic errors, respectively, and are labeled as Required (static) or Required (dynamic). Guidelines that do not cause errors are labeled as Recommended. You can use MATLAB Code Analyzer to help make your parfor-loops comply with these guidelines. Required (static): Description of the guideline or restriction

Classification of Variables
? “Overview” on page 2-17 ? “Loop Variable” on page 2-18 ? “Sliced Variables” on page 2-19 ? “Broadcast Variables” on page 2-23 ? “Reduction Variables” on page 2-23 ? “Temporary Variables” on page 2-30

Overview
When a name in a parfor-loop is recognized as referring to a variable, it is classified into one of the following categories. A parfor-loop generates an

2-17

2

Parallel for-Loops (parfor)

error if it contains any variables that cannot be uniquely categorized or if any variables violate their category restrictions. Classification Loop Sliced Broadcast Reduction Temporary Description Serves as a loop index for arrays An array whose segments are operated on by different iterations of the loop A variable defined before the loop whose value is used inside the loop, but never assigned inside the loop Accumulates a value across iterations of the loop, regardless of iteration order Variable created inside the loop, but unlike sliced or reduction variables, not available outside the loop

Each of these variable classifications appears in this code fragment:

temporary variable reduction variable sliced output variable

loop variable sliced input variable broadcast variable

Loop Variable
The following restriction is required, because changing i in the parfor body invalidates the assumptions MATLAB makes about communication between the client and workers.

2-18

Advanced Topics

Required (static): Assignments to the loop variable are not allowed. This example attempts to modify the value of the loop variable i in the body of the loop, and thus is invalid:
parfor i = 1:n i = i + 1; a(i) = i; end

Sliced Variables
A sliced variable is one whose value can be broken up into segments, or slices, which are then operated on separately by workers and by the MATLAB client. Each iteration of the loop works on a different slice of the array. Using sliced variables is important because this type of variable can reduce communication between the client and workers. Only those slices needed by a worker are sent to it, and only when it starts working on a particular range of indices. In the next example, a slice of A consists of a single element of that array:
parfor i = 1:length(A) B(i) = f(A(i)); end

Characteristics of a Sliced Variable. A variable in a parfor-loop is sliced if it has all of the following characteristics. A description of each characteristic follows the list: ? Type of First-Level Indexing — The first level of indexing is either parentheses, (), or braces, {}. ? Fixed Index Listing — Within the first-level parenthesis or braces, the list of indices is the same for all occurrences of a given variable. ? Form of Indexing — Within the list of indices for the variable, exactly one index involves the loop variable. ? Shape of Array — In assigning to a sliced variable, the right-hand side of the assignment is not [] or '' (these operators indicate deletion of elements).

2-19

2

Parallel for-Loops (parfor)

Type of First-Level Indexing. For a sliced variable, the first level of indexing is enclosed in either parentheses, (), or braces, {}. This table lists the forms for the first level of indexing for arrays sliced and not sliced. Reference for Variable Not Sliced
A.x A.(...)

Reference for Sliced Variable
A(...) A{...}

After the first level, you can use any type of valid MATLAB indexing in the second and further levels. The variable A shown here on the left is not sliced; that shown on the right is sliced:
A.q{i,12} A{i,12}.q

Fixed Index Listing. Within the first-level parentheses or braces of a sliced variable’s indexing, the list of indices is the same for all occurrences of a given variable. The variable A shown here on the left is not sliced because A is indexed by i and i+1 in different places; that shown on the right is sliced:

parfor i = 1:k B(:) = h(A(i), A(i+1)); end

parfor i = 1:k B(:) = f(A(i)); C(:) = g(A{i}); end

The example above on the right shows some occurrences of a sliced variable with first-level parenthesis indexing and with first-level brace indexing in the same loop. This is acceptable. Form of Indexing. Within the list of indices for a sliced variable, one of these indices is of the form i, i+k, i-k, k+i, or k-i, where i is the loop variable and

2-20

Advanced Topics

k is a constant or a simple (nonindexed) broadcast variable; and every other index is a constant, a simple broadcast variable, colon, or end.

With i as the loop variable, the A variables shown here on the left are not sliced; those on the right are sliced:

A(i+f(k),j,:,3) A(i,20:30,end) A(i,:,s.field1)

A(i+k,j,:,3) A(i,:,end) A(i,:,k)

When you use other variables along with the loop variable to index an array, you cannot set these variables inside the loop. In effect, such variables are constant over the execution of the entire parfor statement. You cannot combine the loop variable with itself to form an index expression. Shape of Array. A sliced variable must maintain a constant shape. The variable A shown here on either line is not sliced:
A(i,:) = []; A(end + 1) = i;

The reason A is not sliced in either case is because changing the shape of a sliced array would violate assumptions governing communication between the client and workers. Sliced Input and Output Variables. All sliced variables have the characteristics of being input or output. A sliced variable can sometimes have both characteristics. MATLAB transmits sliced input variables from the client to the workers, and sliced output variables from workers back to the client. If a variable is both input and output, it is transmitted in both directions.

2-21

2

Parallel for-Loops (parfor)

In this parfor-loop, r is a sliced input variable and b is a sliced output variable:
a = 0; z = 0; r = rand(1,10); parfor ii = 1:10 a = ii; z = z + ii; b(ii) = r(ii); end

However, if it is clear that in every iteration, every reference to an array element is set before it is used, the variable is not a sliced input variable. In this example, all the elements of A are set, and then only those fixed values are used:
parfor ii = 1:n if someCondition A(ii) = 32; else A(ii) = 17; end loop code that uses A(ii) end

Even if a sliced variable is not explicitly referenced as an input, implicit usage might make it so. In the following example, not all elements of A are necessarily set inside the parfor-loop, so the original values of the array are received, held, and then returned from the loop, making A both a sliced input and output variable.
A = 1:10; parfor ii = 1:10 if rand < 0.5 A(ii) = 0; end end

2-22

Advanced Topics

Broadcast Variables
A broadcast variable is any variable other than the loop variable or a sliced variable that is not affected by an assignment inside the loop. At the start of a parfor-loop, the values of any broadcast variables are sent to all workers. Although this type of variable can be useful or even essential, broadcast variables that are large can cause a lot of communication between client and workers. In some cases it might be more efficient to use temporary variables for this purpose, creating and assigning them inside the loop.

Reduction Variables
MATLAB supports an important exception, called reductions, to the rule that loop iterations must be independent. A reduction variable accumulates a value that depends on all the iterations together, but is independent of the iteration order. MATLAB allows reduction variables in parfor-loops. Reduction variables appear on both side of an assignment statement, such as any of the following, where expr is a MATLAB expression.
X = X + expr X = X - expr X = expr + X

See Associativity in Reduction Assignments in “Further Considerations with Reduction Variables” on page 2-25
X = expr .* X X = expr * X X = expr & X X = expr | X X = [expr, X] X = [expr; X] X = {expr, X} X = {expr; X} X = min(expr, X) X = max(expr, X)

X = X .* expr X = X * expr X = X & expr X = X | expr X = [X, expr] X = [X; expr] X = {X, expr} X = {X; expr} X = min(X, expr) X = max(X, expr)

2-23

2

Parallel for-Loops (parfor)

X = union(X, expr) X = intersect(X, expr)

X = union(expr, X) X = intersect(expr, X)

Each of the allowed statements listed in this table is referred to as a reduction assignment, and, by definition, a reduction variable can appear only in assignments of this type. The following example shows a typical usage of a reduction variable X:
X = ...; parfor i = 1:n X = X + d(i); end % Do some initialization of X

This loop is equivalent to the following, where each d(i) is calculated by a different iteration:
X = X + d(1) + ... + d(n)

If the loop were a regular for-loop, the variable X in each iteration would get its value either before entering the loop or from the previous iteration of the loop. However, this concept does not apply to parfor-loops: In a parfor-loop, the value of X is never transmitted from client to workers or from worker to worker. Rather, additions of d(i) are done in each worker, with i ranging over the subset of 1:n being performed on that worker. The results are then transmitted back to the client, which adds the workers’ partial sums into X. Thus, workers do some of the additions, and the client does the rest. Basic Rules for Reduction Variables. The following requirements further define the reduction assignments associated with a given variable. Required (static): For any reduction variable, the same reduction function or operation must be used in all reduction assignments for that variable. The parfor-loop on the left is not valid because the reduction assignment uses + in one instance, and [,] in another. The parfor-loop on the right is valid:

2-24

Advanced Topics

parfor i = 1:n if testLevel(k) A = A + i; else A = [A, 4+i]; end % loop body continued end

parfor i = 1:n if testLevel(k) A = A + i; else A = A + i + 5*k; end % loop body continued end

Required (static): If the reduction assignment uses * or [,], then in every reduction assignment for X, X must be consistently specified as the first argument or consistently specified as the second. The parfor-loop on the left below is not valid because the order of items in the concatenation is not consistent throughout the loop. The parfor-loop on the right is valid:

parfor i = 1:n if testLevel(k) A = [A, 4+i]; else A = [r(i), A]; end % loop body continued end

parfor i = 1:n if testLevel(k) A = [A, 4+i]; else A = [A, r(i)]; end % loop body continued end

Further Considerations with Reduction Variables. This section provide more detail about reduction assignments, associativity, commutativity, and overloading of reduction functions. Reduction Assignments. In addition to the specific forms of reduction assignment listed in the table in “Reduction Variables” on page 2-23, the only other (and more general) form of a reduction assignment is
X = f(X, expr) X = f(expr, X)

2-25

2

Parallel for-Loops (parfor)

Required (static): f can be a function or a variable. If it is a variable, it must not be affected by the parfor body (in other words, it is a broadcast variable). If f is a variable, then for all practical purposes its value at run time is a function handle. However, this is not strictly required; as long as the right-hand side can be evaluated, the resulting value is stored in X. The parfor-loop below on the left will not execute correctly because the statement f = @times causes f to be classified as a temporary variable and therefore is cleared at the beginning of each iteration. The parfor on the right is correct, because it does not assign to f inside the loop:

f = @(x,k)x * k; parfor i = 1:n a = f(a,i); % loop body continued f = @times; % Affects f end

f = @(x,k)x * k; parfor i = 1:n a = f(a,i); % loop body continued end

Note that the operators && and || are not listed in the table in “Reduction Variables” on page 2-23. Except for && and ||, all the matrix operations of MATLAB have a corresponding function f, such that u op v is equivalent to f(u,v). For && and ||, such a function cannot be written because u&&v and u||v might or might not evaluate v, but f(u,v) always evaluates v before calling f. This is why && and || are excluded from the table of allowed reduction assignments for a parfor-loop. Every reduction assignment has an associated function f. The properties of f that ensure deterministic behavior of a parfor statement are discussed in the following sections. Associativity in Reduction Assignments. Concerning the function f as used in the definition of a reduction variable, the following practice is recommended, but does not generate an error if not adhered to. Therefore, it is up to you to ensure that your code meets this recommendation.

2-26

Advanced Topics

Recommended: To get deterministic behavior of parfor-loops, the reduction function f must be associative. To be associative, the function f must satisfy the following for all a, b, and c:
f(a,f(b,c)) = f(f(a,b),c)

The classification rules for variables, including reduction variables, are purely syntactic. They cannot determine whether the f you have supplied is truly associative or not. Associativity is assumed, but if you violate this, different executions of the loop might result in different answers. Note While the addition of mathematical real numbers is associative, addition of floating-point numbers is only approximately associative, and different executions of this parfor statement might produce values of X with different round-off errors. This is an unavoidable cost of parallelism. For example, the statement on the left yields 1, while the statement on the right returns 1 + eps:
(1 + eps/2) + eps/2 1 + (eps/2 + eps/2)

With the exception of the minus operator (-), all the special cases listed in the table in “Reduction Variables” on page 2-23 have a corresponding (perhaps approximately) associative function. MATLAB calculates the assignment X = X - expr by using X = X + (-expr). (So, technically, the function for calculating this reduction assignment is plus, not minus.) However, the assignment X = expr - X cannot be written using an associative function, which explains its exclusion from the table. Commutativity in Reduction Assignments. Some associative functions, including +, .*, min, and max, intersect, and union, are also commutative. That is, they satisfy the following for all a and b:
f(a,b) = f(b,a)

Examples of noncommutative functions are * (because matrix multiplication is not commutative for matrices in which both dimensions have size greater than one), [,], [;], {,}, and {;}. Noncommutativity is the reason that consistency

2-27

2

Parallel for-Loops (parfor)

in the order of arguments to these functions is required. As a practical matter, a more efficient algorithm is possible when a function is commutative as well as associative, and parfor is optimized to exploit commutativity. Recommended: Except in the cases of *, [,], [;], {,}, and {;}, the function f of a reduction assignment should be commutative. If f is not commutative, different executions of the loop might result in different answers. Unless f is a known noncommutative built-in, it is assumed to be commutative. There is currently no way to specify a user-defined, noncommutative function in parfor. Overloading in Reduction Assignments. Most associative functions f have an identity element e, so that for any a, the following holds true:
f(e,a) = a = f(a,e)

Examples of identity elements for some functions are listed in this table. Function
+ * and .* [,], [;], and union

Identity Element
0 1 []

MATLAB uses the identity elements of reduction functions when it knows them. So, in addition to associativity and commutativity, you should also keep identity elements in mind when overloading these functions. Recommended: An overload of +, *, .*, union, [,], or [;] should be associative if it is used in a reduction assignment in a parfor. The overload must treat the respective identity element given above (all with class double) as an identity element. Recommended: An overload of +, .*, union, or intersect should be commutative.

2-28

Advanced Topics

There is no way to specify the identity element for a function. In these cases, the behavior of parfor is a little less efficient than it is for functions with a known identity element, but the results are correct. Similarly, because of the special treatment of X = X - expr, the following is recommended. Recommended: An overload of the minus operator (-) should obey the mathematical law that X - (y + z) is equivalent to (X - y) - z. Example: Using a Custom Reduction Function. Suppose each iteration of a loop performs some calculation, and you are interested in finding which iteration of a loop produces the maximum value. This is a reduction exercise that makes an accumulation across multiple iterations of a loop. Your reduction function must compare iteration results, until finally the maximum value can be determined after all iterations are compared. First consider the reduction function itself. To compare an iteration’s result against another’s, the function requires as input the current iteration’s result and the known maximum result from other iterations so far. Each of the two inputs is a vector containing an iteration’s result data and iteration number.
function mc = comparemax(A, B) % Custom reduction function for 2-element vector input if A(1) >= B(1) % Compare the two input data values mc = A; % Return the vector with the larger result else mc = B; end

Inside the loop, each iteration calls the reduction function (comparemax), passing in a pair of 2-element vectors: ? The accumulated maximum and its iteration index (this is the reduction variable, cummax) ? The iteration’s own calculation value and index

2-29

2

Parallel for-Loops (parfor)

If the data value of the current iteration is greater than the maximum in cummmax, the function returns a vector of the new value and its iteration number. Otherwise, the function returns the existing maximum and its iteration number. The code for the loop looks like the following, with each iteration calling the reduction function comparemax to compare its own data [dat i] to that already accumulated in cummax.
% First element of cummax is maximum data value % Second element of cummax is where (iteration) maximum occurs cummax = [0 0]; % Initialize reduction variable parfor ii = 1:100 dat = rand(); % Simulate some actual computation cummax = comparemax(cummax, [dat ii]); end disp(cummax);

Temporary Variables
A temporary variable is any variable that is the target of a direct, nonindexed assignment, but is not a reduction variable. In the following parfor-loop, a and d are temporary variables:
a = 0; z = 0; r = rand(1,10); parfor i = 1:10 a = i; z = z + i; if i <= 5 d = 2*a; end end

% Variable a is temporary

% Variable d is temporary

In contrast to the behavior of a for-loop, MATLAB effectively clears any temporary variables before each iteration of a parfor-loop. To help ensure the independence of iterations, the values of temporary variables cannot be passed from one iteration of the loop to another. Therefore, temporary

2-30

Advanced Topics

variables must be set inside the body of a parfor-loop, so that their values are defined separately for each iteration. MATLAB does not send temporary variables back to the client. A temporary variable in the context of the parfor statement has no effect on a variable with the same name that exists outside the loop, again in contrast to ordinary for-loops. Uninitialized Temporaries. Because temporary variables are cleared at the beginning of every iteration, MATLAB can detect certain cases in which any iteration through the loop uses the temporary variable before it is set in that iteration. In this case, MATLAB issues a static error rather than a run-time error, because there is little point in allowing execution to proceed if a run-time error is guaranteed to occur. This kind of error often arises because of confusion between for and parfor, especially regarding the rules of classification of variables. For example, suppose you write
b = true; parfor i = 1:n if b && some_condition(i) do_something(i); b = false; end ... end

This loop is acceptable as an ordinary for-loop, but as a parfor-loop, b is a temporary variable because it occurs directly as the target of an assignment inside the loop. Therefore it is cleared at the start of each iteration, so its use in the condition of the if is guaranteed to be uninitialized. (If you change parfor to for, the value of b assumes sequential execution of the loop, so that do_something(i) is executed for only the lower values of i until b is set false.) Temporary Variables Intended as Reduction Variables. Another common cause of uninitialized temporaries can arise when you have a variable that you intended to be a reduction variable, but you use it elsewhere in the loop, causing it technically to be classified as a temporary variable. For example:

2-31

2

Parallel for-Loops (parfor)

s = 0; parfor i = 1:n s = s + f(i); ... if (s > whatever) ... end end

If the only occurrences of s were the two in the first statement of the body, it would be classified as a reduction variable. But in this example, s is not a reduction variable because it has a use outside of reduction assignments in the line s > whatever. Because s is the target of an assignment (in the first statement), it is a temporary, so MATLAB issues an error about this fact, but points out the possible connection with reduction. Note that if you change parfor to for, the use of s outside the reduction assignment relies on the iterations being performed in a particular order. The point here is that in a parfor-loop, it matters that the loop “does not care” about the value of a reduction variable as it goes along. It is only after the loop that the reduction value becomes usable.

Improving Performance
Where to Create Arrays
With a parfor-loop, it might be faster to have each MATLAB worker create its own arrays or portions of them in parallel, rather than to create a large array in the client before the loop and send it out to all the workers separately. Having each worker create its own copy of these arrays inside the loop saves the time of transferring the data from client to workers, because all the workers can be creating it at the same time. This might challenge your usual practice to do as much variable initialization before a for-loop as possible, so that you do not needlessly repeat it inside the loop. Whether to create arrays before the parfor-loop or inside the parfor-loop depends on the size of the arrays, the time needed to create them, whether the workers need all or part of the arrays, the number of loop iterations that each worker performs, and other factors. While many for-loops can be

2-32

Advanced Topics

directly converted to parfor-loops, even in these cases there might be other issues involved in optimizing your code.

Optimizing on Local vs. Cluster Workers
With local workers, because all the MATLAB worker sessions are running on the same machine, you might not see any performance improvement from a parfor-loop regarding execution time. This can depend on many factors, including how many processors and cores your machine has. You might experiment to see if it is faster to create the arrays before the loop (as shown on the left below), rather than have each worker create its own arrays inside the loop (as shown on the right). Try the following examples running a matlabpool locally, and notice the difference in time execution for each loop. First open a local matlabpool:
matlabpool

Then enter the following examples. (If you are viewing this documentation in the MATLAB help browser, highlight each segment of code below, right-click, and select Evaluate Selection in the context menu to execute the block in MATLAB. That way the time measurement will not include the time required to paste or type.)

tic; n = 200; M = magic(n); R = rand(n); parfor i = 1:n A(i) = sum(M(i,:).*R(n+1-i,:)); end toc

tic; n = 200; parfor i = 1:n M = magic(n); R = rand(n); A(i) = sum(M(i,:).*R(n+1-i,:)); end toc

Running on a remote cluster, you might find different behavior as workers can simultaneously create their arrays, saving transfer time. Therefore, code that is optimized for local workers might not be optimized for cluster workers, and vice versa.

2-33

2

Parallel for-Loops (parfor)

2-34

3
Single Program Multiple Data (spmd)
? “Executing Simultaneously on Multiple Data Sets” on page 3-2 ? “Accessing Data with Composites” on page 3-7 ? “Distributing Arrays” on page 3-12 ? “Programming Tips” on page 3-15

3

Single Program Multiple Data (spmd)

Executing Simultaneously on Multiple Data Sets
In this section... “Introduction” on page 3-2 “When to Use spmd” on page 3-2 “Setting Up MATLAB Resources Using matlabpool” on page 3-3 “Defining an spmd Statement” on page 3-4 “Displaying Output” on page 3-6

Introduction
The single program multiple data (spmd) language construct allows seamless interleaving of serial and parallel programming. The spmd statement lets you define a block of code to run simultaneously on multiple workers. Variables assigned inside the spmd statement on the workers allow direct access to their values from the client by reference via Composite objects. This chapter explains some of the characteristics of spmd statements and Composite objects.

When to Use spmd
The “single program” aspect of spmd means that the identical code runs on multiple workers. You run one program in the MATLAB client, and those parts of it labeled as spmd blocks run on the workers. When the spmd block is complete, your program continues running in the client. The “multiple data” aspect means that even though the spmd statement runs identical code on all workers, each worker can have different, unique data for that code. So multiple data sets can be accommodated by multiple workers. Typical applications appropriate for spmd are those that require running simultaneous execution of a program on multiple data sets, when communication or synchronization is required between the workers. Some common cases are:

3-2

Executing Simultaneously on Multiple Data Sets

? Programs that take a long time to execute — spmd lets several workers compute solutions simultaneously. ? Programs operating on large data sets — spmd lets the data be distributed to multiple workers.

Setting Up MATLAB Resources Using matlabpool
You use the function matlabpool to reserve a number of MATLAB workers for executing a subsequent spmd statement or parfor-loop. Depending on your scheduler, the workers might be running remotely on a cluster, or they might run locally on your MATLAB client machine. You identify a cluster by selecting a cluster profile. For a description of how to manage and use profiles, see “Cluster Profiles” on page 6-12. To begin the examples of this section, allocate local MATLAB workers for the evaluation of your spmd statement:
matlabpool

This command starts the number of MATLAB worker sessions defined by the default cluster profile. If the local profile is your default and does not specify the number of workers, this starts one worker per core (maximum of twelve) on your local MATLAB client machine. If you do not want to use default settings, you can specify in the matlabpool statement which profile or how many workers to use. For example, to use only three workers with your default profile, type:
matlabpool 3

To use a different profile, type:
matlabpool MyProfileName

To inquire whether you currently have a MATLAB pool open, type:
matlabpool size

This command returns a value indicating the number of workers in the current pool. If the command returns 0, there is currently no pool open.

3-3

3

Single Program Multiple Data (spmd)

Note If there is no MATLAB pool open, an spmd statement runs locally in the MATLAB client without any parallel execution, provided you have Parallel Computing Toolbox software installed. In other words, it runs in your client session as though it were a single worker. When you are finished using a MATLAB pool, close it with the command:
matlabpool close

Defining an spmd Statement
The general form of an spmd statement is:
spmd <statements> end

The block of code represented by <statements> executes in parallel simultaneously on all workers in the MATLAB pool. If you want to limit the execution to only a portion of these workers, specify exactly how many workers to run on:
spmd (n) <statements> end

This statement requires that n workers run the spmd code. n must be less than or equal to the number of workers in the open MATLAB pool. If the pool is large enough, but n workers are not available, the statement waits until enough workers are available. If n is 0, the spmd statement uses no workers, and runs locally on the client, the same as if there were not a pool currently open. You can specify a range for the number of workers:
spmd (m, n) <statements> end

3-4

Executing Simultaneously on Multiple Data Sets

In this case, the spmd statement requires a minimum of m workers, and it uses a maximum of n workers. If it is important to control the number of workers that execute your spmd statement, set the exact number in the cluster profile or with the spmd statement, rather than using a range. For example, create a random matrix on three workers:
matlabpool spmd (3) R = rand(4,4); end matlabpool close

Note All subsequent examples in this chapter assume that a MATLAB pool is open and remains open between sequences of spmd statements. Unlike a parfor-loop, the workers used for an spmd statement each have a unique value for labindex. This lets you specify code to be run on only certain workers, or to customize execution, usually for the purpose of accessing unique data. For example, create different sized arrays depending on labindex:
spmd (3) if labindex==1 R = rand(9,9); else R = rand(4,4); end end

Load unique data on each worker according to labindex, and use the same function on each worker to compute a result from the data:
spmd (3) labdata = load(['datafile_' num2str(labindex) '.ascii']) result = MyFunction(labdata)

3-5

3

Single Program Multiple Data (spmd)

end

The workers executing an spmd statement operate simultaneously and are aware of each other. As with a parallel job, you are allowed to directly control communications between the workers, transfer data between them, and use codistributed arrays among them. For example, use a codistributed array in an spmd statement:
spmd (3) RR = rand(30, codistributor()); end

Each worker has a 30-by-10 segment of the codistributed array RR. For more information about codistributed arrays, see “Working with Codistributed Arrays” on page 5-6.

Displaying Output
When running an spmd statement on a MATLAB pool, all command-line output from the workers displays in the client Command Window. Because the workers are MATLAB sessions without displays, any graphical output (for example, figure windows) from the pool does not display at all.

3-6

Accessing Data with Composites

Accessing Data with Composites
In this section... “Introduction” on page 3-7 “Creating Composites in spmd Statements” on page 3-7 “Variable Persistence and Sequences of spmd” on page 3-9 “Creating Composites Outside spmd Statements” on page 3-10

Introduction
Composite objects in the MATLAB client session let you directly access data values on the workers. Most often you assigned these variables within spmd statements. In their display and usage, Composites resemble cell arrays. There are two ways to create Composites: ? Using the Composite function on the client. Values assigned to the Composite elements are stored on the workers. ? Defining variables on workers inside an spmd statement. After the spmd statement, the stored values are accessible on the client as Composites.

Creating Composites in spmd Statements
When you define or assign values to variables inside an spmd statement, the data values are stored on the workers. After the spmd statement, those data values are accessible on the client as Composites. Composite objects resemble cell arrays, and behave similarly. On the client, a Composite has one element per worker. For example, suppose you open a MATLAB pool of three local workers and run an spmd statement on that pool:
matlabpool open local 3 spmd % Uses all 3 workers MM = magic(labindex+2); % MM is a variable on each worker

end MM{1} % In the client, MM is a Composite with one element per worker 8 1 6

3-7

3

Single Program Multiple Data (spmd)

3 4 MM{2} 16 5 9 4

5 9

7 2

2 11 7 14

3 10 6 15

13 8 12 1

A variable might not be defined on every worker. For the workers on which a variable is not defined, the corresponding Composite element has no value. Trying to read that element throws an error.
spmd if labindex > 1 HH = rand(4); end end HH Lab 1: No data Lab 2: class = double, size = [4 Lab 3: class = double, size = [4 4] 4]

You can also set values of Composite elements from the client. This causes a transfer of data, storing the value on the appropriate worker even though it is not executed within an spmd statement:
MM{3} = eye(4);

In this case, MM must already exist as a Composite, otherwise MATLAB interprets it as a cell array. Now when you do enter an spmd statement, the value of the variable MM on worker 3 is as set:
spmd if labindex == 3, MM, end end Lab 3: MM = 1 0 0 0

3-8

Accessing Data with Composites

0 0 0

1 0 0

0 1 0

0 0 1

Data transfers from worker to client when you explicitly assign a variable in the client workspace using a Composite element:
M = MM{1} % Transfer data from worker 1 to variable M on the client 8 3 4 1 5 9 6 7 2

Assigning an entire Composite to another Composite does not cause a data transfer. Instead, the client merely duplicates the Composite as a reference to the appropriate data stored on the workers:
NN = MM % Set entire Composite equal to another, without transfer

However, accessing a Composite’s elements to assign values to other Composites does result in a transfer of data from the workers to the client, even if the assignment then goes to the same worker. In this case, NN must already exist as a Composite:
NN{1} = MM{1} % Transfer data to the client and then to worker

When finished, you can close the pool:
matlabpool close

Variable Persistence and Sequences of spmd
The values stored on the workers are retained between spmd statements. This allows you to use multiple spmd statements in sequence, and continue to use the same variables defined in previous spmd blocks. The values are retained on the workers until the corresponding Composites are cleared on the client, or until the MATLAB pool is closed. The following example illustrates data value lifespan with spmd blocks, using a pool of four workers:

3-9

3

Single Program Multiple Data (spmd)

matlabpool open local 4 spmd AA = end AA(:) % [1] [2] [3] [4] spmd AA = end AA(:) % [2] [4] [6] [8] clear AA labindex; % Initial setting Composite

AA * 2; % Multiply existing value Composite

% Clearing in client also clears on workers % Generates error

spmd; AA = AA * 2; end matlabpool close

Creating Composites Outside spmd Statements
The Composite function creates Composite objects without using an spmd statement. This might be useful to prepopulate values of variables on workers before an spmd statement begins executing on those workers. Assume a MATLAB pool is already open:
PP = Composite()

By default, this creates a Composite with an element for each worker in the MATLAB pool. You can also create Composites on only a subset of the workers in the pool. See the Composite reference page for more details. The elements of the Composite can now be set as usual on the client, or as variables inside an spmd statement. When you set an element of a Composite, the data is immediately transferred to the appropriate worker:
for ii = 1:numel(PP)

3-10

Accessing Data with Composites

PP{ii} = ii; end

3-11

3

Single Program Multiple Data (spmd)

Distributing Arrays
In this section... “Distributed Versus Codistributed Arrays” on page 3-12 “Creating Distributed Arrays” on page 3-12 “Creating Codistributed Arrays” on page 3-13

Distributed Versus Codistributed Arrays
You can create a distributed array in the MATLAB client, and its data is stored on the workers of the open MATLAB pool. A distributed array is distributed in one dimension, along the last nonsingleton dimension, and as evenly as possible along that dimension among the workers. You cannot control the details of distribution when creating a distributed array. You can create a codistributed array by executing on the workers themselves, either inside an spmd statement, in pmode, or inside a parallel job. When creating a codistributed array, you can control all aspects of distribution, including dimensions and partitions. The relationship between distributed and codistributed arrays is one of perspective. Codistributed arrays are partitioned among the workers from which you execute code to create or manipulate them. Distributed arrays are partitioned among workers from the client with the open MATLAB pool. When you create a distributed array in the client, you can access it as a codistributed array inside an spmd statement. When you create a codistributed array in an spmd statement, you can access is as a distributed array in the client. Only spmd statements let you access the same array data from two different perspectives.

Creating Distributed Arrays
You can create a distributed array in any of several ways: ? Use the distributed function to distribute an existing array from the client workspace to the workers of an open MATLAB pool.

3-12

Distributing Arrays

? Use any of the overloaded distributed object methods to directly construct a distributed array on the workers. This technique does not require that the array already exists in the client, thereby reducing client workspace memory requirements. These overloaded functions include distributed.eye, distributed.rand, etc. For a full list, see the distributed object reference page. ? Create a codistributed array inside an spmd statement, then access it as a distributed array outside the spmd statement. This lets you use distribution schemes other than the default. The first two of these techniques do not involve spmd in creating the array, but you can see how spmd might be used to manipulate arrays created this way. For example: Create an array in the client workspace, then make it a distributed array:
matlabpool open local 2 W = ones(6,6); W = distributed(W); % Distribute to the workers spmd T = W*2; % Calculation performed on workers, in parallel. % T and W are both codistributed arrays here. end T % View results in client. whos % T and W are both distributed arrays here. matlabpool close

Creating Codistributed Arrays
You can create a codistributed array in any of several ways: ? Use the codistributed function inside an spmd statement, a parallel job, or pmode to codistribute data already existing on the workers running that job. ? Use any of the overloaded codistributed object methods to directly construct a codistributed array on the workers. This technique does not require that the array already exists in the workers. These overloaded functions include codistributed.eye, codistributed.rand, etc. For a full list, see the codistributed object reference page.

3-13

3

Single Program Multiple Data (spmd)

? Create a distributed array outside an spmd statement, then access it as a codistributed array inside the spmd statement running on the same MATLAB pool. In this example, you create a codistributed array inside an spmd statement, using a nondefault distribution scheme. First, define 1-D distribution along the third dimension, with 4 parts on worker 1, and 12 parts on worker 2. Then create a 3-by-3-by-16 array of zeros.
matlabpool open local 2 spmd codist = codistributor1d(3, [4, 12]); Z = codistributed.zeros(3, 3, 16, codist); Z = Z + labindex; end Z % View results in client. % Z is a distributed array here. matlabpool close

For more details on codistributed arrays, see “Working with Codistributed Arrays” on page 5-6.

3-14

Programming Tips

Programming Tips
In this section... “MATLAB Path” on page 3-15 “Error Handling” on page 3-15 “Limitations” on page 3-15

MATLAB Path
All workers executing an spmd statement must have the same MATLAB search path as the client, so that they can execute any functions called in their common block of code. Therefore, whenever you use cd, addpath, or rmpath on the client, it also executes on all the workers, if possible. For more information, see the matlabpool reference page. When the workers are running on a different platform than the client, use the function pctRunOnAll to properly set the MATLAB path on all workers.

Error Handling
When an error occurs on a worker during the execution of an spmd statement, the error is reported to the client. The client tries to interrupt execution on all workers, and throws an error to the user. Errors and warnings produced on workers are annotated with the worker ID (labindex) and displayed in the client’s Command Window in the order in which they are received by the MATLAB client. The behavior of lastwarn is unspecified at the end of an spmd if used within its body.

Limitations
Transparency
The body of an spmd statement must be transparent, meaning that all references to variables must be “visible” (i.e., they occur in the text of the program).

3-15

3

Single Program Multiple Data (spmd)

In the following example, because X is not visible as an input variable in the spmd body (only the string 'X' is passed to eval), it does not get transferred to the workers. As a result, MATLAB issues an error at run time:
X = 5; spmd eval('X'); end

Similarly, you cannot clear variables from a worker’s workspace by executing clear inside an spmd statement:
spmd; clear('X'); end

To clear a specific variable from a worker, clear its Composite from the client workspace. Alternatively, you can free up most of the memory used by a variable by setting its value to empty, presumably when it is no longer needed in your spmd statement:
spmd <statements....> X = []; end

Examples of some other functions that violate transparency are evalc, evalin, and assignin with the workspace argument specified as 'caller'; save and load, unless the output of load is assigned to a variable. MATLAB does successfully execute eval and evalc statements that appear in functions called from the spmd body.

Nested Functions
Inside a function, the body of an spmd statement cannot make any direct reference to a nested function. However, it can call a nested function by means of a variable defined as a function handle to the nested function. Because the spmd body executes on workers, variables that are updated by nested functions called inside an spmd statement do not get updated in the workspace of the outer function.

3-16

Programming Tips

Anonymous Functions
The body of an spmd statement cannot define an anonymous function. However, it can reference an anonymous function by means of a function handle.

Nested spmd Statements
The body of an spmd statement cannot directly contain another spmd. However, it can call a function that contains another spmd statement. The inner spmd statement does not run in parallel in another MATLAB pool, but runs serially in a single thread on the worker running its containing function.

Nested parfor-Loops
The body of a parfor-loop cannot contain an spmd statement, and an spmd statement cannot contain a parfor-loop.

Break and Return Statements
The body of an spmd statement cannot contain break or return statements.

Global and Persistent Variables
The body of an spmd statement cannot contain global or persistent variable declarations.

3-17

3

Single Program Multiple Data (spmd)

3-18

4
Interactive Parallel Computation with pmode
This chapter describes interactive pmode in the following sections: ? “pmode Versus spmd” on page 4-2 ? “Run Parallel Jobs Interactively Using pmode” on page 4-3 ? “Parallel Command Window” on page 4-11 ? “Running pmode Interactive Jobs on a Cluster” on page 4-16 ? “Plotting Distributed Data Using pmode” on page 4-17 ? “pmode Limitations and Unexpected Results” on page 4-19 ? “pmode Troubleshooting” on page 4-20

4

Interactive Parallel Computation with pmode

pmode Versus spmd
pmode lets you work interactively with a parallel job running simultaneously on several workers. Commands you type at the pmode prompt in the Parallel Command Window are executed on all workers at the same time. Each worker executes the commands in its own workspace on its own variables. The way the workers remain synchronized is that each worker becomes idle when it completes a command or statement, waiting until all the workers working on this job have completed the same statement. Only when all the workers are idle, do they then proceed together to the next pmode command. In contrast to spmd, pmode provides a desktop with a display for each worker running the job, where you can enter commands, see results, access each worker’s workspace, etc. What pmode does not let you do is to freely interleave serial and parallel work, like spmd does. When you exit your pmode session, its job is effectively destroyed, and all information and data on the workers is lost. Starting another pmode session always begins from a clean state.

4-2

Run Parallel Jobs Interactively Using pmode

Run Parallel Jobs Interactively Using pmode
This example uses a local scheduler and runs the workers on your local MATLAB client machine. It does not require an external cluster or scheduler. The steps include the pmode prompt (P>>) for commands that you type in the Parallel Command Window.
1 Start the pmode with the pmode command.

pmode start local 4

This starts four local workers, creates a parallel job to run on those workers, and opens the Parallel Command Window.

You can control where the command history appears. For this exercise, the position is set by clicking Window > History Position > Above Prompt, but you can set it according to your own preference.
2 To illustrate that commands at the pmode prompt are executed on all

workers, ask for help on a function.
P>> help magic

4-3

4

Interactive Parallel Computation with pmode

3 Set a variable at the pmode prompt. Notice that the value is set on all

the workers.
P>> x = pi

4 A variable does not necessarily have the same value on every worker. The

labindex function returns the ID particular to each worker working on this parallel job. In this example, the variable x exists with a different value in the workspace of each worker. P>> x = labindex
5 Return the total number of workers working on the current parallel job

with the numlabs function.
P>> all = numlabs

4-4

Run Parallel Jobs Interactively Using pmode

6 Create a replicated array on all the workers.

P>> segment = [1 2; 3 4; 5 6]

4-5

4

Interactive Parallel Computation with pmode

7 Assign a unique value to the array on each worker, dependent on the

worker number (labindex). With a different value on each worker, this is a variant array.
P>> segment = segment + 10*labindex

8 Until this point in the example, the variant arrays are independent, other

than having the same name. Use the codistributed.build function to aggregate the array segments into a coherent array, distributed among the workers.
P>> codist = codistributor1d(2, [2 2 2 2], [3 8]) P>> whole = codistributed.build(segment, codist)

This combines four separate 3-by-2 arrays into one 3-by-8 codistributed array. The codistributor1d object indicates that the array is distributed along its second dimension (columns), with 2 columns on each of the four workers. On each worker, segment provided the data for the local portion of the whole array.
9 Now, when you operate on the codistributed array whole, each worker

handles the calculations on only its portion, or segment, of the array, not the whole array.

4-6

Run Parallel Jobs Interactively Using pmode

P>> whole = whole + 1000
10 Although the codistributed array allows for operations on its entirety, you

can use the getLocalPart function to access the portion of a codistributed array on a particular worker.
P>> section = getLocalPart(whole)

Thus, section is now a variant array because it is different on each worker.

11 If you need the entire array in one workspace, use the gather function.

P>> combined = gather(whole)

Notice, however, that this gathers the entire array into the workspaces of all the workers. See the gather reference page for the syntax to gather the array into the workspace of only one worker.
12 Because the workers ordinarily do not have displays, if you want to perform

any graphical tasks involving your data, such as plotting, you must do this from the client workspace. Copy the array to the client workspace by typing the following commands in the MATLAB (client) Command Window.

4-7

4

Interactive Parallel Computation with pmode

pmode lab2client combined 1

Notice that combined is now a 3-by-8 array in the client workspace.
whos combined

To see the array, type its name.
combined

4-8

Run Parallel Jobs Interactively Using pmode

13 Many matrix functions that might be familiar can operate on codistributed

arrays. For example, the eye function creates an identity matrix. Now you can create a codistributed identity matrix with the following commands in the Parallel Command Window.
P>> distobj = codistributor1d(); P>> I = eye(6, distobj) P>> getLocalPart(I)

Calling the codistributor1d function without arguments specifies the default distribution, which is by columns in this case, distributed as evenly as possible.

4-9

4

Interactive Parallel Computation with pmode

14 If you require distribution along a different dimension, you can use

the redistribute function. In this example, the argument 1 to codistributor1d specifies distribution of the array along the first dimension (rows).
P>> distobj = codistributor1d(1); P>> I = redistribute(I, distobj) P>> getLocalPart(I)

15 Exit pmode and return to the regular MATLAB desktop.

P>> pmode exit

4-10

Parallel Command Window

Parallel Command Window
When you start pmode on your local client machine with the command
pmode start local 4

four workers start on your local machine and a parallel job is created to run on them. The first time you run pmode with these options, you get a tiled display of the four workers.

Clear all output windows Show commands in lab output

Lab outputs in tiled arrangement

Command history Command line

4-11

4

Interactive Parallel Computation with pmode

The Parallel Command Window offers much of the same functionality as the MATLAB desktop, including command line, output, and command history. When you select one or more lines in the command history and right-click, you see the following context menu.

You have several options for how to arrange the tiles showing your worker outputs. Usually, you will choose an arrangement that depends on the format of your data. For example, the data displayed until this point in this section, as in the previous figure, is distributed by columns. It might be convenient to arrange the tiles side by side.

Click tiling icon Select layout

4-12

Parallel Command Window

This arrangement results in the following figure, which might be more convenient for viewing data distributed by columns.

Alternatively, if the data is distributed by rows, you might want to stack the worker tiles vertically. For the following figure, the data is reformatted with the command
P>> distobj = codistributor('1d',1); P>> I = redistribute(I, distobj)

When you rearrange the tiles, you see the following.

Select vertical arrangement Drag to adjust tile sizes

4-13

4

Interactive Parallel Computation with pmode

You can control the relative positions of the command window and the worker output. The following figure shows how to set the output to display beside the input, rather than above it.

You can choose to view the worker outputs by tabs.

1. Select tabbed display 3. Select labs shown in this tab

2. Select tab

4-14

Parallel Command Window

You can have multiple workers send their output to the same tile or tab. This allows you to have fewer tiles or tabs than workers.

Click tabbed output Select only two tabs

In this case, the window provides shading to help distinguish the outputs from the various workers.

Multiple labs in same tab

4-15

4

Interactive Parallel Computation with pmode

Running pmode Interactive Jobs on a Cluster
When you run pmode on a cluster of workers, you are running a job that is much like any other parallel job, except it is interactive. The cluster can be heterogeneous, but with certain limitations described at
http://www.mathworks.com/products/parallel-computing/requirements.html;

carefully locate your scheduler on that page and note that pmode sessions run as jobs described as “parallel applications that use inter-worker communication.” Many of the job’s properties are determined by the cluster profile. For more details about creating and using profilies, see “Cluster Profiles” on page 6-12. The general form of the command to start a pmode session is
pmode start <profile-name> <num-workers>

where <profile-name> is the name of the cluster prifile you want to use, and <num-workers> is the number of workers you want to run the pmode job on. If <num-workers> is omitted, the number of workers is determined by the profile. Coordinate with your system administrator when creating or using a profile. If you omit <profile-name>, pmode uses the default profile (see the parallel.defaultClusterProfile reference page). For details on all the command options, see the pmode reference page.

4-16

Plotting Distributed Data Using pmode

Plotting Distributed Data Using pmode
Because the workers running a job in pmode are MATLAB sessions without displays, they cannot create plots or other graphic outputs on your desktop. When working in pmode with codistributed arrays, one way to plot a codistributed array is to follow these basic steps:
1 Use the gather function to collect the entire array into the workspace of

one worker.
2 Transfer the whole array from any worker to the MATLAB client with

pmode lab2client.
3 Plot the data from the client workspace.

The following example illustrates this technique. Create a 1-by-100 codistributed array of 0s. With four workers, each has a 1-by-25 segment of the whole array.
P>> D = zeros(1,100,codistributor1d()) Lab Lab Lab Lab 1: 2: 3: 4: This This This This lab lab lab lab stores stores stores stores D(1:25). D(26:50). D(51:75). D(76:100).

Use a for-loop over the distributed range to populate the array so that it contains a sine wave. Each worker does one-fourth of the array.
P>> for i = drange(1:100) D(i) = sin(i*2*pi/100); end;

Gather the array so that the whole array is contained in the workspace of worker 1.
P>> P = gather(D, 1);

4-17

4

Interactive Parallel Computation with pmode

Transfer the array from the workspace of worker 1 to the MATLAB client workspace, then plot the array from the client. Note that both commands are entered in the MATLAB (client) Command Window.
pmode lab2client P 1 plot(P)

This is not the only way to plot codistributed data. One alternative method, especially useful when running noninteractive parallel jobs, is to plot the data to a file, then view it from a later MATLAB session.

4-18

pmode Limitations and Unexpected Results

pmode Limitations and Unexpected Results
Using Graphics in pmode
Displaying a GUI
The workers that run the tasks of a parallel job are MATLAB sessions without displays. As a result, these workers cannot display graphical tools and so you cannot do things like plotting from within pmode. The general approach to accomplish something graphical is to transfer the data into the workspace of the MATLAB client using
pmode lab2client var labindex

Then use the graphical tool on the MATLAB client.

Using Simulink Software
Because the workers running a pmode job do not have displays, you cannot use Simulink software to edit diagrams or to perform interactive simulation from within pmode. If you type simulink at the pmode prompt, the Simulink Library Browser opens in the background on the workers and is not visible. You can use the sim command to perform noninteractive simulations in parallel. If you edit your model in the MATLAB client outside of pmode, you must save the model before accessing it in the workers via pmode; also, if the workers had accessed the model previously, they must close and open the model again to see the latest saved changes.

4-19

4

Interactive Parallel Computation with pmode

pmode Troubleshooting
In this section... “Connectivity Testing” on page 4-20 “Hostname Resolution” on page 4-20 “Socket Connections” on page 4-20

Connectivity Testing
For testing connectivity between the client machine and the machines of your compute cluster, you can use Admin Center. For more information about Admin Center, including how to start it and how to test connectivity, see “Start Admin Center” and “Test Connectivity” in the MATLAB Distributed Computing Server documentation.

Hostname Resolution
If a worker cannot resolve the hostname of the computer running the MATLAB client, use pctconfig to change the hostname by which the client machine advertises itself.

Socket Connections
If a worker cannot open a socket connection to the MATLAB client, try the following: ? Use pctconfig to change the hostname by which the client machine advertises itself. ? Make sure that firewalls are not preventing communication between the worker and client machines. ? Use pctconfig to change the client’s pmodeport property. This determines the port that the workers will use to contact the client in the next pmode session.

4-20

5
Math with Codistributed Arrays
This chapter describes the distribution or partition of data across several workers, and the functionality provided for operations on that data in spmd statements, parallel jobs, and pmode. The sections are as follows. ? “Nondistributed Versus Distributed Arrays” on page 5-2 ? “Working with Codistributed Arrays” on page 5-6 ? “Looping Over a Distributed Range (for-drange)” on page 5-22 ? “Using MATLAB Functions on Codistributed Arrays” on page 5-26

5

Math with Codistributed Arrays

Nondistributed Versus Distributed Arrays
In this section... “Introduction” on page 5-2 “Nondistributed Arrays” on page 5-2 “Codistributed Arrays” on page 5-4

Introduction
All built-in data types and data structures supported by MATLAB software are also supported in the MATLAB parallel computing environment. This includes arrays of any number of dimensions containing numeric, character, logical values, cells, or structures; but not function handles or user-defined objects. In addition to these basic building blocks, the MATLAB parallel computing environment also offers different types of arrays.

Nondistributed Arrays
When you create a nondistributed array, MATLAB constructs a separate array in the workspace of each worker, using the same variable name on all workers. Any operation performed on that variable affects all individual arrays assigned to it. If you display from worker 1 the value assigned to this variable, all workers respond by showing the array of that name that resides in their workspace. The state of a nondistributed array depends on the value of that array in the workspace of each worker: ? “Replicated Arrays” on page 5-2 ? “Variant Arrays” on page 5-3 ? “Private Arrays” on page 5-4

Replicated Arrays
A replicated array resides in the workspaces of all workers, and its size and content are identical on all workers. When you create the array, MATLAB

5-2

Nondistributed Versus Distributed Arrays

assigns it to the same variable on all workers. If you display in spmd the value assigned to this variable, all workers respond by showing the same array.
spmd, A = magic(3), end WORKER 1 8 3 4 1 5 9 6 7 2 | | | | 8 3 4 WORKER 2 1 5 9 6 7 2 | | | | 8 3 4 WORKER 3 1 5 9 6 7 2 | | | | 8 3 4 WORKER 4 1 5 9 6 7 2

Variant Arrays
A variant array also resides in the workspaces of all workers, but its content differs on one or more workers. When you create the array, MATLAB assigns a different value to the same variable on all workers. If you display the value assigned to this variable, all workers respond by showing their version of the array.
spmd, A = magic(3) + labindex - 1, end WORKER 1 8 3 4 1 5 9 6 7 2 | | | | 9 4 5 WORKER 2 2 6 10 7 9 3 | | 10 | 5 | 6 WORKER 3 3 7 11 8 9 4 | | 11 | 6 | 7 WORKER 4 4 8 12 9 10 5

A replicated array can become a variant array when its value becomes unique on each worker.
spmd B = magic(3); B = B + labindex; end %replicated on all workers %now a variant array, different on each worker

5-3

5

Math with Codistributed Arrays

Private Arrays
A private array is defined on one or more, but not all workers. You could create this array by using labindex in a conditional statement, as shown here:
spmd if labindex >= 3, A = magic(3) + labindex - 1, end end WORKER 1 A is undefined | | | WORKER 2 A is undefined | | | | 10 5 6 WORKER 3 3 7 11 8 9 4 | | | | 11 6 7 WORKER 4 4 8 12 9 10 5

Codistributed Arrays
With replicated and variant arrays, the full content of the array is stored in the workspace of each worker. Codistributed arrays, on the other hand, are partitioned into segments, with each segment residing in the workspace of a different worker. Each worker has its own array segment to work with. Reducing the size of the array that each worker has to store and process means a more efficient use of memory and faster processing, especially for large data sets. This example distributes a 3-by-10 replicated array A across four workers. The resulting array D is also 3-by-10 in size, but only a segment of the full array resides on each worker.
spmd A = [11:20; 21:30; 31:40]; D = codistributed(A); getLocalPart(D) end WORKER 1 11 21 31 12 22 32 13 23 33 | | | | 14 24 34 WORKER 2 15 25 35 16 26 36 WORKER 3 | | 17 18 | 27 28 | 37 38 WORKER 4 | | 19 20 | 29 30 | 39 40

5-4

Nondistributed Versus Distributed Arrays

For more details on using codistributed arrays, see “Working with Codistributed Arrays” on page 5-6.

5-5

5

Math with Codistributed Arrays

Working with Codistributed Arrays
In this section... “How MATLAB Software Distributes Arrays” on page 5-6 “Creating a Codistributed Array” on page 5-8 “Local Arrays” on page 5-12 “Obtaining information About the Array” on page 5-13 “Changing the Dimension of Distribution” on page 5-14 “Restoring the Full Array” on page 5-15 “Indexing into a Codistributed Array” on page 5-16 “2-Dimensional Distribution” on page 5-18

How MATLAB Software Distributes Arrays
When you distribute an array to a number of workers, MATLAB software partitions the array into segments and assigns one segment of the array to each worker. You can partition a two-dimensional array horizontally, assigning columns of the original array to the different workers, or vertically, by assigning rows. An array with N dimensions can be partitioned along any of its N dimensions. You choose which dimension of the array is to be partitioned by specifying it in the array constructor command. For example, to distribute an 80-by-1000 array to four workers, you can partition it either by columns, giving each worker an 80-by-250 segment, or by rows, with each worker getting a 20-by-1000 segment. If the array dimension does not divide evenly over the number of workers, MATLAB partitions it as evenly as possible. The following example creates an 80-by-1000 replicated array and assigns it to variable A. In doing so, each worker creates an identical array in its own workspace and assigns it to variable A, where A is local to that worker. The second command distributes A, creating a single 80-by-1000 array D that spans all four workers. Worker 1 stores columns 1 through 250, worker 2 stores columns 251 through 500, and so on. The default distribution is by the last nonsingleton dimension, thus, columns in this case of a 2-dimensional array.

5-6

Working with Codistributed Arrays

spmd A = zeros(80, 1000); D = codistributed(A) end Lab Lab Lab Lab 1: 2: 3: 4: This This This This lab lab lab lab stores stores stores stores D(:,1:250). D(:,251:500). D(:,501:750). D(:,751:1000).

Each worker has access to all segments of the array. Access to the local segment is faster than to a remote segment, because the latter requires sending and receiving data between workers and thus takes more time.

How MATLAB Displays a Codistributed Array
For each worker, the MATLAB Parallel Command Window displays information about the codistributed array, the local portion, and the codistributor. For example, an 8-by-8 identity matrix codistributed among four workers, with two columns on each worker, displays like this:
>> spmd II = codistributed.eye(8) end Lab 1: This lab stores II(:,1:2). LocalPart: [8x2 double] Codistributor: [1x1 codistributor1d] Lab 2: This lab stores II(:,3:4). LocalPart: [8x2 double] Codistributor: [1x1 codistributor1d] Lab 3: This lab stores II(:,5:6). LocalPart: [8x2 double] Codistributor: [1x1 codistributor1d] Lab 4: This lab stores II(:,7:8). LocalPart: [8x2 double] Codistributor: [1x1 codistributor1d]

5-7

5

Math with Codistributed Arrays

To see the actual data in the local segment of the array, use the getLocalPart function.

How Much Is Distributed to Each Worker
In distributing an array of N rows, if N is evenly divisible by the number of workers, MATLAB stores the same number of rows (N/numlabs) on each worker. When this number is not evenly divisible by the number of workers, MATLAB partitions the array as evenly as possible. MATLAB provides codistributor object properties called Dimension and Partition that you can use to determine the exact distribution of an array. See “Indexing into a Codistributed Array” on page 5-16 for more information on indexing with codistributed arrays.

Distribution of Other Data Types
You can distribute arrays of any MATLAB built-in data type, and also numeric arrays that are complex or sparse, but not arrays of function handles or object types.

Creating a Codistributed Array
You can create a codistributed array in any of the following ways: ? “Partitioning a Larger Array” on page 5-9 — Start with a large array that is replicated on all workers, and partition it so that the pieces are distributed across the workers. This is most useful when you have sufficient memory to store the initial replicated array. ? “Building from Smaller Arrays” on page 5-10 — Start with smaller variant or replicated arrays stored on each worker, and combine them so that each array becomes a segment of a larger codistributed array. This method reduces memory requiremenets as it lets you build a codistributed array from smaller pieces. ? “Using MATLAB Constructor Functions” on page 5-11 — Use any of the MATLAB constructor functions like rand or zeros with the a codistributor object argument. These functions offer a quick means of constructing a codistributed array of any size in just one step.

5-8

Working with Codistributed Arrays

Partitioning a Larger Array
If you have a large array already in memory that you want MATLAB to process more quickly, you can partition it into smaller segments and distribute these segments to all of the workers using the codistributed function. Each worker then has an array that is a fraction the size of the original, thus reducing the time required to access the data that is local to each worker. As a simple example, the following line of code creates a 4-by-8 replicated matrix on each worker assigned to the variable A:
spmd, A = [11:18; 21:28; 31:38; 41:48], end A = 11 12 13 14 15 16 17 21 22 23 24 25 26 27 31 32 33 34 35 36 37 41 42 43 44 45 46 47

18 28 38 48

The next line uses the codistributed function to construct a single 4-by-8 matrix D that is distributed along the second dimension of the array:
spmd D = codistributed(A); getLocalPart(D) end 1: Local Part 11 12 21 22 31 32 41 42 | 2: Local Part | 13 14 | 23 24 | 33 34 | 43 44 | 3: Local Part | 15 16 | 25 26 | 35 36 | 45 46 | 4: Local Part | 17 18 | 27 28 | 37 38 | 47 48

Arrays A and D are the same size (4-by-8). Array A exists in its full size on each worker, while only a segment of array D exists on each worker.
spmd, size(A), size(D), end

Examining the variables in the client workspace, an array that is codistributed among the workers inside an spmd statement, is a distributed array from the perspective of the client outside the spmd statement. Variables that are not codistributed inside the spmd, are Composites in the client outside the spmd.

5-9

5

Math with Codistributed Arrays

whos Name A D

Size 1x4 4x8

Bytes 613 649

Class Composite distributed

See the codistributed function reference page for syntax and usage information.

Building from Smaller Arrays
The codistributed function is less useful for reducing the amount of memory required to store data when you first construct the full array in one workspace and then partition it into distributed segments. To save on memory, you can construct the smaller pieces (local part) on each worker first, and then combine them into a single array that is distributed across the workers. This example creates a 4-by-250 variant array A on each of four workers and then uses codistributor to distribute these segments across four workers, creating a 4-by-1000 codistributed array. Here is the variant array, A:
spmd A = [1:250; 251:500; 501:750; 751:1000] + 250 * (labindex - 1); end WORKER 1 1 251 501 751 2 ... 250 | 252 ... 500 | 502 ... 750 | | 251 501 751 WORKER 2 252 ... 500 | 502 ... 750 | 501 751 WORKER 3 502 ... 750 | etc. 752 ...1000 | etc. 1002 ...1250 | etc. 1252 ...1500 | etc. |

752 ...1000 | 1001 1002 ...1250 | 1251 |

752 ...1000 | 1001

Now combine these segments into an array that is distributed by the first dimension (rows). The array is now 16-by-250, with a 4-by-250 segment residing on each worker:
spmd D = codistributed.build(A, codistributor1d(1,[4 4 4 4],[16 250])) end Lab 1: This lab stores D(1:4,:).

5-10

Working with Codistributed Arrays

LocalPart: [4x250 double] Codistributor: [1x1 codistributor1d] whos Name A D

Size 1x4 16x250

Bytes 613 649

Class Composite distributed

You could also use replicated arrays in the same fashion, if you wanted to create a codistributed array whose segments were all identical to start with. See the codistributed function reference page for syntax and usage information.

Using MATLAB Constructor Functions
MATLAB provides several array constructor functions that you can use to build codistributed arrays of specific values, sizes, and classes. These functions operate in the same way as their nondistributed counterparts in the MATLAB language, except that they distribute the resultant array across the workers using the specified codistributor object, codist. Constructor Functions. The codistributed constructor functions are listed here. Use the codist argument (created by the codistributor function: codist=codistributor()) to specify over which dimension to distribute the array. See the individual reference pages for these functions for further syntax and usage information.
codistributed.cell(m, n, ..., codist) codistributed.colon(a, d, b) codistributed.eye(m, ..., classname, codist) codistributed.false(m, n, ..., codist) codistributed.Inf(m, n, ..., classname, codist) codistributed.linspace(m, n, ..., codist) codistributed.logspace(m, n, ..., codist) codistributed.NaN(m, n, ..., classname, codist) codistributed.ones(m, n, ..., classname, codist) codistributed.rand(m, n, ..., codist) codistributed.randn(m, n, ..., codist) sparse(m, n, codist) codistributed.speye(m, ..., codist)

5-11

5

Math with Codistributed Arrays

codistributed.sprand(m, n, density, codist) codistributed.sprandn(m, n, density, codist) codistributed.true(m, n, ..., codist) codistributed.zeros(m, n, ..., classname, codist)

Local Arrays
That part of a codistributed array that resides on each worker is a piece of a larger array. Each worker can work on its own segment of the common array, or it can make a copy of that segment in a variant or private array of its own. This local copy of a codistributed array segment is called a local array.

Creating Local Arrays from a Codistributed Array
The getLocalPart function copies the segments of a codistributed array to a separate variant array. This example makes a local copy L of each segment of codistributed array D. The size of L shows that it contains only the local part of D for each worker. Suppose you distribute an array across four workers:
spmd(4) A = [1:80; 81:160; 161:240]; D = codistributed(A); size(D) L = getLocalPart(D); size(L) end

returns on each worker:
3 3 80 20

Each worker recognizes that the codistributed array D is 3-by-80. However, notice that the size of the local part, L, is 3-by-20 on each worker, because the 80 columns of D are distributed over four workers.

Creating a Codistributed from Local Arrays
Use the codistributed function to perform the reverse operation. This function, described in “Building from Smaller Arrays” on page 5-10, combines

5-12

Working with Codistributed Arrays

the local variant arrays into a single array distributed along the specified dimension. Continuing the previous example, take the local variant arrays L and put them together as segments to build a new codistributed array X.
spmd codist = codistributor1d(2,[20 20 20 20],[3 80]); X = codistributed.build(L, codist); size(X) end

returns on each worker:
3 80

Obtaining information About the Array
MATLAB offers several functions that provide information on any particular array. In addition to these standard functions, there are also two functions that are useful solely with codistributed arrays.

Determining Whether an Array Is Codistributed
The iscodistributed function returns a logical 1 (true) if the input array is codistributed, and logical 0 (false) otherwise. The syntax is
spmd, TF = iscodistributed(D), end

where D is any MATLAB array.

Determining the Dimension of Distribution
The codistributor object determines how an array is partitioned and its dimension of distribution. To access the codistributor of an array, use the getCodistributor function. This returns two properties, Dimension and Partition:
spmd, getCodistributor(X), end Dimension: 2 Partition: [20 20 20 20]

5-13

5

Math with Codistributed Arrays

The Dimension value of 2 means the array X is distributed by columns (dimension 2); and the Partition value of [20 20 20 20] means that twenty columns reside on each of the four workers. To get these properties programmatically, return the output of
getCodistributor to a variable, then use dot notation to access each

property:
spmd C = getCodistributor(X); part = C.Partition dim = C.Dimension end

Other Array Functions
Other functions that provide information about standard arrays also work on codistributed arrays and use the same syntax. ? length — Returns the length of a specific dimension. ? ndims — Returns the number of dimensions. ? numel — Returns the number of elements in the array. ? size — Returns the size of each dimension. ? is* — Many functions that have names beginning with 'is', such as ischar and issparse.

Changing the Dimension of Distribution
When constructing an array, you distribute the parts of the array along one of the array’s dimensions. You can change the direction of this distribution on an existing array using the redistribute function with a different codistributor object. Construct an 8-by-16 codistributed array D of random values distributed by columns on four workers:
spmd D = rand(8, 16, codistributor());

5-14

Working with Codistributed Arrays

size(getLocalPart(D)) end

returns on each worker:
8 4

Create a new codistributed array distributed by rows from an existing one already distributed by columns:
spmd X = redistribute(D, codistributor1d(1)); size(getLocalPart(X)) end

returns on each worker:
2 16

Restoring the Full Array
You can restore a codistributed array to its undistributed form using the gather function. gather takes the segments of an array that reside on different workers and combines them into a replicated array on all workers, or into a single array on one worker. Distribute a 4-by-10 array to four workers along the second dimension:
spmd, A = [11:20; 21:30; 31:40; 41:50], end A = 11 12 13 14 15 16 17 21 22 23 24 25 26 27 31 32 33 34 35 36 37 41 42 43 44 45 46 47 spmd, D = codistributed(A), end

18 28 38 48

19 29 39 49

20 30 40 50

11 21 31

WORKER 1 12 13 22 23 32 33

| 14 | 24 | 34

WORKER 2 15 16 25 26 35 36

| | |

WORKER 3 17 18 27 28 37 38

| | |

WORKER 4 19 20 29 30 39 40

5-15

5

Math with Codistributed Arrays

41

42

43

| 44 |

45

46

| |

47

48

| |

49

50

spmd, size(getLocalPart(D)), Lab 1: 4 3 Lab 2: 4 3 Lab 3: 4 2 Lab 4: 4 2

end

Restore the undistributed segments to the full array form by gathering the segments:
spmd, X = gather(D), X = 11 12 13 21 22 23 31 32 33 41 42 43 spmd, 4 size(X), 10 end end 14 24 34 44 15 25 35 45 16 26 36 46 17 27 37 47 18 28 38 48 19 29 39 49 20 30 40 50

Indexing into a Codistributed Array
While indexing into a nondistributed array is fairly straightforward, codistributed arrays require additional considerations. Each dimension of a nondistributed array is indexed within a range of 1 to the final subscript, which is represented in MATLAB by the end keyword. The length of any dimension can be easily determined using either the size or length function. With codistributed arrays, these values are not so easily obtained. For example, the second segment of an array (that which resides in the workspace of worker 2) has a starting index that depends on the array distribution. For a 200-by-1000 array with a default distribution by columns over four workers, the starting index on worker 2 is 251. For a 1000-by-200 array also distributed by columns, that same index would be 51. As for the ending index, this is not given by using the end keyword, as end in this case refers to the end

5-16

Working with Codistributed Arrays

of the entire array; that is, the last subscript of the final segment. The length of each segment is also not given by using the length or size functions, as they only return the length of the entire array. The MATLAB colon operator and end keyword are two of the basic tools for indexing into nondistributed arrays. For codistributed arrays, MATLAB provides a version of the colon operator, called codistributed.colon. This actually is a function, not a symbolic operator like colon. Note When using arrays to index into codistributed arrays, you can use only replicated or codistributed arrays for indexing. The toolbox does not check to ensure that the index is replicated, as that would require global communications. Therefore, the use of unsupported variants (such as labindex) to index into codistributed arrays might create unexpected results.

Example: Find a Particular Element in a Codistributed Array
Suppose you have a row vector of 1 million elements, distributed among several workers, and you want to locate its element number 225,000. That is, you want to know what worker contains this element, and in what position in the local part of the vector on that worker. The globalIndices function provides a correlation between the local and global indexing of the codistributed array.
D = distributed.rand(1,1e6); %Distributed by columns spmd globalInd = globalIndices(D,2); pos = find(globalInd == 225e3); if ~isempty(pos) fprintf(... 'Element is in position %d on worker %d.\n', pos, labindex); end end

If you run this code on a pool of four workers you get this result:
Lab 1: Element is in position 225000 on worker 1.

5-17

5

Math with Codistributed Arrays

If you run this code on a pool of five workers you get this result:
Lab 2: Element is in position 25000 on worker 2.

Notice if you use a pool of a different size, the element ends up in a different location on a different worker, but the same code can be used to locate the element.

2-Dimensional Distribution
As an alternative to distributing by a single dimension of rows or columns, you can distribute a matrix by blocks using '2dbc' or two-dimensional block-cyclic distribution. Instead of segments that comprise a number of complete rows or columns of the matrix, the segments of the codistributed array are 2-dimensional square blocks. For example, consider a simple 8-by-8 matrix with ascending element values. You can create this array in an spmd statement, parallel job, or pmode. This example uses pmode for a visual display.
P>> A = reshape(1:64, 8, 8)

The result is the replicated array:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64

5-18

Working with Codistributed Arrays

Suppose you want to distribute this array among four workers, with a 4-by-4 block as the local part on each worker. In this case, the lab grid is a 2-by-2 arrangement of the workers, and the block size is a square of four elements on a side (i.e., each block is a 4-by-4 square). With this information, you can define the codistributor object:
P>> DIST = codistributor2dbc([2 2], 4)

Now you can use this codistributor object to distribute the original matrix:
P>> AA = codistributed(A, DIST)

This distributes the array among the workers according to this scheme:
LAB 1 LAB 2

1 2 3 4 5 6 7 8

9 10 11 12 13 14 15 16

17 18 19 20 21 22 23 24
LAB 3

25 26 27 28 29 30 31 32

33 34 35 36 37 38 39 40

41 42 43 44 45 46 47 48
LAB 4

49 50 51 52 53 54 55 56

57 58 59 60 61 62 63 64

If the lab grid does not perfectly overlay the dimensions of the codistributed array, you can still use '2dbc' distribution, which is block cyclic. In this case, you can imagine the lab grid being repeatedly overlaid in both dimensions until all the original matrix elements are included. Using the same original 8-by-8 matrix and 2-by-2 lab grid, consider a block size of 3 instead of 4, so that 3-by-3 square blocks are distributed among the workers. The code looks like this:
P>> DIST = codistributor2dbc([2 2], 3)

5-19

5

Math with Codistributed Arrays

P>> AA = codistributed(A, DIST)

The first “row” of the lab grid is distributed to worker 1 and worker 2, but that contains only six of the eight columns of the original matrix. Therefore, the next two columns are distributed to worker 1. This process continues until all columns in the first rows are distributed. Then a similar process applies to the rows as you proceed down the matrix, as shown in the following distribution scheme:

Original matrix 1 2 3 4 5 6 7 8
LAB 3 LAB 1

9 10 11 12 13 14 15 16

17 18 19 20 21 22 23 24
LAB 1

25 26 27 28 29 30 31 32

LAB 2

33 34 35

41 42 43 44 45 46 47 48
LAB 2

49 50 51 52 53 54 55 56

57 58 59 60 61 62 63 64
LAB 1 LAB 2 LAB 3 LAB 4 LAB 1 LAB 2

LAB 4

36 37 38 39 40

LAB 3

LAB 4

LAB 3

LAB 4

The diagram above shows a scheme that requires four overlays of the lab grid to accommodate the entire original matrix. The following pmode session shows the code and resulting distribution of data to each of the workers:

5-20

Working with Codistributed Arrays

The following points are worth noting: ? '2dbc' distribution might not offer any performance enhancement unless the block size is at least a few dozen. The default block size is 64. ? The lab grid should be as close to a square as possible. ? Not all functions that are enhanced to work on '1d' codistributed arrays work on '2dbc' codistributed arrays.

5-21

5

Math with Codistributed Arrays

Looping Over a Distributed Range (for-drange)
In this section... “Parallelizing a for-Loop” on page 5-22 “Codistributed Arrays in a for-drange Loop” on page 5-23

Note Using a for-loop over a distributed range (drange) is intended for explicit indexing of the distributed dimension of codistributed arrays (such as inside an spmd statement or a parallel job). For most applications involving parallel for-loops you should first try using parfor loops. See “Parallel for-Loops (parfor)”.

Parallelizing a for-Loop
If you already have a coarse-grained application to perform, but you do not want to bother with the overhead of defining jobs and tasks, you can take advantage of the ease-of-use that pmode provides. Where an existing program might take hours or days to process all its independent data sets, you can shorten that time by distributing these independent computations over your cluster. For example, suppose you have the following serial code:
results = zeros(1, numDataSets); for i = 1:numDataSets load(['\\central\myData\dataSet' int2str(i) '.mat']) results(i) = processDataSet(i); end plot(1:numDataSets, results); save \\central\myResults\today.mat results

The following changes make this code operate in parallel, either interactively in spmd or pmode, or in a parallel job:
results = zeros(1, numDataSets, codistributor()); for i = drange(1:numDataSets) load(['\\central\myData\dataSet' int2str(i) '.mat'])

5-22

Looping Over a Distributed Range (for-drange)

results(i) = processDataSet(i); end res = gather(results, 1); if labindex == 1 plot(1:numDataSets, res); print -dtiff -r300 fig.tiff; save \\central\myResults\today.mat res end

Note that the length of the for iteration and the length of the codistributed array results need to match in order to index into results within a for drange loop. This way, no communication is required between the workers. If results was simply a replicated array, as it would have been when running the original code in parallel, each worker would have assigned into its part of results, leaving the remaining parts of results 0. At the end, results would have been a variant, and without explicitly calling labSend and labReceive or gcat, there would be no way to get the total results back to one (or all) workers. When using the load function, you need to be careful that the data files are accessible to all workers if necessary. The best practice is to use explicit paths to files on a shared file system. Correspondingly, when using the save function, you should be careful to only have one worker save to a particular file (on a shared file system) at a time. Thus, wrapping the code in if labindex == 1 is recommended. Because results is distributed across the workers, this example uses gather to collect the data onto worker 1. A worker cannot plot a visible figure, so the print function creates a viewable file of the plot.

Codistributed Arrays in a for-drange Loop
When a for-loop over a distributed range is executed in a parallel job, each worker performs its portion of the loop, so that the workers are all working simultaneously. Because of this, no communication is allowed between the workers while executing a for-drange loop. In particular, a worker has access only to its partition of a codistributed array. Any calculations in such a loop

5-23

5

Math with Codistributed Arrays

that require a worker to access portions of a codistributed array from another worker will generate an error. To illustrate this characteristic, you can try the following example, in which one for loop works, but the other does not. At the pmode prompt, create two codistributed arrays, one an identity matrix, the other set to zeros, distributed across four workers.
D = eye(8, 8, codistributor()) E = zeros(8, 8, codistributor())

By default, these arrays are distributed by columns; that is, each of the four workers contains two columns of each array. If you use these arrays in a for-drange loop, any calculations must be self-contained within each worker. In other words, you can only perform calculations that are limited within each worker to the two columns of the arrays that the workers contain. For example, suppose you want to set each column of array E to some multiple of the corresponding column of array D:
for j = drange(1:size(D,2)); E(:,j) = j*D(:,j); end

This statement sets the j-th column of E to j times the j-th column of D. In effect, while D is an identity matrix with 1s down the main diagonal, E has the sequence 1, 2, 3, etc., down its main diagonal. This works because each worker has access to the entire column of D and the entire column of E necessary to perform the calculation, as each worker works independently and simultaneously on two of the eight columns. Suppose, however, that you attempt to set the values of the columns of E according to different columns of D:
for j = drange(1:size(D,2)); E(:,j) = j*D(:,j+1); end

This method fails, because when j is 2, you are trying to set the second column of E using the third column of D. These columns are stored in different workers, so an error occurs, indicating that communication between the workers is not allowed.

5-24

Looping Over a Distributed Range (for-drange)

Restrictions
To use for-drange on a codistributed array, the following conditions must exist: ? The codistributed array uses a 1-dimensional distribution scheme (not 2dbc). ? The distribution complies with the default partition scheme. ? The variable over which the for-drange loop is indexing provides the array subscript for the distribution dimension. ? All other subscripts can be chosen freely (and can be taken from for-loops over the full range of each dimension). To loop over all elements in the array, you can use for-drange on the dimension of distribution, and regular for-loops on all other dimensions. The following example executes in an spmd statement running on a MATLAB pool of 4 workers:
spmd PP = codistributed.zeros(6,8,12); RR = rand(6,8,12,codistributor()) % Default distribution: % by third dimension, evenly across 4 workers. for ii = 1:6 for jj = 1:8 for kk = drange(1:12) PP(ii,jj,kk) = RR(ii,jj,kk) + labindex; end end end end

To view the contents of the array, type:
PP

5-25

5

Math with Codistributed Arrays

Using MATLAB Functions on Codistributed Arrays
Many functions in MATLAB software are enhanced or overloaded so that they operate on codistributed arrays in much the same way that they operate on arrays contained in a single workspace. In most cases, if any of the input arguments to these functions is a distributed or codistributed array, their output arrays are distributed or codistributed, respectively. If the output is always scalar, it is replicated on each worker. All these overloaded functions with codistributed array inputs must reference the same inputs at the same time on all workers; therefore, you cannot use variant arrays for input arguments. A few of these functions might exhibit certain limitations when operating on a codistributed array. To see if any function has different behavior when used with a codistributed array, type
help codistributed/functionname

For example,
help codistributed/normest

The following table lists the enhanced MATLAB functions that operate on codistributed arrays. Type of Function Data functions Data type functions Function Names
arrayfun, bsxfun, cumprod, cumsum, fft, max, min, prod, sum cast, cell2mat, cell2struct, celldisp, cellfun, char, double, fieldnames, int16, int32, int64, int8, logical, num2cell, rmfield, single, struct2cell, swapbytes, typecast, uint16, uint32, uint64, uint8

5-26

Using MATLAB? Functions on Codistributed Arrays

Type of Function Elementary and trigonometric functions

Function Names
abs, acos, acosd, acosh, acot, acotd, acoth, acsc, acscd, acsch, angle, asec, asecd, asech, asin, asind, asinh, atan, atan2, atan2d, atand, atanh, ceil, complex, conj, cos, cosd, cosh, cot, cotd, coth, csc, cscd, csch, exp, expm1, fix, floor, hypot, imag, isreal, log, log10, log1p, log2, mod, nextpow2, nthroot, pow2, real, reallog, realpow, realsqrt, rem, round, sec, secd, sech, sign, sin, sind, sinh, sqrt, tan, tand, tanh cat, diag, eps, find, isempty, isequal, isequaln, isfinite, isinf, isnan, length, meshgrid, ndgrid, ndims, numel, repmat, reshape, size, sort, tril, triu chol, eig, inv, lu, norm, normest, qr, svd all, and (&), any, bitand, bitor, bitxor, ctranspose ('), end, eq (==), ge (>=), gt (>), horzcat ([]), ldivide (.\), le (<=), lt (<), minus (-), mldivide (\), mrdivide (/), mtimes (*), ne (~=), not (~), or (|), plus (+), power (.^), rdivide (./), subsasgn, subsindex, subsref, times (.*), transpose (.'), uminus (-), uplus (+), vertcat ([;]), xor full, issparse, nnz, nonzeros, nzmax, sparse, spfun, spones dot

Elementary matrices

Matrix functions Array operations

Sparse matrix functions Special functions

5-27

5

Math with Codistributed Arrays

5-28

6
Programming Overview
This chapter provides information you need for programming with Parallel Computing Toolbox software. Further details of evaluating functions in a cluster, programming distributed jobs, and programming parallel jobs are covered in later chapters. This chapter describes features common to programming all kinds of jobs. The sections are as follows. ? “How Parallel Computing Products Run a Job” on page 6-2 ? “Create Simple Independent Jobs” on page 6-10 ? “Cluster Profiles” on page 6-12 ? “Job Monitor” on page 6-25 ? “Programming Tips” on page 6-28 ? “Control Random Number Streams” on page 6-33 ? “Profiling Parallel Code” on page 6-38 ? “Benchmarking Performance” on page 6-49 ? “Troubleshooting and Debugging” on page 6-50

6

Programming Overview

How Parallel Computing Products Run a Job
In this section... “Overview” on page 6-2 “Toolbox and Server Components” on page 6-3 “Life Cycle of a Job” on page 6-8

Overview
Parallel Computing Toolbox and MATLAB Distributed Computing Server software let you solve computationally and data-intensive problems using MATLAB and Simulink on multicore and multiprocessor computers. Parallel processing constructs such as parallel for-loops and code blocks, distributed arrays, parallel numerical algorithms, and message-passing functions let you implement task-parallel and data-parallel algorithms at a high level in MATLAB without programming for specific hardware and network architectures. A job is some large operation that you need to perform in your MATLAB session. A job is broken down into segments called tasks. You decide how best to divide your job into tasks. You could divide your job into identical tasks, but tasks do not have to be identical. The MATLAB session in which the job and its tasks are defined is called the client session. Often, this is on the machine where you program MATLAB. The client uses Parallel Computing Toolbox software to perform the definition of jobs and tasks and to run them on a cluster local to your machine. MATLAB Distributed Computing Server software is the product that performs the execution of your job on a cluster of machines. The MATLAB job scheduler (MJS) is the process that coordinates the execution of jobs and the evaluation of their tasks. The MJS distributes the tasks for evaluation to the server’s individual MATLAB sessions called workers. Use of the MJS to access a cluster is optional; the distribution of tasks to cluster workers can also be performed by a third-party scheduler, such as Microsoft? Windows HPC Server (including CCS) or Platform LSF?.

6-2

How Parallel Computing Products Run a Job

See the “Glossary” on page Glossary-1 for definitions of the parallel computing terms used in this manual.

MATLAB Worker
MATLAB Distributed Computing Server

MATLAB Client
Parallel Computing Toolbox

Scheduler

MATLAB Worker
MATLAB Distributed Computing Server

MATLAB Worker
MATLAB Distributed Computing Server

Basic Parallel Computing Setup

Toolbox and Server Components
? “MJS, Workers, and Clients” on page 6-3 ? “Local Cluster” on page 6-5 ? “Third-Party Schedulers” on page 6-5 ? “Components on Mixed Platforms or Heterogeneous Clusters” on page 6-7 ? “mdce Service” on page 6-7 ? “Components Represented in the Client” on page 6-7

MJS, Workers, and Clients
The MJS can be run on any machine on the network. The MJS runs jobs in the order in which they are submitted, unless any jobs in its queue are promoted, demoted, canceled, or deleted.

6-3

6

Programming Overview

Each worker is given a task from the running job by the MJS, executes the task, returns the result to the MJS, and then is given another task. When all tasks for a running job have been assigned to workers, the MJS starts running the next job on the next available worker. A MATLAB Distributed Computing Server software setup usually includes many workers that can all execute tasks simultaneously, speeding up execution of large MATLAB jobs. It is generally not important which worker executes a specific task. In an independent job, the workers evaluate tasks one at a time as available, perhaps simultaneously, perhaps not, returning the results to the MJS. In a communicating job, the workers evaluate tasks simultaneously. The MJS then returns the results of all the tasks in the job to the client session. Note For testing your application locally or other purposes, you can configure a single computer as client, worker, and MJS host. You can also have more than one worker session or more than one MJS session on a machine.

Task Job Results Task Results Task Results

Worker

Client

All Results

Scheduler
Job

Worker

Client

All Results

Worker

Interactions of Parallel Computing Sessions

A large network might include several MJSs as well as several client sessions. Any client session can create, run, and access jobs on any MJS, but a worker session is registered with and dedicated to only one MJS at a time. The following figure shows a configuration with multiple MJSs.

6-4

How Parallel Computing Products Run a Job

Worker

Client

Scheduler 1

Worker

Worker

Client Client Scheduler 2

Worker

Client

Worker

Worker

Cluster with Multiple Clients and MJSs

Local Cluster
A feature of Parallel Computing Toolbox software is the ability to run a local scheduler and a cluster of up to twelve workers on the client machine, so that you can run jobs without requiring a remote cluster or MATLAB Distributed Computing Server software. In this case, all the processing required for the client, scheduling, and task evaluation is performed on the same computer. This gives you the opportunity to develop, test, and debug your parallel applications before running them on your cluster.

Third-Party Schedulers
As an alternative to using the MJS, you can use a third-party scheduler. This could be a Microsoft Windows HPC Server (including CCS), Platform LSF scheduler, PBS Pro? scheduler, TORQUE scheduler, or a generic scheduler. Choosing Between a Third-Party Scheduler and an MJS. You should consider the following when deciding to use a third-party scheduler or the MATLAB job scheduler (MJS) for distributing your tasks: ? Does your cluster already have a scheduler? If you already have a scheduler, you may be required to use it as a means of controlling access to the cluster. Your existing scheduler might be

6-5

6

Programming Overview

just as easy to use as an MJS, so there might be no need for the extra administration involved. ? Is the handling of parallel computing jobs the only cluster scheduling management you need? The MJS is designed specifically for MathWorks? parallel computing applications. If other scheduling tasks are not needed, a third-party scheduler might not offer any advantages. ? Is there a file sharing configuration on your cluster already? The MJS can handle all file and data sharing necessary for your parallel computing applications. This might be helpful in configurations where shared access is limited. ? Are you interested in batch mode or managed interactive processing? When you use an MJS, worker processes usually remain running at all times, dedicated to their MJS. With a third-party scheduler, workers are run as applications that are started for the evaluation of tasks, and stopped when their tasks are complete. If tasks are small or take little time, starting a worker for each one might involve too much overhead time. ? Are there security concerns? Your own scheduler might be configured to accommodate your particular security requirements. ? How many nodes are on your cluster? If you have a large cluster, you probably already have a scheduler. Consult your MathWorks representative if you have questions about cluster size and the MJS. ? Who administers your cluster? The person administering your cluster might have a preference for how jobs are scheduled. ? Do you need to monitor your job’s progress or access intermediate data? A job run by the MJS supports events and callbacks, so that particular functions can run as each job and task progresses from one state to another.

6-6

How Parallel Computing Products Run a Job

Components on Mixed Platforms or Heterogeneous Clusters
Parallel Computing Toolbox software and MATLAB Distributed Computing Server software are supported on Windows?, UNIX?, and Macintosh operating systems. Mixed platforms are supported, so that the clients, MJS, and workers do not have to be on the same platform. The cluster can also be comprised of both 32-bit and 64-bit machines, so long as your data does not exceed the limitations posed by the 32-bit systems. Other limitations are described at
http://www.mathworks.com/products/parallel-computing/requirements.html.

In a mixed-platform environment, system administrators should be sure to follow the proper installation instructions for the local machine on which you are installing the software.

mdce Service
If you are using the MJS, every machine that hosts a worker or MJS session must also run the mdce service. The mdce service controls the worker and MJS sessions and recovers them when their host machines crash. If a worker or MJS machine crashes, when the mdce service starts up again (usually configured to start at machine boot time), it automatically restarts the MJS and worker sessions to resume their sessions from before the system crash. More information about the mdce service is available in the MATLAB Distributed Computing Server documentation.

Components Represented in the Client
A client session communicates with the MJS by calling methods and configuring properties of an MJS cluster object. Though not often necessary, the client session can also access information about a worker session through a worker object. When you create a job in the client session, the job actually exists in the MJS job storage location. The client session has access to the job through a job object. Likewise, tasks that you define for a job in the client session exist in the MJS data location, and you access them through task objects.

6-7

6

Programming Overview

Life Cycle of a Job
When you create and run a job, it progresses through a number of stages. Each stage of a job is reflected in the value of the job object’s State property, which can be pending, queued, running, or finished. Each of these stages is briefly described in this section. The figure below illustrates the stages in the life cycle of a job. In the MJS (or other scheduler), the jobs are shown categorized by their state. Some of the functions you use for managing a job are createJob, submit, and fetchOutputs.
Worker

Cluster
Queued Pending
Job Job Job Job Job

Running
Job Job

Worker Worker Worker

createJob

Client

Job Job

submit

Job

Finished
Job Job Job Job
Worker

fetchOutputs

Stages of a Job

The following table describes each stage in the life cycle of a job. Job Stage Pending Description You create a job on the scheduler with the createJob function in your client session of Parallel Computing Toolbox software. The job’s first state is pending. This is when you define the job by adding tasks to it. When you execute the submit function on a job, the MJS or scheduler places the job in the queue, and the job’s state is queued. The scheduler executes jobs in the queue in the sequence in which they are submitted, all jobs moving up the queue as the jobs before them are finished. You can change the sequence of the jobs in the queue with the promote and demote functions.

Queued

6-8

How Parallel Computing Products Run a Job

Job Stage Running

Description When a job reaches the top of the queue, the scheduler distributes the job’s tasks to worker sessions for evaluation. The job’s state is now running. If more workers are available than are required for a job’s tasks, the scheduler begins executing the next job. In this way, there can be more than one job running at a time. When all of a job’s tasks have been evaluated, the job is moved to the finished state. At this time, you can retrieve the results from all the tasks in the job with the function fetchOutputs. When using a third-party scheduler, a job might fail if the scheduler encounters an error when attempting to execute its commands or access necessary files. When a job’s data has been removed from its data location or from the MJS with the delete function, the state of the job in the client is deleted. This state is available only as long as the job object remains in the client.

Finished

Failed

Deleted

Note that when a job is finished, its data remains in the MJS’s JobStorageLocation folder, even if you clear all the objects from the client session. The MJS or scheduler keeps all the jobs it has executed, until you restart the MJS in a clean state. Therefore, you can retrieve information from a job later or in another client session, so long as the MJS has not been restarted with the -clean option. You can permanently remove completed jobs from the MJS or scheduler’s storage location using the Job Monitor GUI or the delete function.

6-9

6

Programming Overview

Create Simple Independent Jobs
Program a Job on a Local Cluster
In some situations, you might need to define the individual tasks of a job, perhaps because they might evaluate different functions or have uniquely structured arguments. To program a job like this, the typical Parallel Computing Toolbox client session includes the steps shown in the following example. This example illustrates the basic steps in creating and running a job that contains a few simple tasks. Each task evaluates the sum function for an input array.
1 Identify a cluster. Use parallel.defaultClusterProfile to indicate that

you are using the local cluster; and use parcluster to create the object c to represent this cluster. (For more information, see “Create a Cluster Object” on page 7-4.)
parallel.defaultClusterProfile('local'); c = parcluster();
2 Create a job. Create job j on the cluster. (For more information, see

“Create a Job” on page 7-4.)
j = createJob(c)
3 Create three tasks within the job j. Each task evaluates the sum of the

array that is passed as an input argument. (For more information, see “Create Tasks” on page 7-5.)
createTask(j, @sum, 1, {[1 1]}); createTask(j, @sum, 1, {[2 2]}); createTask(j, @sum, 1, {[3 3]});
4 Submit the job to the queue for evaluation. The scheduler then distributes

the job’s tasks to MATLAB workers that are available for evaluating. The local scheduler actually starts a MATLAB worker session for each task, up to twelve at one time. (For more information, see “Submit a Job to the Cluster” on page 7-6.)

6-10

Create Simple Independent Jobs

submit(j);
5 Wait for the job to complete, then get the results from all the tasks of the

job. (For more information, see “Fetch the Job’s Results” on page 7-6.)
wait(j) results = fetchOutputs(j) results = [2] [4] [6]
6 Delete the job. When you have the results, you can permanently remove

the job from the scheduler’s storage location.
delete(j)

6-11

6

Programming Overview

Cluster Profiles
In this section... “Cluster Profile Manager” on page 6-12 “Discover Clusters” on page 6-13 “Import and Export Cluster Profiles” on page 6-13 “Create and Modify Cluster Profiles” on page 6-15 “Validate Cluster Profiles” on page 6-21 “Apply Cluster Profiles in Client Code” on page 6-23

Cluster Profile Manager
Cluster profiles let you define certain properties for your cluster, then have these properties applied when you create cluster, job, and task objects in the MATLAB client. Some of the functions that support the use of cluster profiles are ? batch ? matlabpool ? parcluster To create, edit, and import cluster profiles, you can do this from the Cluster Profile Manager. To open the Cluster Profile Manager, on the Home tab in the Environment section, click Parallel > Manage Cluster Profiles.

6-12

Cluster Profiles

Discover Clusters
You can let MATLAB discover clusters for you. Use either of the following techniques to discover those clusters which are available for you to use: ? On the Home tab in the Environment section, click Parallel > Discover Clusters. ? In the Cluster Profile Manager, click Discover Clusters. This opens the Discover Clusters dialog box, where you select the location of your clusters. As clusters are discovered, they populate a list for your selection:

If you already have a profile for any of the listed clusters, those profile names are included in the list. If you want to create a new profile for one of the discovered clusters, select the name of the cluster you want to use, and click Next. The subsequent dialog box lets you choose if you want to make your new profile the default.

Import and Export Cluster Profiles
Cluster profiles are stored as part of your MATLAB preferences, so they are generally available on an individual user basis. To make a cluster profile available to someone else, you can export it to a separate .settings file. In this way, a repository of profiles can be created so that all users of a computing cluster can share common profiles.

6-13

6

Programming Overview

To export a cluster profile:
1 In the Profile Clusters Manager, select (highlight) the profile you want to

export.
2 Click Export > Export. (Alternatively, you can right-click the profile

in the listing and select Export.) If you want to export all your profiles to a single file, click Export > Export All
3 In the Export profiles to file dialog box, specify a location and name for

the file. The default file name is the same as the name of the profile it contains, with a .settings extension appended; you can alter the names if you want to. Profiles saved in this way can then be imported by other MATLAB users:
1 In the Cluster Profile Manager, click Import. 2 In the Import profiles from file dialog box, browse to find the .settings file

for the profile you want to import. Select the file and click Open. The imported profile appears in your Cluster Profile Manager list. Note that the list contains the profile name, which is not necessarily the file name. If you already have a profile with the same name as the one you are importing, the imported profile gets an extension added to its name so you can distinguish it. You can also export and import profiles programmatically with the
parallel.exportProfile and parallel.importProfile functions.

6-14

Cluster Profiles

Export Profiles for MATLAB Compiler
You can use an exported profile with MATLAB Compiler to identify cluster setup information for running compiled applications on a cluster. For example, the setmcruserdata function can use the exported profile file name to set the value for the key ParallelProfile. For more information and examples of deploying parallel applications, see in the MATLAB Compiler documentation. A compiled application has the same default profile and the same list of alternative profiles that the compiling user had when the application was compiled. This means that in many cases the profile file is not needed, as might be the case when using the local profile for local workers. If an exported file is used, the first profile in the file becomes the default when imported. If any of the imported profiles have the same name as any of the existing profiles, they are renamed during import (though their names in the file remain unchanged).

Create and Modify Cluster Profiles
The first time you open the Cluster Profile Manager, it lists only one profile called local, which is the initial default profile having only default settings at this time.

6-15

6

Programming Overview

The following example provides instructions on how to create and modify profiles using the Cluster Profile Manager. Create and Modify Cluster Profiles Suppose you want to create a profile to set several properties for jobs to run in an MJS cluster. The following example illustrates a possible workflow, where you create two profiles differentiated only by the number of workers they use.
1 In the Cluster Profile Manager, select Add > Custom > MATLAB Job

Scheduler (MJS). This specifies that you want a new profile for an MJS cluster.

This creates and displays a new profile, called MJSProfile1.
2 Double-click the new profile name in the listing, and modify the profile

name to be MyMJSprofile1.
3 Click Edit in the tool strip so that you can set your profile property values.

In the Description field, enter the text MJS with 4 workers, as shown in the following figure. Enter the host name for the machine on which the MJS is running, and the name of the MJS. If you are entering information for an actual MJS already running on your network, enter the appropriate text. If

6-16

Cluster Profiles

you are unsure about the MJS (formerly known as a job manager) names and locations on your network, ask your system administrator for help.

6-17

6

Programming Overview

4 Scroll down to the Workers section, and for the Range of number of

workers, enter the two-element vector [4 4]. This specifies that jobs using this profile require at least four workers and no more than four workers. Therefore, a job using this profile runs on exactly four workers, even if it has to wait until four workers are available before starting.

You might want to edit other properties depending on your particular network and cluster situation.

6-18

5 Click Done to save the profile settings.

Cluster Profiles

To create a similar profile with just a few differences, you can duplicate an existing profile and modify only the parts you need to change, as follows:
1 In the Cluster Profile Manager, right-click the profile name MyMJSprofile1

in the list and select Duplicate. This creates a duplicate profile with a name based on the original profile name appended with _Copy.
2 Double-click the new profile name and edit its name to be MyMJSprofile2. 3 Click Edit to allow you to change the profile property values. 4 Edit the description field to change its text to MJS with any workers.

6-19

6

Programming Overview

5 Scroll down to the Workers section, and for the Range of number of

workers, clear the [4 4] and leave the field blank, as highlighted in the following figure:

6 Click Done to save the profile settings and to close the properties editor.

6-20

Cluster Profiles

You now have two profiles that differ only in the number of workers required for running a job.

When creating a job, you can apply either profile to that job as a way of specifying how many workers it should run on. You can see examples of profiles for different kinds of supported schedulers in the MATLAB Distributed Computing Server installation instructions at “Configure Your Cluster”.

Validate Cluster Profiles
The Cluster Profile Manager includes the ability to validate profiles. Validation assures that the MATLAB client session can access the cluster, and that the cluster can run the various types of jobs with the settings of your profile. To validate a profile, follow these steps:
1 Open the Cluster Profile Manager on the Home tab in the Environment

section, by clicking Parallel > Manage Cluster Profiles.
2 In the Cluster Profile Manager, click the name of the profile you want to

test. You can highlight a profile without changing the selected default profile. So a profile selected for validation does not need to be your default profile.

6-21

6

Programming Overview

3 Click Validate.

Profile validation includes five stages:
1 Connects to the cluster (parcluster) 2 Runs an independent job (createJob) on the cluster using the profile 3 Runs an SPMD-type communicating job on the cluster using the profile 4 Runs a pool-type communicating job on the cluster using the profile 5 Runs a MATLAB pool job on the cluster using the profile

While the tests are running, the Cluster Profile Manager displays their progress as shown here:

Note Validation will fail if you already have a MATLAB pool open.

Note When using an mpiexec scheduler, a failure is expected for the Independent Job stage. It is normal for the test then to proceed to the Communicating Job and Matlabpool stages.

6-22

Cluster Profiles

When the tests are complete, you can click Show Details to get more information about test results. This information includes any error messages, debug logs, and other data that might be useful in diagnosing problems or helping to determine proper network settings. The Validation Results tab keeps the test results available until the current MATLAB session closes.

Apply Cluster Profiles in Client Code
In the MATLAB client where you create and define your parallel computing cluster, job, and task objects, you can use cluster profiles when creating these objects.

Select a Default Cluster Profile
Some functions support default profiles, so that if you do not specify a profile for them, they automatically apply the default. There are several ways to specify which of your profiles should be used as the default profile: ? On the Home tab in the Environment section, click Parallel > Set Default, and from there, all your profiles are available. The current default profile is indicated. You can select any profile in the list as the default. ? The Cluster Profile Manager indicates which is currently the default profile. You can select any profile in the list, then click Set as Default. ? You can get or set the default profile programmatically by using the parallel.defaultClusterProfile function. The following sets of commands achieve the same thing:
parallel.defaultClusterProfile('MyMJSprofile1') matlabpool open

or
matlabpool open MyMJSprofile1

6-23

6

Programming Overview

Create Cluster Object
The parcluster function creates a cluster object in your workspace according to the specified profile. The profile identifies a particular cluster and applies property values. For example,
c = parcluster('myMJSprofile')

This command finds the cluster defined by the settings of the profile named myMJSprofile and sets property values on the cluster object based on settings in the profile. By applying different profiles, you can alter your cluster choices without changing your MATLAB application code.

Create Jobs and Tasks
Because the properties of cluster, job, and task objects can be defined in a profile, you do not have to explicitly define them in your application. Therefore, your code can accommodate any type of cluster without being modified. For example, the following code uses one profile to set properties on cluster, job, and task objects:
c = parcluster('myProfile1); job1 = createJob(c); % Uses profile of cluster object c. createTask(job1,@rand,1,{3}) % Uses profile of cluster object c.

6-24

Job Monitor

Job Monitor
In this section... “Job Monitor GUI” on page 6-25 “Manage Jobs Using the Job Monitor” on page 6-26 “Identify Task Errors Using the Job Monitor” on page 6-26

Job Monitor GUI
The Job Monitor displays the jobs in the queue for the scheduler determined by your selection of a cluster profile. Open the Job Monitor from the MATLAB desktop on the Home tab in the Environment section, by clicking Parallel > Monitor Jobs.

The job monitor lists all the jobs that exist for the cluster specified in the selected profile. You can choose any one of your profiles (those available in your current session Cluster Profile Manager), and whether to display jobs from all users or only your own jobs.

6-25

6

Programming Overview

Typical Use Cases
The Job Monitor lets you accomplish many different goals pertaining to job tracking and queue management. Using the Job Monitor, you can: ? Discover and monitor all jobs submitted by a particular user ? Determine the status of a job ? Determine the cause of errors in a job ? Delete old jobs you no longer need ? Create a job object in MATLAB for access to a particular job in the queue

Manage Jobs Using the Job Monitor
Using the Job Monitor you can manage the listed jobs for your cluster. Right-click on any job in the list, and select any of the following options from the context menu. The available options depend on the type of job. ? Cancel — Stops a running job and changes its state to 'finished'. If the job is pending or queued, the state changes to 'finished' without its ever running. This is the same as the command-line cancel function for the job. ? Delete — Deletes the jobs data and removes it from the queue. This is the same as the command-line delete function for the job. ? Show details — This displays detailed information about the job in the Command Window. ? Show errors — This displays all the tasks that generated an error in that job, with their error properties. ? Fetch outputs — This collects all the task output arguments from the job into the client workspace. ? Close MATLAB pool — For interactive MATLAB pool jobs, this closes the pool.

Identify Task Errors Using the Job Monitor
Because the Job Monitor indicates if a job had a run-time error, you can use it to identify the tasks that generated the errors in that job. For example, the following script generates an error because it attempts to perform a matrix inverse on a vector:

6-26

Job Monitor

A = [2 4 6 8]; B = inv(A);

If you save this script in a file named invert_me.m, you can try to run the script as a batch job on the default cluster:
batch('invert_me')

When updated after the job runs, the Job Monitor includes the job created by the batch command, with an error icon ( ) for this job. Right-click the job in the list, and select Show Errors. For all the tasks with an error in that job, the task information, including properties related to the error, display in the MATLAB command window:
Task ID 1 from Job ID 2 Information =================================== State : finished Function : @parallel.internal.cluster.executeScript StartTime : Tue Jun 28 11:46:28 EDT 2011 Running Duration : 0 days 0h 0m 1s - Task Result Properties ErrorIdentifier : MATLAB:square ErrorMessage : Matrix must be square. Error Stack : invert_me (line 2)

6-27

6

Programming Overview

Programming Tips
In this section... “Program Development Guidelines” on page 6-28 “Current Working Directory of a MATLAB Worker” on page 6-29 “Writing to Files from Workers” on page 6-30 “Saving or Sending Objects” on page 6-30 “Using clear functions” on page 6-31 “Running Tasks That Call Simulink Software” on page 6-31 “Using the pause Function” on page 6-31 “Transmitting Large Amounts of Data” on page 6-31 “Interrupting a Job” on page 6-32 “Speeding Up a Job” on page 6-32

Program Development Guidelines
When writing code for Parallel Computing Toolbox software, you should advance one step at a time in the complexity of your application. Verifying your program at each step prevents your having to debug several potential problems simultaneously. If you run into any problems at any step along the way, back up to the previous step and reverify your code. The recommended programming practice for distributed or parallel computing applications is
1 Run code normally on your local machine. First verify all your

functions so that as you progress, you are not trying to debug the functions and the distribution at the same time. Run your functions in a single instance of MATLAB software on your local computer. For programming suggestions, see “Techniques for Improving Performance” in the MATLAB documentation.
2 Decide whether you need an independent or communicating job. If

your application involves large data sets on which you need simultaneous calculations performed, you might benefit from a communicating job

6-28

Programming Tips

with distributed arrays. If your application involves looped or repetitive calculations that can be performed independently of each other, an independent job might be appropriate.
3 Modify your code for division. Decide how you want your code

divided. For an independent job, determine how best to divide it into tasks; for example, each iteration of a for-loop might define one task. For a communicating job, determine how best to take advantage of parallel processing; for example, a large array can be distributed across all your workers.
4 Use pmode to develop parallel functionality. Use pmode with the

local scheduler to develop your functions on several workers in parallel. As you progress and use pmode on the remote cluster, that might be all you need to complete your work.
5 Run the independent or communicating job with a local scheduler.

Create an independent or communicating job, and run the job using the local scheduler with several local workers. This verifies that your code is correctly set up for batch execution, and in the case of an independent job, that its computations are properly divided into tasks.
6 Run the independent job on only one cluster node. Run your

independent job with one task to verify that remote distribution is working between your client and the cluster, and to verify proper transfer of additional files and paths.
7 Run the independent or communicating job on multiple cluster

nodes. Scale up your job to include as many tasks as you need for an independent job, or as many workers as you need for a communicating job. Note The client session of MATLAB must be running the Java? Virtual Machine (JVM?) to use Parallel Computing Toolbox software. Do not start MATLAB with the -nojvm flag.

Current Working Directory of a MATLAB Worker
The current directory of a MATLAB worker at the beginning of its session is

6-29

6

Programming Overview

CHECKPOINTBASE\HOSTNAME_WORKERNAME_mlworker_log\work

where CHECKPOINTBASE is defined in the mdce_def file, HOSTNAME is the name of the node on which the worker is running, and WORKERNAME is the name of the MATLAB worker session. For example, if the worker named worker22 is running on host nodeA52, and its CHECKPOINTBASE value is C:\TEMP\MDCE\Checkpoint, the starting current directory for that worker session is
C:\TEMP\MDCE\Checkpoint\nodeA52_worker22_mlworker_log\work

Writing to Files from Workers
When multiple workers attempt to write to the same file, you might end up with a race condition, clash, or one worker might overwrite the data from another worker. This might be likely to occur when: ? There is more than one worker per machine, and they attempt to write to the same file. ? The workers have a shared file system, and use the same path to identify a file for writing. In some cases an error can result, but sometimes the overwriting can occur without error. To avoid an issue, be sure that each worker or parfor iteration has unique access to any files it writes or saves data to. There is no problem when multiple workers read from the same file.

Saving or Sending Objects
Do not use the save or load function on Parallel Computing Toolbox objects. Some of the information that these objects require is stored in the MATLAB session persistent memory and would not be saved to a file. Similarly, you cannot send a parallel computing object between parallel computing processes by means of an object’s properties. For example, you cannot pass an MJS, job, task, or worker object to MATLAB workers as part of a job’s JobData property.

6-30

Programming Tips

Also, system objects (e.g., Java classes, .NET classes, shared libraries, etc.) that are loaded, imported, or added to the Java search path in the MATLAB client, are not available on the workers unless explicitly loaded, imported, or added on the workers, respectively. Other than in the task function code, typical ways of loading these objects might be in taskStartup, jobStartup, and in the case of workers in a MATLAB pool, in poolStartup and using pctRunOnAll.

Using clear functions
Executing
clear functions

clears all Parallel Computing Toolbox objects from the current MATLAB session. They still remain in the MJS. For information on recreating these objects in the client session, see “Recover Objects” on page 7-21.

Running Tasks That Call Simulink Software
The first task that runs on a worker session that uses Simulink software can take a long time to run, as Simulink is not automatically started at the beginning of the worker session. Instead, Simulink starts up when first called. Subsequent tasks on that worker session will run faster, unless the worker is restarted between tasks.

Using the pause Function
On worker sessions running on Macintosh or UNIX operating systems, pause(inf) returns immediately, rather than pausing. This is to prevent a worker session from hanging when an interrupt is not possible.

Transmitting Large Amounts of Data
Operations that involve transmitting many objects or large amounts of data over the network can take a long time. For example, getting a job’s Tasks property or the results from all of a job’s tasks can take a long time if the job contains many tasks. See also “Object Data Size Limitations” on page 6-50.

6-31

6

Programming Overview

Interrupting a Job
Because jobs and tasks are run outside the client session, you cannot use Ctrl+C (^C) in the client session to interrupt them. To control or interrupt the execution of jobs and tasks, use such functions as cancel, delete, demote, promote, pause, and resume.

Speeding Up a Job
You might find that your code runs slower on multiple workers than it does on one desktop computer. This can occur when task startup and stop time is significant relative to the task run time. The most common mistake in this regard is to make the tasks too small, i.e., too fine-grained. Another common mistake is to send large amounts of input or output data with each task. In both of these cases, the time it takes to transfer data and initialize a task is far greater than the actual time it takes for the worker to evaluate the task function.

6-32

Control Random Number Streams

Control Random Number Streams
In this section... “Different Workers” on page 6-33 “Client and Workers” on page 6-34 “Client and GPU” on page 6-35 “Worker CPU and Worker GPU” on page 6-36

Different Workers
By default, each worker in a cluster working on the same job has a unique random number stream. This example uses two workers in a MATLAB pool to show they generate unique random number sequences.
matlabpool 2 spmd R = rand(1,4); % Different on each worker end R{1},R{2} 0.3246 0.2646 0.6618 0.0968 0.6349 0.5052 0.6497 0.4866

matlabpool close

If you need all workers to generate the same sequence of numbers, you can seed their generators all the same.
matlabpool 2 spmd s = RandStream('twister','Seed',1); RandStream.setGlobalStream(s); R = rand(1,4); % Same on all workers end R{1},R{2} 0.4170 0.7203 0.0001 0.3023

6-33

6

Programming Overview

0.4170

0.7203

0.0001

0.3023

matlabpool close

Client and Workers
By default, the MATLAB client and MATLAB workers use different random number generators, even if the workers are part of a local cluster on the same machine with the client. For the client, the default is the Mersenne Twister generator ('twister'), and for the workers the default is the Combined Multiple Recursive generator ('CombRecursive' or 'mrg32k3a'). If it is necessary to generate the same stream of numbers in the client and workers, you can set one to match the other. For example, you might run a script as a batch job on a worker, and need the same generator or sequence as the client. Suppose you start with a script file named randScript1.m that contains the line:
R = rand(1,4);

You can run this script in the client, and then as a batch job on a worker. Notice that the default generated random number sequences in the results are different.
randScript1; % In client R R = 0.8147 0.9058 0.1270

0.9134

parallel.defaultClusterProfile('local') c = parcluster(); j = batch(c,'randScript1'); % On worker wait(j);load(j); R R = 0.3246 0.6618 0.6349 0.6497

For identical results, you can set the client and worker to use the same generator and seed. Here the file randScript2.m contains the following code:

6-34

Control Random Number Streams

s = RandStream('CombRecursive','Seed',1); RandStream.setGlobalStream(s); R = rand(1,4);

Now, run the new script in the client and on a worker:
randScript2; % In client R R = 0.4957 0.2243 0.2073

0.6823

j = batch(c,'randScript2'); % On worker wait(j); load(j); R R = 0.4957 0.2243 0.2073 0.6823

Client and GPU
By default MATLAB clients use different random generators than code running on a GPU. GPUs are more like workers in this regard, and use the Combined Multiple Recursive generator ('CombRecursive' or 'mrg32k3a') unless otherwise specified. This example shows a default generation of random numbers comparing CPU and GPU in a fresh session.
Rc = rand(1,4) Rc = 0.8147 0.9058

0.1270

0.9134

Rg = gpuArray.rand(1,4) Rg = 0.7270 0.4522 0.9387

0.2360

Be aware that the GPU supports only three generators ('CombRecursive', 'Philox4x32-10', and 'Threefry4x64-20'), none of which is the defa

相关文章:
Parallel Computing Toolbox User's Guide.pdf
Parallel Computing Toolbox? User’s Guide R2012b How to Contact MathWorks Web Newsgroup www.mathworks.com/contact_TS.html Technical Support www.mathworks...
Simulation and Visualization for DNA Computing in ....pdf
while two selectors in parallel will perform an...microreactors proves to be a useful tool, not ...guide for DNA computing thanks to its GUI and ...
OPTIMAL ALGORITHM FOR NO TOOl- RETRACTIONS CONTOUR-PARALLEL ....pdf
OPTIMAL ALGORITHM FOR NO TOOl- RETRACTIONS CONTOUR-PARALLEL OFFSET TOOL-PATH LINKING_机械/仪表_工程科技_专业资料。维普资讯 http://www.cqvip.com CHINES ...
...to visualize loop-carried dependences, Parallel Computing ....pdf
A 3D-java tool to visualize loop-carried dependences, Parallel Computing ... the wavefronts corresponds to the sequential outermost loop with the two ...
for Parallelizing an Image compression Toolbox Version 2.0_....pdf
In this project the use of parallel programming to speed up an image ...On the reverse side, the image toolbox takes the encoded file, decodes ...
A novel parallel hybrid PSO-GA using MapReduce to schedule ....pdf
A novel parallel hybrid PSO-GA using MapReduce ...on Nature and Biologically Inspired Computing Dec....(s) User Jobs submitted Data Node 4 to Job ...
Parallel Computing with MATLAB(并行计算).pdf
Objectives This user guide provides an end user with instructions on how to...2. Assumptions User has access to MATLAB and Parallel Computing Toolbox on...
Parallel Computing with MATLAB_图文.ppt
并行计算产品 (显式) Parallel Computing Toolbox(并行计算工具箱) MATLAB Distributed Computing Server(分布式计算服务器) MATLAB用户多种控制方式 3 ...
...A Tool for Applying Parallel Computing Applications on ....pdf
Abstract Parallel Containers- A Tool for Applying Parallel Computing ...nition can be provided by the user template<> c l a s s S p e c ...
Parallel Computing with MATLAB_图文.pdf
并行计算产品 (显式) Parallel Computing Toolbox(并行计算工具箱) MATLAB Distributed Computing Server(分布式计算服务器) MATLAB用户多种控制方式 3 ...
Phased Array System Toolbox Release Notes.pdf
. . 30 31 32 33 35 36 37 38 39 40 R2011...viewArray for arrays without subarrays, such as ...trials involving Parallel Computing Toolbox software...
MATLAB产品大全.doc
? Parallel Computing Toolbox MATLAB Distributed Computing Server 数学、 数学、... 这是我最近看到的关于产品大全最好的文章 2018-06-23 05:39:54 ...
parallel_computing_toolbox.pdf
Working in Batch Environments Parallel Computing Toolbox lets you use batch ... 39页 1下载券 A new Architecture for... 8页 免费 喜欢此文档的还...
MATLAB工具箱介绍.doc
Parallel Computing Toolbox MATLAB Distributed Computing Server 并行计算工具箱 ...[size=+0]报告生成 39 40 MATLAB[size=+0]应用发布 MATLAB Coder Filter ...
MATLAB.8.2工具及版本号.doc
39 基于模型的标定工具箱 4.6.1 40 神经网络工具箱 8.1 41 OPC 工具箱 3...Toolbox 6.4 Parallel Computing Toolbox 6.3 Partial Differential Equation Tool...
第一讲 Simulink建模与仿真基础.ppt
Parallel Computing Toolbox MATLAB Distributed ...2008 HFUT 39 重庆大学 CHONGQING UNIVERSITY 带宽...
MATLAB分布式并行计算服务器配置和使用方法.doc
此外,使用 Parallel Computing Toolbox 函数的 MATLAB...找到 MDCEUSER 参数,按照 domain\username 的形式...S) 0.48 4.49 90.89 913.73 有并行仿真时间 (S...
MATLAB_readme_图文.doc
Toolbox MATLAB Optimization Toolbox Parallel Computing Toolbox Signal Processing...39 Global Optimization Toolbox Image Acquisition Toolbox Mapping Toolbox ...
MATLAB各类工具箱.doc
Toolbox SimBiology 生物信息工具箱 生物学工具箱 Parallel Computing Toolbox ...Toolbox MATLAB Report Generator 数据库工具箱 MATLAB[size=+0]报告生成 39 ...
Matlab 2012b安装教程及工具箱介绍.xls
Parallel Computing Toolbox 6.1 Partial Differential Equation Toolbox 1.1 ...39 40 41 42 43 44 45 46 MATLAB Report Generator 3.13 Model Predictive ...