03964.com

文档资料库 文档搜索专家

文档资料库 文档搜索专家

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 7, JULY 2009

1073

Cross-Based Local Stereo Matching Using Orthogonal Integral Images

Ke Zhang, Jiangbo Lu, and Gauthier Lafruit

Abstract — We propose an area-based local stereo matching algorithm for accurate disparity estimation across all image regions. A well-known challenge to local stereo methods is to decide an appropriate support window for the pixel under consideration, adapting the window shape or the pixelwise support weight to the underlying scene structures. Our stereo method tackles this problem with two key contributions. First, for each anchor pixel an upright cross local support skeleton is adaptively constructed, with four varying arm lengths decided on color similarity and connectivity constraints. Second, given the local cross-decision results, we dynamically construct a shape-adaptive full support region on the ?y, merging horizontal segments of the crosses in the vertical neighborhood. Approximating image structures accurately, the proposed method is among the best performing local stereo methods according to the benchmark Middlebury stereo evaluation. Additionally, it reduces memory consumption signi?cantly thanks to our compact local cross representation. To accelerate matching cost aggregation performed in an arbitrarily shaped 2-D region, we also propose an orthogonal integral image technique, yielding a speedup factor of 5–15 over the straightforward integration. Index Terms— Cross-based region construction, orthogonal integral images, shape adaptive approximation, stereo matching.

I. I NTRODUCTION

S

TEREO MATCHING as an important vision problem estimates disparities from a given stereo image pair. A substantial amount of work on this topic has been surveyed and evaluated by Scharstein and Szeliski [1]. Different from most global stereo matching methods that are computationally expensive and involve many parameters, local stereo methods are generally ef?cient and easy to implement. To reduce the image ambiguity, local stereo methods commonly aggregate the support from the neighboring pixels in a given sizeconstrained window. For accurate disparity estimates near depth discontinuities, a local support window is desired to adapt its shape and size, therefore only collecting the support from the pixels of the same depth. To this end, many local stereo methods have been proposed, and they roughly fall into two categories.

Manuscript received May 21, 2008; revised September 2, 2008. First version published April 7, 2009; current version published July 22, 2009. This paper was recommended by Associate Editor H. Chen. K. Zhang and J. Lu are with the Department of Electrical Engineering, University of Leuven, Leuven, 3001, Belgium, and Multimedia Group, Interuniversity Microelectronics Center, Leuven, 3001, Belgium (e-mail: zhangke@imec.be; jiangbo.lu@imec.be). G. Lafruit is with the Multimedia Group, Interuniversity Microelectronics Center, Leuven, 3001, Belgium (e-mail: gauthier.lafruit@imec.be). Color versions of one or more of the ?gures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identi?er 10.1109/TCSVT.2009.2020478

The stereo methods in the ?rst category focus on either the optimal support window selection among prede?ned multiple windows [2], [3], or the pixelwise adaptation of the local support window’s shape and size [4], [5]. However, a common limitation to these methods is that the shape of a local support window is rectangular or constrained, and hence is inappropriate for pixels near arbitrarily shaped depth discontinuities. In our recent work [6], a pointwise shape adaptive support polygon is built on a varying-scale sectorial basis, yielding accurate disparity results. Nevertheless, the rigid polygons are still not ?exible enough to approximate various scene structures. Moreover, the adaptive scale decision method and cost aggregation over adaptive polygons are not ef?cient. On the other hand, local stereo methods from the second category adjust the support weights of the pixels in a given support window while ?xing the shape and size of a support window. For instance, Xu et al. [7] determined adaptive support weights by radial computations, but this method is very sensitive to the initial disparity estimation. Yoon and Kweon [8] assigned a support weight to the pixel in a support window based on color similarity and geometric proximity. Despite of the accurate disparity results, this method consumes a huge amount of memory due to the storage of the center pixel-dependent support weights. Employing the image segmentation information, Tombari et al. [9] proposed a modi?ed weight function for every pixel in a large (51 × 51) support window. Their improved disparity accuracy is at the cost of a signi?cant computational complexity increase [10]. In this letter, we propose a cross-based local stereo matching algorithm. The key algorithmic ideas are twofold. First, a locally adaptive upright cross is decided upon the color similarity, de?ning an initial support skeleton for the anchor pixel. Second, we dynamically construct a shape-adaptive full support region in the cost aggregation step, reusing the pre-computed neighboring cross con?gurations. The end result is that appropriate local support regions are ef?ciently derived from the fairly compact cross-based representation, leading to accurate disparity estimates for different pixel locations. Decomposing the conventional cost aggregation into two orthogonal 1-D integrations, we further propose an orthogonal integral image (OII) technique for fast cost aggregation over any arbitrarily shaped windows in constant time. Our OII method represents a novel instantiation of the general integral image technique [11], previously applied to computing rectangular sums only [5], [12]. Overall, the proposed method is among the best performing

1051-8215/$26.00 ? 2009 IEEE

1074

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 7, JULY 2009

p

p q

U(p) H(q)

(a)

(b)

(c)

Fig. 1. Cross-based local support region representation and construction on the Tsukuba image [13]. (a) A pixelwise adaptive cross de?nes a local support skeleton for the anchor pixel, e.g., p . (b) A shape-adaptive full support region U ( p ) is dynamically constructed for the pixel p , integrating multiple horizontal line segments H (q ) of neighboring crosses. (c) Sample shape-adaptive local support regions, approximating local image structures appropriately.

local stereo methods based on the Middlebury stereo evaluation [13]. In particular, it has pronounced advantages in the execution time compared to the state-of-the-art local methods [6], [8], [9]. In the rest of this letter, Section II presents our cross-based local stereo matching method. The OII technique for fast cost aggregation is described in Section III. We show experimental results in Section IV and conclude the letter in Section V. II. C ROSS -BASED L OCAL S TEREO M ATCHING A. Cross-Based Local Support Region Construction For accurate local stereo matching, it is important to decide an appropriate local support region for each pixel adaptively. In principle, this local support region should contain only the neighboring pixels from the same depth with the pixel under consideration. However, without disparity information beforehand, the support region for a pixel can only be adequately derived from the raw images. A common assumption is that pixels with similar intensity within a constrained area are likely from the same image structure, therefore having similar disparity. Utilizing this assumption, we propose a crossbased local support region representation and construction approach. The key idea of the proposed approach is to decide an upright cross for every pixel p = (x p , y p ) in the input image I . As shown in Fig. 1(a), this pixelwise adaptive cross consists of two orthogonal line segments, intersecting at the anchor pixel p . We represent the horizontal segment by H ( p ) and the vertical segment by V ( p ), and they jointly de?ne the local support skeleton for the pixel p . Instead of ?xing the size of a local cross, we adaptively change its four arm lengths to reliably capture the local image structure. More speci?cally, to con?gure an appropriate cross for the pixel p , we ?rst decide + ? + a quadruple {h ? p , h p , v p , v p } that denotes the left, right, up, and bottom arm length, respectively (see Fig. 2). As our cross-based representation is a general concept, there are actually a variety of speci?c implementations to decide the + ? + arm lengths {h ? p , h p , v p , v p }. Here, we present an ef?cient approach based on color similarity under the connectivity constraint [6]. First, we apply a 3 × 3 median ?lter to the input image I , suppressing the impact of image noise as well as subtle non-Lambertian effects. Next, the arm lengths are decided upon color similarity. Taking h ? p as an example,

? vp – hq + hq

H(q)

q V(p)

– hp

+ hp

H(p)

p U(p)

+ vp

Fig. 2. Con?guration of a local upright cross H ( p ) ∪ V ( p ) for the anchor pixel p , and the constructed full support region U ( p ). The quadruple ? + ? + {h p , h p , v p , v p } de?nes the left, right, up, and bottom arm length of the cross, respectively. q ∈ V ( p ) is a pixel on the vertical segment V ( p ) in (2).

we perform a color similarity testing for a consecutive set of pixels which reside on the left horizontal side of the pixel p . The purpose is to search for the largest left span r ? , where all the pixels covered are similar to the anchor pixel p in color. The computation of r ? can be formulated as follows: ? ? r ? = max ?r

r ∈[1, L ] i ∈[1,r ]

δ( p , pi )?

(1)

where pi = (x p ? i , y p ) and L is the preset maximum arm length. δ( p1 , p2 ) is an indicator function evaluating the color similarity between the pixel p1 and p2 based on all color bands ? ?1, max | I c ( p 1 ) ? I c ( p 2 )| ≤ τ c ∈{R,G,B} δ( p1 , p2 ) = ?0 , otherwise where Ic is the intensity of the color band c. Set empirically, τ controls the con?dence level of color similarity. Once the largest left span r ? is derived from (1), we set the left ? arm length h ? p = max(r , 1). In effect, this enforces a minimum support region of 3 × 3 for robust correspondence matching.

ZHANG et al.: CROSS-BASED LOCAL STEREO MATCHING USING ORTHOGONAL INTEGRAL IMAGES

1075

+ ? + Based on the arm lengths {h ? p , h p , v p , v p } decided for the pixel p , two orthogonal cross segments H ( p ) and V ( p ) are + H ( p) = (x , y ) | x ∈ [x p ? h ? p , x p + h p ], y = y p (2) ? V ( p) = (x , y ) | x = x p , y ∈ [ y p ? v p , y p + v + p] .

Left Image I Disparity hypothesis d, d ∈ [0, dmax] Right Image I'

Local Cross Construction

H(p)∪V(p) Cost Aggregation Disparity Disparity with a WTA 0 Refinement * d dp p Strategy

Local Cross Construction

H'(p')∪V'(p')

Fig. 2 shows the local cross con?guration schematically. Apparently, for each pixel p we only need to store four arm lengths, to represent an adaptive local support cross. This compact representation is in sharp contrast to the adaptive support weight method of a huge memory demand [8]. Given the pixelwise local cross decision results, we can readily construct a shape-adaptive full support region U ( p ) for the pixel p . The key idea is to model the local support region U ( p ) as an area integral of multiple horizontal segments H (q ), sliding along the vertical segment V ( p ) of the pixel p U ( p) =

q ∈V ( p)

Fig. 3.

Framework of the proposed local stereo matching algorithm.

H (q )

(3)

U ( p ) for the left image, the matching cost aggregation will be polluted by outliers in the right image, i.e., pixels from different depths with the pixel p in the support window. Therefore, combining two local support regions to de?ne the aggregation region, the normalized matching cost E d ( p ) between the pixel p and p is computed as follows: 1 1 E d ( p) = ed (s ) (4) Ed ( p) = Ud ( p ) Ud ( p )

s ∈Ud ( p )

where q is a support pixel located on the vertical segment V ( p ). From the local cross representation computed for the pixel q , we can easily retrieve its horizontal segment H (q ). In essence, H (q ) provides a connected set of valid support pixels in the 2-D neighborhood of the pixel p , as shown in Fig. 2. Note that U ( p ) can also be constructed from multiple vertical segments, utilizing H ( p ) as the anchor axis. In fact, our experiments show that there is only a slight difference in functional performance between the two con?gurations. Fig. 1(b) and (c) demonstrate the effectiveness of the proposed technique on the Tsukuba image [13]. Our crossbased local support construction method yields appropriate shape-adaptive windows, closely approximating local image structures. This local support region adaptation is clearly desirable for accurate disparity estimation across different image areas. As we model a local support region U ( p ) by a vertical integral of multiple horizontal segments H (q ), there is a big potential to reduce computation redundancy in practice. Reusing the “overlapping” data computed ?rst from the horizontal scan, we propose an optimization technique in Section III. B. Locally Adaptive Matching Cost Aggregation As an area-based local stereo matching method, the proposed algorithm places a key emphasis on the cost aggregation step (see Fig. 3). Given a pair of hypothetical correspondences, i.e., p = (x p , y p ) in the left image I and p = (x p , y p ) in the right image I , we measure the matching cost between them by aggregating raw matching costs in a local support region. Here, the coordinates of p and p are correlated with a disparity hypothesis d , i.e., x p = x p ? d and yp = yp. For reliable cost aggregation, unlike most of existing local methods [2], [5] we symmetrically consider both local support regions U ( p ) and U ( p ) decided for the pixel p and p , respectively. If we only consider the local support region

with Ud ( p ) = (x , y ) | (x , y ) ∈ U ( p ), (x ? d , y ) ∈ U ( p ) . In (4), ed (s ) denotes the pixel-based raw matching cost and Ud ( p ) is the combined local support region that contains only the valid pixels, likely having similar disparities with the anchor pixels p and p in both images. Because of the restriction of U ( p ), Ud ( p ) ? U ( p ). Ud ( p ) is the number of support pixels in Ud ( p ), used to normalize the aggregated cost E d ( p ). The raw matching cost is computed from a pair of corresponding pixels, i.e., s in the left image and s in the right image (with a disparity value d ) ? ? ed (s ) = min ?

c ∈{R,G,B}

I c (s ) ? I c (s ) , T ?

(5)

where T controls the truncation limit of the matching cost. C. Disparity Selection and Re?nement After matching cost aggregation, we use a Winner-Takes-All (WTA) strategy [1] for the disparity selection as follows: d0 p = arg mind E d ( p ), d ∈ [0, dmax ] (6)

where dmax is the maximum value of possible disparities. d 0 p is the initial disparity estimate for the pixel p . It can be further re?ned using a local high-con?dence voting scheme developed in our previous work [6]. As shown in Fig. 1(c), pixels contained in a local support region mostly originate from the same scene patch, and hence they share similar disparities. For each anchor pixel p , if we create a histogram 0 , a distribution peak is ? p of the initial disparity estimate ds very likely to occur, where s ∈ U ( p ) is a pixel in the adaptive neighborhood U ( p ). The histogram bin associated with this peak corresponds to a statistically optimal disparity value d ? p, with which we implicitly perform a piecewise smoothness regularization [6]. Accordingly, the ?nal disparity of the pixel p, d ? p , is decided as d? p = arg maxd ? p (d ), d ∈ [0, dmax ]. (7)

1076

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 7, JULY 2009

Σ

Hd(q) q s Vd(p) Hd(p) p Ud(p)

– hqd

Σ

? vpd

(1)

...

q

s

+ hqd

q (2)

[0, a–1]

Σ

a

b

...

=

b

– hpd

p

+ hpd

Σ

[0, b]

Σ

p

(3)

a

b

+ vpd

Σ = [0, Σb] –[0,Σ [a, b] a–1]

(a)

(b)

(c)

(d)

Fig. 4. Illustration of the proposed OII technique. (a) In the matching cost aggregation step of (4), raw matching costs are aggregated from an arbitrarily shaped local region Ud ( p ), for each anchor pixel p . The combined aggregation region Ud ( p ) is an expansion of the combined local cross Hd ( p ) ∪ Vd ( p ). In the proposed OII technique, we decompose the matching cost aggregation into two orthogonal 1-D integration steps, i.e., (b) a horizontal integration pass followed by (c) a vertical integration pass, and (d) a degenerated 1-D integral image technique is further applied to accelerate these two 1-D integration steps.

III. FAST C OST AGGREGATION WITH THE OII T ECHNIQUE Aggregating raw matching costs in a shape-adaptive support region enhances disparity accuracy, but on the other side it also poses a challenge for fast implementation. The reason is that multiple raw cost additions—in an irregular support region Ud ( p )—are required for each anchor pixel p . As (4) shows, the computational load is directly correlated with the local support region size Ud ( p ) . In fact, for the Tsukuba image, Ud ( p ) on average is as large as 320. Instead of summing raw matching costs ed (s ) in (4) directly, we propose an ef?cient OII technique to accelerate the aggregation over irregularly shaped regions. The major ideas are twofold. First, we decompose cost aggregation performed in an arbitrarily shaped region into two orthogonal 1-D aggregations, namely, a horizontal aggregation step, followed by a vertical one. In such a way, the computational load becomes linear to the 1-D span of the support region, rather than the area size. Second, we accelerate these two orthogonal 1-D aggregation steps by pre-computing a horizontal integral image as well as a vertical one. This optimization makes 1-D aggregation over any line segments of varying lengths in constant time. Compared with the conventional application of the integral image technique in fast rectangular sum computation [5], the proposed OII technique goes one step further. It provides a general approach for fast aggregation over an irregularly shaped 2-D area. Before presenting the procedure of the proposed OII technique, we ?rst formulate the combined aggregation region Ud ( p ) as an expansion of the combined local cross, similar to the local support region U ( p ) in (3). For a pixel p = (x p , y p ), the combined local cross consists of two orthogonal line segments Hd ( p ) and Vd ( p ), as shown in Fig. 4(a) Hd ( p ) = (x , y p ) | (x , y p ) ∈ H ( p ), (x ? d , y p ) ∈ H ( p ) Vd ( p ) = (x p , y ) | (x p , y ) ∈ V ( p ), (x p ? d , y ) ∈ V ( p ) . (8) Here, H ( p ) ∪ V ( p ) and H ( p ) ∪ V ( p ) are the local cross con?gurations for the pixel p and its correspondence p , respectively. Given the pixelwise combined local cross Hd ( p ) ∪ Vd ( p ) decided from (8), the combined local aggregation region Ud ( p ) for the pixel p can be precisely

TABLE I M ATCHING C OST AGGREGATION S PEEDUP U SING O UR OII T ECHNIQUE Tsukuba 384×288 15 1.6 0.14 11.4 Venus 434×383 19 4.0 0.26 15.4 Teddy 450×375 59 7.8 0.82 9.5 Cones 450×375 59 4.5 0.84 5.4

Image resolution Max disparity dmax Straightforward time (s) Our OII time (s) Speedup factor

expressed as Ud ( p ) =

q ∈Vd ( p )

Hd (q )

(9)

where q is a pixel located on the vertical segment Vd ( p ). Substituting the above orthogonal decomposition of Ud ( p ) into (4), we now transform a double integral of raw matching costs ed (s ) into two orthogonal iterated integrals. More specifically, the aggregated cost E d ( p ) is computed as follows: ? ? Ed ( p) = ed (s ) =

q ∈Vd ( p )

?

ed (s )? =

H Ed (q )

(10) H (q ) represents the result after the horizontal where E d integration step. Clearly, the computation of E d ( p ), over an irregular aggregation region Ud ( p ), is now decomposed into a horizontal pass plus a vertical pass [see Fig. 4(b) and (c)]. As a result, the proposed OII technique reduces computation redundancy signi?cantly, by reusing the H (q ) among the pixels in the horizontal integration result E d vertical neighborhood. In addition, we have adopted the general integral image technique [11] in a degenerated 1-D form, further accelerating the integration over any horizontal or vertical line segments [see Fig. 4(d)]. The overall OII technique can be summarized as the following four steps. Step 1: Given the pixelwise raw matching cost ed (x , y ), we ?rst build a horizontal integral image S H (x , y ), storing the cumulative row sum as S H (x , y ) =

0≤m ≤x H

s ∈Ud ( p )

s ∈ Hd (q )

q ∈Vd ( p )

ed (m , y ) (11)

= S (x ? 1, y ) + ed (x , y ).

ZHANG et al.: CROSS-BASED LOCAL STEREO MATCHING USING ORTHOGONAL INTEGRAL IMAGES

1077

TABLE II Q UANTITATIVE E VALUATION R ESULTS OF A REA -BASED L OCAL S TEREO M ETHODS FOR THE O RIGINAL M IDDLEBURY S TEREO DATABASE [13] Tsukuba Untex. 2.16 0.65 1.98 1.65 3.54 11.62 3.80 Sawtooth Untex. Disc. 0.08 3.93 0.27 5.48 0.32 4.77 0.23 7.09 0.45 7.87 0.72 9.29 0.72 13.97 Venus Untex. 0.29 0.61 0.61 1.16 2.18 7.21 6.82 Map Disc. 9.55 13.58 9.53 2.98 3.94 2.49 9.35

Algorithm Our method Adapt.Weights [8] Adapt. Polygon [6] Variable Win. [5] Compact Win. [4] Bay. Diff. [14] Shiftable Win. [3]

All 2.06 1.51 2.29 2.35 3.36 6.49 5.23

Disc. 7.41 7.24 9.39 12.17 12.91 12.29 24.66

All 0.99 1.14 1.32 1.28 1.61 1.45 2.21

All 0.69 1.14 0.80 1.23 1.67 4.00 3.74

Disc. 3.72 4.49 3.67 13.35 13.24 18.39 12.94

All 1.02 1.47 0.70 0.24 0.33 0.20 0.66

Fig. 5. Dense disparity maps for the Tsukuba, Venus, Teddy, and Cones stereo datasets (from left to right), using the proposed local stereo matching algorithm. Top row: the input left images. Bottom row: the resulting disparity maps. Rather than estimating two disparity maps for left-right consistency check [8], [9], we applied a simple border extrapolation step here. All the results are available at http://vision.middlebury.edu/stereo/eval.

Here, S H (x , y ) can be iteratively computed from S H (x ? 1, y ) with only one addition. When x = 0, S H (?1, y ) = 0. Step 2: For each pixel q = (xq , yq ) on the left image lattice, H (q ) in we then compute the horizontal integral E d H (10), using the horizontal integral image S (x , y ), as follows:

+ H Ed (q ) = S H xq + h qd , yq ? ? 1, yq . ? S H xq ? h qd

Step 4: Based on the vertical integral image S V (x , y ), we derive the fully aggregated matching cost E d ( p ) for the pixel p = (x p , y p ), with one ?nal subtraction Ed ( p) = S V x p , y p + v + pd ? SV x p, yp ? v? pd ? 1 . (14)

(12)

? + As shown in Fig. 4(a), h qd and h qd are the left and right arm lengths of the combined aggregation cross, decided for the anchor pixel q . Fig. 4(d) illustrates the subtraction in (12). Step 3: Taking the computed horizontal matching cost H ( x , y ) as the input, we create a vertical integral Ed image S V (x , y ). It stores the cumulative column sum as

+ Similarly, v ? pd and v pd are the up and bottom arm lengths of the combined aggregation cross, for the anchor pixel p .

S V (x , y ) =

0≤n ≤ y V

H Ed (x , n )

H = S (x , y ? 1) + E d (x , y ).

(13)

As in Step 1, only one addition is needed to compute S V (x , y ). When y = 0, S V (x , ?1) = 0.

Following the above steps, we only need four additions/subtractions for an anchor pixel to aggregate raw matching costs over any arbitrarily shaped regions, regardless of the region size. As a comparison, we measured the respective execution time of the straightforward cost aggregation in (4) and that of our proposed OII technique. Including also memory access overhead in the total execution time, Table I shows that our OII technique leads to a speedup factor of 5–15 over the straightforward computation method. The speedup factor generally increases as the average size of the combined aggregation region Ud ( p ) increases. As a result, the Venus dataset gains the largest speedup among the test images, because it has large piecewise planar objects (see Fig. 5).

1078

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 19, NO. 7, JULY 2009

TABLE III Q UANTITATIVE E VALUATION R ESULTS FOR THE N EW M IDDLEBURY S TEREO DATABASE (N OCC . M EANS N ON -O CCLUDED R EGIONS ) [13] Tsukuba Nocc. SegSupport [9] Adapt. Weights [8] Our method RealtimeBP [15] 2OP+occ [16] Layered [17] MultiCamGC [18] 1.25 1.38 1.99 1.49 2.91 1.57 1.27 All 1.62 1.85 2.65 3.40 3.56 1.87 1.99 Disc. 6.68 6.90 6.77 7.87 7.33 8.28 6.48 Nocc. 0.25 0.71 0.62 0.77 0.24 1.34 2.79 Venus All 0.64 1.19 0.96 1.90 0.49 1.85 3.13 Disc. 2.59 6.13 3.20 9.00 2.76 6.85 3.60 Nocc. 8.43 7.88 9.75 8.72 10.9 8.64 12.0 Teddy All 14.2 13.3 15.1 13.2 15.4 14.3 17.6 Disc. 18.2 18.6 18.2 17.2 20.6 18.5 22.0 Nocc. 3.77 3.97 6.28 4.61 5.42 6.59 4.89 Cones All 9.87 9.79 12.7 11.6 10.8 14.7 11.8 Disc. 9.77 8.26 12.9 12.4 12.5 14.4 12.1 Average percent of bad pixels 6.44 6.67 7.60 7.69 7.75 8.24 8.31

Algorithm

IV. E XPERIMENTAL R ESULTS AND D ISCUSSIONS We evaluated the performance of the proposed stereo matching method using the benchmark Middlebury stereo database [13]. The parameters are set constant in all experiments on different stereo datasets, i.e., L = 17, τ = 20, and T = 60. For the original Middlebury stereo database with four test pairs, i.e., Tsukuba, Sawtooth, Venus, and Map, Table II summarizes the quantitative performance of our method and those of other area-based local stereo methods, roughly in descending order of overall performance. Speci?cally, Table II reports the percentage of “bad pixels” whose absolute disparity error is greater than 1. For each pair of images, the error rates for all regions, untextured regions (untex., except for the Map image), and depth discontinuities (disc.) are reported respectively. In general, the proposed method is the best among the state-of-the-art area-based local methods, outperforming the leading local methods [6], [8]. Fig. 5 and Table III present the visual and quantitative results for the new Middlebury stereo database [13], which includes more challenging stereo images, i.e., Teddy and Cones. The proposed method yields discontinuity-preserving smooth disparity maps for the new testbed images. It even achieves a better performance than some complex global stereo matching methods [16]–[18], as shown in Table III. Though our method is slightly outperformed by the two local methods [8], [9], its execution speed is enormously faster than theirs as shown later. We have also examined the robustness of the proposed method when varying parameter settings. First, ?xing τ to 20, we changed the value of L in (1). Fig. 6(a) shows that when L is larger than 15, the proposed method is fairly insensitive to the maximum possible arm length L of a local cross. Second, ?xing L to 17, we changed the value of τ , as shown in Fig. 6(b). In general, the disparity accuracy remains nearly constant, when τ varies from 18 to 28. Without any special code-level optimization, the proposed local stereo method runs at a favorable speed on a 3.0 GHz Intel Pentium 4 CPU. The execution time for the Tsukuba, Venus, Teddy, and Cones stereo datasets is 0.9, 1.6, 2.4, and 2.4 s, respectively. This speed is dozens of times faster than the adaptive support weight method [8], which takes 60 s for the Tsukuba image on an AMD 2700+ machine. On a more advanced CPU than ours, Tombari et al’s approach [9] even needs about 33 min for the Teddy image [10].

Disparity error (%)

22 20 18 16 14 12 10 8 6 4 2 0

Tsukuba Venus Teddy Cones

5

7

9

11 13 15 17 19 21 23 L

(a)

20 18 16 14 12 10 8 6 4 2 0

Disparity error (%)

Tsukuba Venus Teddy Cones

10

14

18

τ

(b)

22

26

30

Fig. 6. Performance evaluation (disparity error rates for all regions) of the proposed method when varying the parameter values, tested on four stereo images [13]. (a) Changing L while τ = 20. (b) Changing τ while L = 17.

V. C ONCLUSION This letter has proposed a cross-based local stereo matching algorithm. Based on the pixelwise compact cross, we dynamically construct shape-adaptive local support regions that approximate varying image structures accurately. Evaluation with the Middlebury stereo benchmark shows that the proposed method is ranked among the best performing local methods. Furthermore, we have proposed an ef?cient orthogonal integral image technique, which accelerates cost aggregation over irregularly shaped support windows. In future work, we intend to optimize the proposed stereo method on a parallel platform.

ZHANG et al.: CROSS-BASED LOCAL STEREO MATCHING USING ORTHOGONAL INTEGRAL IMAGES

1079

R EFERENCES

[1] D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” Int. J. Comput. Vision, vol. 47, no. 1, pp. 7–42, May 2002. [2] A. Fusiello, V. Roberto, and E. Trucco, “Ef?cient stereo with multiple windowing,” in Proc. IEEE Conf. Comput. Vision Pattern Recognition, San Juan, PR, 1997, pp. 858–863. [3] S. B. Kang, R. Szeliski, and J. Chai, “Handling occlusions in dense multi-view stereo,” in Proc. IEEE Conf. Comput. Vision Pattern Recognition, vol. 1, 2001, pp. 103–110. [4] O. Veksler, “Stereo correspondence with compact windows via minimum ratio cycle,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 12, pp. 1654–1660, Dec. 2002. [5] O. Veksler, “Fast variable window for stereo correspondence using integral images,” in Proc. IEEE Conf. Comput. Vision Pattern Recognition, vol. 1, 2003, pp. 556–561. [6] J. Lu, G. Lafruit, and F. Catthoor, “Anisotropic local high-con?dence voting for accurate stereo correspondence,” in Proc. SPIE-IS&T Electron. Imaging, vol. 6812, Jan. 2008, pp. 605822-1–605822-10. [7] Y. Xu, D. Wang, T. Feng, and H.-Y. Shum, “Stereo computation using radial adaptive windows,” in Proc. Int. Conf. Pattern Recognition, vol. 3, 2002, pp. 595–598. [8] K. J. Yoon and S. Kweon, “Adaptive support-weight approach for correspondence search,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 4, pp. 650–656, Apr. 2006.

[9] F. Tombari, S. Mattoccia, and L. D. Stefano, “Segmentation based adaptive support for accurate stereo correspondence,” in Proc. Paci?cRim Symp. Image Video Technol., 2007, pp. 427–438. [10] F. Tombari, S. Mattoccia, L. D. Stefano, and E. Addimanda, “Classi?cation and evaluation of cost aggregation methods for stereo correspondence,” in Proc. IEEE Conf. Comput. Vision Pattern Recognition, 2008, pp. 1–8. [11] F. Crow, “Summed-area tables for texture mapping,” in Proc. ACM SIGGRAPH, 1984, pp. 207–212. [12] P. Viola and M. Jones, “Robust real-time face detection,” in Int. J. Comput. Vision, vol. 57, no. 2, pp. 137–154, 2004. [13] D. Scharstein and R. Szeliski, Middlebury Stereo Vision Page (2008). [Online]. Available: http://vision.middlebury.edu/stereo/ [14] D. Scharstein and R. Szeliski, “Stereo matching with nonlinear diffusion,” Int. J. Comput. Vision, vol. 28, no. 2, pp. 155–174, 1998. [15] Q. Yang, L. Wang, R. Yang, S. Wang, M. Liao, and D. Nistr, “Realtime global stereo matching using hierarchical belief propagation,” in Proc. Brit. Mach. Vision Conf., 2006, pp. 989–998. [16] O. Woodford, P. Torr, I. Reid, and A. Fitzgibbon, “Global stereo reconstruction under second order smoothness priors,” in Proc. IEEE Conf. Comput. Vision Pattern Recognition, Anchorage, U.K., 2008, pp. 1–8. [17] C. L. Zitnick, S. B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski, “High-quality video interpolation using a layered representation,” in Proc. ACM SIGGRAPH, 2004, pp. 600–608. [18] V. Kolmogorov and R. Zabih, “Multi-camera scene reconstruction via graph cuts,” in Proc. Eur. Conf. Comput. Vision, 2002, pp. 8–40.

相关文章:

更多相关标签:

- Accurate and efficient stereo processing by semi-global matching and mutual information
- Dense Depth Map Using Belief Propagation
- Generalized belief propagation
- A dense stereo matching using two-pass dynamic programming with generalized ground control points
- 立体视觉匹配
- Orthogonal Cross Cylinder Using Segmentation Based Environment Modeling
- stereo matching based on
- Stereo matching using belief propagation
- Cross-Based Local Multipoint Filtering
- Local Search Using Orthogonal Design of Experiment
- Feature Based Dense Stereo Matching using Dynamic Programming and Color
- A graph-based approach to corner matching using mutual information as a local similarity me
- STEREO MATCHING USING HIERARCHICAL BELIEF PROPAGATION ALONG
- Real-Time Stereo-Based Head Detection using Size, Shape and Disparity Constraints
- Segment-Based Stereo Matching Using Belief Propagation and a Self-Adapting Dissimilarity Me