Data Loading...

Linear Time Approximation Schemes for the Gale-Berlekamp ... Flipbook PDF

Linear Time Approximation Schemes for the Gale-Berlekamp Game and Related Minimization Problems Marek Karpinski ∗ Dept.

135 Views
104 Downloads
FLIP PDF 271.64KB

Linear Time Approximation Schemes for the Gale-Berlekamp Game and Related Minimization Problems ∗

Marek Karpinski

†

Dept. of Computer Science University of Bonn

Dept. of Computer Science Brown University

[email protected]

[email protected]

ABSTRACT We design a linear time approximation scheme for the GaleBerlekamp Switching Game and generalize it to a wider class of dense fragile minimization problems including the Nearest Codeword Problem (NCP) and Unique Games Problem. Further applications include, among other things, finding a constrained form of matrix rigidity and maximum likelihood decoding of an error correcting code. As another application of our method we give the first linear time approximation schemes for correlation clustering with a fixed number of clusters and its hierarchical generalization. Our results depend on a new technique for dealing with small objective function values of optimization problems and could be of independent interest.

Categories and Subject Descriptors F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems—Computations on discrete structures

General Terms Algorithms, Theory

Keywords Approximation Algorithms, Correlation and Hierarchical Clustering, Dense Instances, Gale-Berlekamp Game, Linear Time Approximation Schemes, Minimum Constraint Satisfaction, Nearest Codeword Problem

1.

Warren Schudy

INTRODUCTION

The Gale-Berlekamp Switching Game (GB Game) was introduced independently by Elwyn Berlekamp [10, 23] and David Gale [23] in the context of coding theory. This game ∗Parts of this work done while visiting Microsoft Research. †Parts of this work done while visiting University of Bonn.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. (Corrected May 29, ’09) STOC’09, May 31–June 2, 2009, Bethesda, Maryland, USA. Copyright 2009 ACM 978-1-60558-506-2/09/05 ...$5.00.

is played using of a m by m grid of lightbulbs. The adversary chooses an arbitrary subset of the lightbulbs to be initially “on.” Next to every row (resp. column) of lightbulbs is a switch, which can be used to invert the state of every lightbulb in that row (resp. column). The protagonist’s task is to minimize the number of lit lightbulbs (by flipping switches). This problem was proven very recently to be NP-hard [21]. Let Φ = {−1, 1} ⊂ R. For matrices M, N let d(M, N ) denote the number of entries where M and N differ. It is fairly easy to see that the GB Game is equivalent to the following natural problems: [21] • Given matrix M ∈ Φm×m find column vectors x, y ∈ Φm minimizing d(M, xy T ). find x, y ∈ Fm • Given matrix M ∈ Fm×m 2 minimizing 2 P 1 1 (M = 6 x ⊕ y ) where F is the finite field over ij i j 2 ij two elements with addition operator ⊕. • Given matrix M ∈ Φm×m find column vectors x, y ∈ Φm maximizing xT M y. We focus on the equivalent minimization versions and prove existence of linear-time approximation schemes for them. Theorem 1. For every ǫ > 0 there is a randomized 1 + ǫ-approximation algorithm for the Gale-Berlekamp Switching Game (its minimization version) with runtime O(m2 ) + 2 2O(1/ǫ ) . A constraint satisfaction problem (CSP) consists of n variables over a domain of constant-size d and a collection of arity-k constraints (k constant). The objective of MINkCSP (MAX-kCSP) is to minimize the number of unsatisfied (maximize the number of satisfied) constraints. An (everywhere) dense instance is one where every variable is involved in at least a constant times the maximum possible number of constraints, i.e. Ω(nk−1 ). For example, the GB Game is a dense MIN-2CSP since each of the n = 2m variables is involved in precisely m = n/2 constraints. It is natural to consider generalizing Theorem 1 to all dense MIN-CSPs, but unfortunately many such problems have no PTASs unless P=NP [7] so we must look at a restricted class of MIN-CSPs. A constraint is fragile if modifying any variable in a satisfied constraint makes the constraint unsatisfied. A CSP is fragile if all of its constraints are. Clearly the GB Game can be modeled as a fragile dense MIN-2CSP. Our results generalize to all dense fragile MIN-kCSPs. We now formulate our general theorem.

Theorem 2. For every ǫ > 0 there is a randomized 1 + ǫapproximation algorithm for dense fragile MIN-kCSPs with 2 runtime O(nk ) + 2O(1/ǫ ) . Any approximation algorithm for MIN-kCSP must read the entire input to distinguish between instances with optimal value of 1 and 0 and hence the O(nk ) term of the runtime cannot be improved.1 It is fairly easy to see that improving 2 2 the second term (to 2o(1/ǫ ) ) would imply a O(n2 ) + 2o(1/ǫ ) time PTAS for average-dense max cut. Over a decade worth of algorithms [5, 6, 16, 2, 20] for MAX-kCSP all have depen2 dence on ǫ of at best 2O(1/ǫ ) , so any improvement to the runtime of Theorem 2 would be surprising. We begin exploring applications of Theorem 2 by generalizing the Gale-Berlekamp game to higher dimensions k and to arbitrary k-ary equations. Given n variables xi ∈ F2 and m linear equations of the form xi1 ⊕ xi2 ⊕ . . . ⊕ xik = 0 (or = 1), the k-ary Nearest Codeword Problem (NCP) consists of finding an assignment minimizing the number of unsatisfied equations. As the name suggests, the Nearest Codeword Problem can be interpreted as maximum likelihood decoding for linear error correcting codes. The Nearest Codeword Problem has fragile constraints so Theorem 2 implies a linear-time PTAS for the dense k-ary Nearest Codeword Problem. The Unique Games Problem (UGP) [12, 18] consists of solving MIN-2CSPs where the constraints are permutations over a finite domain D of colors; i.e. a constraint involving variables xu and xv is satisfied iff xu = πuv (xv ) for permutation πuv . These constraints are clearly fragile, so Theorem 2 implies also a linear-time PTAS for the dense Unique Game Problem (with a constant number of colors). The multiway cut problem, also known as MIN-dCUT, consists of coloring an undirected graph with d colors, such that each of d terminal nodes ti is colored with color i, minimizing the number of bichromatic edges. The requirement that the terminal nodes must be colored particular colors does not fit in our dense fragile MIN-CSP framework, so we use a work-around: let the constraint corresponding to an edge be satisfied only if it is monochromatic and the endpoint(s) that are terminals (if any) are colored correctly. As another application, consider MIN-kSAT, the problem of minimizing the number of satisfied clauses of a boolean expression in conjunctive normal form where each clause has k variables (some negated). We consider the equivalent problem of minimizing the number of unsatisfied conjunctions of a boolean expression in disjunctive normal form. A conjunction can be represented as a fragile constraint indicating that all of the negated variables within that constraint are false and the remainder are true, so Theorem 2 applies to MIN-kSAT as well. Finally we consider correlation clustering with a fixed number of clusters [17, 1]. The input to the correlation clustering problem consists of a complete graph where each edge is labeled either “+” or “-”. The objective is to color the vertices with d colors minimizing the number of bichromatic “+” edges plus the number of monochromatic “-” edges. Correlation clustering with two clusters is equivalent to the following symmetric variant of the Gale-Berlekamp game: given a symmetric matrix M ∈ Φm×m find a column vector x ∈ Φm 1

Proof idea: consider a distribution of instances with cost zero 50% of the time and otherwise one random violated constraint and apply Yao’s principle.

minimizing d(M, xxT ). Like the GB game, correlation clustering with 2 clusters is fragile and Theorem 2 gives a lineartime approximation scheme. For d > 2 correlation clustering is not fragile but has properties allowing for a PTAS anyway. We also solve a generalization of correlation clustering called hierarchical clustering [1]. We prove the following theorem. Theorem 3. For every ǫ > 0 there is a randomized 1 + ǫapproximation algorithm for correlation clustering and hierarchical clustering with fixed number of clusters d with run6 2 ning time n2 2O((log d)d /ǫ ) . d

2

The above results improves on the running time nO(9 /ǫ ) · d 2 log n = nO(9 /ǫ ) of the previous PTAS for correlation clustering by Giotis and Guruswami [17] in two ways: first the polynomial is linear in the size of the input and second the exponent is polynomial in d rather than exponential. Our result for hierarchical clustering with a fixed number of clusters is the first PTAS for this problem. We prove Theorem 2 in Sections 2 and 3 and Theorem 3 in Sections 4 and 5.

Related Work Elwyn Berlekamp built a physical model of the GB game with either m = 8 or m = 10 [10, 23] at Bell Labs in the 1960s motivated by the connection with coding theory and the Nearest Codeword Problem. Several works [15, 10] investigated the cost of worst-case instances of the GB Game; for example the worst-case instance for m = 10 has cost 35 [10]. Roth and Viswanathan [21] showed very recently that the GB game is in fact NP-hard. They also give a linear-time algorithm if the input is generated by adding random noise to a cost zero instance. Replacing Φ with R in the third formulation of the GB Game yields the problem of computing the 1-rigidity of a matrix. Lower bounds on matrix rigidity have applications to circuit and communication complexity [19]. The Nearest Codeword Problem is hard to approximate in general [4, 11] better than nΩ(1/ log log n) . It is hard even if each equation has exactly 3 variables and each variable appears in exactly 3 equations [9]. There is a O(n/ log n) approximation algorithm [8, 3]. Over a decade ago two groups [6, 13] independently discovered polynomial-time approximation algorithms for MAXCUT achieving additive error of ǫn2 , implying a PTAS for average-dense MAX-CUT instances. The fastest algorithms [2, 2 20] have constant runtime 2O(1/ǫ ) for approximating the value of any MAX-kCSP over a binary domain D. This can be generalized to an arbitrary domain D. To see this, note that we can code D in binary and correspondingly enlarge the arity of the constraints to k⌈log |D|⌉. A random 4 ˜ sample of O(1/ǫ ) variables suffices to achieve an additive approximation [2, 20, 22]. These results extend to MAXBISECTION [14]. Arora, Karger and Karpinski [6] introduced the first PTASs for dense minimum constraint satisfaction problems. They 2 give PTASs with runtime nO(1/ǫ ) [6] for min bisection and multiway cut (MIN-d-CUT). Bazgan, Fernandez de la Vega and Karpinski [7] designed PTASs for MIN-SAT and the nearest codeword problem (which generalizes the GB Game) 2 with runtime nO(1/ǫ ) . Giotis and Guruswami [17] give a

FRAGILE-DENSE ALGORITHM

2.1 Intuition Consider the following scenario. Suppose that our nemesis, who knows the optimal solution to the Gale-Berlekamp problem shown in Figure 1, gives us a constant size random sample of it to tease us. How can we use this information to construct a good solution? One reasonable strategy is to set each variable greedily based on the random sample. Throughout this section we will focus on the row variables; the column variables are analogous. For simplicity our example has the optimal solution consisting of all of the switches in one position, which we denote by α. For row v, the greedy strategy, resulting in assignment x(1) , is to set switch v to α iff ˆb(v, α) < ˆb(v, β), where ˆb(v, α) (resp. ˆb(v, β)) denotes the number of light bulbs in the intersection of row v and the sampled columns that would be lit if we set the switch to position α (resp. β). With a constant size sample we can expect to set most of the switches correctly but a constant fraction of them will elude us. Can we do better? Yes, we simply do greedy again. The greedy prices analogous to ˆb are shown in the columns labeled with b in the middle of Figure 1. For the example at hand, this strategy works wonderfully, resulting in us reconstructing the optimal solution exactly, as evidenced by the b(x(1) , v, α) < b(x(1) , v, β) for all v. In general this does not reconstruct the optimal solution but provably gives something close. Some of the rows, e.g. the last one, have b(x(1) , v, α) much less than b(x(1) , v, β) while other rows, such as the first, have b(x(1) , v, α) and b(x(1) , v, β) closer together. We call variables with |b(x(1) , v, α) − b(x(1) , v, β)| > Θ(n) clearcut. Intuitively, one would expect the clearcut rows to be more likely correct than the nearly tied ones. In fact, we can show that we get all of the clearcut ones correct, so the remaining problem is to choose values for the rows that are close to tied. However, those rows have a lot of lightbulbs lit, suggesting that the optimal value is large, so it is reasonable to run an additive approximation algorithm and use that to set the remaining variables. Finally observe that we can simulate the random sample given by the nemesis by simply taking a random sample of the variables and then doing exhaustive search of all possibly assignments of those variables. We have just sketched our algorithm. Our techniques differ from previous work [7, 6, 17] in two key ways: 1. Previous work used a sample size of O((log n)/ǫ2 ), which allowed the clearcut variables to be set correctly after a single greedy step. We instead use a constantsized sample and run a second greedy step before identifying the clearcut variables. 2. Our algorithm is the first one that runs the additive error algorithm after identifying clearcut variables. Previous work ran the additive error algorithm at the beginning.

1 1 1 0 2 0

2 2 2 3 1 3

α α α α α α α α α α α α

b(x(1),v,β) b(x(1),v,α)

2.

Sampled variables

b(v,β) b(v,α)

PTAS for correlation clustering with d clusters with rund 2 time nO(9 /ǫ ) . We give linear-time approximation schemes for all of the problems mentioned in this paragraph except for the MIN-BISECTION problem.

2 1 1 1 2 1

4 5 5 5 4 5

α α α α α α α α α α β α (1)

x* (Optimum)

x

Figure 1: An illustration of our algorithmic ideas on the Gale-Berlekamp Game. The same ideas apply to all dense fragile CSPs. In the remainder of the paper we do not explicitly discuss the GB Game but present our ideas in the abstract framework of fragile-dense CSPs.

2.2 Model We now give a formulation of MIN-kCSP that is suitable ` ´ for our purposes. For non-negative integers n, k, let nk = ` ´ n! , and for a given set V let Vk denote the set of k!(n−k)! subsets of V of size k (analogous to 2S for all subsets of S). There is a set V of n variables, each of which can take any value in constant-sized domain D. Let xv ∈ D denote the value of variable v in assignment x. ` ´ Consider some set of variables I ∈ Vk . There may be many constraints over these variables; number them arbitrarily. Define p(I, ℓ, x) to be 1 if the ℓth constraint over`I ´is unsatisfied in assignment x and zero otherwise. For I ∈ Vk , P we define pI (x) = η1 ℓ p(I, ℓ, x), where η is a scaling factor to ensure 0 ≤ pI (x) ≤ 1 (e.g. η = 2k for MIN-kSAT). For notational simplicity we write pI as a function of a complete assignment, ` ´but pI (x) only depends on xu for variables u ∈ I. For I 6∈ Vk define pI (x) = 0. Definition 4. On input V, p a minimum constraint satisfaction problem (MIN-kCSP) is aPproblem of finding an assignment x minimizing Obj(x) = I∈(V ) pI (x). k

Let Rvi (x) be an assignment over the variables V that agrees with x for all u ∈ V except for v where it is i; i.e. i if u = v . We will frequently use the Rvi (x)u = xu otherwise P identity Rvxv (x) = x. Let b(x, v, i) = I∈(V ):v∈I pI (Rvi (x)) k be the number of unsatisfied constraints v would be in if xv were set to i (divided by η). We say the ℓth constraint over I is fragile if p(I, ℓ, Rvi (x))+ p(I, ℓ, (Rvj (x)) ≥ 1 for all assignments x, variables v and distinct values i and j.

Definition 5. A Min-kCSP is δ-fragile-dense (or simply ` n ´ fragile-dense) for some δ > 0 if b(x, v, i)+b(x, v, j) ≥ δ k−1 for all assignments x, variables v and distinct values i and j. Lemma 6. For any δ > 0 an instance ` n ´ where every variable v ∈ V participates in at least δη k−1 fragile constraints is δ-fragile-dense.

Proof. By definitions: b(x, v, i) + b(x, v, j) X (pI (Rvi (x)) + pI (Rvj (x))) = V I∈( k ):v∈I X 1X (p(I, ℓ, Rvi (x)) + p(I, ℓ, Rvj (x))) = η ℓ I∈(V :v∈I k) X 1 ≥ · (The number of fragile constraints over I) η I∈(V :v∈I k) ! ! n n δη =δ ≥ η k−1 k−1

We will make no further mention of individual constraints, η or fragility; our algorithms and analysis use pI and the fragile-dense property exclusively.

Algorithm 1 Our 1 + ǫ approximation for δ-fragile-dense MIN-kCSP with variables taking values from domain D. ` ´ ǫ δ2 n 1: Run a 1+ǫ · 72k additive approximation algorithm. k ` ´ 2: if Obj(answer) ≥ nk δ 2 /(72k) then 3: Return answer. 4: else 5: Let s = 18 log(480|D|k/δ) δ2 ` V ´ 6: Draw S1 , S2 , . . . , Ss randomly from k−1 with replacement. S 7: for Each assignment xˆ∗ of the variables in sj=1 Sj do 8: For all v ∈ nV and i ∈ D let ˆb(v, i) = (k−1) Ps pS ∪{v} (Rvi (xˆ∗ )) j j=1 s (1) 9: For all v ∈ V let xv = arg mini ˆb(v, i)

10: 11:

12:

2.3 Algorithm that b(x∗ , v, i) can be written as a sum, over S ∈ ` VRecall ´ , of pS∪{v}(Rvi (x∗ )), which is a function of x∗u for k−1 u ∈ S. We estimate b(x∗ , v, i) with a random sample of s = 18 log(480|D|k/δ) of its terms. Let S1 , S2 , . . . , Ss be indeδ2 pendent random samples of k − 1 variables from V . Later (see Lemma 13) we show that ` n ´ s X ˆb(v, i) = k−1 pS ∪{v} (Rvi (xˆ∗ )) s j=1 j

is an unbiased estimator of b(x∗ , v, i). We now describe our linear-time Algorithm 1 for fragiledense MIN-k-CSPs. ` ´ We first (lines 1–3) dispose of the easy case OPT=Ω( nk ) by running an additive error algorithm and returning the output if it is sufficiently costly. Second (lines 5–7) we take a random sample of variables. The optimal assignment x∗ is of course unknown to us so we simply try every possible assignment of the sampled variables (line 7). We then compute ˆb (line 8) and then make a preliminary assignment x(1) by setting each variable greedily relative to ˆb (line 9). To reduce noise we do a second greedy step, yielding assignment x(2) (line 10). While constructing x(2) we make a note when the best value for a variable is far better than the second best; we fix such clear-cut variables to their values in assignment x(2) . We finally run an additive error algorithm (line 12) to set the remaining variables.

3.

ANALYSIS OF ALGORITHM 1

We use one of the known additive error approximation algorithms for MAX-kCSP problems. Theorem 7. [20] For any MAX-kCSP (or MIN-kCSP) and any ǫ′ > 0 there is a randomized algorithm which returns an assignment of cost Y such that |Y − OP T | ≤ ǫ′ nk in ′2 runtime O(nk ) + 2O(1/ǫ ) . Throughout the rest of the paper let x∗ denote an optimal assignment. First consider Algorithm 1 when the “then” branch of the “if” is taken. Choose constants appropriately so that the additive error algorithm fails with probability at most 1/10 and

13: 14: 15:

(2)

For all v ∈ V let xv = arg mini b(x(1) , v, i) (2) Let C = {v ∈ V : b(x(1) , v, xv ) < b(x(1) , v, j) − ` n ´ (2) δ k−1 /6 for all j 6= xv }. \C|δ `n´ Find x(3) of cost at most ǫ|V3n + min [Obj(x)] k using an additive approximation algorithm, where the minimum ranges over x such that xv = (2) xv ∀v ∈ C. See Section 3.1 for details. end for Return the best assignment x(3) found. end if

assume it succeeds. Let xa denote the additive-error soluǫ tion. We know Obj(xa ) ≤ Obj(x∗ )+ 1+ǫ P and Obj(xa ) ≥ P `n´ 2 ǫ )= where P = k δ /(72k). Therefore Obj(x∗ ) ≥ P (1 − 1+ǫ a ∗ ∗ ǫ P and hence Obj(x ) ≤ Obj(x ) + (1 + ǫ)Obj(x ) = 1+ǫ 1+ǫ (1 + ǫ)Obj(x∗ ). Therefore if the additive approximation is returned it is a 1 + ǫ-approximation. The remainder of this section considers the case when Algorithm 1 takes the“else” branch. Define`γ ´so that Obj(x∗ ) = `n´ γ k . We have Obj(x∗ ) ≤ Obj(xa ) < nk δ 2 /(72k) so γ ≤ ∗ ˆ∗ δ 2 /(72k). We analyze the S x where we guess x , that is when xˆ∗v = x∗v for all v ∈ si=1 Si . Clearly the overall cost is at most the cost of x(3) during the iteration when we guess correctly. Lemma 8. b(x∗ , v, x∗v ) ≤ b(x∗ , v, j) for all j ∈ D. Proof. Immediate from definition of b and optimality of x∗ . Lemma 9. For any assignment x, 1 X Obj(x) = b(x, v, xv ) k v∈V

Proof. By definitions, X X pI (x). pI (Rvxv (x)) = b(x, v, xv ) = I∈(V :v∈I I∈(V :v∈I k) k) ˆP ˜ P P 1 Write Obj(x) = pI (x) = V pI (x) v∈I k I∈(V I∈ ) ( ) k k and reorder summations. Definition 10. We say variable v in assignment x is corrupted if xv 6= x∗v . 11. Variable v is clear if (x∗ , v, x∗v ) < b(x∗ , v, j)− `Definition ´ for all j 6= x∗v . A variable is unclear if it is not clear.

n δ 3 k−1

Clearness is the analysis analog of the algorithmic notion of clear-cut vertices sketched in Section 2.1. Comparing the definition of clearness to Lemma 8 further motivates the terminology “clear.” Lemma 12. The number of unclear variables t satisfies δn . 24k Proof. Let v be unclear and choose j 6= x∗v minimizing b(x`∗ , v,´j). By unclearness, b(x∗ , v, x∗v ) ≥ b(x∗ , v, j) − n (1/3)δ k−1 . By fragile-dense, b(x∗ , v, x∗v ) + b(x∗ , v, j) ≥ ` n ´ δ k−1 . Adding these inequalities we see ! ! 1 1 − 1/3 n n = δ (1) δ b(x∗ , v, x∗v ) ≥ 2 3 k−1 k−1 t ≤ 3(n − k + 1)γ/δ ≤

By Lemma 9 and (1), ! X n b(x∗ , u, x∗u ) = 1/k OP T = γ k u ! ! X δ n n δ = t. ≥ 1/k 3 k−1 3k k − 1 u:unclear ` ´ Therefore t ≤ γ nk δ 3kn = 3γ (n − k + 1). δ (k−1) δ2 δn = 24k . For the second bound observe 3nγ/δ ≤ 3n δ 72k Lemma 13. The probability of a fixed clear variable v beδ . ing corrupted in x(1) is bounded above by 240k Proof. First we show that ˆb(v, i) is in fact an unbiased estimator of b(x∗ , v, i) for all i. By definitions and in particular by the assumption that pI = 0 when |I| < k, we have for any 1 ≤ j ≤ s: X ˆ ˜ 1 E pSj ∪{v} (Rvi (x∗ )) = ` n ´ pJ ∪{v} (Rvi (x∗ )) k−1

=

=

V J ∈(k−1 )

1

n k−1

`

1

´

X

pI (Rvi (x∗ ))

I∈(V :v∈I k)

´ bvi (x∗ )

n k−1

`

Therefore ` n ´ h i ˆ ˜ E ˆb(v, i) = s k−1 E pS1 ∪{v} (Rvi (x∗ )) = b(x∗ , v, i). s Recall that 0 ≤ pI (x) ≤ 1 by definition of p, so by AzumaHoeffding, ˛ – »˛ ˛ ˛P Pr ˛˛ sj=1 pSj ∪{v} (Rvi (x∗ )) − ns b(x∗ , v, i)˛˛ ≥ λs (k−1) ≤ 2e

−2λ2 s

for any λ > 0 hence " Pr |ˆb(v, i) − b(x∗ , v, i)| ≥ λ

!# n ≤ k−1

2e−2λ

2

s

, yielding. Choose λ = δ/6 and recall s = 18 log(480|D|k/δ) δ2 !# " δ n δ ≤ Pr |ˆb(v, i) − b(x∗ , v, i)| ≥ 6 k−1 240|D|k

` n ´ By clearness we have b(x∗ , v, j) > b(x∗ , v, x∗v ) + δ k−1 /3 ∗ ∗ ˆ for all j 6= xv . Therefore, the probability that b(v, xv ) is not the smallest ˆb(v, j) is bounded by |D| times the probability that a particular ˆb(v, j)h differs from i its mean by at ` n ´ (1) δ = least δ k−1 /6. Therefore Pr xv 6= x∗v ≤ |D| 240|D|k δ . 240k

Let E1 denote the event that the assignment x(1) has at most δn/12k corrupted variables. Lemma 14. Event E1 occurs with probability at least 1 − 1/10. Proof. We consider the corrupted clear and unclear variables separately. By Lemma 12, the number of unclear variables, and hence the number of corrupted unclear variables, δn is bounded by 24k . The expected number of clear corrupted variables can be δn using Lemma 13, so by Markov bound the bounded by 240k δn with number of clear corrupted variables is less than 24k probability at least 1 - 1/10. Therefore the total number of corrupted variables is bounded δn δn δn by 24k + 24k = 12k with probability at least 9/10. We henceforth assume E1 occurs. The remainder of the analysis is deterministic. Lemma 15. For assignments y and y ′ that differ in the assignment of at most t variables, ` nfor´ all variables v and values i, |b(y, v, i) − b(y ′ , v, i)| ≤ t k−2 .

Proof. Clearly pI (Rvi (y)) is a function only of the variables in I excluding v, so if I − {v} consists of variables u where yu = yu′ , then pI (Rvi (y))−pI (Rvi (y ′ )) = `0. ´Therefore b(y, v, i) − b(y ′ , v, i) equals the sum, over I ∈ Vk containing v and at least one variable u other than v where yu 6= yu′ , of [pI (Rvi (y)) − pI (Rvi (y ′ ))]. For any I, |pI (Rvi (y)) − pI (Rvi (y ′ ))| ≤ 1, so by the triangle inequality a bound on the number of such sets suffices to bound |b(y, v, i) − b(y ′ , v, i)|. The of such sets can trivially be bounded above by ` n number ´ t k−2 . (2)

Lemma 16. Let C = {v ∈ V : b(x(1) , v, xv ) < b(x(1) , v, j)− ´ (2) n δ k−1 /6 for all j 6= xv } as defined in Algorithm 1. If E1 then: `

(2)

1. xv = x∗v for all v ∈ C. 2. |V \ C| ≤

3nγ . δ

Proof. Assume E1 occurred. Using the definitions of corrupted and event E1 together with Lemma 15 we have for any v, i (for sufficiently large n so that n−k+1 ≥ nk ): k−1 ! n δn (1) ∗ |b(x , v, i) − b(x , v, i)| ≤ 12k k − 2 ! n δ . (2) ≤ 12 k − 1 To prove the first statement of the Lemma, let v be a clear-cut variable, i.e. v ∈ C. Apply (2), yielding:

b(x∗ , v, x(2) v ) ≤ < ≤

! n δ b(x + 12 k − 1 ! ! δ n n δ (1) + b(x , v, j) − 6 k−1 12 k − 1 ! „ « δ δ n ∗ b(x , v, j) + − + 2 = b(x∗ , v, j). 6 12 k−1 (1)

, v, x(2) v )

(2)

So by Lemma 8, x∗v = xv . To prove the second statement, let u be clear and use (2) again: ! n δ (1) ∗ ∗ ∗ b(x , u, xu ) ≤ b(x , u, xu ) + 12 k − 1 ! ! δ n n δ ∗ < b(x , u, j) − + 3 k−1 12 k − 1 ! « „ δ n δ ≤ b(x(1) , u, j) + − + 2 3 12 k−1 ! δ n = b(x(1) , u, j) − 6 k−1 so u ∈ C by definition of C. Therefore the second statement follows from bound on the number of unclear variables from Lemma 12.

3.1 Computation of x(3) This subsection has been overhauled since the proceedings version. The previous reduction had excessive runtime. Now we give the details of the computation of x(3) . Let T = V \ C. We call C the clear-cut vertices and T the tricky vertices. If |T | ≥ n/2 the desired approximation factor can be achieved using a direct application of any additive error algorithm so suppose |T | < n/2.2 It is natural to substitute the values of the clear-cut variables into the constraints, but doing these leaves constraints with arity less than k. To remedy this we create placeholder variables as follows. We divide C into |T | sets C1 , C2 , . . . , C|T | all of size at most 2|C|/|T |. We create new placeholder variables P = {ν1 , . . . , ν|T | } to represent the variables in C1 , C2 , . . . , C|T | respectively. We create a new CSP over variables T ∪ P by replacing each clear-cut variable in C with its representative in P . Constraints with insufficient arity are padded with arbitrary placeholder variables from P . Constraints that formerly depended only on variables in C can sefely be discarded. Formally, for any H ⊆ C let µ(H) denote the variables representing H, i.e. Sµ(H) is an arbitrary subset of P of size |H| satisfying H ⊆ νi ∈µ(H) Ci . For any assignment y of T ∪ P , M ⊆ T and L ⊆ P with |M | + |L| = k, define X pM ∪H (RT y (x(2) )) qM,L (y) = 11 (|L| ≤ k − 1) C H∈(|L|):µ(H)=L The following Lemma follows easily from the definitions: 2

Alternatively Lemma 16 implies |T | ≤ δn/(24k) ≤ n/2

Lemma 17. For any y ∈ D2|T | we have X X Obj(RT y (x(2) )) = qK (y) + pI (x(2) ) T

C

K∈( k )

I∈( k )

Lemma 18. 0 ≤ qK (y) ≤ O

„

|C| |T |

«k−1 !

Proof. Let M = K ∩ T and L = K ∩ P . If |L| = k the Lemma is trivial so the interesting case is |L| ≤ k − 1. Recall that ρ(H) = L implies |H| = |L|S≤ k − 1, so any H contributing to the sum is a subset of νi ∈L Ci , which has size at most 2|L||C|/|T |. Therefore « number of such H „“ ” the k−1 `2|L||C|/|T |´ |C| is at most . =O k−1 |T |

After using Lemma 18 to normalize the weights, Theorem 7 with an error parameter of ǫ′ = Θ(ǫ) yields an additive k−1 error of O(ǫ|T |k (|C|/|T ) = O(ǫ(|T |/n)nk ) for the probP |) lem of minimizing K∈(T ∪P ) qK (y). By Lemma 17 this is k

also an additive error O(ǫ(|T |/n)nk ) for Obj(RT y (x(2) )) as stated in the algorithm. Using Lemma 16 we further bound the additive error O(ǫ(|T |/n)nk ) by O(ǫγnk ). Lemma 16 implies that x∗ = RT y (x(2) ) for some y, so this yields an additive error O(ǫγnk ) = ǫOP T for our original problem of minimizing Obj(x) over all assignments x.

3.2 Runtime In this subsection we heavily exploit our choice to hide δ, |D| and k within O(·). Note s = O(1), hence the number Ss of iterations of the loop is |D|| j=1 Sj | ≤ |D|(k−1)s = O(1), which we may safely ignore when analyzing runtime. The two calls to the additive error algorithm take, by Theo2 rem 7, O(nk ) + 2O(1/ǫ ) time. Observe that b(x, v, i) can be easily computed in time O(nk−1 ) directly from its definition, hence b(x(1) , v, i) can be computed for all all v, i in |D|nO(nk−1 ) = O(nk ) time. Computing the cost of a particular assignment can also be done in O(nk ) time. It is easy to see that the tasks not specifically mentioned take O(n) 2 time, so the overall runtime is O(nk ) + 2O(1/ǫ ) as claimed.

|C1|-1

C1

C2 C3

C1

C2

C3

f (parent(u), parent(v))|. Hierarchical clustering with d clusters is the same except that we restrict the number of clusters (recall that equals the number of nodes whose children are leaves) to at most d. It is easy to see that the special case of hierarchical clustering with M = 1 is equivalent to the correlation clustering problem described in the introduction. Lemma 19. The number of possible trunks is at most d(M −1)d .

Figure 2: An illustration of correlation clustering and the rigidity property. Only “+” edges are shown.

Proof. The trunk can be specified by giving the parent of all non-root nodes. There are at most d nodes on each of the M − 1 non-root levels so the lemma follows.

4.

We now show how to reduce hierarchical clustering with a constant number of clusters to the solution of a constant number of min-2CSPs with cardinality of D equal to d. We use a variant of the notation used in Sections 2 and 3 that is specialized for MIN-2CSPs. For vertices u, v and values i, j, let pu,v (i, j) be the cost of putting u in cluster i and v in cluster j. This is the same concept as pI for the fragile case, but this notation is more convenient here. Define b(x, v, i) = P u∈V : u6=v pu,v (xu , i), which is identical to b of the fragiledense analysis but expressed using different notation.

CLUSTERING ALGORITHM

4.1 Intuition As we noted previously in Section 1, correlation clustering constraints are not fragile for d > 2. Indeed, the constraint corresponding to a pair of vertices that are connected by a “−” edge can be satisfied by any coloring of the endpoints as long as the endpoints are colored differently. Fortunately there is a key observation in [17] that allows for the construction of a PTAS. Consider the cost-zero clustering shown on the left of Figure 2. Note that moving a vertex from a small cluster to another small one increases the cost very little, but moving a vertex from a large cluster to anywhere else increases the cost a lot. Fortunately most vertices are in big clusters so, as in [17], we can postpone processing the vertices in small clusters. We use the above ideas, which are due to [17], the fragile-dense ideas sketched above, plus some additional ideas, to analyze our correlation clustering algorithm. To handle hierarchical clustering (c.f. [1]) we need a few more ideas. Firstly we abstract the arguments of the previous paragraph to a CSP property rigidity. Secondly, we note that the number of trees with d leaves is a constant and therefore we can safely try them all. We remark that all fragile-dense problems are also rigid.

4.2 Reduction to Rigid MIN-2CSP We now define hierarchical clustering formally (following [1]). For integer M ≥ 1, an M -level hierarchical clustering of n objects V is a rooted tree with the elements of V as the leaves and every leaf at depth (distance to root) exactly M + 1. For M = 1, a hierarchical clustering has one node at the root, some “cluster” nodes in the middle level and all of X in the bottom level. The nodes in the middle level can be identified with clusters of V . We call the subtree induced by the internal nodes of a M -level hierarchical clustering the trunk. We call the leaves of the trunk clusters. A hierarchical clustering is completely specified by its trunk and the parent cluster of each leaf. For a fixed hierarchical clustering and clusters i and j, let f (i, j) be the distance from i (or j) to the least common ancestor of i and j. For example when M = 1, f (i, j) = 11 (i = j). We are given a function F from pairs of vertices to {0, 1, . . . , M }.3 The objective of hierarchical clustering output a P is to 1 M -level hierarchical clustering minimizing u,v M |F (u, v)− 3 [1] chose {1, 2, . . . , M + 1} instead; the difference is merely notational.

Definition 20. A MIN-2CSP is rigid if for some δ > 0, all v ∈ V and all j 6= x∗v b(x∗ , v, x∗v ) + b(x∗ , v, j) ≥ δ|{u ∈ V \ {v} : x∗u = x∗v }| Observe that |{u ∈ V \ {v} : x∗u = x∗v }| ≤ |V | = hence any fragile-dense CSP is also rigid.

` |V | ´ 2−1

Lemma 21. If the trunk is fixed, hierarchical clustering can be expressed as a 1/M -rigid MIN-2CSP with |D| = d. Proof. (C.f. Figure 2) Let D be the leaves of the trunk (clusters). It is easy to see that choosing 1 |f (i, j) − F (u, v)| M yields the correct objective function. To show rigidity, fix vertex v, define i = x∗v and Ci = {u ∈ V : x∗u = i}. Fix j 6= i and u ∈ Ci \{v}. Clearly |f (i, i)−f (i, j)| ≥ 1, hence by triangle inequality |F (u, v) − f (i, i)| + |F (u, v) − f (i, j)| ≥ 1, hence pu,v (i, i)+pu,v (i, j) ≥ 1/M . Summing over u ∈ Ci \{v} we see 1 b(x∗ , v, x∗v ) + b(x∗ , v, j) ≥ |Ci \ {v}| M = δ |{u ∈ V \ {v} : x∗u = x∗v }| pu,v (i, j) =

for δ = 1/M . Lemmas 21 and 19 suggest a technique for solving hierarchical clustering: guess the trunk and then solve the rigid MIN-2CSP. We now give our algorithm for solving rigid MIN-2CSPs.

4.3 Algorithm for Rigid MIN-2CSP Algorithm 2 solves rigid MIN-2CSPs by identifying clearcut variables, fixing them to optimal values, and then recursing on the remaining “tricky” variables T . The recursion terminates when the remaining subproblem is sufficiently expensive for an additive approximation to suffice.

Algorithm 2 Approximation Algorithm for Rigid MIN2CSPs. Return CC(V , blank assignment, 0) CC(tricky variables T , assignment y of V \T , recursion depth depth): ǫ 1+ǫ

δ 3 |T |2 6·722 |D|3

· + 1: Find assignment of cost at most minx:xv =yv ∀v∈V \T [Obj(x)] using an additive approximation algorithm. δ 3 |T |2 2: if Obj(answer) ≥ 6·72 2 |D|3 or depth ≥ |D| + 1 then 3: Return answer. 4: else 2 4 3 /δ) 5: Let s = 432 |D| log(1440|D| 2δ 4 6: Draw v1 , v2 , . . . , vs randomly from T with replacement. 7: for Each assignment xˆ∗ of the variables {v1 , v2 , . . . , vs } do 8: For all v ∈ T and i let P ˆb(v, i) = |T | Ps pv ,v (ˆ x∗vj , i) + u∈V \T pu,v (yu , i) j j=1 s 9: For all  v ∈ V let yv If v ∈ V \ T (1) xv = arg mini ˆb(v, i) Otherwise (2) 10: For all v ∈ T let xv = arg mini b(x(1) , v, i) (2) 11: Let C = {v ∈ T : b(x(1) , v, xv ) < b(x(1) , v, j) − (2) δ|T | for all j 6= xv }. 12|D| 12: Let T ′ = T \ C ′ 13: Define 8assignment y by If v ∈ V \ T < yv . yv′ = x(2) If v ∈ C v : Undefined If v ∈ T \ C 14: If CC(T ′ , y ′ , depth + 1) is the best clustering so far, update best. 15: end for 16: Return the best clustering found. 17: end if

5.

ANALYSIS OF ALGORITHM 2

5.1 Runtime Theorem 22. For any T, y, an assignment of cost at most ǫ′ |T |2 + minx:xv =yv ∀v∈V \T [Obj(x)] can be found in time ′2 n2 2O((log |D|)/ǫ ) . Proof. The problem is essentially a CSP on T variables but with an additional linear cost term for each variable. It is fairly easy to see that Algorithm 1 in [20] has error proportional to the misestimation of b and hence is unaffected by arbitrarily large linear cost terms. On the other hand, the more efficient Algorithm 2 in [20] needs to estimate the objective value from a constant-sized sample as well and hence does not seem to work for this type of problem. In this subsection O(·) hides only absolute constants and ˜ O(·) hides one factor of log |D|. Algorithm 2 has recursion depth at most |D|+1 and branching factor |D|s , so the number of recursive calls is at most (|D|s )|D|+1 = 2s(|D|+1) log |D| = 5 4 ˜ 2O(|D| /δ ) . Each call spends O(|D|n2 ) time on miscellaneous tasks such as computing the objective value plus time required to run the additive error algorithm, which is n2 ·

˜

2O(|D|

6

/ǫ2 δ 6 )

by Theorem 22. Therefore the runtime of Algo6

rithm 2 is

˜ |D| ) n2 2O( ǫ2 δ6 ,

˜

where the 2O(|D|

5

/δ 4 )

from the size of 6

˜ |D| ) O( 2 6

the recursion tree got absorbed into the 2 ǫ δ from Theorem 22. For hierarchical clustering, δ = 1/M yields a run6 6 ˜ |D| M ) O(

O(

(log |D|)|D|6 M 6

)

ǫ2 ǫ2 · |D|(M −1)|D| = n2 2 . time of n2 2 As noted„in the introduction this improves on the run«

O

9|D|

ǫ2 time of n of [17] for correlation clustering in two ways: the degree of the polynomial is independent of ǫ and |D|, and the dependence on |D| is singly rather than doubly exponential.

5.2 Approximation We fix optimal assignment x∗ . We analyze the path through the recursion tree where we always guess x ˆ∗ correctly, i.e. ∗ ∗ x ˆv = xv for all v ∈ {v1 , v2 , . . . , vs }. We call this the principal path. We will need the following definitions. Definition 23. Variable v is m-clear if b(x∗ , v, x∗v ) < b(x∗ , v, j) − m for all j 6= x∗v . We say a variable is clear if it is m-clear for m obvious from context. A variable is unclear if it is not clear. Definition 24. A variable is obvious if it is in cluster C in OPT and it is δ|C|/3-clear. Definition 25. A cluster C of OPT is finished w.r.t. T if T ∩ C contains no obvious variables. Lemma 26. With probability at least 8/10, for any triple (T, y, depth) encountered on the principle path, 1. yv = x∗v for all v ∈ V \ T and 2. The number of finished clusters w.r.t. T is at least depth. Before proving Lemma 26 let us see why it implies Algorithm 2 has the correct approximation factor. Proof. Study the final call on the principal path, which returns the additive approximation clustering. The second part of Lemma 26 implies that depth ≤ |D|, hence we must δ 3 |T |2 have terminated because Obj(answer) ≥ 6·72 2 |D|3 . By the first part of Lemma 26 the additive approximation gives error at most ǫ δ 3 |T |2 · + OP T. 1 + ǫ 6 · 722 |D|3 so the approximation factor follows from an easy calculation. Now we prove Lemma 26 by induction. Our base case is the root, which vacuously satisfies the inductive hypothesis since V \ T = {} and depth = 0. We show that if a node (T, y, depth) (in the recursion tree) satisfies the invariant then its child (T ′ , y ′ , depth + 1) does as well. We hereafter analyze a particular (T, y, depth) and assume the inductive hypothesis holds for them. There is only something to prove if a child exists, so we hereafter assume the additive error answer is not returned from this node and hence OP T ≤ Obj(answer)
b(x∗ , v, x∗v ) + 216|D| 2 for ∗ ˆ all j 6= xv . Therefore, the probability that b(v, xv ) is not the smallest ˆb(v, j) is bounded by |D| times the probability that a particular ˆb(v, j) differs h from itsi mean by at least (1) δ δ 2 |T |/(432|D|2 ). Therefore Pr xv 6= x∗v ≤ |D| 720|D| 3 =

Proof. We say a variable v is confusing’ if it is nonobvious and its cluster in OPT has size at least |T |/2|D|. By Lemma 28 and (3) |{v ∈ T : v confusing’}|

δ . 720|D|2

Therefore, by Markov bound, with probability 1 − 1/(10|D|) the number of corrupted δ 2 |T |/(216|D|2 )-clear variables is at most δ|T |/(72|D|).

There are two types of bad events: the additive error algorithm failing and our own random samples failing. We choose constants so that each of these events has probability at most 1/(10|D|). The principle path in the recursion tree has length at most |D|, so the overall probability of a bad event is at most 2/10. We hereafter assume no bad events occur. Lemma 28. The number of δc/3-unclear variables in clusT . ters of size at least c + 1 is at most 6OP δc

Adding these inequalities we see b(x∗ , v, x∗v ) ≥ δc/3. Therefore X 1X 1 OP T = b(x∗ , v, x∗v ) ≥ δc/3 2 v 2 v confusing =

|{v ∈ T : v confusing}| δc/6

so |{v ∈ T : v confusing}| ≤

6OP T δc

.

Lemma 29. For all v, i, |b(x(1) , v, i) − b(x∗ , v, i)| ≤

δ|T | 24|D|

Proof. First we show bounds on three classes of corrupted variables: 1. Lemma 27 bounds the number of rupted variables by

δ|T | . 72|D|

δ 2 |T | -clear 216|D|2

cor-

≤

12|D| δ 3 |T |2 < |T |/2. δ|T | 6 · 722 |D|3

Lemma 31. The number of finished clusters w.r.t. T ′ strictly exceeds the number of finished clusters w.r.t. T . Proof. Let v be the variable promised by Lemma 30 and Ci its cluster in OPT. For any obvious variable u in Ci note that u is δ|Ci |/3 ≥ δ|T |/6|D|-clear, so Lemma 29 implies b(x(1) , u, i) ≤ ≤

b(x∗ , v, x∗v ) ≥ b(x∗ , v, j) − δc/3 b(x∗ , v, x∗v ) + b(x∗ , v, j) ≥ δ |C \ {v}| ≥ δc.

12|D| OP T δ|T |

Simple counting shows there are at most |T |/2 variables of T in clusters of size less than |T |/2|D|, hence there must be an obvious variable in a big cluster.

Proof. Let confusing variable refer to a δc/3-unclear variable in a cluster of size at least c + 1. Let v be such a variable, in cluster C in OPT. By unclearness, for appropriate j 6= x∗v and by rigidity

≤

δ|T | δ|T | δ|T | < b(x∗ , u, j) + − 24|D| 24|D| 6|D| δ|T | δ|T | δ|T | b(x(1) , u, j) + 2 − = b(x(1) , u, j) − 24|D| 6|D| 12|D|

b(x∗ , u, i) +

hence u is in the set C of clear variables defined in Algorithm 2. Therefore, no obvious variables in Ci are in T ′ so Ci is finished w.r.t. T ′ . The existence of v implies Ci is not finished w.r.t. T , so Ci is newly finished. To complete the proof note that T ′ ⊆ T so finished is a monotonic property.

yv′

Lemma 32. (T ′ , y ′ ) satisfy the invariant v ∈ V \ T ′ → = x∗v .

Proof. Fix v ∈ V \ T ′ . If v ∈ T the conclusion follows from the inductive hypothesis. If v ∈ T \ T ′ = C we need to show yv′ = x∗v . 4

To simplify the exposition we assume that

ger. Sketch of general case: if

δ|T | 72|D|2

δ|T | 72|D|2

is an inte-

is large fiddle with the

δ|T | constants and if 72|D| 2 is small T is small so one can afford to guess all possible assignments of the variables in T .

Let i = yv′ . For any j 6= i, use Lemma 29 to obtain b(x∗ , v, i)

δ|T | 24|D| δ|T | δ|T | < b(x(1) , v, j) + − 24|D| 12|D| δ|T | δ|T | − = b(x∗ , v, j) ≤ b(x∗ , v, j) + 2 24|D| 12|D| ≤ b(x(1) , v, i) +

so by optimality of x∗ we have the Lemma. Lemmas 31 and 32 complete the inductive proof of Lemma 26.

Acknowledgements We would like to thank Claire Mathieu and Joel Spencer for raising a question on approximability status of the GaleBerlekamp game and Alex Samorodintsky for interesting discussions.

6.

REFERENCES

[1] N. Ailon and M. Charikar. Fitting tree metrics: Hierarchical clustering and phylogeny. In Proc. 46th IEEE FOCS, pages 73–82, 2005. [2] N. Alon, W. Fernandez de la Vega, R. Kannan, and M. Karpinski. Random Sampling and Approximation of MAX-CSP Problems. In Proc. 34th ACM STOC, pages 232–239, 2002. Journal version in J. Comput. System Sciences, 67(2):212–243, 2003. [3] N. Alon, R. Panigrahy, and S. Yekhanin. Deterministic Approximation Algorithms for the Nearest Codeword Problem. Technical report, Elec. Coll. on Comp. Compl., ECCC TR08-065, 2008. [4] S. Arora, L. Babai, J. Stern, and Z. Sweedyk. The Hardness of Approximate Optima in Lattices, Codes, and Systems of Linear Equations. In Proc. 34th IEEE FOCS, pages 724–733, Nov 1993. Journal version in J. Comput. System Sciences, 54(2):317–331, 1997. [5] S. Arora, A. Frieze, and H. Kaplan. A New Rounding Procedure for the Assignment Problem with Applications to Dense Graph Arrangement Problems. In Proc. 37th IEEE FOCS, pages 21–30, Oct 1996. Journal version in Mathematical Programming, 92(1):1–36, 2002. [6] S. Arora, D. Karger, and M. Karpinski. Polynomial Time Approximation Schemes for Dense Instances of NP-Hard Problems. In Proc. 27th ACM STOC, pages 284–293, 1995. Journal version in J. Comput. System Sciences, 58(1):193–210, 1999. [7] C. Bazgan, W. Fernandez de la Vega, and M. Karpinski. Polynomial Time Approximation Schemes for Dense Instances of Minimum Constraint Satisfaction. Random Structures and Algorithms, 23(1):73–91, 2003.

[8] P. Berman and M. Karpinski. Approximating Minimum Unsatisfiability of Linear Equations. In Proc. 13th ACM-SIAM SODA, pages 514–516, 2002. [9] P. Berman and M. Karpinski. Approximation Hardness of Bounded Degree MIN-CSP and MIN-BISECTION. In Proc. 29th ICALP, LNCS 2380, pages 623–632. Springer, 2002. [10] J. Carlson and D. Stolarski. The Correct Solution to Berlekamp’s Switching Game. Discrete Mathematics, 287(1–3):145–150, 2004. [11] I. Dinur, G. Kindler, R. Raz, and S. Safra. Approximating CVP to Within Almost-Polynomial Factors is NP-Hard. Combinatorica, 23(2):205–243, 2003. [12] U. Feige and L. Lovasz. Two prover one round proof systems: Their power and their problems. In Proc. 24th STOC, pages 733–741, 1992. [13] W. Fernandez de la Vega. MAX-CUT has a Randomized Approximation Scheme in Dense Graphs. Random Struct. Algorithms, 8(3):187–198, 1996. [14] W. Fernandez de la Vega, R. Kannan, and M. Karpinski. Approximation of Global MAX–CSP Problems. Technical Report TR06-124, Electronic Colloquim on Computation Complexity, 2006. [15] P. C. Fishburn and N. J. Sloane. The Solution to Berlekamp’s Switching Game. Discrete Math., 74(3):263–290, 1989. [16] A. M. Frieze and R. Kannan. Quick Approximation to Matrices and Applications. Combinatorica, 19(2):175–220, 1999. [17] I. Giotis and V. Guruswami. Correlation clustering with a fixed number of clusters. Theory of Computing, 2(1):249–266, 2006. [18] A. Gupta and K. Talvar. Approximating unique games. In Proc. 17th ACM-SIAM SODA, pages 99–106, 2006. [19] S. V. Lokam. Spectral Methods for Matrix Rigidity with Applications to Size-Depth Tradeoffs and Communication Complexity. In Proc. 36th IEEE FOCS, pages 6–15, 1995. [20] C. Mathieu and W. Schudy. Yet Another Algorithm for Dense Max Cut: Go Greedy. In Proc. 19th ACM-SIAM SODA, pages 176–182, 2008. [21] R. Roth and K. Viswanathan. On the Hardness of Decoding the Gale-Berlekamp Code. IEEE Transactions on Information Theory, 54(3):1050–1060, March 2008. [22] M. Rudelson and R. Vershynin. Sampling from large matrices: An approach through geometric functional analysis. J. ACM, 54(4):21, 2007. [23] J. Spencer. Ten Lectures on the Probabilistic Method. SIAM, Regional Conference Series, second edition, 1994.