Story
Hardness of High-Dimensional Linear Classification
Key takeaway
Researchers found that classifying high-dimensional data with linear models is fundamentally difficult, setting limits on how accurate these models can be for real-world problems like image recognition.
Quick Explainer
The core idea of this work is to establish computational hardness of the fundamental problem of high-dimensional linear classification. The authors show that finding the optimal hyperplane to separate two sets of points (the "red" and "blue" points) is an exponentially hard problem, even when allowing for approximate solutions. This hardness result stems from reductions to two well-known hard problems in computational geometry - the k-Sum problem and the Affine Degeneracy problem. These reductions demonstrate that any efficient algorithm for linear classification could also be used to solve these hard problems, implying that linear classification itself is computationally intractable in high dimensions. This suggests that further advances in high-dimensional machine learning may require fundamentally different approaches beyond the standard linear classification framework.
Deep Dive
Technical Deep Dive: Hardness of High-Dimensional Linear Classification
Overview
This work establishes new exponential lower bounds for the Maximum Halfspace Discrepancy problem, which models the fundamental problem of linear classification in high dimensions. The authors show that both the exact and approximate versions of this problem are computationally hard, conditionally on widely-believed hardness conjectures.
Problem & Context
Linear classification is a fundamental task in machine learning, where the goal is to find a hyperplane that best separates two sets of points (the "red" and "blue" points) with minimal mislabeling. The authors formulate this as the Maximum Halfspace Discrepancy problem:
- MaxHalfspace: Given two point sets R and B in ℝ^d, find the hyperplane h that maximizes the difference between the number of red points in h and the number of blue points in h.
- ε-MaxHalfspace: Given two point sets R and B in ℝ^d, find a hyperplane h such that the difference between the number of red points in h^* (the optimal hyperplane) and the number of red points in h is at most εn, where n = |R ∪ B|.
While efficient algorithms are known for the low-dimensional case, the authors show that in high dimensions, these problems are computationally hard.
Methodology
The authors prove hardness results for the MaxHalfspace and ε-MaxHalfspace problems by reducing from two well-known hard problems in computational geometry:
- k-Sum: Given n integers, decide if there exists a subset of k that sum to 0.
- AffineDegeneracy: Given n points in ℝ^d, decide if there exists a set of d+1 points that lie on a common affine hyperplane.
They show that any algorithm that solves MaxHalfspace or ε-MaxHalfspace efficiently can also be used to solve these hard problems efficiently, implying that the classification problems are themselves hard.
The key ideas in the reductions are:
- For k-Sum, they construct points in ℝ^(k-1) that encode the input integers, such that a k-Sum solution corresponds to a hyperplane that separates the red and blue points.
- For AffineDegeneracy, they simply shift the input points slightly along one dimension, such that a set of d+1 points on a common hyperplane corresponds to a hyperplane that separates at least d+1 more red points than blue points (or vice versa).
Results
The authors prove the following conditional hardness results:
k-Sum Hardness:
- Any algorithm that solves MaxHalfspace in time O(n^(⌈(d+1)/2⌉-c)) also solves k-Sum in time O(n^(⌈k/2⌉-c)).
- Any algorithm that solves ε-MaxHalfspace in time O((1/(2(d+1)ε))^(⌈(d+1)/2⌉-c)) also solves k-Sum in time O((1/ε)^(⌈k/2⌉-c)).
Affine Degeneracy Hardness:
- Assuming there is no O(n^(d-c)) algorithm for AffineDegeneracy, there is no O(n^(d-c)) algorithm for MaxHalfspace or O((1/ε)^(d-c)) algorithm for ε-MaxHalfspace.
- Any algorithm that uses only "sidedness queries" (i.e., tests if a point lies above, below, or on a hyperplane) requires Ω(n^d) queries for MaxHalfspace and Ω(1/ε^d) queries for ε-MaxHalfspace.
Interpretation
These results show that the linear classification problem is computationally hard in high dimensions, even when allowing for approximate solutions. The authors argue that this stands in contrast to the many "no-dimensional" results in computational geometry and machine learning, which show that some high-dimensional problems can be solved efficiently.
The hardness results imply that exponential dependence on the dimension may be unavoidable for linear classification, even with approximation. This suggests that further advances in high-dimensional machine learning may need to explore fundamentally different approaches beyond the standard linear classification framework.
Limitations & Uncertainties
- The reductions only hold in the real RAM model of computation. The AffineDegeneracy hardness result for the Turing model requires the dimension d to be constant.
- The hardness results are conditional on widely-believed conjectures about the k-Sum and AffineDegeneracy problems. While these conjectures are widely accepted, they have not been proven.
- The paper does not consider other notions of linear classification beyond the MaxHalfspace and ε-MaxHalfspace formulations, such as ones that use convex loss functions. The hardness of these variants remains an open question.
What Comes Next
The authors highlight that their work helps identify when high-dimensional geometric problems relevant for data analysis can, and cannot, be solved efficiently. Future work could explore:
- Unconditional hardness results for linear classification beyond the MaxHalfspace and ε-MaxHalfspace formulations.
- Identifying alternative approaches to high-dimensional linear classification that do not suffer from the exponential dependence on dimension.
- Understanding the practical implications of these hardness results for modern machine learning applications.