- Mixture of Experts Soften the Curse of Dimensionality in Operator Learning In this paper, we construct a mixture of neural operators (MoNOs) between function spaces whose complexity is distributed over a network of expert neural operators (NOs), with each NO satisfying parameter scaling restrictions. Our main result is a distributed universal approximation theorem guaranteeing that any Lipschitz non-linear operator between L^2([0,1]^d) spaces can be approximated uniformly over the Sobolev unit ball therein, to any given varepsilon>0 accuracy, by an MoNO while satisfying the constraint that: each expert NO has a depth, width, and rank of O(varepsilon^{-1}). Naturally, our result implies that the required number of experts must be large, however, each NO is guaranteed to be small enough to be loadable into the active memory of most computers for reasonable accuracies varepsilon. During our analysis, we also obtain new quantitative expression rates for classical NOs approximating uniformly continuous non-linear operators uniformly on compact subsets of L^2([0,1]^d). 5 authors · Apr 13, 2024
- On Coresets for Clustering in Small Dimensional Euclidean Spaces We consider the problem of constructing small coresets for k-Median in Euclidean spaces. Given a large set of data points Psubset R^d, a coreset is a much smaller set Ssubset R^d, so that the k-Median costs of any k centers w.r.t. P and S are close. Existing literature mainly focuses on the high-dimension case and there has been great success in obtaining dimension-independent bounds, whereas the case for small d is largely unexplored. Considering many applications of Euclidean clustering algorithms are in small dimensions and the lack of systematic studies in the current literature, this paper investigates coresets for k-Median in small dimensions. For small d, a natural question is whether existing near-optimal dimension-independent bounds can be significantly improved. We provide affirmative answers to this question for a range of parameters. Moreover, new lower bound results are also proved, which are the highest for small d. In particular, we completely settle the coreset size bound for 1-d k-Median (up to log factors). Interestingly, our results imply a strong separation between 1-d 1-Median and 1-d 2-Median. As far as we know, this is the first such separation between k=1 and k=2 in any dimension. 4 authors · Feb 27, 2023
- Classifying Clustering Schemes Many clustering schemes are defined by optimizing an objective function defined on the partitions of the underlying set of a finite metric space. In this paper, we construct a framework for studying what happens when we instead impose various structural conditions on the clustering schemes, under the general heading of functoriality. Functoriality refers to the idea that one should be able to compare the results of clustering algorithms as one varies the data set, for example by adding points or by applying functions to it. We show that within this framework, one can prove a theorems analogous to one of J. Kleinberg, in which for example one obtains an existence and uniqueness theorem instead of a non-existence result. We obtain a full classification of all clustering schemes satisfying a condition we refer to as excisiveness. The classification can be changed by varying the notion of maps of finite metric spaces. The conditions occur naturally when one considers clustering as the statistical version of the geometric notion of connected components. By varying the degree of functoriality that one requires from the schemes it is possible to construct richer families of clustering schemes that exhibit sensitivity to density. 2 authors · Nov 23, 2010
- Towards a statistical theory of data selection under weak supervision Given a sample of size N, it is often useful to select a subsample of smaller size n<N to be used for statistical estimation or learning. Such a data selection step is useful to reduce the requirements of data labeling and the computational complexity of learning. We assume to be given N unlabeled samples {{boldsymbol x}_i}_{ile N}, and to be given access to a `surrogate model' that can predict labels y_i better than random guessing. Our goal is to select a subset of the samples, to be denoted by {{boldsymbol x}_i}_{iin G}, of size |G|=n<N. We then acquire labels for this set and we use them to train a model via regularized empirical risk minimization. By using a mixture of numerical experiments on real and synthetic data, and mathematical derivations under low- and high- dimensional asymptotics, we show that: (i)~Data selection can be very effective, in particular beating training on the full sample in some cases; (ii)~Certain popular choices in data selection methods (e.g. unbiased reweighted subsampling, or influence function-based subsampling) can be substantially suboptimal. 3 authors · Sep 25, 2023
- Parameterized covering in semi-ladder-free hypergraphs In this article, we study the parameterized complexity of the Set Cover problem restricted to semi-ladder-free hypergraphs, a class defined by Fabianski et al. [Proceedings of STACS 2019]. We observe that two algorithms introduced by Langerman and Morin [Discrete & Computational Geometry 2005] in the context of geometric covering problems can be adapted to this setting, yielding simple FPT and kernelization algorithms for Set Cover in semi-ladder-free hypergraphs. We complement our algorithmic results with a compression lower bound for the problem, which proves the tightness of our kernelization under standard complexity-theoretic assumptions. 1 authors · Nov 1, 2023
- Reverse mathematics and a Ramsey-type König's Lemma In this paper, we propose a weak regularity principle which is similar to both weak K\"onig's lemma and Ramsey's theorem. We begin by studying the computational strength of this principle in the context of reverse mathematics. We then analyze different ways of generalizing this principle. 1 authors · Nov 10, 2011
- Near-Optimal Quantum Coreset Construction Algorithms for Clustering k-Clustering in R^d (e.g., k-median and k-means) is a fundamental machine learning problem. While near-linear time approximation algorithms were known in the classical setting for a dataset with cardinality n, it remains open to find sublinear-time quantum algorithms. We give quantum algorithms that find coresets for k-clustering in R^d with O(nkd^{3/2}) query complexity. Our coreset reduces the input size from n to poly(kepsilon^{-1}d), so that existing alpha-approximation algorithms for clustering can run on top of it and yield (1 + epsilon)alpha-approximation. This eventually yields a quadratic speedup for various k-clustering approximation algorithms. We complement our algorithm with a nearly matching lower bound, that any quantum algorithm must make Omega(nk) queries in order to achieve even O(1)-approximation for k-clustering. 4 authors · Jun 5, 2023
- AutoCoreset: An Automatic Practical Coreset Construction Framework A coreset is a tiny weighted subset of an input set, that closely resembles the loss function, with respect to a certain set of queries. Coresets became prevalent in machine learning as they have shown to be advantageous for many applications. While coreset research is an active research area, unfortunately, coresets are constructed in a problem-dependent manner, where for each problem, a new coreset construction algorithm is usually suggested, a process that may take time or may be hard for new researchers in the field. Even the generic frameworks require additional (problem-dependent) computations or proofs to be done by the user. Besides, many problems do not have (provable) small coresets, limiting their applicability. To this end, we suggest an automatic practical framework for constructing coresets, which requires (only) the input data and the desired cost function from the user, without the need for any other task-related computation to be done by the user. To do so, we reduce the problem of approximating a loss function to an instance of vector summation approximation, where the vectors we aim to sum are loss vectors of a specific subset of the queries, such that we aim to approximate the image of the function on this subset. We show that while this set is limited, the coreset is quite general. An extensive experimental study on various machine learning applications is also conducted. Finally, we provide a ``plug and play" style implementation, proposing a user-friendly system that can be easily used to apply coresets for many problems. Full open source code can be found at https://github.com/alaamaalouf/AutoCoreset{https://github.com/alaamaalouf/AutoCoreset}. We believe that these contributions enable future research and easier use and applications of coresets. 4 authors · May 19, 2023
- The Virtual Large Cardinal Hierarchy We continue the study of the virtual large cardinal hierarchy by analysing virtual versions of superstrong, Woodin, and Berkeley cardinals. Gitman and Schindler showed that virtualizations of strong and supercompact cardinals yield the same large cardinal notion. We provide various equivalent characterizations of virtually Woodin cardinals, including showing that On is virtually Woodin if and only if for every class A, there is a proper class of virtually A-extendible cardinals. We introduce the virtual Vopenka principle for finite languages and show that it is not equivalent to the virtual Vopenka principle (although the two principles are equiconsistent), but is equivalent to the assertion that On is virtually pre-Woodin, a weakening of virtually Woodin, which is equivalent to having for every class A, a weakly virtually A-extendible cardinal. We show that if there are no virtually Berkeley cardinals, then On is virtually Woodin if and only if On is virtually pre-Woodin (if and only if the virtual Vopenka principle for finite languages holds). In particular, if the virtual Vopenka principle holds and On is not Mahlo, then On is not virtually Woodin, and hence there is a virtually Berkeley cardinal. 3 authors · Sep 13, 2021
1 Efficient Latency-Aware CNN Depth Compression via Two-Stage Dynamic Programming Recent works on neural network pruning advocate that reducing the depth of the network is more effective in reducing run-time memory usage and accelerating inference latency than reducing the width of the network through channel pruning. In this regard, some recent works propose depth compression algorithms that merge convolution layers. However, the existing algorithms have a constricted search space and rely on human-engineered heuristics. In this paper, we propose a novel depth compression algorithm which targets general convolution operations. We propose a subset selection problem that replaces inefficient activation layers with identity functions and optimally merges consecutive convolution operations into shallow equivalent convolution operations for efficient end-to-end inference latency. Since the proposed subset selection problem is NP-hard, we formulate a surrogate optimization problem that can be solved exactly via two-stage dynamic programming within a few seconds. We evaluate our methods and baselines by TensorRT for a fair inference latency comparison. Our method outperforms the baseline method with higher accuracy and faster inference speed in MobileNetV2 on the ImageNet dataset. Specifically, we achieve 1.41times speed-up with 0.11\%p accuracy gain in MobileNetV2-1.0 on the ImageNet. 4 authors · Jan 28, 2023
- Approximating the Convex Hull via Metric Space Magnitude Magnitude of a finite metric space and the related notion of magnitude functions on metric spaces is an active area of research in algebraic topology. Magnitude originally arose in the context of biology, where it represents the number of effective species in an environment; when applied to a one-parameter family of metric spaces tX with scale parameter t, the magnitude captures much of the underlying geometry of the space. Prior work has mostly focussed on properties of magnitude in a global sense; in this paper we restrict the sets to finite subsets of Euclidean space and investigate its individual components. We give an explicit formula for the corrected inclusion-exclusion principle, and define a quantity associated with each point, called the moment which gives an intrinsic ordering to the points. We exploit this in order to form an algorithm which approximates the convex hull. 3 authors · Aug 7, 2019
- Enhancing Neural Subset Selection: Integrating Background Information into Set Representations Learning neural subset selection tasks, such as compound selection in AI-aided drug discovery, have become increasingly pivotal across diverse applications. The existing methodologies in the field primarily concentrate on constructing models that capture the relationship between utility function values and subsets within their respective supersets. However, these approaches tend to overlook the valuable information contained within the superset when utilizing neural networks to model set functions. In this work, we address this oversight by adopting a probabilistic perspective. Our theoretical findings demonstrate that when the target value is conditioned on both the input set and subset, it is essential to incorporate an invariant sufficient statistic of the superset into the subset of interest for effective learning. This ensures that the output value remains invariant to permutations of the subset and its corresponding superset, enabling identification of the specific superset from which the subset originated. Motivated by these insights, we propose a simple yet effective information aggregation module designed to merge the representations of subsets and supersets from a permutation invariance perspective. Comprehensive empirical evaluations across diverse tasks and datasets validate the enhanced efficacy of our approach over conventional methods, underscoring the practicality and potency of our proposed strategies in real-world contexts. 8 authors · Feb 5, 2024
- Shortcut Partitions in Minor-Free Graphs: Steiner Point Removal, Distance Oracles, Tree Covers, and More The notion of shortcut partition, introduced recently by Chang, Conroy, Le, Milenkovi\'c, Solomon, and Than [CCLMST23], is a new type of graph partition into low-diameter clusters. Roughly speaking, the shortcut partition guarantees that for every two vertices u and v in the graph, there exists a path between u and v that intersects only a few clusters. They proved that any planar graph admits a shortcut partition and gave several applications, including a construction of tree cover for arbitrary planar graphs with stretch 1+varepsilon and O(1) many trees for any fixed varepsilon in (0,1). However, the construction heavily exploits planarity in multiple steps, and is thus inherently limited to planar graphs. In this work, we breach the "planarity barrier" to construct a shortcut partition for K_r-minor-free graphs for any r. To this end, we take a completely different approach -- our key contribution is a novel deterministic variant of the cop decomposition in minor-free graphs [And86, AGG14]. Our shortcut partition for K_r-minor-free graphs yields several direct applications. Most notably, we construct the first optimal distance oracle for K_r-minor-free graphs, with 1+varepsilon stretch, linear space, and constant query time for any fixed varepsilon in (0,1). The previous best distance oracle [AG06] uses O(nlog n) space and O(log n) query time, and its construction relies on Robertson-Seymour structural theorem and other sophisticated tools. We also obtain the first tree cover of O(1) size for minor-free graphs with stretch 1+varepsilon, while the previous best (1+varepsilon)-tree cover has size O(log^2 n) [BFN19]. 6 authors · Jul 31, 2023
- Probabilistic Partitive Partitioning (PPP) Clustering is a NP-hard problem. Thus, no optimal algorithm exists, heuristics are applied to cluster the data. Heuristics can be very resource-intensive, if not applied properly. For substantially large data sets computational efficiencies can be achieved by reducing the input space if a minimal loss of information can be achieved. Clustering algorithms, in general, face two common problems: 1) these converge to different settings with different initial conditions and; 2) the number of clusters has to be arbitrarily decided beforehand. This problem has become critical in the realm of big data. Recently, clustering algorithms have emerged which can speedup computations using parallel processing over the grid but face the aforementioned problems. Goals: Our goals are to find methods to cluster data which: 1) guarantee convergence to the same settings irrespective of the initial conditions; 2) eliminate the need to establish the number of clusters beforehand, and 3) can be applied to cluster large datasets. Methods: We introduce a method that combines probabilistic and combinatorial clustering methods to produce repeatable and compact clusters that are not sensitive to initial conditions. This method harnesses the power of k-means (a combinatorial clustering method) to cluster/partition very large dimensional datasets and uses the Gaussian Mixture Model (a probabilistic clustering method) to validate the k-means partitions. Results: We show that this method produces very compact clusters that are not sensitive to initial conditions. This method can be used to identify the most 'separable' set in a dataset which increases the 'clusterability' of a dataset. This method also eliminates the need to specify the number of clusters in advance. 1 authors · Mar 9, 2020
- On resolvability, connectedness and pseudocompactness We prove that: I. If L is a T_1 space, |L|>1 and d(L) leq kappa geq omega, then there is a submaximal dense subspace X of L^{2^kappa} such that |X|=Delta(X)=kappa; II. If cleqkappa=kappa^omega<lambda and 2^kappa=2^lambda, then there is a Tychonoff pseudocompact globally and locally connected space X such that |X|=Delta(X)=lambda and X is not kappa^+-resolvable; III. If omega_1leqkappa<lambda and 2^kappa=2^lambda, then there is a regular space X such that |X|=Delta(X)=lambda, all continuous real-valued functions on X are constant (so X is pseudocompact and connected) and X is not kappa^+-resolvable. 1 authors · Aug 2, 2023
- On affine spaces of alternating matrices with constant rank Let F be a field, and n geq r>0 be integers, with r even. Denote by A_n(F) the space of all n-by-n alternating matrices with entries in F. We consider the problem of determining the greatest possible dimension for an affine subspace of A_n(F) in which every matrix has rank equal to r (or rank at least r). Recently Rubei has solved this problem over the field of real numbers. We extend her result to all fields with large enough cardinality. Provided that n geq r+3 and |F|geq minbigl(r-1,r{2}+2bigr), we also determine the affine subspaces of rank r matrices in A_n(F) that have the greatest possible dimension, and we point to difficulties for the corresponding problem in the case nleq r+2. 1 authors · Jul 19, 2023
- Games and Ramsey-like cardinals We generalise the alpha-Ramsey cardinals introduced in Holy and Schlicht (2018) for cardinals alpha to arbitrary ordinals alpha, and answer several questions posed in that paper. In particular, we show that alpha-Ramseys are downwards absolute to the core model K for all alpha of uncountable cofinality, that strategic omega-Ramsey cardinals are equiconsistent with remarkable cardinals and that strategic alpha-Ramsey cardinals are equiconsistent with measurable cardinals for all alpha>omega. We also show that the n-Ramseys satisfy indescribability properties and use them to provide a game-theoretic characterisation of completely ineffable cardinals, as well as establishing further connections between the alpha-Ramsey cardinals and the Ramsey-like cardinals introduced in Gitman (2011), Feng (1990) and Sharpe and Welch (2011). 2 authors · Apr 27, 2018
- Submodular Order Functions and Assortment Optimization We define a new class of set functions that in addition to being monotone and subadditive, also admit a very limited form of submodularity defined over a permutation of the ground set. We refer to this permutation as a submodular order. This class of functions includes monotone submodular functions as a sub-family. To understand the importance of this structure in optimization problems we consider the problem of maximizing function value under various types of constraints. To demonstrate the modeling power of submodular order functions we show applications in two different settings. First, we apply our results to the extensively studied problem of assortment optimization. While the objectives in assortment optimization are known to be non-submodular (and non-monotone) even for simple choice models, we show that they are compatible with the notion of submodular order. Consequently, we obtain new and in some cases the first constant factor guarantee for constrained assortment optimization in fundamental choice models. As a second application of submodular order functions, we show an intriguing connection to the maximization of monotone submodular functions in the streaming model. We recover some best known guarantees for this problem as a corollary of our results. 1 authors · Jul 6, 2021
1 Infinite Feature Selection: A Graph-based Feature Filtering Approach We propose a filtering feature selection framework that considers subsets of features as paths in a graph, where a node is a feature and an edge indicates pairwise (customizable) relations among features, dealing with relevance and redundancy principles. By two different interpretations (exploiting properties of power series of matrices and relying on Markov chains fundamentals) we can evaluate the values of paths (i.e., feature subsets) of arbitrary lengths, eventually go to infinite, from which we dub our framework Infinite Feature Selection (Inf-FS). Going to infinite allows to constrain the computational complexity of the selection process, and to rank the features in an elegant way, that is, considering the value of any path (subset) containing a particular feature. We also propose a simple unsupervised strategy to cut the ranking, so providing the subset of features to keep. In the experiments, we analyze diverse settings with heterogeneous features, for a total of 11 benchmarks, comparing against 18 widely-known comparative approaches. The results show that Inf-FS behaves better in almost any situation, that is, when the number of features to keep are fixed a priori, or when the decision of the subset cardinality is part of the process. 5 authors · Jun 15, 2020
- CompAct: Compressed Activations for Memory-Efficient LLM Training We introduce CompAct, a technique that reduces peak memory utilization on GPU by 25-30% for pretraining and 50% for fine-tuning of LLMs. Peak device memory is a major limiting factor in training LLMs, with various recent works aiming to reduce model memory. However most works don't target the largest component of allocated memory during training: the model's compute graph, which is stored for the backward pass. By storing low-rank, compressed activations to be used in the backward pass we greatly reduce the required memory, unlike previous methods which only reduce optimizer overheads or the number of trained parameters. Our compression uses random projection matrices, thus avoiding additional memory overheads. Comparisons with previous techniques for either pretraining or fine-tuning show that CompAct substantially improves existing compute-performance tradeoffs. We expect CompAct's savings to scale even higher for larger models. 4 authors · Oct 20, 2024
- Intensional Inheritance Between Concepts: An Information-Theoretic Interpretation This paper addresses the problem of formalizing and quantifying the concept of "intensional inheritance" between two concepts. We begin by conceiving the intensional inheritance of W from F as the amount of information the proposition "x is F " provides about the proposition "x is W. To flesh this out, we consider concepts F and W defined by sets of properties left{F_{1}, F_{2}, ldots, F_{n}right} and left{W_{1}, W_{2}, ldots, W_{m}right} with associated degrees left{d_{1}, d_{2}, ldots, d_{n}right} and left{e_{1}, e_{2}, ldots, e_{m}right}, respectively, where the properties may overlap. We then derive formulas for the intensional inheritance using both Shannon information theory and algorithmic information theory, incorporating interaction information among properties. We examine a special case where all properties are mutually exclusive and calculate the intensional inheritance in this case in both frameworks. We also derive expressions for P(W mid F) based on the mutual information formula. Finally we consider the relationship between intensional inheritance and conventional set-theoretic "extensional" inheritance, concluding that in our information-theoretic framework, extensional inheritance emerges as a special case of intensional inheritance. 1 authors · Jan 28
- Approximation Algorithms for Fair Range Clustering This paper studies the fair range clustering problem in which the data points are from different demographic groups and the goal is to pick k centers with the minimum clustering cost such that each group is at least minimally represented in the centers set and no group dominates the centers set. More precisely, given a set of n points in a metric space (P,d) where each point belongs to one of the ell different demographics (i.e., P = P_1 uplus P_2 uplus cdots uplus P_ell) and a set of ell intervals [alpha_1, beta_1], cdots, [alpha_ell, beta_ell] on desired number of centers from each group, the goal is to pick a set of k centers C with minimum ell_p-clustering cost (i.e., (sum_{vin P} d(v,C)^p)^{1/p}) such that for each group iin ell, |Ccap P_i| in [alpha_i, beta_i]. In particular, the fair range ell_p-clustering captures fair range k-center, k-median and k-means as its special cases. In this work, we provide efficient constant factor approximation algorithms for fair range ell_p-clustering for all values of pin [1,infty). 3 authors · Jun 11, 2023
- Optimal LP Rounding and Linear-Time Approximation Algorithms for Clustering Edge-Colored Hypergraphs We study the approximability of an existing framework for clustering edge-colored hypergraphs, which is closely related to chromatic correlation clustering and is motivated by machine learning and data mining applications where the goal is to cluster a set of objects based on multiway interactions of different categories or types. We present improved approximation guarantees based on linear programming, and show they are tight by proving a matching integrality gap. Our results also include new approximation hardness results, a combinatorial 2-approximation whose runtime is linear in the hypergraph size, and several new connections to well-studied objectives such as vertex cover and hypergraph multiway cut. 1 authors · Aug 12, 2022
- Fast, Stable and Efficient Approximation of Multi-parameter Persistence Modules with MMA In this article, we introduce a new parameterized family of topological invariants, taking the form of candidate decompositions, for multi-parameter persistence modules. We prove that our candidate decompositions are controllable approximations: when restricting to modules that can be decomposed into interval summands, we establish theoretical results about the approximation error between our candidate decompositions and the true underlying module in terms of the standard interleaving and bottleneck distances. Moreover, even when the underlying module does not admit such a decomposition, our candidate decompositions are nonetheless stable invariants; small perturbations in the underlying module lead to small perturbations in the candidate decomposition. Then, we introduce MMA (Multipersistence Module Approximation): an algorithm for computing stable instances of such invariants, which is based on fibered barcodes and exact matchings, two constructions that stem from the theory of single-parameter persistence. By design, MMA can handle an arbitrary number of filtrations, and has bounded complexity and running time. Finally, we present empirical evidence validating the generalization capabilities and running time speed-ups of MMA on several data sets. 3 authors · Jun 4, 2022
1 Geometry of Sample Spaces In statistics, independent, identically distributed random samples do not carry a natural ordering, and their statistics are typically invariant with respect to permutations of their order. Thus, an n-sample in a space M can be considered as an element of the quotient space of M^n modulo the permutation group. The present paper takes this definition of sample space and the related concept of orbit types as a starting point for developing a geometric perspective on statistics. We aim at deriving a general mathematical setting for studying the behavior of empirical and population means in spaces ranging from smooth Riemannian manifolds to general stratified spaces. We fully describe the orbifold and path-metric structure of the sample space when M is a manifold or path-metric space, respectively. These results are non-trivial even when M is Euclidean. We show that the infinite sample space exists in a Gromov-Hausdorff type sense and coincides with the Wasserstein space of probability distributions on M. We exhibit Fr\'echet means and k-means as metric projections onto 1-skeleta or k-skeleta in Wasserstein space, and we define a new and more general notion of polymeans. This geometric characterization via metric projections applies equally to sample and population means, and we use it to establish asymptotic properties of polymeans such as consistency and asymptotic normality. 4 authors · Oct 15, 2020
- Networks bijective to permutations We study the set of networks, which consist of sources, sinks and neutral points, bijective to the permutations. The set of directed edges, which characterizes a network, is constructed from a polyomino or a Rothe diagram of a permutation through a Dyck tiling on a ribbon. We introduce a new combinatorial object similar to a tree-like tableau, which we call a forest. A forest is shown to give a permutation, and be bijective to a network corresponding to the inverse of the permutation. We show that the poset of networks is a finite graded lattice and admits an EL-labeling. By use of this EL-labeling, we show the lattice is supersolvable and compute the M\"obius function of an interval of the poset. 1 authors · Feb 8, 2024
- Polynomial Width is Sufficient for Set Representation with High-dimensional Features Set representation has become ubiquitous in deep learning for modeling the inductive bias of neural networks that are insensitive to the input order. DeepSets is the most widely used neural network architecture for set representation. It involves embedding each set element into a latent space with dimension L, followed by a sum pooling to obtain a whole-set embedding, and finally mapping the whole-set embedding to the output. In this work, we investigate the impact of the dimension L on the expressive power of DeepSets. Previous analyses either oversimplified high-dimensional features to be one-dimensional features or were limited to analytic activations, thereby diverging from practical use or resulting in L that grows exponentially with the set size N and feature dimension D. To investigate the minimal value of L that achieves sufficient expressive power, we present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE). We demonstrate that L being poly(N, D) is sufficient for set representation using both embedding layers. We also provide a lower bound of L for the LP embedding layer. Furthermore, we extend our results to permutation-equivariant set functions and the complex field. 5 authors · Jul 8, 2023
- Construction of simplicial complexes with prescribed degree-size sequences We study the realizability of simplicial complexes with a given pair of integer sequences, representing the node degree distribution and the facet size distribution, respectively. While the s-uniform variant of the problem is NP-complete when s geq 3, we identify two populations of input sequences, most of which can be solved in polynomial time using a recursive algorithm that we contribute. Combining with a sampler for the simplicial configuration model [J.-G. Young et al., Phys. Rev. E 96, 032312 (2017)], we facilitate the efficient sampling of simplicial ensembles from arbitrary degree and size distributions. We find that, contrary to expectations based on dyadic networks, increasing the nodes' degrees reduces the number of loops in simplicial complexes. Our work unveils a fundamental constraint on the degree-size sequences and sheds light on further analysis of higher-order phenomena based on local structures. 1 authors · May 31, 2021
- A Probability Monad as the Colimit of Spaces of Finite Samples We define and study a probability monad on the category of complete metric spaces and short maps. It assigns to each space the space of Radon probability measures on it with finite first moment, equipped with the Kantorovich-Wasserstein distance. This monad is analogous to the Giry monad on the category of Polish spaces, and it extends a construction due to van Breugel for compact and for 1-bounded complete metric spaces. We prove that this Kantorovich monad arises from a colimit construction on finite power-like constructions, which formalizes the intuition that probability measures are limits of finite samples. The proof relies on a criterion for when an ordinary left Kan extension of lax monoidal functors is a monoidal Kan extension. The colimit characterization allows the development of integration theory and the treatment of measures on spaces of measures, without measure theory. We also show that the category of algebras of the Kantorovich monad is equivalent to the category of closed convex subsets of Banach spaces with short affine maps as morphisms. 2 authors · Dec 14, 2017
- Subset Selection Based On Multiple Rankings in the Presence of Bias: Effectiveness of Fairness Constraints for Multiwinner Voting Score Functions We consider the problem of subset selection where one is given multiple rankings of items and the goal is to select the highest ``quality'' subset. Score functions from the multiwinner voting literature have been used to aggregate rankings into quality scores for subsets. We study this setting of subset selection problems when, in addition, rankings may contain systemic or unconscious biases toward a group of items. For a general model of input rankings and biases, we show that requiring the selected subset to satisfy group fairness constraints can improve the quality of the selection with respect to unbiased rankings. Importantly, we show that for fairness constraints to be effective, different multiwinner score functions may require a drastically different number of rankings: While for some functions, fairness constraints need an exponential number of rankings to recover a close-to-optimal solution, for others, this dependency is only polynomial. This result relies on a novel notion of ``smoothness'' of submodular functions in this setting that quantifies how well a function can ``correctly'' assess the quality of items in the presence of bias. The results in this paper can be used to guide the choice of multiwinner score functions for the subset selection setting considered here; we additionally provide a tool to empirically enable this. 5 authors · Jun 16, 2023
- Constrained Monotonic Neural Networks Wider adoption of neural networks in many critical domains such as finance and healthcare is being hindered by the need to explain their predictions and to impose additional constraints on them. Monotonicity constraint is one of the most requested properties in real-world scenarios and is the focus of this paper. One of the oldest ways to construct a monotonic fully connected neural network is to constrain signs on its weights. Unfortunately, this construction does not work with popular non-saturated activation functions as it can only approximate convex functions. We show this shortcoming can be fixed by constructing two additional activation functions from a typical unsaturated monotonic activation function and employing each of them on the part of neurons. Our experiments show this approach of building monotonic neural networks has better accuracy when compared to other state-of-the-art methods, while being the simplest one in the sense of having the least number of parameters, and not requiring any modifications to the learning procedure or post-learning steps. Finally, we prove it can approximate any continuous monotone function on a compact subset of R^n. 2 authors · May 24, 2022
- Which Invariance Should We Transfer? A Causal Minimax Learning Approach A major barrier to deploying current machine learning models lies in their non-reliability to dataset shifts. To resolve this problem, most existing studies attempted to transfer stable information to unseen environments. Particularly, independent causal mechanisms-based methods proposed to remove mutable causal mechanisms via the do-operator. Compared to previous methods, the obtained stable predictors are more effective in identifying stable information. However, a key question remains: which subset of this whole stable information should the model transfer, in order to achieve optimal generalization ability? To answer this question, we present a comprehensive minimax analysis from a causal perspective. Specifically, we first provide a graphical condition for the whole stable set to be optimal. When this condition fails, we surprisingly find with an example that this whole stable set, although can fully exploit stable information, is not the optimal one to transfer. To identify the optimal subset under this case, we propose to estimate the worst-case risk with a novel optimization scheme over the intervention functions on mutable causal mechanisms. We then propose an efficient algorithm to search for the subset with minimal worst-case risk, based on a newly defined equivalence relation between stable subsets. Compared to the exponential cost of exhaustively searching over all subsets, our searching strategy enjoys a polynomial complexity. The effectiveness and efficiency of our methods are demonstrated on synthetic data and the diagnosis of Alzheimer's disease. 5 authors · Jul 5, 2021
- The Test of Tests: A Framework For Differentially Private Hypothesis Testing We present a generic framework for creating differentially private versions of any hypothesis test in a black-box way. We analyze the resulting tests analytically and experimentally. Most crucially, we show good practical performance for small data sets, showing that at epsilon = 1 we only need 5-6 times as much data as in the fully public setting. We compare our work to the one existing framework of this type, as well as to several individually-designed private hypothesis tests. Our framework is higher power than other generic solutions and at least competitive with (and often better than) individually-designed tests. 4 authors · Feb 8, 2023
4 MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback To solve complex tasks, large language models (LLMs) often require multiple rounds of interactions with the user, sometimes assisted by external tools. However, current evaluation protocols often emphasize benchmark performance with single-turn exchanges, neglecting the nuanced interactions among the user, LLMs, and external tools, while also underestimating the importance of natural language feedback from users. These oversights contribute to discrepancies between research benchmark evaluations and real-world use cases. We introduce MINT, a benchmark that evaluates LLMs' ability to solve tasks with multi-turn interactions by (1) using tools and (2) leveraging natural language feedback. To ensure reproducibility, we provide an evaluation framework where LLMs can access tools by executing Python code and receive users' natural language feedback simulated by GPT-4. We repurpose a diverse set of established evaluation datasets focusing on reasoning, coding, and decision-making and carefully curate them into a compact subset for efficient evaluation. Our analysis of 20 open- and closed-source LLMs offers intriguing findings. (a) LLMs generally benefit from tools and language feedback, with performance gains (absolute, same below) of 1-8% for each turn of tool use and 2-17% with natural language feedback. (b) Better single-turn performance does not guarantee better multi-turn performance. (c) Surprisingly, on the LLMs evaluated, supervised instruction-finetuning (SIFT) and reinforcement learning from human feedback (RLHF) generally hurt multi-turn capabilities. We expect MINT can help measure progress and incentivize research in improving LLMs' capabilities in multi-turn interactions, especially for open-source communities where multi-turn human evaluation can be less accessible compared to commercial LLMs with a larger user base. 7 authors · Sep 19, 2023
- Adversarially Robust PAC Learnability of Real-Valued Functions We study robustness to test-time adversarial attacks in the regression setting with ell_p losses and arbitrary perturbation sets. We address the question of which function classes are PAC learnable in this setting. We show that classes of finite fat-shattering dimension are learnable in both realizable and agnostic settings. Moreover, for convex function classes, they are even properly learnable. In contrast, some non-convex function classes provably require improper learning algorithms. Our main technique is based on a construction of an adversarially robust sample compression scheme of a size determined by the fat-shattering dimension. Along the way, we introduce a novel agnostic sample compression scheme for real-valued functions, which may be of independent interest. 2 authors · Jun 26, 2022
- Sample Complexity Bounds for Learning High-dimensional Simplices in Noisy Regimes In this paper, we find a sample complexity bound for learning a simplex from noisy samples. Assume a dataset of size n is given which includes i.i.d. samples drawn from a uniform distribution over an unknown simplex in R^K, where samples are assumed to be corrupted by a multi-variate additive Gaussian noise of an arbitrary magnitude. We prove the existence of an algorithm that with high probability outputs a simplex having a ell_2 distance of at most varepsilon from the true simplex (for any varepsilon>0). Also, we theoretically show that in order to achieve this bound, it is sufficient to have ngeleft(K^2/varepsilon^2right)e^{Omegaleft(K/SNR^2right)} samples, where SNR stands for the signal-to-noise ratio. This result solves an important open problem and shows as long as SNRgeOmegaleft(K^{1/2}right), the sample complexity of the noisy regime has the same order to that of the noiseless case. Our proofs are a combination of the so-called sample compression technique in ashtiani2018nearly, mathematical tools from high-dimensional geometry, and Fourier analysis. In particular, we have proposed a general Fourier-based technique for recovery of a more general class of distribution families from additive Gaussian noise, which can be further used in a variety of other related problems. 4 authors · Sep 9, 2022
- Regularity of shadows and the geometry of the singular set associated to a Monge-Ampere equation Illuminating the surface of a convex body with parallel beams of light in a given direction generates a shadow region. We prove sharp regularity results for the boundary of this shadow in every direction of illumination. Moreover, techniques are developed for investigating the regularity of the region generated by orthogonally projecting a convex set onto another. As an application we study the geometry and Hausdorff dimension of the singular set corresponding to a Monge-Ampere equation. 2 authors · Nov 22, 2013
- Dynamic Constrained Submodular Optimization with Polylogarithmic Update Time Maximizing a monotone submodular function under cardinality constraint k is a core problem in machine learning and database with many basic applications, including video and data summarization, recommendation systems, feature extraction, exemplar clustering, and coverage problems. We study this classic problem in the fully dynamic model where a stream of insertions and deletions of elements of an underlying ground set is given and the goal is to maintain an approximate solution using a fast update time. A recent paper at NeurIPS'20 by Lattanzi, Mitrovic, Norouzi{-}Fard, Tarnawski, Zadimoghaddam claims to obtain a dynamic algorithm for this problem with a 1{2} -epsilon approximation ratio and a query complexity bounded by poly(log(n),log(k),epsilon^{-1}). However, as we explain in this paper, the analysis has some important gaps. Having a dynamic algorithm for the problem with polylogarithmic update time is even more important in light of a recent result by Chen and Peng at STOC'22 who show a matching lower bound for the problem -- any randomized algorithm with a 1{2}+epsilon approximation ratio must have an amortized query complexity that is polynomial in n. In this paper, we develop a simpler algorithm for the problem that maintains a (1{2}-epsilon)-approximate solution for submodular maximization under cardinality constraint k using a polylogarithmic amortized update time. 6 authors · May 24, 2023
2 All Weight Systems for Calabi-Yau Fourfolds from Reflexive Polyhedra For any given dimension d, all reflexive d-polytopes can be found (in principle) as subpolytopes of a number of maximal polyhedra that are defined in terms of (d+1)-tuples of integers (weights), or combinations of k-tuples of weights with k<d+1. We present the results of a complete classification of sextuples of weights pertaining to the construction of all reflexive polytopes in five dimensions. We find 322 383 760 930 such weight systems. 185 269 499 015 of them give rise directly to reflexive polytopes and thereby to mirror pairs of Calabi-Yau fourfolds. These lead to 532 600 483 distinct sets of Hodge numbers. 2 authors · Aug 7, 2018
- Sharper Bounds for ell_p Sensitivity Sampling In large scale machine learning, random sampling is a popular way to approximate datasets by a small representative subset of examples. In particular, sensitivity sampling is an intensely studied technique which provides provable guarantees on the quality of approximation, while reducing the number of examples to the product of the VC dimension d and the total sensitivity mathfrak S in remarkably general settings. However, guarantees going beyond this general bound of mathfrak S d are known in perhaps only one setting, for ell_2 subspace embeddings, despite intense study of sensitivity sampling in prior work. In this work, we show the first bounds for sensitivity sampling for ell_p subspace embeddings for pneq 2 that improve over the general mathfrak S d bound, achieving a bound of roughly mathfrak S^{2/p} for 1leq p<2 and mathfrak S^{2-2/p} for 2<p<infty. For 1leq p<2, we show that this bound is tight, in the sense that there exist matrices for which mathfrak S^{2/p} samples is necessary. Furthermore, our techniques yield further new results in the study of sampling algorithms, showing that the root leverage score sampling algorithm achieves a bound of roughly d for 1leq p<2, and that a combination of leverage score and sensitivity sampling achieves an improved bound of roughly d^{2/p}mathfrak S^{2-4/p} for 2<p<infty. Our sensitivity sampling results yield the best known sample complexity for a wide class of structured matrices that have small ell_p sensitivity. 2 authors · Jun 1, 2023
- The Power of Few: Accelerating and Enhancing Data Reweighting with Coreset Selection As machine learning tasks continue to evolve, the trend has been to gather larger datasets and train increasingly larger models. While this has led to advancements in accuracy, it has also escalated computational costs to unsustainable levels. Addressing this, our work aims to strike a delicate balance between computational efficiency and model accuracy, a persisting challenge in the field. We introduce a novel method that employs core subset selection for reweighting, effectively optimizing both computational time and model performance. By focusing on a strategically selected coreset, our approach offers a robust representation, as it efficiently minimizes the influence of outliers. The re-calibrated weights are then mapped back to and propagated across the entire dataset. Our experimental results substantiate the effectiveness of this approach, underscoring its potential as a scalable and precise solution for model training. 4 authors · Mar 18, 2024
- ARC Sort: Enhanced and Time Efficient Sorting Algorithm This paper discusses about a sorting algorithm which uses the concept of buckets where each bucket represents a certain number of digits. A two dimensional data structure is used where one dimension represents buckets i. e; number of digits and each bucket's corresponding dimensions represents the input numbers that belong to that bucket. Each bucket is then individually sorted. Since every preceding bucket elements will always be smaller than the succeeding buckets no comparison between them is required. By doing this we can significantly reduced the time complexity of any sorting algorithm used to sort the given set of inputs. 4 authors · Jun 6, 2014
1 Deep Sets We study the problem of designing models for machine learning tasks defined on sets. In contrast to traditional approach of operating on fixed dimensional vectors, we consider objective functions defined on sets that are invariant to permutations. Such problems are widespread, ranging from estimation of population statistics poczos13aistats, to anomaly detection in piezometer data of embankment dams Jung15Exploration, to cosmology Ntampaka16Dynamical,Ravanbakhsh16ICML1. Our main theorem characterizes the permutation invariant functions and provides a family of functions to which any permutation invariant objective function must belong. This family of functions has a special structure which enables us to design a deep network architecture that can operate on sets and which can be deployed on a variety of scenarios including both unsupervised and supervised learning tasks. We also derive the necessary and sufficient conditions for permutation equivariance in deep models. We demonstrate the applicability of our method on population statistic estimation, point cloud classification, set expansion, and outlier detection. 6 authors · Mar 10, 2017 1
- Scalable Set Encoding with Universal Mini-Batch Consistency and Unbiased Full Set Gradient Approximation Recent work on mini-batch consistency (MBC) for set functions has brought attention to the need for sequentially processing and aggregating chunks of a partitioned set while guaranteeing the same output for all partitions. However, existing constraints on MBC architectures lead to models with limited expressive power. Additionally, prior work has not addressed how to deal with large sets during training when the full set gradient is required. To address these issues, we propose a Universally MBC (UMBC) class of set functions which can be used in conjunction with arbitrary non-MBC components while still satisfying MBC, enabling a wider range of function classes to be used in MBC settings. Furthermore, we propose an efficient MBC training algorithm which gives an unbiased approximation of the full set gradient and has a constant memory overhead for any set size for both train- and test-time. We conduct extensive experiments including image completion, text classification, unsupervised clustering, and cancer detection on high-resolution images to verify the efficiency and efficacy of our scalable set encoding framework. Our code is available at github.com/jeffwillette/umbc 6 authors · Aug 25, 2022
2 Understanding networks and their behaviors using sheaf theory Many complicated network problems can be easily understood on small networks. Difficulties arise when small networks are combined into larger ones. Fortunately, the mathematical theory of sheaves was constructed to address just this kind of situation; it extends locally-defined structures to globally valid inferences by way of consistency relations. This paper exhibits examples in network monitoring and filter hardware where sheaves have useful descriptive power. 1 authors · Aug 21, 2013
- Proof-irrelevant model of CC with predicative induction and judgmental equality We present a set-theoretic, proof-irrelevant model for Calculus of Constructions (CC) with predicative induction and judgmental equality in Zermelo-Fraenkel set theory with an axiom for countably many inaccessible cardinals. We use Aczel's trace encoding which is universally defined for any function type, regardless of being impredicative. Direct and concrete interpretations of simultaneous induction and mutually recursive functions are also provided by extending Dybjer's interpretations on the basis of Aczel's rule sets. Our model can be regarded as a higher-order generalization of the truth-table methods. We provide a relatively simple consistency proof of type theory, which can be used as the basis for a theorem prover. 2 authors · Nov 1, 2011
- Multisize Dataset Condensation While dataset condensation effectively enhances training efficiency, its application in on-device scenarios brings unique challenges. 1) Due to the fluctuating computational resources of these devices, there's a demand for a flexible dataset size that diverges from a predefined size. 2) The limited computational power on devices often prevents additional condensation operations. These two challenges connect to the "subset degradation problem" in traditional dataset condensation: a subset from a larger condensed dataset is often unrepresentative compared to directly condensing the whole dataset to that smaller size. In this paper, we propose Multisize Dataset Condensation (MDC) by compressing N condensation processes into a single condensation process to obtain datasets with multiple sizes. Specifically, we introduce an "adaptive subset loss" on top of the basic condensation loss to mitigate the "subset degradation problem". Our MDC method offers several benefits: 1) No additional condensation process is required; 2) reduced storage requirement by reusing condensed images. Experiments validate our findings on networks including ConvNet, ResNet and DenseNet, and datasets including SVHN, CIFAR-10, CIFAR-100 and ImageNet. For example, we achieved 6.40% average accuracy gains on condensing CIFAR-10 to ten images per class. Code is available at: https://github.com/he-y/Multisize-Dataset-Condensation. 4 authors · Mar 9, 2024
- PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition We present PutnamBench, a new multilingual benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1697 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the theorems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. Proving the theorems requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PutnamBench to evaluate several established neural and symbolic theorem-provers. These approaches can only solve a handful of the PutnamBench problems, establishing the benchmark as a difficult open challenge for research on neural theorem-proving. PutnamBench is available at https://github.com/trishullab/PutnamBench. 8 authors · Jul 15, 2024