stereoplegic
's Collections
Approximation
updated
Linear Self-Attention Approximation via Trainable Feedforward Kernel
Paper
•
2211.04076
•
Published
•
1
Greenformer: Factorization Toolkit for Efficient Deep Neural Networks
Paper
•
2109.06762
•
Published
•
1
COMCAT: Towards Efficient Compression and Customization of
Attention-Based Vision Models
Paper
•
2305.17235
•
Published
•
2
Exploring Low Rank Training of Deep Neural Networks
Paper
•
2209.13569
•
Published
•
1
Fourier Transformer: Fast Long Range Modeling by Removing Sequence
Redundancy with FFT Operator
Paper
•
2305.15099
•
Published
•
1
AxFormer: Accuracy-driven Approximation of Transformers for Faster,
Smaller and more Accurate NLP Models
Paper
•
2010.03688
•
Published
•
1
Compressing Neural Networks: Towards Determining the Optimal Layer-wise
Decomposition
Paper
•
2107.11442
•
Published
•
1
L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and
Accurate Deep Learning
Paper
•
2210.17357
•
Published
•
1
Paper
•
2312.17244
•
Published
•
9
Rethinking Compression: Reduced Order Modelling of Latent Features in
Large Language Models
Paper
•
2312.07046
•
Published
•
13
LORD: Low Rank Decomposition Of Monolingual Code LLMs For One-Shot
Compression
Paper
•
2309.14021
•
Published
•
1
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
Paper
•
2310.10700
•
Published
•
1
The Truth is in There: Improving Reasoning in Language Models with
Layer-Selective Rank Reduction
Paper
•
2312.13558
•
Published
•
5
NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning
Paper
•
2307.08941
•
Published
•
1
Low-rank lottery tickets: finding efficient low-rank neural networks via
matrix differential equations
Paper
•
2205.13571
•
Published
•
1
Trained Rank Pruning for Efficient Deep Neural Networks
Paper
•
1812.02402
•
Published
•
1
TRP: Trained Rank Pruning for Efficient Deep Neural Networks
Paper
•
2004.14566
•
Published
•
1
Factorization Vision Transformer: Modeling Long Range Dependency with
Local Window Cost
Paper
•
2312.08614
•
Published
•
1
Learning Low-Rank Representations for Model Compression
Paper
•
2211.11397
•
Published
•
1
Latent Space Factorisation and Manipulation via Matrix Subspace
Projection
Paper
•
1907.12385
•
Published
•
1
Rethinking Attention with Performers
Paper
•
2009.14794
•
Published
•
1
Softmax-free Linear Transformers
Paper
•
2207.03341
•
Published
•
1
Generalization Bounds for Magnitude-Based Pruning via Sparse Matrix
Sketching
Paper
•
2305.18789
•
Published
•
1
Efficient Storage of Fine-Tuned Models via Low-Rank Approximation of
Weight Residuals
Paper
•
2305.18425
•
Published
•
1
Pixelated Butterfly: Simple and Efficient Sparse training for Neural
Network Models
Paper
•
2112.00029
•
Published
•
1
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its
Routing Policy
Paper
•
2310.01334
•
Published
•
3
LoGAH: Predicting 774-Million-Parameter Transformers using Graph
HyperNetworks with 1/100 Parameters
Paper
•
2405.16287
•
Published
•
10
Effectively Compress KV Heads for LLM
Paper
•
2406.07056
•
Published
SVD-LLM: Truncation-aware Singular Value Decomposition for Large
Language Model Compression
Paper
•
2403.07378
•
Published
•
2
On the Benefits of Rank in Attention Layers
Paper
•
2407.16153
•
Published