Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
anujga 's Collections
rl-papers
Multi-lingual
Retrieval
Special
Aggregates
PT
Persona
Pt-classify
Sft
O1
Rl
Programming
Benchmark
Architecture
Datasets
Theory
agent
data/tool
data/vision
chemistry

Architecture

updated Dec 9, 2024
Upvote
-

  • UT5: Pretraining Non autoregressive T5 with unrolled denoising

    Paper • 2311.08552 • Published Nov 14, 2023 • 8

  • Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers

    Paper • 2311.10642 • Published Nov 17, 2023 • 26

  • Densing Law of LLMs

    Paper • 2412.04315 • Published Dec 5, 2024 • 19
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs