Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
floom 's Collections
ShowAndTell-2025-01-30
ShowAndTell
ShowAndTell-2024-12-03
Coding
Reasoning
ICL
RL
Model Training
Agents
NLU
Training data
RAG
Data Efficient Approaches
Long-context
Personalization
sentence-transformer-models
Tool Use & more
Feedback Analysis
Model Safety
Webscraping
Timeseries
Evaluation
Memory
SSM
TabularData
Efficient Serving/Inference
Synthetic Data Generation
Hallucination
Frontier research ideas

Efficient Serving/Inference

updated Jul 13, 2024
Upvote
-

  • MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool

    Paper • 2406.17565 • Published Jun 25, 2024 • 4

  • Inference Performance Optimization for Large Language Models on CPUs

    Paper • 2407.07304 • Published Jul 10, 2024 • 54
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs