<!doctype html> <html> <head> <meta charset="utf-8" /> <meta name="viewport" content="width=device-width" /> <title>My static Space</title> <link rel="stylesheet" href="style.css" /> <script type="module" src="https://gradio.s3-us-west-2.amazonaws.com/4.36.1/gradio.js" ></script> </head> <body class="theme-dependent"> <div class="description"> <h2>VPTQ Online Demo</h2> <p> <b>VPTQ (Vector Post-Training Quantization)</b> is an advanced compression technique that dramatically reduces the size of large language models such as the 70B and 405B Llama models. VPTQ efficiently compresses these models to 1-2 bits within just a few hours, enabling them to run effectively on GPUs with limited memory. For more information, visit the following links: <ul> <li> <a href="https://arxiv.org/abs/2409.17066" target="_blank" class="link-styled"> <img src="arxiv-logo.png" alt="arXiv" width="20" height="20" /> <b>Paper on arXiv</b> </a> </li> <li> <a href="https://github.com/microsoft/VPTQ" target="_blank" class="link-styled"> <img src="github-mark.png" alt="GitHub" width="20" height="20" /> <b>GitHub Repository</b> </a> </li> <li> <a href="https://huggingface.co/VPTQ-community" target="_blank" class="link-styled"> <img src="hf-logo.png" alt="Hugging Face" width="20" height="20" /> <b>Hugging Face Community</b> </a> </li> </ul> </p> </div> <gradio-app src="https://opensourceronin-vptq-demo.hf.space"></gradio-app> </body> <style> body.theme-dependent { background-color: #0d1117; color: #c9d1d9; font-family: Arial, sans-serif; } .description h2 { color: #58a6ff; } .link-styled { color: #58a6ff; text-decoration: none; } .link-styled:hover { text-decoration: underline; } .link-styled:visited { color: #8b949e; } @media (prefers-color-scheme: dark) { body.theme-dependent { --background-color: #0d1117; --text-color: #c9d1d9; } } @media (prefers-color-scheme: light) { body.theme-dependent { --background-color: #ffffff; --text-color: #000000; } } </style> </html>