Spaces:

ortal1602
/

ARvsFM

Running

App Files Files Community

ortal1602 commited on Jun 11

Commit

f92efcb

verified ·

1 Parent(s): 1959be2

Update index.html

Browse files

Files changed (1) hide show

index.html +43 -12

index.html CHANGED Viewed

@@ -34,9 +34,9 @@
   <img src="figures/ARvsFM.png" alt="AR vs FM" style="width: 100%; border-radius: 20px; box-shadow: 0 4px 16px rgba(0,0,0,0.2); margin-bottom: 20px;">
   <h1>AR vs FM: A Comparative Study on Audio Modeling Paradigms</h1>
   <p>
-    <a href="https://scholar.google.com/citations?user=OrTalScholarID" target="_blank">Or Tal</a> ·
-    <a href="https://scholar.google.com/citations?user=FelixKreukID" target="_blank">Felix Kreuk</a> ·
-    <a href="https://scholar.google.com/citations?user=YossiAdiID" target="_blank">Yossi Adi</a>
   </p>
 </div>
@@ -60,21 +60,52 @@
   </div>
 </div>
 <script>
   const highlights = [
-    "🎯 AR achieves better text-to-music fidelity and is more robust to frame-rate changes.",
-    "🧠 AR follows temporally-aligned control signals (chords, melody, drums) more accurately.",
-    "🪄 FM (supervised) produces the smoothest transitions in inpainting; AR has lowest FAD but audible seams.",
-    "🚀 FM can be faster, but only at the cost of quality (needs fewer steps). AR scales better with batch size.",
-    "🧪 FM achieves near-topline performance with smaller batches; AR improves steadily with more training steps.",
-    "🎧 Both AR and FM lose fidelity when conditioned with strict temporal controls—highlighting a trade-off between control and quality."
   ];
   let highlightIndex = 0;
-  const highlightText = document.getElementById('highlight-text');
   function showHighlight(index) {
-    highlightText.textContent = highlights[index];
   }
   function prevHighlight() {

   <img src="figures/ARvsFM.png" alt="AR vs FM" style="width: 100%; border-radius: 20px; box-shadow: 0 4px 16px rgba(0,0,0,0.2); margin-bottom: 20px;">
   <h1>AR vs FM: A Comparative Study on Audio Modeling Paradigms</h1>
   <p>
+    <a href="https://scholar.google.com/citations?user=QK3_J9IAAAAJ" target="_blank">Or Tal</a> ·
+    <a href="https://scholar.google.com/citations?user=UiERcYsAAAAJ" target="_blank">Felix Kreuk</a> ·
+    <a href="https://scholar.google.com/citations?user=ryMtc7sAAAAJ" target="_blank">Yossi Adi</a>
   </p>
 </div>
   </div>
 </div>
+<!-- Interactive Highlight Slider -->
+<div class="container">
+  <h2>Paper Highlights</h2>
+  <div id="highlight-box" style="text-align: center; padding: 30px; border: 1px solid #ddd; border-radius: 10px; background: #fafafa;">
+    <p id="highlight-text" style="font-size: 1.2rem; font-style: italic;"></p>
+    <img id="highlight-image" src="" alt="Highlight figure" style="max-width: 100%; max-height: 400px; margin-top: 20px; border-radius: 12px; box-shadow: 0 2px 12px rgba(0,0,0,0.1);">
+  </div>
+  <div class="text-center mt-3">
+    <button onclick="prevHighlight()" class="btn btn-outline-primary">← Prev</button>
+    <button onclick="nextHighlight()" class="btn btn-outline-primary">Next →</button>
+  </div>
+</div>
 <script>
   const highlights = [
+    {
+      text: "🎼 AR vs FM across 5 axes — fidelity, control, editing, speed, and training. No single winner. Every strength is a trade-off.",
+      image: "figures/highlights/table.png"
+    },
+    {
+      text: "Both modeling paradigms (EnCodec-based latent) show comparable performance with a slight favor toward AR, which also prove to be more robust to the latent representation’s sample rate. FM performance degrade as the number of inference steps decrease. In order to maintain comparable performance with AR, FM requires a large number of inference steps.",
+      image: "figures/highlights/fidelity.png"
+    },
+    {
+      text: "AR follows temporally-aligned conditioning more accurately than FM, but both paradigms lose perceptual quality under strict controls, illustrating a controllability–fidelity trade-off.",
+      image: "figures/highlights/control.png"
+    },
+    {
+      text: "Supervised flow matching is the most robust inpainting method: it yields the smoothest and most coherent edits; zero-shot flow matching is attractive for rapid, prompt-driven edits but needs a small hyper-parameter search per-sample or a better sampling strategy to provide more stable outputs.",
+      image: "figures/highlights/inpainting.png"
+    },
+    {
+      text: "AR scales better with batch size thanks to KV caching; FM may becomes faster while reducing the number of inference steps, however this comes at the cost of degraded generation quality. Selecting a modeling paradigm therefore hinges on how much quality one is willing to trade for latency.",
+      image: "figures/highlights/speed_vs_quality.png"
+    },
+    {
+      text: "When the number of update steps is capped, FM reaches almost the same FAD, PQ, and CE as in the one-million-step topline using much smaller batches, though its CLAP score keeps improving with scale. The AR model needs a larger token budget per step to match its topline performance and benefits more from large scale training.",
+      image: "figures/highlights/training_sensitivity.png"
+    }
   ];
   let highlightIndex = 0;
   function showHighlight(index) {
+    document.getElementById('highlight-text').textContent = highlights[index].text;
+    document.getElementById('highlight-image').src = highlights[index].image;
   }
   function prevHighlight() {