ngxson HF staff commited on
Commit
4a52fb2
·
1 Parent(s): be27aeb

add examples

Browse files
app.py CHANGED
@@ -157,7 +157,7 @@ API_NAME = 'tts'
157
 
158
  head = f'''
159
  <script>
160
- document.addEventListener('load', () => {{
161
  console.log('DOM content loaded');
162
  if (!localStorage.getItem('debug') && !window.location.href.match(/debug=1/)) {{
163
  console.log('Attaching frontend app');
 
157
 
158
  head = f'''
159
  <script>
160
+ document.addEventListener('DOMContentLoaded', () => {{
161
  console.log('DOM content loaded');
162
  if (!localStorage.getItem('debug') && !window.location.href.match(/debug=1/)) {{
163
  console.log('Attaching frontend app');
front/src/components/ScriptMaker.tsx CHANGED
@@ -2,6 +2,7 @@ import { useEffect, useState } from 'react';
2
  import { CONFIG } from '../config';
3
  import { getPromptGeneratePodcastScript } from '../utils/prompts';
4
  import { getSSEStreamAsync } from '../utils/utils';
 
5
 
6
  interface SplitContent {
7
  thought: string;
@@ -93,6 +94,29 @@ export const ScriptMaker = ({
93
  <div className="card bg-base-100 w-full shadow-xl">
94
  <div className="card-body">
95
  <h2 className="card-title">Step 1: Input information</h2>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
  <textarea
97
  className="textarea textarea-bordered w-full h-72 p-2"
98
  placeholder="Type your input information here (an article, a document, etc)..."
 
2
  import { CONFIG } from '../config';
3
  import { getPromptGeneratePodcastScript } from '../utils/prompts';
4
  import { getSSEStreamAsync } from '../utils/utils';
5
+ import { EXAMPLES } from '../examples';
6
 
7
  interface SplitContent {
8
  thought: string;
 
94
  <div className="card bg-base-100 w-full shadow-xl">
95
  <div className="card-body">
96
  <h2 className="card-title">Step 1: Input information</h2>
97
+
98
+ <select
99
+ className="select select-bordered w-full"
100
+ disabled={isGenerating || busy}
101
+ onChange={(e) => {
102
+ const idx = parseInt(e.target.value);
103
+ const ex = EXAMPLES[idx];
104
+ if (ex) {
105
+ setInput(ex.input);
106
+ setNote(ex.note);
107
+ }
108
+ }}
109
+ >
110
+ <option selected disabled value={-1}>
111
+ Try one of these examples!!
112
+ </option>
113
+ {EXAMPLES.map((example, index) => (
114
+ <option key={index} value={index}>
115
+ {example.name}
116
+ </option>
117
+ ))}
118
+ </select>
119
+
120
  <textarea
121
  className="textarea textarea-bordered w-full h-72 p-2"
122
  placeholder="Type your input information here (an article, a document, etc)..."
front/src/examples.ts ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ export const EXAMPLES = [
2
+ {
3
+ name: 'Example 1: History and facts about croissants',
4
+ note: 'Be informative, but keep it engaging',
5
+ input: `
6
+ Where are croissants from?
7
+
8
+ There are several stories that claim to explain the origins of how croissants came to be. Whilst experts have done their best to debunk most of these apochryphal tales, one thing that the historical evidence shows more definitively is that the story of the croissant begins in Austria, not in France.
9
+
10
+ It is widely understood that the croissant of today is a descendent of the 'kipferl' (or kipfel) - an Austrian, crescent-shaped pastry that resembles a thinner, denser croissant made with a generous amount of butter and often served topped with sugar and almonds.
11
+
12
+ ## From Vienna To Paris
13
+
14
+ Regardless of the kipferl’s origin, how then did this Viennese delicacy become France’s most beloved patisserie item? As you may now have come to expect, there are a number of tall tales.
15
+
16
+ One legend attributes the kipferl’s introduction to Paris, having come from Austrian-born Marie Antoinette bringing the delicacy over when she married King Louis XVI in 1770. This is yet another myth!
17
+
18
+ The real birth of the croissant is more accurately attributed to August Zang, an Austrian entrepreneur who opened a Viennese-style boulangerie in Paris in 1838.
19
+
20
+ It was here, known locally as simply “Zang’s”, that Parisians first encountered what would become the croissant. Though Zang’s was only open for two years, his unique Vienneserie products and marketing acumen (using newspaper advertising and lavish window displays to entice customers) made his kipferl a sensation; so much so that by the time Zang’s closed in 1840, there were already a dozen imitators baking his beloved crescent-shaped delicacy.
21
+
22
+ It wasn’t long before these pastries were firmly cemented in Parisian culture, with the French word ‘croissant’ (meaning ‘crescent’) replacing the original Austrian name.
23
+
24
+ ## Today, croissants have conquered the world
25
+
26
+ Undoubtedly, August Zang would never have predicted the global impact his Parisian-Viennese invention would have. Fast forward from 1838 to today, and the popularity of croissants worldwide is truly staggering.
27
+
28
+ A recent study valued the size of the global croissant market at 6663.1 million USD, with a projected growth expected to reach 8574.66 million USD by 2027¹.
29
+
30
+ Spanning over an 800-year period from 13th Century Austria through to global significance in the modern day, the story of the croissant is truly a remarkable one. Even more remarkably is that, though we’ve been enjoying these incredibly popular and palatable pastries for hundreds of years, there are still new and exciting takes on the recipe being created.
31
+
32
+ # Facts
33
+
34
+
35
+ ## Marie Antoinette did not popularise the croissant
36
+
37
+ Legend credits the French queen Marie Antoinette—homesick for a taste of her native Vienna—with introducing the kipferl, and thus the croissant, to France. But Jim Chevallier mentioned above sees no evidence to support this notion.
38
+
39
+ ## A relatively recent recipe
40
+
41
+ A croissant recipe showing the use of yeasted puff pastry dough instead of brioche that was used previously first appeared in 1905 in the book Colombie’s “NOUVELLE ENCYCLOPEDIE CULINAIRE. Cuisine et Pâtisserie Bourgeoises conserves de ménage.”
42
+
43
+ ## A breakfast staple
44
+
45
+ The croissant was already a breakfast staple by the late 1860s and Charles Dickens referred to the “dainty croissant on the boudoir table” in All the Year Round in 1872.
46
+
47
+ Croissant history
48
+
49
+ ## Why some croissants are curved and others are straight
50
+
51
+ Croissants that are straight are those made with butter (croissants au beurre) and the curved ones are made with margarine (croissants ordinaires).
52
+
53
+ Croissant au beurre croissant ordinaire
54
+
55
+ ## Lamination
56
+
57
+ Those crispy, airy, crunchy layers we associate with the croissant are thanks to a process called lamination. What is it? The dough is folded several times with alternating layers of fat – butter or margarine, before then being rolled and cut into triangles.
58
+ `.trim(),
59
+ },
60
+ {
61
+ name: 'Example 2: Intro to Hugging Face Transformers (from HF blog)',
62
+ note: 'Make it a bit fun and engaging',
63
+ input: `
64
+ Welcome to "A Total Noob’s Introduction to Hugging Face Transformers," a guide designed specifically for those looking to understand the bare basics of using open-source ML. Our goal is to demystify what Hugging Face Transformers is and how it works, not to turn you into a machine learning practitioner, but to enable better understanding of and collaboration with those who are. That being said, the best way to learn is by doing, so we'll walk through a simple worked example of running Microsoft’s Phi-2 LLM in a notebook on a Hugging Face space.
65
+
66
+ You might wonder, with the abundance of tutorials on Hugging Face already available, why create another? The answer lies in accessibility: most existing resources assume some technical background, including Python proficiency, which can prevent non-technical individuals from grasping ML fundamentals. As someone who came from the business side of AI, I recognize that the learning curve presents a barrier and wanted to offer a more approachable path for like-minded learners.
67
+
68
+ Therefore, this guide is tailored for a non-technical audience keen to better understand open-source machine learning without having to learn Python from scratch. We assume no prior knowledge and will explain concepts from the ground up to ensure clarity. If you're an engineer, you’ll find this guide a bit basic, but for beginners, it's an ideal starting point.
69
+
70
+ Let’s get stuck in… but first some context.
71
+ What is Hugging Face Transformers?
72
+
73
+ Hugging Face Transformers is an open-source Python library that provides access to thousands of pre-trained Transformers models for natural language processing (NLP), computer vision, audio tasks, and more. It simplifies the process of implementing Transformer models by abstracting away the complexity of training or deploying models in lower level ML frameworks like PyTorch, TensorFlow and JAX.
74
+ What is a library?
75
+
76
+ A library is just a collection of reusable pieces of code that can be integrated into projects to implement functionality more efficiently without the need to write your own code from scratch.
77
+
78
+ Notably, the Transformers library provides re-usable code for implementing models in common frameworks like PyTorch, TensorFlow and JAX. This re-usable code can be accessed by calling upon functions (also known as methods) within the library.
79
+ What is the Hugging Face Hub?
80
+
81
+ The Hugging Face Hub is a collaboration platform that hosts a huge collection of open-source models and datasets for machine learning, think of it being like Github for ML. The hub facilitates sharing and collaborating by making it easy for you to discover, learn, and interact with useful ML assets from the open-source community. The hub integrates with, and is used in conjunction with the Transformers library, as models deployed using the Transformers library are downloaded from the hub.
82
+ What are Hugging Face Spaces?
83
+
84
+ Spaces from Hugging Face is a service available on the Hugging Face Hub that provides an easy to use GUI for building and deploying web hosted ML demos and apps. The service allows you to quickly build ML demos, upload your own apps to be hosted, or even select a number of pre-configured ML applications to deploy instantly.
85
+
86
+ In the tutorial we’ll be deploying one of the pre-configured ML applications, a JupyterLab notebook, by selecting the corresponding docker container.
87
+ What is a notebook?
88
+
89
+ Notebooks are interactive applications that allow you to write and share live executable code interwoven with complementary narrative text. Notebooks are especially useful for Data Scientists and Machine Learning Engineers as they allow you to experiment with code in realtime and easily review and share the results.
90
+
91
+ Why do I need to select a GPU Space Hardware?
92
+
93
+ By default, our Space comes with a complimentary CPU, which is fine for some applications. However, the many computations required by LLMs benefit significantly from being run in parallel to improve speed, which is something GPUs are great at.
94
+
95
+ It's also important to choose a GPU with enough memory to store the model and providing spare working memory. In our case, an A10G Small with 24GB is enough for Phi-2.
96
+
97
+
98
+ If we are using Transformers, why do we need Pytorch too?
99
+
100
+ Hugging Face is a library that is built on top of other frameworks like Pytorch, Tensorflow and JAX. In this case we are using Transformers with Pytorch and so need to install it to access it’s functionality.
101
+
102
+ Why do I need to import the Class again after installing Transformers?
103
+
104
+ Although Transformers is already installed, the specific Classes within Transformers are not automatically available for use in your environment. Python requires us to explicitly import individual Classes as it helps avoid naming conflicts and ensures that only the necessary parts of a library are loaded into your current working context.
105
+
106
+ What is a tokenizer?
107
+
108
+ A tokenizer is a tool that splits sentences into smaller pieces of text (tokens) and assigns each token a numeric value called an input id. This is needed because our model only understands numbers, so we first must convert (a.k.a encode) the text into a format the model can understand. Each model has it’s own tokenizer vocabulary, it’s important to use the same tokenizer that the model was trained on or it will misinterpret the text.
109
+
110
+ Why do I need to decode?
111
+
112
+ Models only understand numbers, so when we provided our input_ids as vectors it returned an output in the same format. To return those outputs to text we need to reverse the initial encoding we did using the tokenizer.
113
+ Why does the output read like a story?
114
+
115
+ Remember that Phi-2 is a base model that hasn't been instruction tuned for conversational uses, as such it's effectively a massive auto-complete model. Based on your input it is predicting what it thinks is most likely to come next based on all the web pages, books and other content it has seen previously.
116
+
117
+ Congratulations, you've run inference on your very first LLM!
118
+
119
+ I hope that working through this example helped you to better understand the world of open-source ML. If you want to continue your ML learning journey, I recommend the recent Hugging Face course we released in partnership with DeepLearning AI.
120
+ `,
121
+ },
122
+ {
123
+ name: 'Example 3: What is DeepSeek (from CNN)',
124
+ note: 'Pay more attention to the fact that R1 is open-source',
125
+ input: `
126
+ # What is DeepSeek, the Chinese AI startup that shook the tech world?
127
+
128
+ (From CNN) - A surprisingly efficient and powerful Chinese AI model has taken the technology industry by storm. It’s called DeepSeek R1, and it’s rattling nerves on Wall Street.
129
+
130
+ The new AI model was developed by DeepSeek, a startup that was born just a year ago and has somehow managed a breakthrough that famed tech investor Marc Andreessen has called “AI’s Sputnik moment”: R1 can nearly match the capabilities of its far more famous rivals, including OpenAI’s GPT-4, Meta’s Llama and Google’s Gemini — but at a fraction of the cost.
131
+
132
+ The company said it had spent just $5.6 million powering its base AI model, compared with the hundreds of millions, if not billions of dollars US companies spend on their AI technologies. That’s even more shocking when considering that the United States has worked for years to restrict the supply of high-power AI chips to China, citing national security concerns. That means DeepSeek was supposedly able to achieve its low-cost model on relatively under-powered AI chips.
133
+ What is DeepSeek?
134
+
135
+ The company, founded in late 2023 by Chinese hedge fund manager Liang Wenfeng, is one of scores of startups that have popped up in recent years seeking big investment to ride the massive AI wave that has taken the tech industry to new heights.
136
+
137
+ Liang has become the Sam Altman of China — an evangelist for AI technology and investment in new research. His hedge fund, High-Flyer, focuses on AI development.
138
+
139
+ Like other AI startups, including Anthropic and Perplexity, DeepSeek released various competitive AI models over the past year that have captured some industry attention. Its V3 model raised some awareness about the company, although its content restrictions around sensitive topics about the Chinese government and its leadership sparked doubts about its viability as an industry competitor, the Wall Street Journal reported.
140
+
141
+ But R1, which came out of nowhere when it was revealed late last year, launched last week and gained significant attention this week when the company revealed to the Journal its shockingly low cost of operation. And it is open-source, which means other companies can test and build upon the model to improve it.
142
+
143
+ The DeepSeek app has surged on the app store charts, surpassing ChatGPT Monday, and it has been downloaded nearly 2 million times.
144
+ Why is DeepSeek such a big deal?
145
+
146
+ AI is a power-hungry and cost-intensive technology — so much so that America’s most powerful tech leaders are buying up nuclear power companies to provide the necessary electricity for their AI models.
147
+
148
+ Meta last week said it would spend upward of $65 billion this year on AI development. Sam Altman, CEO of OpenAI, last year said the AI industry would need trillions of dollars in investment to support the development of high-in-demand chips needed to power the electricity-hungry data centers that run the sector’s complex models.
149
+
150
+ So the notion that similar capabilities as America’s most powerful AI models can be achieved for such a small fraction of the cost — and on less capable chips — represents a sea change in the industry’s understanding of how much investment is needed in AI. The technology has many skeptics and opponents, but its advocates promise a bright future: AI will advance the global economy into a new era, they argue, making work more efficient and opening up new capabilities across multiple industries that will pave the way for new research and developments.
151
+
152
+ Andreessen, a Trump supporter and co-founder of Silicon Valley venture capital firm Andreessen Horowitz, called DeepSeek “one of the most amazing and impressive breakthroughs I’ve ever seen,” in a post on X.
153
+
154
+ If that potentially world-changing power can be achieved at a significantly reduced cost, it opens up new possibilities — and threats — to the planet.
155
+ What does this mean for America?
156
+
157
+ The United States thought it could sanction its way to dominance in a key technology it believes will help bolster its national security. Just a week before leaving office, former President Joe Biden doubled down on export restrictions on AI computer chips to prevent rivals like China from accessing the advanced technology.
158
+
159
+ But DeepSeek has called into question that notion, and threatened the aura of invincibility surrounding America’s technology industry. America may have bought itself time with restrictions on chip exports, but its AI lead just shrank dramatically despite those actions.
160
+
161
+ DeepSeek may show that turning off access to a key technology doesn’t necessarily mean the United States will win. That’s an important message to President Donald Trump as he pursues his isolationist “America First” policy.
162
+
163
+ Wall Street was alarmed by the development. US stocks were set for a steep selloff Monday morning. Nvidia (NVDA), the leading supplier of AI chips, whose stock more than doubled in each of the past two years, fell 12% in premarket trading. Meta (META) and Alphabet (GOOGL), Google’s parent company, were also down sharply, as were Marvell, Broadcom, Palantir, Oracle and many other tech giants.
164
+ Are we really sure this is a big deal?
165
+
166
+ The industry is taking the company at its word that the cost was so low. No one is really disputing it, but the market freak-out hinges on the truthfulness of a single and relatively unknown company. The company notably didn’t say how much it cost to train its model, leaving out potentially expensive research and development costs. (Still, it probably didn’t spend billions of dollars.)
167
+
168
+ It’s also far too early to count out American tech innovation and leadership. One achievement, albeit a gobsmacking one, may not be enough to counter years of progress in American AI leadership. And a massive customer shift to a Chinese startup is unlikely.
169
+
170
+ “The DeepSeek model rollout is leading investors to question the lead that US companies have and how much is being spent and whether that spending will lead to profits (or overspending),” said Keith Lerner, analyst at Truist. “Ultimately, our view, is the required spend for data and such in AI will be significant, and US companies remain leaders.”
171
+
172
+ Although the cost-saving achievement may be significant, the R1 model is a ChatGPT competitor — a consumer-focused large-language model. It hasn’t yet proven it can handle some of the massively ambitious AI capabilities for industries that — for now — still require tremendous infrastructure investments.
173
+
174
+ “Thanks to its rich talent and capital base, the US remains the most promising ‘home turf’ from which we expect to see the emergence of the first self-improving AI,” said Giuseppe Sette, president of AI market research firm Reflexivity.
175
+ `,
176
+ },
177
+ ];