somosnlp-hackathon-2022
/

poem-gen-spanish-t5-small

@@ -1,6 +1,5 @@
 ---
 license: mit
-language: es
 tags:
 - generated_from_trainer
 model-index:
@@ -8,74 +7,26 @@ model-index:
   results: []
 ---
-# poem-gen-spanish-t5-small
-This model is a fine-tuned version of [flax-community/spanish-t5-small](https://huggingface.co/flax-community/spanish-t5-small) on the [Spanish Poetry Dataset](https://www.kaggle.com/andreamorgar/spanish-poetry-dataset/version/1) dataset.
-The model was created during the [First Spanish Hackathon](https://somosnlp.org/hackathon) organized by [Somos NLP](https://somosnlp.org/).
-The team who participated was composed by:
-- 🇨🇺 [Alberto Carmona Barthelemy](https://huggingface.co/milyiyo)
-- 🇪🇸 [Andrea Morales Garzón](https://huggingface.co/andreamorgar)
-- 🇨🇴 [Jorge Henao](https://huggingface.co/jorge-henao)
-- 🇮🇳 [Drishti Sharma](https://huggingface.co/DrishtiSharma)
 It achieves the following results on the evaluation set:
-- Loss: 2.8586
-- Perplexity: 17.43
 ## Model description
-The model was trained to generate spanish poems attending to some parameters like style, sentiment, words to include and starting phrase.
-Example:
-```
-poema:
-  estilo: Pablo Neruda &&
-  sentimiento: positivo &&
-  palabras: cielo, luna, mar &&
-  texto: Todos fueron a verle pasar
-```
-### How to use
-You can use this model directly with a pipeline for masked language modeling:
-```python
-from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
-model_name = 'hackathon-pln-es/poem-gen-spanish-t5-small'
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
-author, sentiment, word, start_text = 'Pablo Neruda', 'positivo', 'cielo', 'Todos fueron a la plaza'
-input_text = f"""poema: estilo: {author} && sentimiento: {sentiment} && palabras: {word} && texto: {start_text} """
-inputs = tokenizer(input_text, return_tensors="pt")
-outputs = model.generate(inputs["input_ids"],
-                         do_sample = True,
-                         max_length = 30,
-                         repetition_penalty = 20.0,
-                         top_k = 50,
-                         top_p = 0.92)
-detok_outputs = [tokenizer.decode(x, skip_special_tokens=True) for x in outputs]
-res = detok_outputs[0]
-```
 ## Training and evaluation data
-The original dataset has the columns `author`, `content` and `title`.
-For each poem we generate new examples:
-- content: *line_i* , generated: *line_i+1*
-- content: *concatenate(line_i, line_i+1)* , generated: *line_i+2*
-- content: *concatenate(line_i, line_i+1, line_i+2)* , generated: *line_i+3*
-The resulting dataset has the columns `author`, `content`, `title` and `generated`.
-For each example we compute the sentiment of the generated column and the nouns. In the case of sentiment, we used the model `mrm8488/electricidad-small-finetuned-restaurant-sentiment-analysis` and for nouns extraction we used spaCy.
 ## Training procedure
@@ -94,14 +45,14 @@ The following hyperparameters were used during training:
 | Training Loss | Epoch | Step   | Validation Loss |
 |:-------------:|:-----:|:------:|:---------------:|
-| 3.1354        | 0.73  | 30000  | 3.0147          |
-| 2.9761        | 1.46  | 60000  | 2.9498          |
-| 2.897         | 2.19  | 90000  | 2.9019          |
-| 2.8292        | 2.93  | 120000 | 2.8792          |
-| 2.7774        | 3.66  | 150000 | 2.8738          |
-| 2.741         | 4.39  | 180000 | 2.8634          |
-| 2.7128        | 5.12  | 210000 | 2.8666          |
-| 2.7108        | 5.85  | 240000 | 2.8595          |
 ### Framework versions

 ---
 license: mit
 tags:
 - generated_from_trainer
 model-index:
   results: []
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# poem-gen-spanish-t5-small
+This model is a fine-tuned version of [hackathon-pln-es/poem-gen-spanish-t5-small](https://huggingface.co/hackathon-pln-es/poem-gen-spanish-t5-small) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 2.8723
 ## Model description
+More information needed
+## Intended uses & limitations
+More information needed
 ## Training and evaluation data
+More information needed
 ## Training procedure
 | Training Loss | Epoch | Step   | Validation Loss |
 |:-------------:|:-----:|:------:|:---------------:|
+| 2.7082        | 0.73  | 30000  | 2.8878          |
+| 2.6251        | 1.46  | 60000  | 2.8940          |
+| 2.5796        | 2.19  | 90000  | 2.8853          |
+| 2.5556        | 2.93  | 120000 | 2.8749          |
+| 2.527         | 3.66  | 150000 | 2.8850          |
+| 2.5024        | 4.39  | 180000 | 2.8760          |
+| 2.4887        | 5.12  | 210000 | 2.8749          |
+| 2.4808        | 5.85  | 240000 | 2.8707          |
 ### Framework versions