first old model

Browse files

Files changed (12) hide show

README.md +217 -3
config.json +44 -0
generation_config.json +8 -0
generic_nel.py +192 -0
handler.py +125 -0
model.safetensors +3 -0
requirements.txt +5 -0
scheduler.pt +3 -0
sentencepiece.bpe.model +3 -0
special_tokens_map.json +51 -0
tokenizer_config.json +55 -0
trainer_state.json +207 -0

README.md CHANGED Viewed

@@ -1,3 +1,217 @@
----
-license: agpl-3.0
----

+---
+library_name: transformers
+language:
+- multilingual
+- af
+- am
+- ar
+- as
+- az
+- be
+- bg
+- bm
+- bn
+- br
+- bs
+- ca
+- cs
+- cy
+- da
+- de
+- el
+- en
+- eo
+- es
+- et
+- eu
+- fa
+- ff
+- fi
+- fr
+- fy
+- ga
+- gd
+- gl
+- gn
+- gu
+- ha
+- he
+- hi
+- hr
+- ht
+- hu
+- hy
+- id
+- ig
+- is
+- it
+- ja
+- jv
+- ka
+- kg
+- kk
+- km
+- kn
+- ko
+- ku
+- ky
+- la
+- lg
+- ln
+- lo
+- lt
+- lv
+- mg
+- mk
+- ml
+- mn
+- mr
+- ms
+- my
+- ne
+- nl
+- no
+- om
+- or
+- pa
+- pl
+- ps
+- pt
+- qu
+- ro
+- ru
+- sa
+- sd
+- si
+- sk
+- sl
+- so
+- sq
+- sr
+- ss
+- su
+- sv
+- sw
+- ta
+- te
+- th
+- ti
+- tl
+- tn
+- tr
+- uk
+- ur
+- uz
+- vi
+- wo
+- xh
+- yo
+- zh
+license: agpl-3.0
+tags:
+- retrieval
+- entity-retrieval
+- named-entity-disambiguation
+- entity-disambiguation
+- named-entity-linking
+- entity-linking
+- text2text-generation
+---
+# Model Card for `impresso-project/nel-mgenre-multilingual`
+The **Impresso multilingual named entity linking (NEL)** model is based on **mGENRE** (multilingual Generative ENtity REtrieval) proposed by [De Cao et al](https://arxiv.org/abs/2103.12528), a sequence-to-sequence architecture for entity disambiguation based on [mBART](https://arxiv.org/abs/2001.08210). It uses **constrained generation** to output entity names mapped to Wikidata/QIDs.
+This model was adapted for historical texts and fine-tuned on the [HIPE-2022 dataset](https://github.com/hipe-eval/HIPE-2022-data), which includes a variety of historical document types and languages.
+## Model Details
+### Model Description
+### Model Description
+- **Developed by:** EPFL from the [Impresso team](https://impresso-project.ch). The project is an interdisciplinary project focused on historical media analysis across languages, time, and modalities. Funded by the Swiss National Science Foundation ([CRSII5_173719](http://p3.snf.ch/project-173719), [CRSII5_213585](https://data.snf.ch/grants/grant/213585)) and the Luxembourg National Research Fund (grant No. 17498891).
+- **Model type:** mBART-based sequence-to-sequence model with constrained beam search for named entity linking
+- **Languages:** Multilingual (100+ languages, optimized for French, German, and English)
+- **License:** [AGPL v3+](https://github.com/impresso/impresso-pyindexation/blob/master/LICENSE)
+- **Finetuned from:** [`facebook/mgenre-wiki`](https://huggingface.co/facebook/mgenre-wiki)
+-
+### Model Architecture
+- **Architecture:** mBART-based seq2seq with constrained beam search
+## Training Details
+### Training Data
+The model was trained on the following datasets:
+| Dataset alias | README | Document type | Languages |  Suitable for | Project | License |
+|---------|---------|---------------|-----------| ---------------|---------------| ---------------|
+| ajmc       | [link](documentation/README-ajmc.md)  | classical commentaries | de, fr, en | NERC-Coarse, NERC-Fine, EL | [AjMC](https://mromanello.github.io/ajax-multi-commentary/) | [![License: CC BY 4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/) |
+| hipe2020   | [link](documentation/README-hipe2020.md)| historical newspapers | de, fr, en | NERC-Coarse, NERC-Fine, EL | [CLEF-HIPE-2020](https://impresso.github.io/CLEF-HIPE-2020)| [![License: CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC_BY--NC--SA_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-sa/4.0/)|
+| topres19th | [link](documentation/README-topres19th.md) | historical newspapers | en | NERC-Coarse, EL |[Living with Machines](https://livingwithmachines.ac.uk/) | [![License: CC BY-NC-SA 4.0](https://img.shields.io/badge/License-CC_BY--NC--SA_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc-sa/4.0/)|
+| newseye    | [link](documentation/README-newseye.md)|  historical newspapers | de, fi, fr, sv | NERC-Coarse, NERC-Fine, EL |  [NewsEye](https://www.newseye.eu/) |  [![License: CC BY 4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)|
+| sonar      | [link](documentation/README-sonar.md) | historical newspapers  | de | NERC-Coarse, EL |  [SoNAR](https://sonar.fh-potsdam.de/)  | [![License: CC BY 4.0](https://img.shields.io/badge/License-CC_BY_4.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)|
+## How to Use
+```python
+from transformers import AutoTokenizer, pipeline
+NEL_MODEL_NAME = "impresso-project/nel-mgenre-multilingual"
+nel_tokenizer = AutoTokenizer.from_pretrained(NEL_MODEL_NAME)
+nel_pipeline = pipeline("generic-nel", model=NEL_MODEL_NAME,
+                        tokenizer=nel_tokenizer,
+                        trust_remote_code=True,
+                        device='cpu')
+sentence = "Le 0ctobre 1894, [START] Dreyfvs [END] est arrêté à Paris, accusé d'espionnage pour l'Allemagne — un événement qui déch1ra la société fr4nçaise pendant des années."
+print(nel_pipeline(sentence))
+```
+### Output Format
+```python
+[
+    {
+        'surface': 'Dreyfvs',
+        'wkd_id': 'Q171826',
+        'wkpedia_pagename': 'Alfred Dreyfus',
+        'wkpedia_url': 'https://fr.wikipedia.org/wiki/Alfred_Dreyfus',
+        'type': 'UNK',
+        'confidence_nel': 99.98,
+        'lOffset': 24,
+        'rOffset': 33}]
+```
+The type of the entity is `UNK` because the model was not trained on the entity type. The `confidence_nel` score indicates the model's confidence in the prediction.
+## Use Cases
+- Entity disambiguation in noisy OCR settings
+- Linking historical names to modern Wikidata entities
+- Assisting downstream event extraction and biography generation from historical archives
+## Limitations
+- Sensitive to tokenisation and malformed spans
+- Accuracy degrades on non-Wikidata entities or in highly ambiguous contexts
+- Focused on historical entity mentions — performance may vary on modern texts
+## Environmental Impact
+- **Hardware:** 1x A100 (80GB) for finetuning
+- **Training time:** ~12 hours
+- **Estimated CO₂ Emissions:** ~2.3 kg CO₂eq
+## Contact
+- Website: [https://impresso-project.ch](https://impresso-project.ch)
+<p align="center">
+  <img src="https://github.com/impresso/impresso.github.io/blob/master/assets/images/3x1--Yellow-Impresso-Black-on-White--transparent.png?raw=true" width="300" alt="Impresso Logo"/>
+</p>

config.json ADDED Viewed

	@@ -0,0 +1,44 @@

+{
+  "_name_or_path": "facebook/mgenre-wiki",
+  "activation_dropout": 0.0,
+  "activation_function": "gelu",
+  "architectures": [
+    "MBartForConditionalGeneration"
+  ],
+  "custom_pipelines": {
+    "generic-nel": {
+      "impl": "generic_nel.NelPipeline",
+      "pt": [
+        "MBartForConditionalGeneration"
+      ],
+      "tf": []
+    }
+  },
+  "attention_dropout": 0.0,
+  "bos_token_id": 0,
+  "classifier_dropout": 0.0,
+  "d_model": 1024,
+  "decoder_attention_heads": 16,
+  "decoder_ffn_dim": 4096,
+  "decoder_layerdrop": 0.0,
+  "decoder_layers": 12,
+  "decoder_start_token_id": 2,
+  "dropout": 0.1,
+  "encoder_attention_heads": 16,
+  "encoder_ffn_dim": 4096,
+  "encoder_layerdrop": 0.0,
+  "encoder_layers": 12,
+  "eos_token_id": 2,
+  "forced_eos_token_id": 2,
+  "init_std": 0.02,
+  "is_encoder_decoder": true,
+  "max_position_embeddings": 1024,
+  "model_type": "mbart",
+  "num_hidden_layers": 12,
+  "pad_token_id": 1,
+  "scale_embedding": true,
+  "torch_dtype": "float32",
+  "transformers_version": "4.31.0",
+  "use_cache": true,
+  "vocab_size": 256001
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "bos_token_id": 0,
+  "decoder_start_token_id": 2,
+  "eos_token_id": 2,
+  "forced_eos_token_id": 2,
+  "pad_token_id": 1,
+  "transformers_version": "4.46.0.dev0"
+}

generic_nel.py ADDED Viewed

	@@ -0,0 +1,192 @@

+from transformers import Pipeline
+import nltk
+import requests
+import torch
+nltk.download("averaged_perceptron_tagger")
+nltk.download("averaged_perceptron_tagger_eng")
+NEL_MODEL = "nel-mgenre-multilingual"
+def get_wikipedia_page_props(input_str: str):
+    """
+    Retrieves the QID for a given Wikipedia page name from the specified language Wikipedia.
+    If the request fails, it falls back to using the OpenRefine Wikidata API.
+    Args:
+        input_str (str): The input string in the format "page_name >> language".
+    Returns:
+        str: The QID or "NIL" if the QID is not found.
+    """
+    # print(f"Input string: {input_str}")
+    if ">>" not in input_str:
+        page_name = input_str
+        language = "en"
+        print(
+            f"<< was not found in {input_str} so we are checking with these values: Page name: {page_name}, Language: {language}"
+        )
+    else:
+        # Preprocess the input string
+        try:
+            page_name, language = input_str.split(">>")
+            page_name = page_name.strip()
+            language = language.strip()
+        except:
+            page_name = input_str
+            language = "en"
+            print(
+                f"<< was not found in {input_str} so we are checking with these values: Page name: {page_name}, Language: {language}"
+            )
+    wikipedia_url = f"https://{language}.wikipedia.org/w/api.php"
+    wikipedia_params = {
+        "action": "query",
+        "prop": "pageprops",
+        "format": "json",
+        "titles": page_name,
+    }
+    qid = "NIL"
+    try:
+        # Attempt to fetch from Wikipedia API
+        response = requests.get(wikipedia_url, params=wikipedia_params)
+        response.raise_for_status()
+        data = response.json()
+        if "pages" in data["query"]:
+            page_id = list(data["query"]["pages"].keys())[0]
+            if "pageprops" in data["query"]["pages"][page_id]:
+                page_props = data["query"]["pages"][page_id]["pageprops"]
+                if "wikibase_item" in page_props:
+                    # print(page_props["wikibase_item"], language)
+                    return page_props["wikibase_item"], language
+                else:
+                    return qid, language
+            else:
+                return qid, language
+        else:
+            return qid, language
+    except Exception as e:
+        return qid, language
+def get_wikipedia_title(qid, language="en"):
+    url = f"https://www.wikidata.org/w/api.php"
+    params = {
+        "action": "wbgetentities",
+        "format": "json",
+        "ids": qid,
+        "props": "sitelinks/urls",
+        "sitefilter": f"{language}wiki",
+    }
+    response = requests.get(url, params=params)
+    try:
+        response.raise_for_status()  # Raise an HTTPError if the response was not 2xx
+        data = response.json()
+    except requests.exceptions.RequestException as e:
+        print(f"HTTP error: {e}")
+        return "NIL", "None"
+    except ValueError as e:  # Catch JSON decode errors
+        print(f"Invalid JSON response: {response.text}")
+        return "NIL", "None"
+    try:
+        title = data["entities"][qid]["sitelinks"][f"{language}wiki"]["title"]
+        url = data["entities"][qid]["sitelinks"][f"{language}wiki"]["url"]
+        return title, url
+    except KeyError:
+        return "NIL", "None"
+class NelPipeline(Pipeline):
+    def _sanitize_parameters(self, **kwargs):
+        preprocess_kwargs = {}
+        if "text" in kwargs:
+            preprocess_kwargs["text"] = kwargs["text"]
+        return preprocess_kwargs, {}, {}
+    def preprocess(self, text, **kwargs):
+        # Extract the entity between [START] and [END]
+        start_token = "[START]"
+        end_token = "[END]"
+        if start_token in text and end_token in text:
+            start_idx = text.index(start_token) + len(start_token)
+            end_idx = text.index(end_token)
+            enclosed_entity = text[start_idx:end_idx].strip()
+            lOffset = start_idx  # left offset (start of the entity)
+            rOffset = end_idx  # right offset (end of the entity)
+        else:
+            enclosed_entity = None
+            lOffset = None
+            rOffset = None
+        # Generate predictions using the model
+        outputs = self.model.generate(
+            **self.tokenizer([text], return_tensors="pt").to(self.device),
+            num_beams=1,
+            num_return_sequences=1,
+            max_new_tokens=30,
+            return_dict_in_generate=True,
+            output_scores=True,
+        )
+        # Decode the predictions into readable text
+        wikipedia_prediction = self.tokenizer.batch_decode(
+            outputs.sequences, skip_special_tokens=True
+        )[0]
+        # Process the scores for each token
+        transition_scores = self.model.compute_transition_scores(
+            outputs.sequences, outputs.scores, normalize_logits=True
+        )
+        log_prob_sum = sum(transition_scores[0])
+        # Calculate the probability for the entire sequence by exponentiating the sum of log probabilities
+        sequence_confidence = torch.exp(log_prob_sum)
+        percentage = sequence_confidence.cpu().numpy() * 100.0
+        # print(wikipedia_prediction, enclosed_entity, lOffset, rOffset, percentage)
+        # Return the predictions along with the extracted entity, lOffset, and rOffset
+        return wikipedia_prediction, enclosed_entity, lOffset, rOffset, percentage
+    def _forward(self, inputs):
+        return inputs
+    def postprocess(self, outputs, **kwargs):
+        """
+        Postprocess the outputs of the model
+        :param outputs:
+        :param kwargs:
+        :return:
+        """
+        wikipedia_prediction, enclosed_entity, lOffset, rOffset, percentage = outputs
+        qid, language = get_wikipedia_page_props(wikipedia_prediction)
+        title, url = get_wikipedia_title(qid, language=language)
+        # if title is "NIL":
+        #     title = wikipedia_prediction
+        percentage = round(percentage, 2)
+        results = [
+            {
+                # "id": f"{lOffset}:{rOffset}:{enclosed_entity}:{NEL_MODEL}",
+                "surface": enclosed_entity,
+                "wkd_id": qid,
+                "wkpedia_pagename": title,
+                "wkpedia_url": url,
+                "type": "UNK",
+                "confidence_nel": percentage,
+                "lOffset": lOffset,
+                "rOffset": rOffset,
+            }
+        ]
+        return results

handler.py ADDED Viewed

	@@ -0,0 +1,125 @@

+import torch
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+from typing import List, Dict, Any
+import requests
+import nltk
+from transformers import pipeline
+# Download required NLTK models
+nltk.download("averaged_perceptron_tagger")
+nltk.download("averaged_perceptron_tagger_eng")
+# Define your model name
+NEL_MODEL = "nel-mgenre-multilingual"
+def get_wikipedia_page_props(input_str: str):
+    if ">>" not in input_str:
+        page_name = input_str
+        language = "en"
+    else:
+        try:
+            page_name, language = input_str.split(">>")
+            page_name = page_name.strip()
+            language = language.strip()
+        except:
+            page_name = input_str
+            language = "en"
+    wikipedia_url = f"https://{language}.wikipedia.org/w/api.php"
+    wikipedia_params = {
+        "action": "query",
+        "prop": "pageprops",
+        "format": "json",
+        "titles": page_name,
+    }
+    qid = "NIL"
+    try:
+        response = requests.get(wikipedia_url, params=wikipedia_params)
+        response.raise_for_status()
+        data = response.json()
+        if "pages" in data["query"]:
+            page_id = list(data["query"]["pages"].keys())[0]
+            if "pageprops" in data["query"]["pages"][page_id]:
+                page_props = data["query"]["pages"][page_id]["pageprops"]
+                if "wikibase_item" in page_props:
+                    return page_props["wikibase_item"], language
+                else:
+                    return qid, language
+            else:
+                return qid, language
+        else:
+            return qid, language
+    except Exception as e:
+        return qid, language
+def get_wikipedia_title(qid, language="en"):
+    url = f"https://www.wikidata.org/w/api.php"
+    params = {
+        "action": "wbgetentities",
+        "format": "json",
+        "ids": qid,
+        "props": "sitelinks/urls",
+        "sitefilter": f"{language}wiki",
+    }
+    response = requests.get(url, params=params)
+    try:
+        response.raise_for_status()
+        data = response.json()
+    except requests.exceptions.RequestException as e:
+        return "NIL", "None"
+    except ValueError as e:
+        return "NIL", "None"
+    try:
+        title = data["entities"][qid]["sitelinks"][f"{language}wiki"]["title"]
+        url = data["entities"][qid]["sitelinks"][f"{language}wiki"]["url"]
+        return title, url
+    except KeyError:
+        return "NIL", "None"
+class NelPipeline:
+    def __init__(self, model_dir: str = "."):
+        self.model_name = NEL_MODEL
+        print(f"Loading {model_dir}")
+        self.device = "cuda" if torch.cuda.is_available() else "cpu"
+        self.tokenizer = AutoTokenizer.from_pretrained(model_dir)
+        self.model = pipeline("generic-nel", model="impresso-project/nel-mgenre-multilingual",
+                        tokenizer=self.tokenizer,
+                        trust_remote_code=True,
+                        device=self.device)
+    def preprocess(self, text: str):
+        linked_entity = self.model(text)
+        return linked_entity
+    def postprocess(self, outputs):
+        linked_entity = outputs
+        return linked_entity
+class EndpointHandler:
+    def __init__(self, path: str = None):
+        # Initialize the NelPipeline with the specified model
+        self.pipeline = NelPipeline("impresso-project/nel-mgenre-multilingual")
+    def __call__(self, data: Dict[str, Any]) -> List[Dict[str, Any]]:
+        # Process incoming data
+        inputs = data.get("inputs", "")
+        if not isinstance(inputs, str):
+            raise ValueError("Input must be a string.")
+        # Preprocess, forward, and postprocess
+        preprocessed = self.pipeline.preprocess(inputs)
+        results = self.pipeline.postprocess(preprocessed)
+        return results

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f8cfb5bf9aa521336b586ae37eecac31ed7e86327a1be1802d32551472988633
+size 2468961388

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+nltk
+torch
+transformers
+requests
+typing

scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bc00424e1006d8552c992d3bde1acf8d4282909093c4e18a2112a6e6b087b217
+size 1064

sentencepiece.bpe.model ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6ee4dc054a17c18fe81f76c0b1cda00e9fc1cfd9e0f1a16cb6d77009e2076653
+size 4870365

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,51 @@

+{
+  "bos_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "cls_token": {
+    "content": "<s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "mask_token": {
+    "content": "<mask>",
+    "lstrip": true,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "sep_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<unk>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,55 @@

+{
+  "added_tokens_decoder": {
+    "0": {
+      "content": "<s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "3": {
+      "content": "<unk>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "256001": {
+      "content": "<mask>",
+      "lstrip": true,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<s>",
+  "clean_up_tokenization_spaces": false,
+  "cls_token": "<s>",
+  "eos_token": "</s>",
+  "mask_token": "<mask>",
+  "model_max_length": 512,
+  "pad_token": "<pad>",
+  "sep_token": "</s>",
+  "sp_model_kwargs": {},
+  "tokenizer_class": "XLMRobertaTokenizer",
+  "unk_token": "<unk>"
+}

trainer_state.json ADDED Viewed

	@@ -0,0 +1,207 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 10.0,
+  "eval_steps": 500,
+  "global_step": 6480,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.7716049382716049,
+      "grad_norm": 1.0553990602493286,
+      "learning_rate": 1.846913580246914e-05,
+      "loss": 0.9346,
+      "step": 500
+    },
+    {
+      "epoch": 1.0,
+      "eval_bleu": 0.0,
+      "eval_gen_len": 10.0959,
+      "eval_loss": 0.15309424698352814,
+      "eval_runtime": 7.8759,
+      "eval_samples_per_second": 154.903,
+      "eval_steps_per_second": 2.539,
+      "step": 648
+    },
+    {
+      "epoch": 1.5432098765432098,
+      "grad_norm": 0.7297214269638062,
+      "learning_rate": 1.6925925925925926e-05,
+      "loss": 0.0763,
+      "step": 1000
+    },
+    {
+      "epoch": 2.0,
+      "eval_bleu": 0.0,
+      "eval_gen_len": 10.159,
+      "eval_loss": 0.16104426980018616,
+      "eval_runtime": 7.6405,
+      "eval_samples_per_second": 159.674,
+      "eval_steps_per_second": 2.618,
+      "step": 1296
+    },
+    {
+      "epoch": 2.314814814814815,
+      "grad_norm": 0.44237253069877625,
+      "learning_rate": 1.5382716049382717e-05,
+      "loss": 0.0446,
+      "step": 1500
+    },
+    {
+      "epoch": 3.0,
+      "eval_bleu": 0.0,
+      "eval_gen_len": 10.0426,
+      "eval_loss": 0.17489495873451233,
+      "eval_runtime": 7.8385,
+      "eval_samples_per_second": 155.642,
+      "eval_steps_per_second": 2.552,
+      "step": 1944
+    },
+    {
+      "epoch": 3.0864197530864197,
+      "grad_norm": 0.3801327049732208,
+      "learning_rate": 1.3839506172839507e-05,
+      "loss": 0.0275,
+      "step": 2000
+    },
+    {
+      "epoch": 3.8580246913580245,
+      "grad_norm": 0.29495081305503845,
+      "learning_rate": 1.2296296296296298e-05,
+      "loss": 0.0162,
+      "step": 2500
+    },
+    {
+      "epoch": 4.0,
+      "eval_bleu": 0.0,
+      "eval_gen_len": 10.1139,
+      "eval_loss": 0.1843736320734024,
+      "eval_runtime": 7.7649,
+      "eval_samples_per_second": 157.118,
+      "eval_steps_per_second": 2.576,
+      "step": 2592
+    },
+    {
+      "epoch": 4.62962962962963,
+      "grad_norm": 0.29735738039016724,
+      "learning_rate": 1.0753086419753086e-05,
+      "loss": 0.0106,
+      "step": 3000
+    },
+    {
+      "epoch": 5.0,
+      "eval_bleu": 0.0,
+      "eval_gen_len": 9.9508,
+      "eval_loss": 0.19341909885406494,
+      "eval_runtime": 7.6995,
+      "eval_samples_per_second": 158.452,
+      "eval_steps_per_second": 2.598,
+      "step": 3240
+    },
+    {
+      "epoch": 5.401234567901234,
+      "grad_norm": 0.07027166336774826,
+      "learning_rate": 9.209876543209877e-06,
+      "loss": 0.0076,
+      "step": 3500
+    },
+    {
+      "epoch": 6.0,
+      "eval_bleu": 0.0,
+      "eval_gen_len": 9.9377,
+      "eval_loss": 0.20017552375793457,
+      "eval_runtime": 7.6996,
+      "eval_samples_per_second": 158.45,
+      "eval_steps_per_second": 2.598,
+      "step": 3888
+    },
+    {
+      "epoch": 6.172839506172839,
+      "grad_norm": 0.1504916250705719,
+      "learning_rate": 7.666666666666667e-06,
+      "loss": 0.0059,
+      "step": 4000
+    },
+    {
+      "epoch": 6.944444444444445,
+      "grad_norm": 0.24264627695083618,
+      "learning_rate": 6.123456790123458e-06,
+      "loss": 0.0043,
+      "step": 4500
+    },
+    {
+      "epoch": 7.0,
+      "eval_bleu": 0.0,
+      "eval_gen_len": 10.0279,
+      "eval_loss": 0.20386986434459686,
+      "eval_runtime": 7.7944,
+      "eval_samples_per_second": 156.523,
+      "eval_steps_per_second": 2.566,
+      "step": 4536
+    },
+    {
+      "epoch": 7.716049382716049,
+      "grad_norm": 0.08363181352615356,
+      "learning_rate": 4.580246913580247e-06,
+      "loss": 0.0035,
+      "step": 5000
+    },
+    {
+      "epoch": 8.0,
+      "eval_bleu": 0.0,
+      "eval_gen_len": 10.1566,
+      "eval_loss": 0.20531675219535828,
+      "eval_runtime": 7.6989,
+      "eval_samples_per_second": 158.465,
+      "eval_steps_per_second": 2.598,
+      "step": 5184
+    },
+    {
+      "epoch": 8.487654320987655,
+      "grad_norm": 0.13225023448467255,
+      "learning_rate": 3.0370370370370372e-06,
+      "loss": 0.0029,
+      "step": 5500
+    },
+    {
+      "epoch": 9.0,
+      "eval_bleu": 0.0,
+      "eval_gen_len": 10.0689,
+      "eval_loss": 0.20702147483825684,
+      "eval_runtime": 7.6619,
+      "eval_samples_per_second": 159.23,
+      "eval_steps_per_second": 2.61,
+      "step": 5832
+    },
+    {
+      "epoch": 9.25925925925926,
+      "grad_norm": 0.022540247067809105,
+      "learning_rate": 1.4938271604938272e-06,
+      "loss": 0.003,
+      "step": 6000
+    }
+  ],
+  "logging_steps": 500,
+  "max_steps": 6480,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 10,
+  "save_steps": 1000,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 4.288315707563704e+17,
+  "train_batch_size": 64,
+  "trial_name": null,
+  "trial_params": null
+}