Translation Small T5

Trained on 2048 context length, able to translate malay, english, javanese, banjarese and indonesian to target language. It also able to maintain the text structure as it is and only translate necessary texts, eg, programming code.

Added more coding translation dataset and do heavy postfilter.

how-to

from transformers import T5ForConditionalGeneration, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
    'mesolitica/translation-t5-small-standard-bahasa-cased-code',
    use_fast=False
)
model = T5ForConditionalGeneration.from_pretrained(
    'mesolitica/translation-t5-small-standard-bahasa-cased-code '
)

answer = """
First, let's start with implementing the `is_number` function, which checks whether the given Variant is number type or not. It checks the type of the Variant and returns whether it is an integer or a real number.

```cpp
#include <cmath>
#include <string>

namespace godot {

// ... (other Variant function declarations here)

class Variant {

// ... (other Variant function definitions here)

public:
  bool is_number() const {
    switch (get_type()) {
      case INT:
      case REAL:
        return true;
      default:
        return false;
    }
  }

// ... (other Variant function definitions here)

};

} // namespace godot

Next, we can create the apply_operation function, which takes a Variant object, an operation (either "+" or "-"), and a Variant operand as the input, applying the given operation to the original Variant object and the operand if the original Variant object is a number. If the Variant object is not a number, the function should throw an exception.

#include <stdexcept>
#include <string>

// ... (other Variant and function declarations here)

Variant apply_operation(Variant a, const std::string& operation, Variant b) {
  if (!a.is_number()) {
    throw std::invalid_argument("The first Variant is not a number.");
  }

  if (operation != "+" && operation != "-") {
    throw std::invalid_argument("Invalid operation. Supported operations are '+' and '-'.");
  }

  if (a.get_type() == INT) {
    if (b.is_number()) {
      if (b.get_type() == INT) {
        a = a.operator int() + b.operator int();
      } else { // REAL
        a = a.operator int() + b.operator double();
      }
    } else {
      throw std::invalid_argument("The second Variant is not a number.");
    }
  } else { // REAL
    if (b.is_number()) {
      if (b.get_type() == INT) {
        a = a.operator double() + b.operator int();
      } else { // REAL
        a = a.operator double() + b.operator double();
      }
    } else {
      throw std::invalid_argument("The second Variant is not a number.");
    }
  }

  if (operation == "-") {
    a = -a;
  }

  return a;
}

"""

input_ids = tokenizer.encode(f'terjemah ke Melayu: {answer.strip()}', return_tensors = 'pt').cuda() outputs = model.generate(input_ids, max_length = 512) outputs = [o for o in outputs[0] if o not in [0, 1, 2]] print(tokenizer.decode(outputs, spaces_between_special_tokens = False, skip_special_tokens = False))


Pertama, mari kita mulakan dengan melaksanakan fungsi is_number, yang memeriksa sama ada Variant yang diberikan adalah jenis nombor atau tidak. Ia memeriksa jenis Variant dan mengembalikan sama ada ia adalah integer atau nombor sebenar.

#include <cmath>
#include <string>

namespace godot {

//... (deklarasi fungsi Variant lain di sini)

class Variant {

//... (definisi fungsi Variant lain di sini)

public:
  bool is_number() const {
    switch (get_type()) {
      case INT:
      case REAL:
        return true;
      default:
        return false;
    }
  }

//... (definisi fungsi Variant lain di sini)

};

} // namespace godot

Seterusnya, kita boleh membuat fungsi apply_operation, yang mengambil objek Variant, operasi (sama ada "+" atau "-"), dan operand Variant sebagai input, menerapkan operasi yang diberikan ke objek Variant asal dan operand jika objek Variant asal adalah nombor. Jika objek Variant bukan nombor, fungsi harus melemparkan pengecualian.

#include <stdexcept>
#include <string>

// 
import torch
Downloads last month
1
Safetensors
Model size
60.5M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support