## facebook/tart-full-flan-t5-xl

`facebook/tart-full-flan-t5-xl` is a multi-task cross-encoder model trained via instruction-tuning on approximately 40 retrieval tasks, initialized with [google/flan-t5-xl](https://huggingface.co/google/flan-t5-xl).

### Installation
```
git clone https://github.com/facebookresearch/tart
pip install -r requirements.txt
cd tart/TART
```

TART-full can be loaded through our customized EncT5 model. 
```python
from src.modeling_enc_t5 import EncT5ForSequenceClassification
from src.tokenization_enc_t5 import EncT5Tokenizer
import torch
import torch.nn.functional as F

# load TART full and tokenizer
model = EncT5ForSequenceClassification.from_pretrained("tart_full_flan_t5_xl")
tokenizer =  EncT5Tokenizer.from_pretrained("tart_full_flan_t5_xl")
model.eval()

q = "What is the population of Tokyo?"
in_answer = "retrieve a passage that answers this question from Wikipedia"

p_1 = "The population of Japan's capital, Tokyo, dropped by about 48,600 people to just under 14 million at the start of 2022."
p_2 = "Tokyo, officially the Tokyo Metropolis (東京都, Tōkyō-to), is the capital and largest city of Japan."

# 1. TART-full can identify more relevant paragraph. 
features = tokenizer(['{0} [SEP] {1}'.format(in_answer, q), '{0} [SEP] {1}'.format(in_answer, q)], [p_1, p_2], padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
    scores = model(**features).logits
    normalized_scores = [float(score[1]) for score in F.softmax(scores, dim=1)]
print([p_1, p_2]np.argmax(normalized_scores)) # "The population of Japan's capital, Tokyo, dropped by about 48,600 people to just under 14 million."

# 2. TART-full can identify the document that is more relevant AND follows instructions.
in_sim = "You need to find duplicated questions in Wiki forum. Could you find a question that is similar to this question"
q_1 = "How many people live in Tokyo?"
features = tokenizer(['{0} [SEP] {1}'.format(in_sim, q), '{0} [SEP] {1}'.format(in_sim, q)], [p, q_1], padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
    scores = model(**features).logits
    normalized_scores = [float(score[1]) for score in F.softmax(scores, dim=1)]

print([p, q_1]np.argmax(normalized_scores)) #  "How many people live in Tokyo?"
```