File size: 2,667 Bytes
3ff212d 72f9d99 3ff212d 500fb60 6049bd5 500fb60 f1ea85f 500fb60 6049bd5 500fb60 fad950c 500fb60 6049bd5 500fb60 6049bd5 500fb60 6049bd5 500fb60 6049bd5 500fb60 6049bd5 500fb60 6049bd5 fad950c 6049bd5 fad950c 500fb60 6049bd5 500fb60 6049bd5 500fb60 f1ea85f 500fb60 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
---
license: llama2
library_name: peft
---
# RepLLaMA-7B-Document
[Fine-Tuning LLaMA for Multi-Stage Text Retrieval](https://arxiv.org/abs/2310.08319).
Xueguang Ma, Liang Wang, Nan Yang, Furu Wei, Jimmy Lin, arXiv 2023
This model is fine-tuned from LLaMA-2-7B using LoRA and the embedding size is 4096, the model take input length upto 2048 tokens.
## Training Data
The model is fine-tuned on the training split of [MS MARCO Document Ranking](https://microsoft.github.io/msmarco/Datasets) datasets for 1 epoch.
Please check our paper for details.
## Usage
Below is an example to encode a query and a document, and then compute their similarity using their embedding.
```python
import torch
from transformers import AutoModel, AutoTokenizer
from peft import PeftModel, PeftConfig
def get_model(peft_model_name):
config = PeftConfig.from_pretrained(peft_model_name)
base_model = AutoModel.from_pretrained(config.base_model_name_or_path)
model = PeftModel.from_pretrained(base_model, peft_model_name)
model = model.merge_and_unload()
model.eval()
return model
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf')
model = get_model('castorini/repllama-v1-7b-lora-doc')
# Define query and document inputs
query = "What is llama?"
title = "Llama"
url = "https://en.wikipedia.org/wiki/Llama"
document = "The llama is a domesticated South American camelid, widely used as a meat and pack animal by Andean cultures since the pre-Columbian era."
query_input = tokenizer(f'query: {query}</s>', return_tensors='pt')
document_input = tokenizer(f'passage: {url} {title} {document}</s>', return_tensors='pt')
# Run the model forward to compute embeddings and query-document similarity score
with torch.no_grad():
# compute query embedding
query_outputs = model(**query_input)
query_embedding = query_outputs.last_hidden_state[0][-1]
query_embedding = torch.nn.functional.normalize(query_embedding, p=2, dim=0)
# compute document embedding
document_outputs = model(**document_input)
document_embeddings = document_outputs.last_hidden_state[0][-1]
document_embeddings = torch.nn.functional.normalize(document_embeddings, p=2, dim=0)
# compute similarity score
score = torch.dot(query_embedding, document_embeddings)
print(score)
```
## Citation
If you find our paper or models helpful, please consider cite as follows:
```
@article{rankllama,
title={Fine-Tuning LLaMA for Multi-Stage Text Retrieval},
author={Xueguang Ma and Liang Wang and Nan Yang and Furu Wei and Jimmy Lin},
year={2023},
journal={arXiv:2310.08319},
}
``` |