This is a sentence-transformer model derived from bert-base-cased. It was tuned on the English MNLI training data.
We used a training script provided by the sentence-transformers library: https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/nli/training_nli_v2.py
Usage (Sentence-Transformers)
Using this model becomes easy when you have sentence-transformers installed:
pip install -U sentence-transformers
Then you can use the model like this:
from sentence_transformers import SentenceTransformer, util
sentences = ["Around 9 million people live in London.", "London is known for its financial district."]
model = SentenceTransformer('kathaem/bert-base-cased-sentence-transformer-mnli-en')
embeddings = model.encode(sentences)
print(embeddings)
Usage (HuggingFace Transformers)
Without sentence-transformers, you can use the model like this: First, you pass your input through the transformer model, then you apply a pooling operation on top of the contextualized word embeddings.
from transformers import AutoTokenizer, AutoModel
import torch
# Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0] #First element of model_output contains all token embeddings
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
sentences = ["Around 9 million people live in London.", "London is known for its financial district."]
# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('kathaem/bert-base-cased-sentence-transformer-mnli-en')
model = AutoModel.from_pretrained('kathaem/bert-base-cased-sentence-transformer-mnli-en')
# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
# Compute token embeddings
with torch.no_grad():
model_output = model(**encoded_input)
# Perform pooling to get sentence embeddings
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
print(sentence_embeddings)
Citation
If you find this model useful in your work, please cite our paper:
@inproceedings{haemmerl-etal-2023-speaking,
title = "Speaking Multiple Languages Affects the Moral Bias of Language Models",
author = {H{\"a}mmerl, Katharina and
Deiseroth, Bjoern and
Schramowski, Patrick and
Libovick{\'y}, Jind{\v{r}}ich and
Rothkopf, Constantin and
Fraser, Alexander and
Kersting, Kristian},
editor = "Rogers, Anna and
Boyd-Graber, Jordan and
Okazaki, Naoaki",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
month = jul,
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.findings-acl.134/",
doi = "10.18653/v1/2023.findings-acl.134",
pages = "2137--2156",
}
- Downloads last month
- 11
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support