|
--- |
|
license: apache-2.0 |
|
tags: |
|
- LucaOne |
|
- Biological Foundation Model |
|
- Unified Nucleic Acid and Protein Language Model |
|
- Biology |
|
- AI4Science |
|
- AI4Biology |
|
- Bio |
|
language: |
|
- en |
|
--- |
|
|
|
# LucaGPLM |
|
|
|
LucaGPLM - The LUCA general purpose language model. |
|
|
|
## Installation |
|
|
|
You can install the package from source using pip: |
|
|
|
```bash |
|
pip install tokenizers==0.19.1 |
|
pip install transformers==4.41.2 |
|
pip install lucagplm |
|
``` |
|
|
|
## Usage |
|
|
|
```python |
|
from lucagplm import LucaGPLMModel, LucaGPLMTokenizer |
|
|
|
# Load model |
|
model = LucaGPLMModel.from_pretrained("LucaGroup/LucaOne-default-step17.6M") |
|
tokenizer = LucaGPLMTokenizer.from_pretrained("LucaGroup/LucaOne-default-step17.6M") |
|
|
|
# Example usage |
|
seq = "ATCG" |
|
inputs = tokenizer(seq, seq_type="gene",return_tensors="pt") |
|
outputs = model(**inputs) |
|
|
|
seq = "NSQTA" |
|
inputs = tokenizer(seq, seq_type="prot",return_tensors="pt") |
|
outputs = model(**inputs) |
|
|
|
print(outputs.last_hidden_state.shape) |
|
``` |