EgorKim
/

llm-course-hw1

Text Generation

Model card Files Files and versions Community

llm-course-hw1 / README.md

EgorKim's picture

Update README.md

eee9feb verified 3 months ago

|

history blame contribute delete

1.25 kB

	---
	license: mit
	datasets:
	- IgorVolochay/russian_jokes
	language:
	- ru
	pipeline_tag: text-generation
	---
	Маленькая LLM для генерации несмешных шуток (пока что). Обучена на датасете [RussianJokes](https://huggingface.co/datasets/IgorVolochay/russian_jokes). Создана в рамках учебного проекта VK education.

	# Архитектура:
	10.55M параметров, SwiGLU, GQA, ALiBi, byte-level BPE
	- n_layer=6
	- n_head=6
	- n_kv_head=3
	- hidden_dim=384
	- intermediate_dim=1024

	# Как использовать
	```
	device = torch.device("cuda")

	tokenizer = ByteLevelBPETokenizer.from_pretrained(REPO_NAME)
	check_model = TransformerForCausalLM.from_pretrained(REPO_NAME)
	check_model = check_model.to(device)
	check_model = check_model.eval()

	text = "Штирлиц пришел домой"
	input_ids = torch.tensor(tokenizer.encode(text), device=device)
	model_output = check_model.generate(
	input_ids[None, :], max_new_tokens=200, eos_token_id=tokenizer.eos_token_id, do_sample=True, top_k=10
	)
	tokenizer.decode(model_output[0].tolist())
	```
	Output:
	```
	Штирлиц пришел домой к врачу и видит, что он пришел с ней.
	```