afrideva
/

smol_llama-220M-GQA-GGUF

Text Generation

Model card Files Files and versions Community

smol_llama-220M-GQA-GGUF / README.md

afrideva's picture

Upload README.md with huggingface_hub

4d4d4af over 1 year ago

|

history blame contribute delete

3.58 kB

	---
	base_model: BEE-spoke-data/smol_llama-220M-GQA
	datasets:
	- JeanKaddour/minipile
	- pszemraj/simple_wikipedia_LM
	- mattymchen/refinedweb-3m
	- BEE-spoke-data/knowledge-inoc-concat-v1
	inference: false
	language:
	- en
	license: apache-2.0
	model_creator: BEE-spoke-data
	model_name: smol_llama-220M-GQA
	pipeline_tag: text-generation
	quantized_by: afrideva
	tags:
	- smol_llama
	- llama2
	- gguf
	- ggml
	- quantized
	- q2_k
	- q3_k_m
	- q4_k_m
	- q5_k_m
	- q6_k
	- q8_0
	widget:
	- example_title: El Microondas
	text: My name is El Microondas the Wise, and
	- example_title: Kennesaw State University
	text: Kennesaw State University is a public
	- example_title: Bungie
	text: Bungie Studios is an American video game developer. They are most famous for
	developing the award winning Halo series of video games. They also made Destiny.
	The studio was founded
	- example_title: Mona Lisa
	text: The Mona Lisa is a world-renowned painting created by
	- example_title: Harry Potter Series
	text: The Harry Potter series, written by J.K. Rowling, begins with the book titled
	- example_title: Riddle
	text: 'Question: I have cities, but no houses. I have mountains, but no trees. I
	have water, but no fish. What am I?

	Answer:'
	- example_title: Photosynthesis
	text: The process of photosynthesis involves the conversion of
	- example_title: Story Continuation
	text: Jane went to the store to buy some groceries. She picked up apples, oranges,
	and a loaf of bread. When she got home, she realized she forgot
	- example_title: Math Problem
	text: 'Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph,
	and another train leaves Station B at 10:00 AM and travels at 80 mph, when will
	they meet if the distance between the stations is 300 miles?

	To determine'
	- example_title: Algorithm Definition
	text: In the context of computer programming, an algorithm is
	---
	# BEE-spoke-data/smol_llama-220M-GQA-GGUF

	Quantized GGUF model files for [smol_llama-220M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-220M-GQA) from [BEE-spoke-data](https://huggingface.co/BEE-spoke-data)


	\| Name \| Quant method \| Size \|
	\| ---- \| ---- \| ---- \|
	\| [smol_llama-220m-gqa.fp16.gguf](https://huggingface.co/afrideva/smol_llama-220M-GQA-GGUF/resolve/main/smol_llama-220m-gqa.fp16.gguf) \| fp16 \| 436.50 MB \|
	\| [smol_llama-220m-gqa.q2_k.gguf](https://huggingface.co/afrideva/smol_llama-220M-GQA-GGUF/resolve/main/smol_llama-220m-gqa.q2_k.gguf) \| q2_k \| 102.60 MB \|
	\| [smol_llama-220m-gqa.q3_k_m.gguf](https://huggingface.co/afrideva/smol_llama-220M-GQA-GGUF/resolve/main/smol_llama-220m-gqa.q3_k_m.gguf) \| q3_k_m \| 115.70 MB \|
	\| [smol_llama-220m-gqa.q4_k_m.gguf](https://huggingface.co/afrideva/smol_llama-220M-GQA-GGUF/resolve/main/smol_llama-220m-gqa.q4_k_m.gguf) \| q4_k_m \| 137.58 MB \|
	\| [smol_llama-220m-gqa.q5_k_m.gguf](https://huggingface.co/afrideva/smol_llama-220M-GQA-GGUF/resolve/main/smol_llama-220m-gqa.q5_k_m.gguf) \| q5_k_m \| 157.91 MB \|
	\| [smol_llama-220m-gqa.q6_k.gguf](https://huggingface.co/afrideva/smol_llama-220M-GQA-GGUF/resolve/main/smol_llama-220m-gqa.q6_k.gguf) \| q6_k \| 179.52 MB \|
	\| [smol_llama-220m-gqa.q8_0.gguf](https://huggingface.co/afrideva/smol_llama-220M-GQA-GGUF/resolve/main/smol_llama-220m-gqa.q8_0.gguf) \| q8_0 \| 232.28 MB \|



	## Original Model Card:
	# smol_llama: 220M GQA

	> model card WIP, more details to come


	A small 220M param (total) decoder model. This is the first version of the model.

	- 1024 hidden size, 10 layers
	- GQA (32 heads, 8 key-value), context length 2048
	- train-from-scratch on one GPU :)


	---