A newer version of this model is available: EpistemeAI/VibeCoder-20B-alpha-0.001

Model card

Test our endpoint

FriendliAI

Summary

This is an first-generation vibe-code alpha(preview) LLM. It’s optimized to produce both natural-language and code completions directly from loosely structured, “vibe coding” prompts. Compared to earlier-generation LLMs, it has a lower prompt-engineering overhead and smoother latent-space interpolation, making it easier to guide toward usable code. The following capabilities can be leveraged:

  • Agentic capabilities: Use the OpenAI's gpt oss 20b models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs.
  • This model were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise.

Vibe-Code LLM

This is a first-generation vibe-code LLM.
It’s optimized to produce both natural-language and code completions directly from loosely structured, “vibe coding” prompts.

Unlike earlier LLMs that demanded rigid prompt engineering, vibe-code interaction lowers the overhead: you can sketch intent, describe functionality in free-form language, or mix pseudo-code with natural text. The model interpolates smoothly in latent space, making it easier to guide toward usable and executable code.


Key Features

  • Low Prompt-Engineering Overhead
    Accepts incomplete or intuitive instructions, reducing the need for explicit formatting or rigid templates.

  • Latent-Space Interpolation
    Transitions fluidly between natural-language reasoning and syntax-aware code generation. Produces semantically coherent code blocks even when the prompt is under-specified.

  • Multi-Domain Support
    Handles a broad range of programming paradigms: Python, JavaScript, C++, shell scripting, and pseudo-code scaffolding.

  • Context-Sensitive Completion
    Leverages attention mechanisms to maintain coherence across multi-turn coding sessions.

  • Syntax-Aware Decoding
    Biases output distribution toward syntactically valid tokens, improving out-of-the-box executability of code.

  • Probabilistic Beam & Sampling Controls
    Supports temperature scaling, top-k, and nucleus (top-p) sampling to modulate creativity vs. determinism.

  • Hybrid Text + Code Responses
    Generates inline explanations, design rationales, or docstrings alongside code for improved readability and maintainability.


Example Usage

Prompt:  
"make me a fast vibe function that sorts numbers but with a cool twist"

Response:  
- Natural explanation of sorting method  
- Code snippet (e.g., Python quicksort variant)  
- Optional playful commentary to match the vibe  

Ideal Applications

  • Rapid prototyping & exploratory coding
  • Creative coding workflows with minimal boilerplate
  • Educational contexts where explanation + code matter equally
  • Interactive REPLs, notebooks, or editor assistants that thrive on loose natural-language input

Limitations

  • Not tuned for production-grade formal verification.
  • May require post-processing or linting to ensure strict compliance with project coding standards.
  • Designed for “fast prototyping vibes”, not for long-horizon enterprise-scale codebases.

Inference examples

Transformers

You can use gpt-oss-120b and gpt-oss-20b with Transformers. If you use the Transformers chat template, it will automatically apply the harmony response format. If you use model.generate directly, you need to apply the harmony format manually using the chat template or use our openai-harmony package.

To get started, install the necessary dependencies to setup your environment:

pip install -U transformers kernels torch 

For Google Colab (free/Pro)

!pip install -q --upgrade torch

!pip install -q transformers triton==3.4 kernels

!pip uninstall -q torchvision torchaudio -y

Once, setup you can proceed to run the model by running the snippet below:

from transformers import pipeline
import torch
model_id = "EpistemeAI/VibeCoder-20B-alpha"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype="auto",
    device_map="auto",
)
messages = [
    {"role": "user", "content": "Let’s start with the header and navigation for the landing page. Start by creating the top header section for the dashboard. We’ll add the content blocks below afterward."},
]
outputs = pipe(
    messages,
    max_new_tokens=3000,
)
print(outputs[0]["generated_text"][-1])

Amazon SageMaker

import json
import sagemaker
import boto3
from sagemaker.huggingface import HuggingFaceModel, get_huggingface_llm_image_uri

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client('iam')
    role = iam.get_role(RoleName='sagemaker_execution_role')['Role']['Arn']

# Hub Model configuration. https://huggingface.co/models
hub = {
    'HF_MODEL_ID':'EpistemeAI/VibeCoder-20B-alpha',
    'SM_NUM_GPUS': json.dumps(1)
}



# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    image_uri=get_huggingface_llm_image_uri("huggingface",version="3.2.3"),
    env=hub,
    role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.2xlarge",
    container_startup_health_check_timeout=300,
  )
  
# send request
predictor.predict({
    "inputs": "Hi, what can you help me with?",
})

Uploaded finetuned model

  • Developed by: EpistemeAI
  • License: apache-2.0
  • Finetuned from model : unsloth/gpt-oss-20b-unsloth-bnb-4bit

This gpt_oss model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
145
Safetensors
Model size
21.5B params
Tensor type
BF16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for EpistemeAI/VibeCoder-20B-alpha

Base model

openai/gpt-oss-20b
Quantized
(32)
this model
Quantizations
3 models

Collection including EpistemeAI/VibeCoder-20B-alpha