NSFW-API
/

NSFW_Wan_14b

nsfw

Not-For-All-Audiences

Model card Files Files and versions

NSFW Wan 14B T2V - Uncensored Text-to-Video Model

Model Description

NSFW Wan 14B T2V is a massive, 14 billion parameter text-to-video generation model, specifically fine-tuned for generating Not Safe For Work (NSFW) content. This model was created using a unified training methodology to build a solid understanding across the entire NSFW spectrum and generate videos with coherent motion natively.

The primary goal of this model is to provide a research and creative tool capable of generating thematically relevant short video clips based on text prompts within the adult content domain. It aims to understand and render a wide array of NSFW scenarios, aesthetics, and actions described in natural language with high fidelity and temporal consistency.

Model Details

Architecture: Text-to-Video Transformer Architecture
Parameters: 14 Billion
Type: Text-to-Video (T2V)
Specialization: NSFW Content Generation

Training Methodology

Unlike previous multi-phase approaches, the 14B model was trained using a single, unified configuration from the ground up to ensure maximum quality and stability from the very first epoch.

Mixed Dataset: The model was trained on a mixed dataset of 30k video clips and 20k still images simultaneously. This method provides constant spatial regularization, preventing the anatomical drift and quality collapse that can occur in phased training. The model learns aesthetics and motion in parallel.
Stable Configuration: The entire 15-epoch run used a stable learning rate with an initial warmup, batch sizes optimized for the 14B architecture, and a training schedule designed for steady, progressive learning.
Training Specifications: The training was conducted on 17-frame video clips at a resolution of 480p.
Outcome: The result is a series of 15 high-quality, coherent checkpoints. The model demonstrates vastly improved spatial quality, stable motion, and reliable NSFW fidelity without the legacy artifacts associated with older training methods.

We strongly recommend using wan_14B_e15.safetensors for all general use cases and LoRA training. This final checkpoint represents the most refined state of the model.

Training Data

The model was trained on a dataset comprising the top 1,000 posts from approximately 1,250 distinct NSFW subreddits. This dataset was carefully curated to capture a broad spectrum of adult themes, visual styles, character archetypes, specific kinks, and actions prevalent in these online communities. The video portion of the dataset was sourced from similar communities.

The captions associated with the training data leveraged the language and tagging conventions found within these subreddents. For insights into effective prompting strategies for specific styles or content, please refer to the prompting-guide.json file included in this repository.

Note: Due to the nature of the source material, the training dataset inherently contains explicit adult content.

Files Included

wan_14B_e1.safetensors
... (and all intermediate epochs)
wan_14B_e15.safetensors
prompting-guide.json: This crucial JSON file contains an analysis of common keywords, phrases, and descriptive language associated with the content from various source subreddits. It is designed to help users craft more effective prompts.

How to Use

This model is intended for generating short video clips (typically a few seconds) from descriptive text prompts.

Select a Checkpoint: We recommend using the final wan_14B_e15.safetensors checkpoint for the best balance of training and quality.
No Helper LoRA Needed: The model generates motion natively. You do not need to use any external motion LoRAs.
Craft Your Prompt: Utilize natural language to describe the desired scene, subjects, actions, and style.
Consult prompting-guide.json: For best results, especially when targeting specific sub-community styles or niche fetishes, refer to the prompting-guide.json. This guide will provide insights into the terminology and phrasing most likely to elicit the desired output.
Generate: Use your preferred inference pipeline compatible with this model architecture.

The Ideal Base for LoRA Fine-Tuning

While NSFW Wan 14B T2V is a capable standalone model, its greatest strength lies in its efficacy as a foundational base for training specialized LoRAs (Low-Rank Adaptations).

We highly recommend using wan_14B_e15.safetensors as the base for all LoRA training.

Its robust, unified training provides a strong and stable understanding of:

Core NSFW Anatomy & Aesthetics: The mixed-data training provides a strong, coherent grasp of anatomy and visual styles from the start.
Coherent Motion & Actions: The video component provides foundational knowledge of common sexual acts and temporal consistency.

Because the base model has a strong, coherent understanding of anatomy and motion from the outset, you can focus your LoRA training dataset exclusively on the specific niche concept, character, artistic style, or unique action you want to master. This leads to more efficient LoRA training and superior results.

Community & Support

Join our Discord server!

Connect with other users, share your creations, get help with prompting, discuss fine-tuning, and contribute to the community:

https://discord.gg/mjnStFuCYh

We encourage active participation and feedback to help improve future iterations and resources!

Limitations and Bias

NSFW Focus: The model's knowledge is heavily biased towards the content prevalent in the NSFW subreddits it was trained on. It will likely perform poorly on SFW (Safe For Work) prompts.
Specificity & Artifacts: While the model demonstrates high quality, it may still produce visual artifacts, anatomical inaccuracies, or fail to perfectly capture highly complex or nuanced prompts. Video generation is an evolving field.
Bias: The training data reflects the content, biases, preferences, and potentially problematic depictions present in the source NSFW communities. The model may generate content that perpetuates these biases.
Safety: This model does not have built-in safety filters. Users are responsible for the ethical application of the model.
Temporal Coherence: Coherence is significantly improved. However, very long or complex actions might still exhibit some temporal inconsistencies.

Ethical Considerations & Responsible AI

This model is intended for adult users (18+/21+ depending on local regulations) only.

Consent and Harm: This model generates fictional, synthetic media. It must not be used to create non-consensual depictions of real individuals, to impersonate, defame, harass, or generate content that could cause harm.
Legal Use: Users are solely responsible for ensuring that their use of this model and the content they generate complies with all applicable local, national, and international laws and regulations.
Distribution: Exercise extreme caution and responsibility if distributing content generated by this model. Be mindful of platform terms of service and legal restrictions regarding adult content.
No Endorsement: The creators of this model do not endorse or condone the creation or distribution of illegal, unethical, or harmful content.

We strongly recommend users familiarize themselves with responsible AI practices and the potential societal impacts of generative NSFW media.

License

Steal this model!

Disclaimer

The outputs of this model are entirely synthetic and computer-generated. They do not depict real people or events unless explicitly prompted to do so with user-provided data (which is not the intended use of this pre-trained model). The developers of this model are not responsible for the outputs created by users.

Downloads last month: 6,709

GGUF

Model size

14.3B params

Architecture

wan

Hardware compatibility

Log In to view the estimation

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for NSFW-API/NSFW_Wan_14b

Base model

Wan-AI/Wan2.1-T2V-14B

Quantized

(5)

this model