--- license: mit datasets: - CAS-SIAT-XinHai/CPsyCoun language: - zh base_model: - internlm/internlm2_5-7b-chat tags: - psychology --- # Model Details ## Model Description - **Developed by:** AITA - **Model type:** Full-Precision Text Generation LLM (FP16 GGUF format) - **Original Model:** https://huggingface.co/CAS-SIAT-XinHai/CPsyCounX - **Precision:** FP16 (non-quantized full-precision version) ## Repository - **GGUF Converter:** [llama.cpp](https://github.com/ggerganov/llama.cpp) - **Huggingface Hub:** https://huggingface.co/Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16 # Usage ## Method 1: llama.cpp Backend Server + Chatbox **Step 1: Start .[llama.cpp](https://github.com/ggml-org/llama.cpp) Server** ```bash ./llama-server \ -m /path/to/model.gguf \ -c 2048 \ # Context length --host 0.0.0.0 \ # Allow remote connections --port 8080 \ # Server port --n-gpu-layers 35 # GPU acceleration (if available) ``` **Step 2: Connect via Chatbox** 1. Download [Chatbox](https://github.com/Bin-Huang/chatbox) 2. Configure API endpoint: ``` API URL: http://localhost:8080 Model: (leave empty) API Type: llama.cpp ``` 3. Set generation parameters: ```json { "temperature": 0.7, "max_tokens": 512, "top_p": 0.9 } ``` ## Method 2: LM Studio 1. Download [LM Studio](https://lmstudio.ai/) 2. Load GGUF file: - Launch LM Studio - Search Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16 3. Configure settings: ```yaml Context Length: 2048 GPU Offload: Recommended (enable if available) Batch Size: 512 ``` 4. Start chatting through the built-in UI # Precision Details | Filename | Precision | Size | Characteristics | |----------------|-----------|-----------|--------------------------------| | CPsyCounX.gguf | FP16 | [15.5GB] | Full original model precision | # Hardware Requirements **Minimum:** - 24GB RAM (for 7B model) - CPU with AVX/AVX2 instruction set support **Recommended:** - 32GB RAM - CUDA-capable GPU (for acceleration) - Fast SSD storage (due to large model size) # Key Notes 1. Requires latest llama.cpp (v3+ recommended) 2. Use `--n-gpu-layers 35` for GPU acceleration (requires CUDA-enabled build) 3. Initial loading takes longer (2-5 minutes) 4. Requires more memory/storage than quantized versions 5. Use `--mlock` to prevent swapping # Advantages - Preserves original model precision - Ideal for precision-sensitive applications - No quantization loss - Suitable for continued fine-tuning