---
license: mit
datasets:
- CAS-SIAT-XinHai/CPsyCoun
language:
- zh
base_model:
- internlm/internlm2_5-7b-chat
tags:
- psychology
---

# Model Details

## Model Description

- **Developed by:** AITA
- **Model type:** Full-Precision Text Generation LLM (FP16 GGUF format)  
- **Original Model:** https://huggingface.co/CAS-SIAT-XinHai/CPsyCounX
- **Precision:** FP16 (non-quantized full-precision version)  

## Repository

- **GGUF Converter:** [llama.cpp](https://github.com/ggerganov/llama.cpp)  
- **Huggingface Hub:** https://huggingface.co/Slipstream-Max/CPsyCounX-InternLM2-Chat-7B-GGUF-fp16


# Usage

## Method 1: llama.cpp Backend Server + Chatbox

**Step 1: Start .[llama.cpp](https://github.com/ggml-org/llama.cpp) Server**
```bash
./llama-server \
  -m /path/to/model.gguf \
  -c 2048 \          # Context length
  --host 0.0.0.0 \   # Allow remote connections
  --port 8080 \      # Server port
  --n-gpu-layers 35  # GPU acceleration (if available)
```

**Step 2: Connect via Chatbox**  
1. Download [Chatbox](https://github.com/Bin-Huang/chatbox)
2. Configure API endpoint:
   ```
   API URL: http://localhost:8080
   Model: (leave empty)
   API Type: llama.cpp
   ```
3. Set generation parameters:
   ```json
   {
     "temperature": 0.7,
     "max_tokens": 512,
     "top_p": 0.9
   }
   ```

## Method 2: LM Studio

1. Download [LM Studio](https://lmstudio.ai/)
2. Load GGUF file:
   - Launch LM Studio
   - Search Slipstream-Max/Emollm-InternLM2.5-7B-chat-GGUF-fp16
3. Configure settings:
   ```yaml
   Context Length: 2048
   GPU Offload: Recommended (enable if available)
   Batch Size: 512
   ```
4. Start chatting through the built-in UI


# Precision Details

| Filename       | Precision | Size      | Characteristics               |
|----------------|-----------|-----------|--------------------------------|
| CPsyCounX.gguf | FP16      | [15.5GB] | Full original model precision |


# Hardware Requirements

**Minimum:**  
- 24GB RAM (for 7B model)  
- CPU with AVX/AVX2 instruction set support  

**Recommended:**  
- 32GB RAM  
- CUDA-capable GPU (for acceleration)  
- Fast SSD storage (due to large model size)  


# Key Notes

1. Requires latest llama.cpp (v3+ recommended)
2. Use `--n-gpu-layers 35` for GPU acceleration (requires CUDA-enabled build)
3. Initial loading takes longer (2-5 minutes)
4. Requires more memory/storage than quantized versions
5. Use `--mlock` to prevent swapping


# Advantages

- Preserves original model precision
- Ideal for precision-sensitive applications
- No quantization loss
- Suitable for continued fine-tuning