internlm/Intern-S1-mini-GGUF · ollama中无法识别图片

@mj23333 Hi how do you test it? Pls. provide sample code and data to reproduce.

下面是运行日志：
time=2025-09-03T16:08:15.699+08:00 level=INFO source=routes.go:1331 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:1h0m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:F:\ollama OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:true OLLAMA_NEW_ESTIMATES:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_SCHED_SPREAD:false ROCR_VISIBLE_DEVICES:]"
time=2025-09-03T16:08:15.763+08:00 level=INFO source=images.go:477 msg="total blobs: 105"
time=2025-09-03T16:08:15.775+08:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2025-09-03T16:08:15.782+08:00 level=INFO source=routes.go:1384 msg="Listening on 127.0.0.1:11434 (version 0.11.8)"
time=2025-09-03T16:08:15.782+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-09-03T16:08:15.782+08:00 level=INFO source=gpu_windows.go:167 msg=packages count=2
time=2025-09-03T16:08:15.783+08:00 level=INFO source=gpu_windows.go:214 msg="" package=0 cores=24 efficiency=0 threads=24
time=2025-09-03T16:08:15.783+08:00 level=INFO source=gpu_windows.go:214 msg="" package=1 cores=24 efficiency=0 threads=24
time=2025-09-03T16:08:16.131+08:00 level=INFO source=gpu.go:321 msg="detected OS VRAM overhead" id=GPU-60edb41c-b874-e19f-154d-2a20d55a228e library=cuda compute=7.5 driver=12.8 name="NVIDIA GeForce RTX 2080 Ti" overhead="96.3 MiB"
time=2025-09-03T16:08:16.333+08:00 level=INFO source=gpu.go:321 msg="detected OS VRAM overhead" id=GPU-aafd59aa-59b7-2231-065b-f21c126d060b library=cuda compute=7.5 driver=12.8 name="NVIDIA GeForce RTX 2080 Ti" overhead="426.9 MiB"
time=2025-09-03T16:08:16.344+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-60edb41c-b874-e19f-154d-2a20d55a228e library=cuda variant=v12 compute=7.5 driver=12.8 name="NVIDIA GeForce RTX 2080 Ti" total="22.0 GiB" available="20.8 GiB"
time=2025-09-03T16:08:16.345+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-aafd59aa-59b7-2231-065b-f21c126d060b library=cuda variant=v12 compute=7.5 driver=12.8 name="NVIDIA GeForce RTX 2080 Ti" total="22.0 GiB" available="20.8 GiB"
[GIN] 2025/09/03 - 16:08:42 | 200 | 0s | 127.0.0.1 | HEAD "/"
[GIN] 2025/09/03 - 16:08:42 | 200 | 144.9071ms | 127.0.0.1 | POST "/api/show"
time=2025-09-03T16:08:43.094+08:00 level=INFO source=server.go:388 msg="starting runner" cmd="C:\Users\mj\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --model F:\ollama\blobs\sha256-88543e7b3d9bafcf0688beb5060d92a75f51e9d6f7847cc532a6e6dff5c7af9c --port 55085"
time=2025-09-03T16:08:43.337+08:00 level=INFO source=server.go:493 msg="system memory" total="447.7 GiB" free="421.4 GiB" free_swap="454.7 GiB"
time=2025-09-03T16:08:43.338+08:00 level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=F:\ollama\blobs\sha256-88543e7b3d9bafcf0688beb5060d92a75f51e9d6f7847cc532a6e6dff5c7af9c library=cuda parallel=1 required="15.9 GiB" gpus=1
time=2025-09-03T16:08:43.340+08:00 level=INFO source=server.go:533 msg=offload library=cuda layers.requested=-1 layers.model=37 layers.offload=37 layers.split=[37] memory.available="[20.8 GiB]" memory.gpu_overhead="0 B" memory.required.full="15.9 GiB" memory.required.partial="15.9 GiB" memory.required.kv="576.0 MiB" memory.required.allocations="[15.9 GiB]" memory.weights.total="14.1 GiB" memory.weights.repeating="12.9 GiB" memory.weights.nonrepeating="1.2 GiB" memory.graph.full="384.0 MiB" memory.graph.partial="384.0 MiB"
time=2025-09-03T16:08:43.420+08:00 level=INFO source=runner.go:1006 msg="starting ollama engine"
time=2025-09-03T16:08:43.464+08:00 level=INFO source=runner.go:1043 msg="Server listening on 127.0.0.1:55085"
time=2025-09-03T16:08:43.478+08:00 level=INFO source=runner.go:925 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:48 GPULayers:37[ID:GPU-aafd59aa-59b7-2231-065b-f21c126d060b Layers:37(0..36)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2025-09-03T16:08:43.566+08:00 level=INFO source=ggml.go:130 msg="" architecture=qwen3 file_type=F16 name=Intern-S1-Mini description="" num_tensors=399 num_key_values=29
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes, ID: GPU-60edb41c-b874-e19f-154d-2a20d55a228e
Device 1: NVIDIA GeForce RTX 2080 Ti, compute capability 7.5, VMM: yes, ID: GPU-aafd59aa-59b7-2231-065b-f21c126d060b
load_backend: loaded CUDA backend from C:\Users\mj\AppData\Local\Programs\Ollama\lib\ollama\ggml-cuda.dll
load_backend: loaded CPU backend from C:\Users\mj\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-skylakex.dll
time=2025-09-03T16:08:43.903+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 CUDA.1.ARCHS=500,600,610,700,750,800,860,870,890,900,1200 CUDA.1.USE_GRAPHS=1 CUDA.1.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2025-09-03T16:08:44.545+08:00 level=INFO source=ggml.go:486 msg="offloading 36 repeating layers to GPU"
time=2025-09-03T16:08:44.545+08:00 level=INFO source=ggml.go:492 msg="offloading output layer to GPU"
time=2025-09-03T16:08:44.545+08:00 level=INFO source=ggml.go:497 msg="offloaded 37/37 layers to GPU"
time=2025-09-03T16:08:44.546+08:00 level=INFO source=backend.go:310 msg="model weights" device=CUDA1 size="14.1 GiB"
time=2025-09-03T16:08:44.547+08:00 level=INFO source=backend.go:315 msg="model weights" device=CPU size="1.2 GiB"
time=2025-09-03T16:08:44.547+08:00 level=INFO source=backend.go:321 msg="kv cache" device=CUDA1 size="576.0 MiB"
time=2025-09-03T16:08:44.547+08:00 level=INFO source=backend.go:332 msg="compute graph" device=CUDA1 size="280.0 MiB"
time=2025-09-03T16:08:44.547+08:00 level=INFO source=backend.go:337 msg="compute graph" device=CPU size="8.0 MiB"
time=2025-09-03T16:08:44.547+08:00 level=INFO source=backend.go:342 msg="total memory" size="16.1 GiB"
time=2025-09-03T16:08:44.548+08:00 level=INFO source=sched.go:473 msg="loaded runners" count=1
time=2025-09-03T16:08:44.548+08:00 level=INFO source=server.go:1236 msg="waiting for llama runner to start responding"
time=2025-09-03T16:08:44.548+08:00 level=INFO source=server.go:1270 msg="waiting for server to become available" status="llm server loading model"
time=2025-09-03T16:08:52.839+08:00 level=INFO source=server.go:1274 msg="llama runner started in 9.74 seconds"
[GIN] 2025/09/03 - 16:08:52 | 200 | 10.1202716s | 127.0.0.1 | POST "/api/generate"
[GIN] 2025/09/03 - 16:10:23 | 200 | 8.2599869s | 127.0.0.1 | POST "/api/chat"
这是与模型的问答：

请描述一下这张图片 C:/Users/mj/Desktop/tp/image(1).png
Added image 'C:/Users/mj/Desktop/tp/image(1).png'

好的，用户让我描述一张图片，但图片已经上传了。首先，我需要确认图片的内容。不过，根据用户提供的提示，可能图片的内容与
之前提到的“img-0”相关。不过，用户可能误以为我能看到图片，但实际上我无法处理或查看图像。这时候，我应该礼貌地告知用
户这一点，并询问是否有其他方式帮助他们。

接下来，我需要检查用户是否有其他问题或者是否需要进一步的信息。可能用户上传图片是为了让我分析，但根据规则，我不能执行
图像识别任务。这时候，我应该保持友好，提供替代方案，比如描述文字内容或者解释相关概念。

另外，用户可能不太清楚我的限制，所以需要明确说明我无法查看图片，但可以协助其他方面的问题。同时，要确保回复简洁明了，
避免让用户感到困惑。

最后，保持专业和友好的语气，确保用户明白我的能力范围，并愿意提供其他帮助。这样既能解决问题，又能维护良好的用户体验。

我无法查看或分析图片内容。如果您有其他问题或需要帮助，请随时告诉我！