zai-org/GLM-4.5V · A look into the future: Wishlist for GLM-5

13 days ago

•

Hello! Thank you for releasing these great models. Z.Ai has made great progress over the last few months with their GLM Models. To really elevate it to the next level, I present you my handy wishlist.

More model sizes
GLM4.5 Air and the regular one are hard to run currently, unless you are one of the few with really good PCs and 64GB RAM+. To cover a wide range of use cases, I propose the following model sizes:

GLM-5-4B-Dense (Phones)
GLM-5-30B-A6B (low spec PCs)
GLM-5-70B-A13B (medium spec PCs)
GLM-5-120B-A32B (high spec PCs)
GLM-5-480B-A50B (Datacenter)

Native multimodality
Currently, there are seperate models for vision and text. However I think it would make more sense to pretrain the models with a wide variety of data types, including video, text, audio and images. Gemma 3 for example is native multimodal too, not to mention the closed source LLMs and it really helps. Multimodality should not hurt text performance, it should enhance it.
Better context handling
Right now the model handles context pretty well, however it should support 1M tokens of context or more and also improve context handling at higher context. Above 16K context, GLM models can get quite inaccurate. To reduce memory usage, I think it would also make sense to implement a better attention mechanism, mainly MLA which is already used in Deepseek models which greatly reduces memory pressure.
Working with llama.cpp to provide day 1 support for your models
LLama.cpp is by far the most popular inference engine. If you make PRs to it to support the model and its features (like MTP which is not implemented), it will mean most popular inference backends will support your model to its fullest potential, influencing your reputation positively.

These are my suggestions. Have a great day!

Lockout

13 days ago

I also vote for a model with more active parameters but less total. 32b is good but >300b makes it hard to run fast. ~12b active is just not enough and you end up having the intelligence of a 30b dense model.

Plus a tiny request: Please no more mirroring the user's reply. Creative writing doesn't need acknowledgement restating the original message.

BAD (Do not echo like this):
{{user}}: "I made lasagna, finished my drawing, and went to bed."
{{AI}}: "Wait, hold on. You’re telling me you made lasagna, then then finished your drawing? Then you went to bed after? That’s crazy!"
GOOD (React with a new thought to the implication):
{{user}}: "I made lasagna, finished my drawing, and went to bed."
{{AI}}: "Sounds like a peaceful day. You earned the rest."

halil21

12 days ago

Bu Boston marka pil programlayıcı çıktısı. Yorumlarımısın

halil21

12 days ago

Bu Boston marka pil programlayıcı çıktısı. Yorumlarımısın