ALLURA.MOE - GLM-4-32B-0414 Neon v2

🌈 GLM-4-32B-0414 Neon v2 🌈
32B RP finetune with personality and variety

<< Back to Main | << Back to Models | Hugging Face | GitHub

GLM-4-32B-0414 Neon v2
Base Model: THUDM/GLM-4-32B-0414
by Auri/Aurorae

Description: RP finetune of GLM-4-32B-0414. Feels nice, lots of personality, lots of variety, if bit quirky sometimes. Pretty smart, but sometimes plays dumb for a swipe, just let it be itself. Nice prose, not too Claude-ish or Gemini-ish. Bit of structural repetitions happen sometimes, but that's how modern LLMs are so ¯\_(ツ)_/¯. Seems to like JSON formatted system prompts.

Use Cases:
• Character roleplay
• Creative writing
• Story generation
• Interactive fiction

Training Details:
• 77M tokens of synthetic RP and short story data
• 1 epoch training
• 28 hours on 4xRTX 3090 (provided by OwenArli)
• QLoRA + CCE with sequence parallelism

Links:
• Huggingface (Full Weights)
• GGUF Quantizations

Usage:
Format: GLM4 instruct formatting
Template:
[gMASK]<sop><|system|>
{system_prompt}<|user|>
{prompt}<|assistant|>

Recommended Samplers:
• Temperature: 1.0
• Min-P: 0.1
• Repetition Penalty: 1.03

Backend Notes:
• KoboldCPP: Use latest version + `--overridekv glm4.rope.dimension_count=int:64`
• vLLM: Works OOTB on vLLM >= 0.8.5
• EXL3: Should work out of the box
• llama.cpp: Latest versions support GGUFs OOTB

Special Thanks:
• OwenArli for compute and tuning help
• ArliAI for collaboration
• Artus for free inference
• BeaverAI community for feedback

← Back to Model Archive | Browse by Author | Browse by Series