LLM 모델 서빙 시 권장 옵션
{
"--gpu-memory-utilization": 0.9,
"--enable-expert-parallel": "X-BOOLEAN-TRUE",
"--max-model-len": 430000,
"--quantization": "compressed-tensors",
"--enable-auto-tool-choice": "X-BOOLEAN-TRUE",
"--tool-call-parser": "pythonic",
"--chat-template": "examples/tool_chat_template_llama4_pythonic.jinja"
}RoPE
Last updated
Was this helpful?