Spaces:

Baraaqasem
/

Chat

Configuration error

App Files Files Community

Chat / gallery /hermes-vllm.yaml

Baraaqasem

Upload 711 files

651d019 verified 13 days ago

raw

history blame

3.38 kB

	---
	name: "hermes-vllm"

	config_file: \|
	backend: vllm
	parameters:
	max_tokens: 8192
	context_size: 8192
	stopwords:
	- "<\|im_end\|>"
	- "<dummy32000>"
	- "<\|eot_id\|>"
	- "<\|end_of_text\|>"
	function:
	disable_no_action: true
	grammar:
	# Uncomment the line below to enable grammar matching for JSON results if the model is breaking
	# the output. This will make the model more accurate and won't break the JSON output.
	# This however, will make parallel_calls not functional (it is a known bug)
	# mixed_mode: true
	disable: true
	parallel_calls: true
	expect_strings_after_json: true
	json_regex_match:
	- "(?s)<tool_call>(.*?)</tool_call>"
	- "(?s)<tool_call>(.*)"
	capture_llm_results:
	- (?s)<scratchpad>(.*?)</scratchpad>
	replace_llm_results:
	- key: (?s)<scratchpad>(.*?)</scratchpad>
	value: ""

	template:
	use_tokenizer_template: true
	chat: \|
	{{.Input -}}
	<\|im_start\|>assistant
	chat_message: \|
	<\|im_start\|>{{if eq .RoleName "assistant"}}assistant{{else if eq .RoleName "system"}}system{{else if eq .RoleName "tool"}}tool{{else if eq .RoleName "user"}}user{{end}}
	{{- if .FunctionCall }}
	<tool_call>
	{{- else if eq .RoleName "tool" }}
	<tool_response>
	{{- end }}
	{{- if .Content}}
	{{.Content }}
	{{- end }}
	{{- if .FunctionCall}}
	{{toJson .FunctionCall}}
	{{- end }}
	{{- if .FunctionCall }}
	</tool_call>
	{{- else if eq .RoleName "tool" }}
	</tool_response>
	{{- end }}<\|im_end\|>
	completion: \|
	{{.Input}}
	function: \|
	<\|im_start\|>system
	You are a function calling AI model.
	Here are the available tools:
	<tools>
	{{range .Functions}}
	{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
	{{end}}
	</tools>
	You should call the tools provided to you sequentially
	Please use <scratchpad> XML tags to record your reasoning and planning before you call the functions as follows:
	<scratchpad>
	{step-by-step reasoning and plan in bullet points}
	</scratchpad>
	For each function call return a json object with function name and arguments within <tool_call> XML tags as follows:
	<tool_call>
	{"arguments": <args-dict>, "name": <function-name>}
	</tool_call><\|im_end\|>
	{{.Input -}}
	<\|im_start\|>assistant
	# Uncomment to specify a quantization method (optional)
	# quantization: "awq"
	# Uncomment to limit the GPU memory utilization (vLLM default is 0.9 for 90%)
	# gpu_memory_utilization: 0.5
	# Uncomment to trust remote code from huggingface
	# trust_remote_code: true
	# Uncomment to enable eager execution
	# enforce_eager: true
	# Uncomment to specify the size of the CPU swap space per GPU (in GiB)
	# swap_space: 2
	# Uncomment to specify the maximum length of a sequence (including prompt and output)
	# max_model_len: 32768
	# Uncomment and specify the number of Tensor divisions.
	# Allows you to partition and run large models. Performance gains are limited.
	# https://github.com/vllm-project/vllm/issues/1435
	# tensor_parallel_size: 2