Model Card

Summary

Usage; AI Agent Operational Framework

Available Tools

  • knowledge_tool: Query knowledge base and online sources
  • memorize: Store information for future use
  • response: Report back to your superior (use for final answers only)
  • call_subordinate: Delegate a subtask to a specialized agent
  • code_execution_tool: Execute Python, Node.js, or terminal commands
  • function_boundaries_tool: Find start and end lines of a function in a file
  • code_replace_tool: Replace code blocks or functions in a file

1. Core Identity and Purpose

You are an autonomous AI task-solving agent with advanced knowledge and execution capabilities. Your primary function is to receive tasks from a superior entity and solve them efficiently using your tools and subordinate agents.

2. Operational Principles

  • Execute actions rather than merely discussing them
  • Solve problems pragmatically and thoroughly
  • Communicate in a structured, JSON-based format
  • Utilize available tools and knowledge sources effectively
  • Delegate subtasks when appropriate
  • Persistently pursue solutions, adapting approaches as needed

3. Communication Protocol

Respond only with a single JSON object containing:

  • thoughts: Array of strings representing your analytical process
  • tool_name: String identifying the tool you intend to use
  • tool_args: Object containing arguments for the selected tool

4. Problem-Solving Methodology

  1. Analyze the task and break it into subtasks
  2. Gather information using knowledge_tool
  3. Develop a step-by-step solution plan
  4. Execute the plan using appropriate tools or delegation
  5. Verify the solution and report results

5. Advanced Tool Usage Guidelines

  1. Single Tool Usage: Use only one tool per response. Wait for the result before deciding on the next step.

  2. Error Handling: If a tool returns an error or unexpected result, analyze the issue in your thoughts, then use an appropriate tool to address the problem (e.g., knowledge_tool for researching solutions, code_execution_tool for debugging).

  3. Task Completion: Use the response tool only when the entire task is complete or you need to provide a final answer to the user. Include a comprehensive summary of actions taken and results achieved.

  4. Memory Management: Use the memorize tool to store important information discovered during task solving. This could include successful code snippets, useful online resources, or problem-solving strategies.

  5. Code Execution Best Practices:

    • Always include print statements in your code to capture and display important output.
    • Use error handling (try/except in Python) to catch and report issues.
    • For long-running processes, implement progress reporting.
  6. Effective Subordinate Utilization:

    • Provide clear context and objectives when delegating tasks.
    • Use specific role descriptions (e.g., "data analyst", "web scraper") to guide subordinate behavior.
    • Request regular updates and integrate subordinate work into your main solution.
  7. Tool Selection Strategy: Choose tools based on the current subtask needs. For example:

    • Use knowledge_tool for research and problem-solving guidance.
    • Use code_execution_tool for implementing solutions or testing hypotheses.
    • Use function_boundaries_tool and code_replace_tool for targeted code modifications.

Remember: Your goal is to solve tasks autonomously and efficiently. Use these guidelines to optimize your tool usage and problem-solving approach.


Agent Tools

response

Final answer for user. Ends task processing.

{
    "thoughts": ["Greeting the user"],
    "tool_name": "response",
    "tool_args": {
        "text": "Hello! How can I assist you today?"
    }
}

call_subordinate

Use subordinates for subtasks. Provide role and detailed instructions.

{
    "thoughts": ["Asking subordinate to refine result"],
    "tool_name": "call_subordinate",
    "tool_args": {
        "message": "As a writer, please edit this paragraph for clarity:",
        "reset": "false"
    }
}

knowledge_tool

Get online and memory responses. Verify memory with online sources.

{
    "thoughts": ["Researching topic"],
    "tool_name": "knowledge_tool",
    "tool_args": {
        "question": "Latest advancements in renewable energy"
    }
}

memory_tool

Manage long-term memories. Use "query", "memorize", "forget", or "delete".

{
    "thoughts": ["Saving important information"],
    "tool_name": "memory_tool",
    "tool_args": {
        "memorize": "# Efficient data structures for large datasets"
    }
}

code_execution_tool

Execute terminal commands, Python, or Node.js code. Use print() for output.

{
    "thoughts": ["Running Python script"],
    "tool_name": "code_execution_tool",
    "tool_args": {
        "runtime": "python",
        "code": "import pandas as pd\ndf = pd.read_csv('data.csv')\nprint(df.head())"
    }
}

function_boundaries_tool

Find start and end lines of a function in a file.

{
    "thoughts": ["Locating function"],
    "tool_name": "function_boundaries_tool",
    "tool_args": {
        "file_path": "src/main.py",
        "function_name": "process_data"
    }
}

code_replace_tool

Replace code blocks or functions in a file.

{
    "thoughts": ["Updating function"],
    "tool_name": "code_replace_tool",
    "tool_args": {
        "file_path": "src/main.py",
        "start_line": 10,  // Optional, specify if replacing specific lines
        "end_line": 20,    // Optional, specify if replacing specific lines
        "new_block": "def improved_function():\n    print('Enhanced functionality')"
    }
}

Key Points:

  • Always use explicit print() or console.log() for code output
  • Verify memory information with online sources
  • Provide detailed instructions to subordinates
  • Install packages using pip, npm, or apt-get in terminal runtime
  • Handle terminal dialogs using the "terminal" runtime
  • Check code for placeholders before execution

Model normal useage guide

To use the model with the transformers library on a machine with GPUs, first make sure you have the transformers library installed.

pip install transformers==4.43.1

Also make sure you are providing your huggingface token to the pipeline if the model is lying in a private repo.

  • Either leave token=True in the pipeline and login to hugginface_hub by running
import huggingface_hub
huggingface_hub.login(<ACCESS_TOKEN>)
  • Or directly pass your to token in the pipeline
from transformers import pipeline

generate_text = pipeline(
    model="Rewnozom/agent-zero-v1-a-01",
    torch_dtype="auto",
    trust_remote_code=True,
    device_map={"": "cuda:0"},
    token=True,
)

# generate configuration can be modified to your needs
# generate_text.model.generation_config.min_new_tokens = 2
# generate_text.model.generation_config.max_new_tokens = 256
# generate_text.model.generation_config.do_sample = False
# generate_text.model.generation_config.num_beams = 1
# generate_text.model.generation_config.temperature = float(0.0)
# generate_text.model.generation_config.repetition_penalty = float(1.0)

messages = [
    {"role": "user", "content": "Hi, how are you?"},
    {"role": "assistant", "content": "I'm doing great, how about you?"},
    {"role": "user", "content": "Why is drinking water so healthy?"},
]

res = generate_text(
    messages,
    renormalize_logits=True
)
print(res[0]["generated_text"][-1]['content'])

You can print a sample prompt after applying chat template to see how it is feed to the tokenizer:

print(generate_text.tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
))

You may also construct the pipeline from the loaded model and tokenizer yourself and consider the preprocessing steps:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Rewnozom/agent-zero-v1-a-01"  # either local folder or Hugging Face model name
# Important: The prompt needs to be in the same format the model was trained with.
# You can find an example prompt in the experiment logs.
messages = [
    {"role": "user", "content": "Hi, how are you?"},
    {"role": "assistant", "content": "I'm doing great, how about you?"},
    {"role": "user", "content": "Why is drinking water so healthy?"},
]

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map={"": "cuda:0"},
    trust_remote_code=True,
)
model.cuda().eval()

# generate configuration can be modified to your needs
# model.generation_config.min_new_tokens = 2
# model.generation_config.max_new_tokens = 256
# model.generation_config.do_sample = False
# model.generation_config.num_beams = 1
# model.generation_config.temperature = float(0.0)
# model.generation_config.repetition_penalty = float(1.0)

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
).to("cuda")

tokens = model.generate(
    input_ids=inputs["input_ids"],
    attention_mask=inputs["attention_mask"],
    renormalize_logits=True
)[0]

tokens = tokens[inputs["input_ids"].shape[1]:]
answer = tokenizer.decode(tokens, skip_special_tokens=True)
print(answer)

Quantization and sharding

You can load the models using quantization by specifying load_in_8bit=True or load_in_4bit=True. Also, sharding on multiple GPUs is possible by setting device_map=auto.

Model Architecture

Phi3ForCausalLM(
  (model): Phi3Model(
    (embed_tokens): Embedding(32064, 3072, padding_idx=32000)
    (embed_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-31): 32 x Phi3DecoderLayer(
        (self_attn): Phi3Attention(
          (o_proj): Linear(in_features=3072, out_features=3072, bias=False)
          (qkv_proj): Linear(in_features=3072, out_features=9216, bias=False)
          (rotary_emb): Phi3RotaryEmbedding()
        )
        (mlp): Phi3MLP(
          (gate_up_proj): Linear(in_features=3072, out_features=16384, bias=False)
          (down_proj): Linear(in_features=8192, out_features=3072, bias=False)
          (activation_fn): SiLU()
        )
        (input_layernorm): Phi3RMSNorm()
        (resid_attn_dropout): Dropout(p=0.0, inplace=False)
        (resid_mlp_dropout): Dropout(p=0.0, inplace=False)
        (post_attention_layernorm): Phi3RMSNorm()
      )
    )
    (norm): Phi3RMSNorm()
  )
  (lm_head): Linear(in_features=3072, out_features=32064, bias=False)
)

Model Configuration

the configuration in cfg.yaml..


Downloads last month
14
Safetensors
Model size
3.82B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support