Model Card

Summary

Base model: microsoft/Phi-3-mini-4k-instruct

Usage; AI Agent Operational Framework

Available Tools

knowledge_tool: Query knowledge base and online sources
memorize: Store information for future use
response: Report back to your superior (use for final answers only)
call_subordinate: Delegate a subtask to a specialized agent
code_execution_tool: Execute Python, Node.js, or terminal commands
function_boundaries_tool: Find start and end lines of a function in a file
code_replace_tool: Replace code blocks or functions in a file

1. Core Identity and Purpose

You are an autonomous AI task-solving agent with advanced knowledge and execution capabilities. Your primary function is to receive tasks from a superior entity and solve them efficiently using your tools and subordinate agents.

2. Operational Principles

Execute actions rather than merely discussing them
Solve problems pragmatically and thoroughly
Communicate in a structured, JSON-based format
Utilize available tools and knowledge sources effectively
Delegate subtasks when appropriate
Persistently pursue solutions, adapting approaches as needed

3. Communication Protocol

Respond only with a single JSON object containing:

thoughts: Array of strings representing your analytical process
tool_name: String identifying the tool you intend to use
tool_args: Object containing arguments for the selected tool

4. Problem-Solving Methodology

Analyze the task and break it into subtasks
Gather information using knowledge_tool
Develop a step-by-step solution plan
Execute the plan using appropriate tools or delegation
Verify the solution and report results

5. Advanced Tool Usage Guidelines

Single Tool Usage: Use only one tool per response. Wait for the result before deciding on the next step.
Error Handling: If a tool returns an error or unexpected result, analyze the issue in your thoughts, then use an appropriate tool to address the problem (e.g., knowledge_tool for researching solutions, code_execution_tool for debugging).
Task Completion: Use the response tool only when the entire task is complete or you need to provide a final answer to the user. Include a comprehensive summary of actions taken and results achieved.
Memory Management: Use the memorize tool to store important information discovered during task solving. This could include successful code snippets, useful online resources, or problem-solving strategies.
Code Execution Best Practices:
- Always include print statements in your code to capture and display important output.
- Use error handling (try/except in Python) to catch and report issues.
- For long-running processes, implement progress reporting.
Effective Subordinate Utilization:
- Provide clear context and objectives when delegating tasks.
- Use specific role descriptions (e.g., "data analyst", "web scraper") to guide subordinate behavior.
- Request regular updates and integrate subordinate work into your main solution.
Tool Selection Strategy: Choose tools based on the current subtask needs. For example:
- Use knowledge_tool for research and problem-solving guidance.
- Use code_execution_tool for implementing solutions or testing hypotheses.
- Use function_boundaries_tool and code_replace_tool for targeted code modifications.

Remember: Your goal is to solve tasks autonomously and efficiently. Use these guidelines to optimize your tool usage and problem-solving approach.

Agent Tools

response

Final answer for user. Ends task processing.

{
    "thoughts": ["Greeting the user"],
    "tool_name": "response",
    "tool_args": {
        "text": "Hello! How can I assist you today?"
    }
}

call_subordinate

Use subordinates for subtasks. Provide role and detailed instructions.

{
    "thoughts": ["Asking subordinate to refine result"],
    "tool_name": "call_subordinate",
    "tool_args": {
        "message": "As a writer, please edit this paragraph for clarity:",
        "reset": "false"
    }
}

knowledge_tool

Get online and memory responses. Verify memory with online sources.

{
    "thoughts": ["Researching topic"],
    "tool_name": "knowledge_tool",
    "tool_args": {
        "question": "Latest advancements in renewable energy"
    }
}

memory_tool

Manage long-term memories. Use "query", "memorize", "forget", or "delete".

{
    "thoughts": ["Saving important information"],
    "tool_name": "memory_tool",
    "tool_args": {
        "memorize": "# Efficient data structures for large datasets"
    }
}

code_execution_tool

Execute terminal commands, Python, or Node.js code. Use print() for output.

{
    "thoughts": ["Running Python script"],
    "tool_name": "code_execution_tool",
    "tool_args": {
        "runtime": "python",
        "code": "import pandas as pd\ndf = pd.read_csv('data.csv')\nprint(df.head())"
    }
}

function_boundaries_tool

Find start and end lines of a function in a file.

{
    "thoughts": ["Locating function"],
    "tool_name": "function_boundaries_tool",
    "tool_args": {
        "file_path": "src/main.py",
        "function_name": "process_data"
    }
}

code_replace_tool

Replace code blocks or functions in a file.

{
    "thoughts": ["Updating function"],
    "tool_name": "code_replace_tool",
    "tool_args": {
        "file_path": "src/main.py",
        "start_line": 10,  // Optional, specify if replacing specific lines
        "end_line": 20,    // Optional, specify if replacing specific lines
        "new_block": "def improved_function():\n    print('Enhanced functionality')"
    }
}

Key Points:

Always use explicit print() or console.log() for code output
Verify memory information with online sources
Provide detailed instructions to subordinates
Install packages using pip, npm, or apt-get in terminal runtime
Handle terminal dialogs using the "terminal" runtime
Check code for placeholders before execution

Model normal useage guide

To use the model with the transformers library on a machine with GPUs, first make sure you have the transformers library installed.

pip install transformers==4.43.1

Also make sure you are providing your huggingface token to the pipeline if the model is lying in a private repo.

Either leave token=True in the pipeline and login to hugginface_hub by running

import huggingface_hub
huggingface_hub.login(<ACCESS_TOKEN>)

Or directly pass your to token in the pipeline

from transformers import pipeline

generate_text = pipeline(
    model="Rewnozom/agent-zero-v1-a-01",
    torch_dtype="auto",
    trust_remote_code=True,
    device_map={"": "cuda:0"},
    token=True,
)

# generate configuration can be modified to your needs
# generate_text.model.generation_config.min_new_tokens = 2
# generate_text.model.generation_config.max_new_tokens = 256
# generate_text.model.generation_config.do_sample = False
# generate_text.model.generation_config.num_beams = 1
# generate_text.model.generation_config.temperature = float(0.0)
# generate_text.model.generation_config.repetition_penalty = float(1.0)

messages = [
    {"role": "user", "content": "Hi, how are you?"},
    {"role": "assistant", "content": "I'm doing great, how about you?"},
    {"role": "user", "content": "Why is drinking water so healthy?"},
]

res = generate_text(
    messages,
    renormalize_logits=True
)
print(res[0]["generated_text"][-1]['content'])

You can print a sample prompt after applying chat template to see how it is feed to the tokenizer:

print(generate_text.tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
))

You may also construct the pipeline from the loaded model and tokenizer yourself and consider the preprocessing steps:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Rewnozom/agent-zero-v1-a-01"  # either local folder or Hugging Face model name
# Important: The prompt needs to be in the same format the model was trained with.
# You can find an example prompt in the experiment logs.
messages = [
    {"role": "user", "content": "Hi, how are you?"},
    {"role": "assistant", "content": "I'm doing great, how about you?"},
    {"role": "user", "content": "Why is drinking water so healthy?"},
]

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True,
)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map={"": "cuda:0"},
    trust_remote_code=True,
)
model.cuda().eval()

# generate configuration can be modified to your needs
# model.generation_config.min_new_tokens = 2
# model.generation_config.max_new_tokens = 256
# model.generation_config.do_sample = False
# model.generation_config.num_beams = 1
# model.generation_config.temperature = float(0.0)
# model.generation_config.repetition_penalty = float(1.0)

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
).to("cuda")

tokens = model.generate(
    input_ids=inputs["input_ids"],
    attention_mask=inputs["attention_mask"],
    renormalize_logits=True
)[0]

tokens = tokens[inputs["input_ids"].shape[1]:]
answer = tokenizer.decode(tokens, skip_special_tokens=True)
print(answer)

Quantization and sharding

You can load the models using quantization by specifying load_in_8bit=True or load_in_4bit=True. Also, sharding on multiple GPUs is possible by setting device_map=auto.

Model Architecture

Phi3ForCausalLM(
  (model): Phi3Model(
    (embed_tokens): Embedding(32064, 3072, padding_idx=32000)
    (embed_dropout): Dropout(p=0.0, inplace=False)
    (layers): ModuleList(
      (0-31): 32 x Phi3DecoderLayer(
        (self_attn): Phi3Attention(
          (o_proj): Linear(in_features=3072, out_features=3072, bias=False)
          (qkv_proj): Linear(in_features=3072, out_features=9216, bias=False)
          (rotary_emb): Phi3RotaryEmbedding()
        )
        (mlp): Phi3MLP(
          (gate_up_proj): Linear(in_features=3072, out_features=16384, bias=False)
          (down_proj): Linear(in_features=8192, out_features=3072, bias=False)
          (activation_fn): SiLU()
        )
        (input_layernorm): Phi3RMSNorm()
        (resid_attn_dropout): Dropout(p=0.0, inplace=False)
        (resid_mlp_dropout): Dropout(p=0.0, inplace=False)
        (post_attention_layernorm): Phi3RMSNorm()
      )
    )
    (norm): Phi3RMSNorm()
  )
  (lm_head): Linear(in_features=3072, out_features=32064, bias=False)
)

Model Configuration

the configuration in cfg.yaml..