From Prompt to Action: Building Smarter AI Agents

Patric
8 min readMar 26, 2025

--

Introduction

Large language models like LLaMA 3 have evolved far beyond simple question-answering systems. With the right prompting strategy, they can behave more like intelligent agents — capable of planning, reasoning, and interacting with real-world tools via structured outputs. But turning a chat model into a goal-oriented agent requires more than clever instructions, it demands a carefully crafted prompt structure that guides both behavior and output format.

In this article, we’ll walk through how to transform Meta’s LLaMA 3 (8B) into a task-driven AI agent using function calling. By leveraging Replicate’s hosted model and a structured prompt format, you’ll see how to build agents that:

  • Decide when to use tools or answer directly
  • Break down user requests into actionable steps
  • Return machine-readable JSON to trigger external APIs

We’ll break down each part of the prompt — system message, user input, and expected assistant response — and explain how to shape model behavior through schemas, examples, and precise role definitions. Whether you’re building a simple automation assistant or a multi-step reasoning engine, this guide gives you a concrete foundation to start building smarter agents — from prompt to action.

Capabilities of Our Function-Calling Agent

Using an advanced prompt structure tailored for LLaMA 3 (8B) via Replicate, we enable the assistant to behave more like a lightweight agent — capable of:

  • 🧠 Deciding whether to respond directly or use external tools
  • 🗺️ Planning a sequence of actions based on user intent
  • 🧾 Returning structured JSON output for seamless tool execution
  • 🧩 Acting with embedded reasoning — even without long-term memory

We’re using the hosted version of LLaMA 3 8B Instruct provided by Replicate, which follows Meta’s official chat-based prompt format. This format introduces special tokens (like <|begin_of_text|>, <|start_header_id|>, etc.) to structure interactions between the system, user, and assistant. These tokens are critical for guiding model behavior and ensuring it adheres to the designed schema for tool-calling agents.

💰 Cost Efficiency with Replicate
Replicate charges based on token usage:

  • Input: $0.05 per million tokens (or 20 million tokens per $1)
  • Output: $0.25 per million tokens (or 4 million tokens per $1)

In our use case — where inputs and outputs are short and structured — the cost per inference is typically well under $0.01. This makes it a highly affordable option for experimentation and small-scale deployment of AI agents using LLaMA 3.

The Anatomy of a Smart Agent Prompt

Prompt Template

We’re using a structured prompt template designed for LLaMA 3’s special token format, which is particularly important for models hosted via Replicate or any other system that uses Meta’s chat-based fine-tuning format:

<|begin_of_text|>
<|start_header_id|>system<|end_header_id|>
{system_prompt} ← JSON instructions about tools, schema, etc.
<|eot_id|>
<|start_header_id|>user<|end_header_id|>
{prompt} ← User's natural language query
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
← Model completes here by returning JSON output

<|begin_of_text|>: Marks the very beginning of the prompt. It signals the start of a structured chat block and should always appear first

<|start_header_id|>system<|end_header_id|>: Specifies the role for the following message, i.e. “system”. Everything between this and the next <|eot_id|> is considered the system’s instruction

{
"role": "AI Assistant",
"instructions": [...],
"tools": [ ... ],
"response_format": { ... }
}

Why it matters:

  • The model sees this and adjusts its internal behavior.
  • Instructs the model how to behave before seeing the user’s question.

<|eot_id|>: “End of turn” — tells the model this message is complete. Think of it like a newline plus role switch.

<|start_header_id|>user<|end_header_id|>: Begins the user’s message. This is the message our assistant receives from the user, and it’s where the real magic begins.

What’s the weather in Chinas Kapital, convert 500 euro to the currency used in China, and summarize Wikipedia’s article on China?

<|start_header_id|>assistant<|end_header_id|>: Begins the assistant’s message block, but no content follows yet. This tells the model: start completing here.

Following this prompt, Llama 3 completes it by generating the {{assistant_message}}. It signals the end of the {{assistant_message}} by generating the <|eot_id|>.

Prompt Example

Replicate API for the LLama 3 8b Model:

{
system_prompt: "...", // 🧾 JSON-encoded instructions, tools, response format, examples
prompt: "...", // 🧑 User's task/question
max_tokens: 10000,
prompt_template: "<|begin_of_text|><|start_header_id|>system<|end_header_id|>{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>"
}

🧩 Part 1: system_prompt

This is a JSON-formatted instruction set that tells the model:

  • "role"– defines the identity of the assistant — in our case, it's set to "AI Assistant" to signal how the model should behave from the start
  • "instructions"– rules the assistant must follow
  • "tools"– the list of available tools, including parameter schemas
  • "response_format"– the shape of the expected response
  • "examples"– few-shot examples that teach the model how to behave
{
"role": "AI Assistant",
"instructions": [...],
"tools": [
{
"name": "...",
"description": "...",
"parameters": {
"param1": { "type": "...", "description": "..." },
...
}
},
...
],
"response_format": {
"type": "json",
"schema": {
"requires_tools": { "type": "boolean", "description": "..." },
"direct_response": { "type": "string", "description": "...", "optional": true },
"thought": { "type": "string", "description": "...", "optional": true },
"plan": {
"type": "array",
"items": { "type": "string" },
"description": "...",
"optional": true
},
"tool_calls": {
"type": "array",
"items": {
"type": "object",
"properties": {
"tool": { "type": "string", "description": "..." },
"args": { "type": "object", "description": "..." }
}
},
"description": "...",
"optional": true
}
},
"examples": [
{
"query": "...",
"response": { ... }
},
...
]
}
}

🔹 role: "AI Assistant"

Sets the role of the assistant.

This is simple but important: it tells the model to think and act like an assistant — not a chatbot or programmer or researcher.

🔹 instructions: [...]

These are core behavioral rules the model must follow.

  • "Only use tools when necessary"Prevents unnecessary tool use (e.g. for trivial queries)
  • "If the answer can be provided directly, do not use a tool"Tells model to directly answer basic queries
  • "Plan the steps needed if tool usage is required"Activates chain-of-thought reasoning and plan field
  • "Only respond the JSON and nothing else"Ensures safe parsing, prevents chatter or text

This behavior is perfect for post-processing (e.g., calling tools, chaining results).

🔹 tools: [...]

Each tool is defined with:

  • name: Internal identifier
  • description: Natural-language purpose
  • parameters: Input schema with type and description

We defined three tools:

1. getWeather

{
"name": "getWeather",
"description": "Gets the current weather for a location.",
"parameters": {
"location": {
"type": "string",
"description": "Name of the location (e.g., 'Paris', 'New York')"
}
}
}

One simple input: "location" — e.g., "Beijing"

2. convertCurrency

{
"name": "convertCurrency",
"description": "Converts currency using the latest exchange rates.",
"parameters": {
"amount": { "type": "number" },
"from_currency": { "type": "string" },
"to_currency": { "type": "string" }
}
}

Multi-input, our example task is500 EUR to Chinese currency the LLM will map this to:

"args": {
"amount": 500,
"from_currency": "EUR",
"to_currency": "CNY"
}

3. fetchWikipediaSummary

{
"name": "fetchWikipediaSummary",
"description": "Fetches a summary from Wikipedia for a given topic.",
"parameters": {
"topic": { "type": "string" }
}
}

Great for extracting knowledge from a known topic like “China”.

🔹 response_format

Defines the only valid output JSON and the structure we expect as response. Optional fields give flexibility: model can adapt to tool-based vs direct-response tasks.

🔹 examples: [...]

We include two:

🧪 Example 1: Tool use

{
"query": "What's the current weather in Tokyo?",
"response": {
"requires_tools": true,
"thought": "...",
"plan": ["...", "..."],
"tool_calls": [
{
"tool": "getWeather",
"args": { "location": "Tokyo" }
}
]
}
}

Teaches the model when and how to use a tool.

🧪 Example 2: Direct response

{
"query": "Tell me a fun fact.",
"response": {
"requires_tools": false,
"direct_response": "..."
}
}

Teaches the model how to avoid tools when not needed.

These examples are crucial for steering the model’s behavior.

🗣️ Final Prompt Construction

When rendered and sent to Replicate, our actual prompt looks like:

<|begin_of_text|>
<|start_header_id|>system<|end_header_id|>
{
"role": "...",
"instructions": [...],
...
}
<|eot_id|>
<|start_header_id|>user<|end_header_id|>
What's the weather in China's capital, convert 500 euro to the currency used in China, and summarize Wikipedia's article on China?
<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

And the model now completes the assistant’s message in JSON.

Now that we’ve defined how the system prompt works, let’s see how the user prompt fits into this structure — and how the assistant uses it to plan tool usage.

🧩 Part 2: User Prompt prompt

🔹 User message:

What's the weather in Chinas Kapital, convert 500 euro to the currency used in China, and summarize Wikipedia's article on China?

What the model sees:

This is the user’s turn in the structured chat prompt, inserted after the <|start_header_id|>user<|end_header_id|>token.

Role of this section in the prompt template:

It’s not just natural language — it’s interpreted within context of the system instructions and the response schema. The assistant’s task is to:

  1. Parse the compound query
  2. Decide what should be answered directly and what needs tools
  3. Generate JSON according to the defined schema

Model interpretation (step-by-step):

1. “What’s the weather in China’s capital?”

  • This requires geographical knowledge: capital = Beijing
  • Tool to use: getWeather
  • Input: { "location": "Beijing" }

2. “convert 500 euro to the currency used in China”

  • Implicit knowledge: Chinese currency = CNY
  • Tool to use: convertCurrency
  • Input: { "amount": 500, "from_currency": "EUR", "to_currency": "CNY" }

3. “summarize Wikipedia’s article on China”

  • Model matches this with fetchWikipediaSummary
  • Input: { "topic": "China" }

The model’s task is to recognize that all 3 parts require tool usage, then respond only in JSON according to the schema.

🧩 Part 3: Assistant Response assistant

{
"requires_tools": true,
"thought": "I need to fetch the weather information for Beijing, China.",
"plan": [
"Use getWeather tool with location 'Beijing'.",
"Use convertCurrency tool with amount 500, from_currency EUR, and to_currency CNY.",
"Use fetchWikipediaSummary tool with topic 'China'.",
"Return the weather information, currency conversion, and Wikipedia summary."
],
"tool_calls": [
{
"tool": "getWeather",
"args": {
"location": "Beijing"
}
},
{
"tool": "convertCurrency",
"args": {
"amount": 500,
"from_currency": "EUR",
"to_currency": "CNY"
}
},
{
"tool": "fetchWikipediaSummary",
"args": {
"topic": "China"
}
}
]
}

Field-by-field breakdown:

  • requires_tools: trueConfirms that the assistant won’t answer directly, it will call tools
  • thoughtChain-of-thought explanation, why tool usage is necessary
  • planHigh-level ordered steps to solve the full query
  • tool_callsStructured, ready-to-execute calls to tools with parameters extracted from the user’s query

Why this works

This response:

  • ✅ Matches the schema from the system prompt
  • ✅ Follows the behavior rules (“respond in JSON only”)
  • ✅ Shows correct reasoning and tool sequencing
  • ✅ Uses tools only when necessary — in this case, all 3 are needed

🧪 Try It Yourself

We’ve now walked through a full example of how to put together a prompt for LLaMA 3 that can reason, plan and call tools with strict JSON output.

Want to see it in action?

The full implementation of a agent that uses this prompt and system prompt is available on GitHub:
👉 pguso/ai-agents-workshop

You’ll find the agent logic with prompts in src/weather.ts, where you can:

  • Customize tools and schemas
  • Test with your own prompts
  • Extend it with memory, caching or UI integrations

This is a great starting point if you’re exploring multi-step reasoning, tool-calling or want to build your own AI agents that interact with real APIs.

--

--

Patric
Patric

Written by Patric

Loving web development and learning something new. Always curious about new tools and ideas.

No responses yet