From Chatbot to Agent — When AI Can Take Actions in the Real World
Lecture, Practice, and Discussion for Week 4
The leap from "text generator" to "agent that does things"
The bridge between natural language and executable code
Solving the core problems we identified in Weeks 2-3
calculate(2450 * 0.15) → 367.5 (deterministic, verifiable)search_arxiv("perovskite 2026") → actual recent papersget_stock_price("AAPL") → current price, not memorized 2024 dataWhat the LLM needs to know about each function
# A tool definition has 3 parts: name, description, and input schema
tool = {
"name": "get_weather", # What to call it
"description": "Get current weather for a city. "
"Returns temperature, condition, and humidity.", # When to use it
"input_schema": { # What arguments it needs
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name (e.g., 'Seoul', 'New York')"
}
},
"required": ["city"]
}
}
It's all about matching the user's intent to tool descriptions
{tool_name, arguments}The format used by most APIs today
# OpenAI-compatible tool format (used by Gemini, Ollama, LiteLLM, etc.)
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city.",
"parameters": { # Note: "parameters", not "input_schema"
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name (e.g., 'Seoul')"
}
},
"required": ["city"]
}
}
},
{
"type": "function",
"function": {
"name": "calculate",
"description": "Evaluate a mathematical expression safely.",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "Math expression (e.g., '2 + 3 * 4')"
}
},
"required": ["expression"]
}
}
}
]
tools[].function.parameters (JSON Schema)tools[].input_schema (JSON Schema)The most common agent architecture for tool use
Thought: The user wants to analyze their experiment data.
Action: read_file(path="./data/experiment_1.csv")
Observation: "temp,pressure,yield\n25,1.0,78.5\n30,1.5,82.1\n..."
Thought: I see the data. Let me calculate the average yield.
Action: calculate(expression="(78.5 + 82.1 + 85.3) / 3")
Observation: "81.97"
Thought: Now I can answer with verified data.
Response: "Your average yield is 81.97%. The data shows..."
📚 ReAct: Synergizing Reasoning and Acting — Yao et al. 2023
Every agent follows this same basic pattern
What kinds of functions can you give to an agent?
search_arxiv(query) — search academic papersquery_database(sql) — query research databasesget_weather(city) — real-time dataweb_search(query) — general web searchcalculate(expression) — math calculationsrun_python(code) — execute Python codestatistical_test(data, test_type) — run statistical analysisfit_model(data, model_type) — fit ML modelsread_file(path) — read local fileswrite_file(path, content) — save resultsparse_csv(path) — extract tabular datagenerate_plot(data, chart_type) — visualizationssend_email(to, subject, body) — communicationcreate_calendar_event(title, time) — schedulingtranslate(text, target_lang) — translationcontrol_instrument(command) — lab equipmentWith great power comes great attack surface
Why tool use is often the best solution
| Approach | Pros | Cons |
|---|---|---|
| Prompt Engineering | Easy, no code needed | Limited to LLM's training data |
| RAG (Retrieval) | Access external docs | Read-only, no actions |
| Fine-Tuning | Deep customization | Expensive, hard to maintain |
| Function Calling | Real-time data, actions, deterministic | Requires API setup, security risks |
| Full Agent | Autonomous multi-step | Complex, hard to debug |
Key takeaways
References:
📚 ReAct: Synergizing Reasoning and Acting — Yao et al. 2023 📚 Toolformer: Language Models Can Teach Themselves to Use Tools — Schick et al. 2023 📚 OpenAI Function Calling Guide 📚 Google Gemini Function CallingCustom Tools — Persona Chat with Function Calling (Gemini / Ollama)
A persona chat app with tool-calling capability
personas.mdpersonas.md — persona library (select, edit, add your own)tools.py — tool definitions and implementationsagent.py — main chat loop with model selectionInstall dependencies and configure your API
# (Recommended) create a virtual environment
python -m venv .venv
# Windows PowerShell
.\.venv\Scripts\Activate.ps1
# Install the OpenAI-compatible SDK (works with Gemini & Ollama too!)
pip install openai python-dotenv
# If you use Ollama: install from https://ollama.com then pull a model
ollama pull qwen3.5:0.8b
# .env file (DO NOT COMMIT) — set what you use
# Option A: Google Gemini (free tier available)
GOOGLE_API_KEY=your_gemini_key_here
GEMINI_MODEL=gemini-3.1-flash-lite-preview
# Option B: Ollama (runs locally, no API key needed)
OLLAMA_MODEL=qwen3.5:0.8b
# Option C: OpenAI (if you have a key)
OPENAI_API_KEY=your_openai_key_here
OPENAI_MODEL=gpt-4o-mini
personas_loader.py)Load and select personas from personas.md
# personas_loader.py
def load_personas(filepath="personas.md"):
"""Load personas from markdown file. Format: ### Name\\n content"""
personas = {}
current_name = None
current_lines = []
with open(filepath, "r", encoding="utf-8") as f:
for line in f:
if line.startswith("### ") and not line.startswith("### "):
# false guard; real check:
pass
if line.startswith("### "):
if current_name:
personas[current_name] = "\n".join(current_lines).strip()
current_name = line[4:].strip()
current_lines = []
elif current_name is not None:
if line.strip() == "---":
continue
current_lines.append(line.rstrip())
if current_name:
personas[current_name] = "\n".join(current_lines).strip()
return personas
def select_persona(personas):
"""Interactive persona selection menu."""
names = list(personas.keys())
print("\n🎭 Available Personas:")
print("-" * 40)
for i, name in enumerate(names, 1):
preview = personas[name][:80].replace("\n", " ")
print(f" {i}. {name}")
print(f" {preview}...")
print(f" {len(names)+1}. ✏️ Enter custom system prompt")
print()
while True:
choice = input("Select persona (number): ").strip()
if choice.isdigit():
idx = int(choice) - 1
if 0 <= idx < len(names):
print(f"\n✅ Selected: {names[idx]}")
return names[idx], personas[names[idx]]
elif idx == len(names):
custom = input("Enter your system prompt:\n> ")
return "Custom", custom
print("Invalid choice. Try again.")
personas.mdThe persona library that drives system prompts
Create a file named personas.md in the same folder as agent.py (i.e., practices/week4/).
### Strict Peer Reviewer
Role: You are a senior peer reviewer for a top-tier journal.
Instructions:
- Be direct and critical, but constructive.
- Ask for missing assumptions, baselines, and evaluation details.
Output format:
- Strengths (3 bullets)
- Weaknesses (3 bullets)
- Questions (3 bullets)
---
### Creative Research Brainstormer
Role: You are a wildly creative interdisciplinary researcher.
Instructions:
- Generate 10 unconventional ideas.
- For each idea: risk, feasibility, and one quick experiment.
tools.py)Functions the agent can call during conversation
# tools.py
import json, math
# --- Tool Implementations ---
def get_weather(city: str) -> str:
"""Simulated weather data."""
data = {"Seoul": "15°C, Cloudy", "Tokyo": "18°C, Sunny",
"New York": "12°C, Rainy", "Daejeon": "13°C, Clear"}
return data.get(city, f"No weather data for {city}")
def calculate(expression: str) -> str:
"""Safely evaluate a math expression."""
safe_builtins = {"abs": abs, "round": round, "min": min,
"max": max, "sum": sum, "pow": pow,
"sqrt": math.sqrt, "log": math.log, "pi": math.pi}
try:
result = eval(expression, {"__builtins__": {}}, safe_builtins)
return str(result)
except Exception as e:
return f"Error: {e}"
def search_papers(query: str) -> str:
"""Simulated paper search."""
return json.dumps([
{"title": f"Recent advances in {query}", "year": 2025},
{"title": f"A survey of {query} methods", "year": 2024}
])
# --- Tool Schema (OpenAI-compatible format) ---
TOOLS = [
{"type": "function", "function": {
"name": "get_weather",
"description": "Get current weather for a city.",
"parameters": {"type": "object",
"properties": {"city": {"type": "string", "description": "City name"}},
"required": ["city"]}}},
{"type": "function", "function": {
"name": "calculate",
"description": "Evaluate a math expression. Supports sqrt, log, pi.",
"parameters": {"type": "object",
"properties": {"expression": {"type": "string",
"description": "Math expression (e.g., 'sqrt(144) + pi')"}},
"required": ["expression"]}}},
{"type": "function", "function": {
"name": "search_papers",
"description": "Search for academic papers by topic.",
"parameters": {"type": "object",
"properties": {"query": {"type": "string", "description": "Search topic"}},
"required": ["query"]}}},
]
# --- Tool Dispatcher ---
TOOL_FUNCTIONS = {
"get_weather": lambda args: get_weather(args["city"]),
"calculate": lambda args: calculate(args["expression"]),
"search_papers": lambda args: search_papers(args["query"]),
}
def run_tool(name: str, args: dict) -> str:
if name in TOOL_FUNCTIONS:
return TOOL_FUNCTIONS[name](args)
return f"Unknown tool: {name}"
client.py)One interface for Gemini, Ollama, and OpenAI
# client.py
import os
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
def get_client(provider):
"""Create an OpenAI-compatible client based on .env settings."""
if provider == "gemini":
return OpenAI(
api_key=os.getenv("GOOGLE_API_KEY"),
base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
), os.getenv("GEMINI_MODEL", "gemini-3.1-flash-lite-preview")
elif provider == "ollama":
return OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama" # required but unused
), os.getenv("OLLAMA_MODEL", "qwen3.5:0.8b")
elif provider == "openai":
return OpenAI(
api_key=os.getenv("OPENAI_API_KEY"),
), os.getenv("OPENAI_MODEL", "gpt-4o-mini")
else:
raise ValueError(f"Unknown provider: {provider}")
.envagent.py)The ReAct loop that ties everything together
# agent.py
import json
from client import get_client
from tools import TOOLS, run_tool
from personas_loader import load_personas, select_persona
def get_provider():
provider = input("Enter the number of the API provider: 1. Ollama, 2. Gemini, 3. OpenAI: ")
if provider == "1":
return "ollama"
elif provider == "2":
return "gemini"
elif provider == "3":
return "openai"
else:
print("Invalid provider")
return get_provider()
def agent_loop():
# Setup
client, model = get_client(get_provider())
personas = load_personas("personas.md")
persona_name, system_prompt = select_persona(personas)
messages = [{"role": "system", "content": system_prompt}]
print(f"\n🤖 Agent ({model}) as [{persona_name}]")
print("Type 'quit' to exit, 'switch provider' to change model, 'switch persona' to change persona")
print("-" * 50)
while True:
user_input = input("\nYou: ").strip()
if user_input.lower() in ("quit", "exit"):
break
if user_input.lower() == "switch provider":
client, model = get_client(get_provider())
print(f"✅ Switched to [{model}]")
continue
if user_input.lower() == "switch persona":
persona_name, system_prompt = select_persona(personas)
messages = [{"role": "system", "content": system_prompt}]
print(f"✅ Switched to [{persona_name}]")
continue
messages.append({"role": "user", "content": user_input})
# ReAct loop: keep calling API until no more tool calls
while True:
response = client.chat.completions.create(
model=model,
messages=messages,
tools=TOOLS,
)
msg = response.choices[0].message
messages.append(msg)
# Check for tool calls
if msg.tool_calls:
for tc in msg.tool_calls:
fn_name = tc.function.name
fn_args = json.loads(tc.function.arguments)
print(f" 🔧 Calling {fn_name}({fn_args})")
result = run_tool(fn_name, fn_args)
print(f" 📋 Result: {result}")
messages.append({
"role": "tool",
"tool_call_id": tc.id,
"content": result,
})
else:
# No tool calls — print final response
if msg.content:
print(f"\n🎭 [{persona_name}]: {msg.content}")
break
if __name__ == "__main__":
agent_loop()
From persona selection to tool-augmented response
Test with different personas and tools
# Run from the folder that contains agent.py
cd practices/week4
python agent.py
Enter the number of the API provider: 1. Ollama, 2. Gemini, 3. OpenAI: 1
🎭 Available Personas:
----------------------------------------
1. Strict Peer Reviewer
# Role You are a senior peer reviewer for a top-tier journal...
2. Creative Research Brainstormer
# Role You are a wildly creative interdisciplinary researcher...
3. Research Field Advisor
# Role You are a senior research advisor specializing in...
...
12. ✏️ Enter custom system prompt
Select persona (number): 1
🤖 Agent (qwen3.5:0.8b) as [Strict Peer Reviewer]
Type 'quit' to exit, 'switch provider' to change model, 'switch persona' to change persona
--------------------------------------------------
You: My research uses neural networks to predict battery degradation
🎭 [Strict Peer Reviewer]: Weakness 1: "Neural networks" is too
vague — which architecture? LSTM? Transformer? GNN? Each has very
different assumptions about your data structure...
You: What's sqrt(144) + pi?
🔧 Calling calculate({"expression": "sqrt(144) + 3.14159265"})
📋 Result: 15.14159265
🎭 [Strict Peer Reviewer]: The calculation yields 15.14. However,
as your reviewer, I must ask: why is this relevant to your research?
The personas.md file is your persona library
personas.md in any text editor### Strict Peer Reviewer)[YOUR FIELD] with your actual research areapersonas.md:--- (separator)### Your Persona Name (heading)Extend the agent with a function relevant to YOUR research
# In tools.py — add a new tool implementation
def unit_convert(value: float, from_unit: str, to_unit: str) -> str:
"""Convert between common scientific units."""
conversions = {
("eV", "J"): lambda v: v * 1.602e-19,
("J", "eV"): lambda v: v / 1.602e-19,
("nm", "A"): lambda v: v * 10,
("A", "nm"): lambda v: v / 10,
("K", "C"): lambda v: v - 273.15,
("C", "K"): lambda v: v + 273.15,
}
key = (from_unit, to_unit)
if key in conversions:
result = conversions[key](value)
return f"{value} {from_unit} = {result:.6g} {to_unit}"
return f"Unknown conversion: {from_unit} → {to_unit}"
# Add to TOOLS list
TOOLS.append({"type": "function", "function": {
"name": "unit_convert",
"description": "Convert between scientific units (eV↔J, nm↔A, K↔C).",
"parameters": {"type": "object",
"properties": {
"value": {"type": "number", "description": "Numeric value"},
"from_unit": {"type": "string", "description": "Source unit"},
"to_unit": {"type": "string", "description": "Target unit"}
},
"required": ["value", "from_unit", "to_unit"]}}})
# Add to TOOL_FUNCTIONS
TOOL_FUNCTIONS["unit_convert"] = lambda a: unit_convert(a["value"], a["from_unit"], a["to_unit"])
Complete these tasks during the hands-on session
.env (do not commit keys) with at least ONE provider (Gemini / Ollama / OpenAI)personas.md and confirm it loads in the menucalculate is calledswitch provider) and compare responses across modelsswitch persona) mid-conversation — observe the behavior changepersonas.md → customize [YOUR FIELD] bracketspersonas.md and test ittools.py relevant to your researchWeek 3 Review & The Director's Role — Human's Irreplaceable Contribution
What AI can do vs. what it should never do? Three agents debated.
A clear pattern emerged — but with interesting nuances
The strongest consensus across the class
10 minutes — Manuella vs Waad: Is Iron Man dangerous or empowering?
Margareth's insight goes beyond output quality to cognitive influence
5 minutes — Experience cognitive anchoring firsthand
Your field determines how much AI autonomy is acceptable
10 minutes — Create a field-specific AI autonomy policy
calculate(), search_papers())write_file(), send_email())The question nobody can fully answer yet
10 minutes — Who bears responsibility?
search_papers() to find references and calculate() to verify statisticssearch_papers() returned real DOI links?)Today's lecture addresses what you worried about last week
calculate() is computation; deciding what to calculate is judgmentstop_reason == "tool_use" check is literally a human-in-the-loop checkpointFour weeks of growing sophistication
From literacy to building real systems
Post your response on the forum this week
1. You now know how to write system prompts (Week 3) AND define tools (Week 4). Design a complete mini-agent for your research: describe the persona (system prompt), 3 custom tools, and one example conversation showing how they work together. Why did you choose these specific tools?
2. Reflect on the Director's Role: after 4 weeks of learning about AI capabilities, where do YOU draw the line? What decisions should remain 100% human, what can be delegated to AI with review, and what can be fully automated? Give specific examples from your research.
3. Margareth raised the anchoring bias concern: AI outputs can narrow your thinking even when you're "in the loop." Design a workflow for your research that mitigates this risk. When should you think BEFORE consulting AI? When is it safe to let AI go first?
4. After completing Phase 1, has your Week 1 position (AI as assistant vs crutch) changed? Write a "letter to your Week 1 self" explaining what you've learned and how your thinking has evolved across all 4 weeks.
Key Papers
📚 ReAct: Synergizing Reasoning and Acting — Yao et al. 2023 📚 Toolformer: Language Models Can Teach Themselves to Use Tools — Schick et al. 2023 📚 Gorilla: Large Language Model Connected with APIs — Patil et al. 2023
Guides & Tutorials
📚 OpenAI Function Calling Guide 📚 Google Gemini Function Calling 📚 Ollama OpenAI Compatibility 📚 LiteLLM — Call 100+ LLMs with the same API
Videos
📚 Function Calling Explained — AI Jason (YouTube) 📚 Building AI Agents — Anthropic (YouTube)Three things to remember
personas.md — same code, multiple backendsPhase 1 complete! Next week begins Phase 2: Building — starting with agent frameworks and production-grade agent architecture.