Designing AI Agent Applications

From Idea to Specification — Building What Your Research Actually Needs

Week 5 of Phase 2: Building Real Systems (Weeks 5-8)

Contents

Lecture, Practice, and Discussion for Week 5

📖 1. Lecture
  • Specifying AI Applications — What to define before you code
  • The 5 questions every AI app must answer
💻 2. Practice
  • Build a Multi-PDF Research Assistant (Streamlit)
  • PDF upload, selective context, preset prompts, custom prompt save/load
🗣️ 3. Discussion
  • Week 4 Review & Midterm Project Briefing
  • Design YOUR app — specification due by midterm

Phase 2 — From Literacy to Building

You understand AI. Now what will you BUILD with it?

📚 Phase 1 Recap (Weeks 1-4)
  • Week 1: What is Agentic AI? (the Research Director metaphor)
  • Week 2: How LLMs work (capabilities, limits, hallucination)
  • Week 3: System prompts (RICE, personas, Chain-of-Thought)
  • Week 4: Function calling (tools, ReAct loop, agent architecture)
🏗️ Phase 2 Goal (Weeks 5-8)
  • Design and build a working AI agent application
  • Not a demo — a tool you can actually use in your research
  • Midterm deliverable: specification document + working prototype + 5-min demo
🎯 Today's Key Insight
  • The first building skill is NOT coding — it's specification
  • What problem? For whom? What features? How does the human interact?
  • A clear spec makes coding 10x easier; a vague spec makes it impossible
"An unspecified AI app is just a chatbot with a custom system prompt. Specification is what turns it into a product."

5 Questions Every AI App Must Answer

Before you write a single line of code

🎯 1. Problem / Purpose
  • What problem does this app solve?
  • Who suffers without it? What is painful about the current workflow?
  • Be specific: "I spend 3 hours comparing papers manually" not "research is hard"
👤 2. Target Users
  • Who will actually use this? (you? your lab? external researchers?)
  • What is their technical skill level?
  • What context do they work in? (lab, office, field, mobile?)
🔧 3. Core Features
  • What does the app actually do? List 3-5 concrete functions
  • Each feature = a verb: "upload", "compare", "generate", "search", "summarize"
  • Prioritize: what is the MVP (minimum viable product)?
🤝 4. Human-AI Interaction Flow
  • When does the human act? When does the AI act?
  • What triggers the AI? (button click, text input, automatic?)
  • What requires human approval before the AI proceeds?
  • This is where Week 4's "Director's Role" discussion becomes concrete
📊 5. Expected Outcomes
  • What does success look like? How do you measure it?
  • "Reduces paper comparison time from 3 hours to 15 minutes"
  • "Generates 5 novel research ideas per session, with 2+ being feasible"

The Specification Document Template

This is exactly what you'll submit for midterm

# App Name: [Your App Name]

## 1. Problem Statement
What problem does this solve? Why does it matter?
Who currently suffers from this problem, and how?

## 2. Target Users
Who will use this? What is their skill level?
What is their context (research field, tools they already use)?

## 3. Core Features
| Feature | Description | Priority |
|---------|-------------|----------|
| Feature 1 | What it does | Must-have |
| Feature 2 | What it does | Must-have |
| Feature 3 | What it does | Nice-to-have |

## 4. Human-AI Interaction Flow
[Diagram or step-by-step description]
- Step 1: Human does X → Step 2: AI does Y → Step 3: Human reviews → ...

## 5. Technical Approach
- LLM: Gemini / Ollama / OpenAI
- Framework: Streamlit / Gradio
- Key libraries: [list]
- Data: [what data does the app need?]

## 6. Success Criteria
How do you know the app works? What does "good" look like?
💡 This Template = Your Midterm Proposal
  • Fill it in for YOUR research project
  • Draft due Week 7, prototype due Week 8
  • Start thinking about it today during the discussion exercise

Example Spec — Multi-PDF Research Assistant

Let's walk through the 5 questions for a concrete example

🎯 1. Problem
  • Researchers need to analyze and cross-reference multiple papers simultaneously
  • Current tools handle one PDF at a time — switching between 5 papers is painful
  • Manual comparison takes hours; AI can help if given the right context
👤 2. Users
  • Graduate students and researchers doing literature review
  • Moderate tech literacy — comfortable with web apps, not command-line
  • Working with 2-20 PDF papers at a time
🔧 3. Features
  • Upload PDFs to the app (drag & drop)
  • Select which PDFs to include in AI context (checkboxes)
  • Chat with AI about the selected papers
  • Preset prompt buttons (Compare, Summarize, Find Contradictions, etc.)
  • Save/delete custom prompts for reuse
🤝 4. Interaction Flow
  • Human uploads PDFs → human selects which to include → human picks preset or types custom prompt → AI analyzes selected PDFs → human reviews response → iterate
📊 5. Success Criteria
  • Reduces paper comparison time from 3 hours to 30 minutes
  • AI responses cite specific papers in its answers
  • Saved presets eliminate repetitive typing

Example — Interaction Flow Diagram

How the human and AI collaborate

graph TD A["📄 Upload PDFs"] --> B["☑️ Select PDFs
(checkboxes)"] B --> C{"Choose Prompt"} C -->|Preset| D["📌 Click Preset Button
(Compare / Summarize / ...)"] C -->|Custom| E["✏️ Type Custom Prompt"] D --> F["🤖 AI Analyzes
Selected PDFs"] E --> F F --> G["💬 AI Response
with Paper Citations"] G --> H{"Satisfied?"} H -->|"No"| C H -->|"Yes"| I["💾 Save Prompt / Export"] style A fill:#e1f5fe,stroke:#0288d1 style F fill:#fff3e0,stroke:#f57c00 style G fill:#e8f5e9,stroke:#388e3c style I fill:#f3e5f5,stroke:#7b1fa2
"The diagram IS the spec. If you can draw the flow, you can build the app."

What Makes This More Than a Chatbot?

Connecting to everything you learned in Weeks 1-4

📝 System Prompt (Week 3)
  • The app has a built-in system prompt for research paper analysis
  • It instructs the AI to cite papers, use academic language, admit uncertainty
  • This is RICE in action: Role (research assistant), Instructions (structured analysis), Context (paper content), Examples (implicit via presets)
🔧 Tool Use (Week 4)
  • PDF text extraction is essentially a tool — the AI doesn't read PDFs, our code does
  • The code extracts text → sends it as context → AI reasons about it
  • This is the same pattern as calculate() or search_papers() from Week 4
🤝 Human-in-the-Loop (Week 4 Discussion)
  • Human controls context: which PDFs are included (checkboxes)
  • Human controls intent: which prompt to use (preset or custom)
  • AI executes within scope: analyzes only what the human selected
  • This directly implements the boundary YOU defined in Week 4
📋 Preset Prompts = Reusable Tools
  • Each preset is a mini system prompt for a specific task
  • Save/load/delete means the user can customize their own toolbox
  • This is prompt engineering (Week 3) packaged into a UI

Design Principles for AI-Powered Apps

Practical guidelines — not UX theory

🎚️ Progressive Disclosure
  • Start simple: upload PDFs → chat
  • Advanced features (presets, custom saves) available but not required
  • Don't overwhelm users with every option at once
🛡️ Transparency
  • Show which PDFs are in context — the user must know what the AI "sees"
  • Display the prompt being sent — no hidden behavior
  • If the AI doesn't know something, it should say so (system prompt enforces this)
🔄 Iteration Over Perfection
  • Users will refine their questions — make it easy to re-ask and adjust
  • Preset prompts reduce friction for repeated tasks
  • Save/delete cycle means prompts evolve with the user's needs
AI Does Computation, Human Does Judgment
  • AI: extracts, compares, summarizes, generates ideas
  • Human: selects context, chooses what to ask, evaluates quality, makes decisions
  • This is the "Director" principle from Week 1 — now in a real app

Lecture Summary — Specification Before Code

Key takeaways

📋 The 5 Questions
  • Problem → why; Users → who; Features → what; Interaction Flow → how; Outcomes → success criteria
  • Answer these BEFORE writing code — the spec IS the design
🏗️ The Example
  • Multi-PDF Research Assistant: upload → select → prompt → analyze → iterate
  • Maps directly to Weeks 3-4 concepts (system prompts, tools, human-in-the-loop)
🎯 Your Turn
  • Today's practice builds this example; your midterm builds YOUR version
  • Start thinking about what YOUR research needs

Part 2: Practice

Build a Multi-PDF Research Assistant — Streamlit Web App

What We'll Build Today

A real web app that implements the example specification

🎯 The Goal
  • A web app that lets you upload PDFs, select which to include, and chat with AI about them
  • Preset prompt buttons for common research tasks
  • Save and delete custom prompts for reuse
🛠️ Tech Stack
  • Streamlit — web framework (familiar from Week 3)
  • PyPDF2 — PDF text extraction
  • OpenAI client — LLM calls (Gemini / Ollama / OpenAI — your choice, same as Week 4)
📁 Project Structure
  • app.py — Main Streamlit app (UI + chat logic)
  • pdf_utils.py — PDF loading and text extraction
  • llm_client.py — LLM client (reuses Week 4 pattern)
  • prompt_manager.py — Preset and custom prompt management
  • presets.json — Default preset prompts
Upload PDFs
Select Context
Chat / Use Presets
Save Custom Prompts

Step 0 — Setup

Install dependencies and reuse your Week 4 API keys

# Navigate to practice folder
cd practices/week5

# Install dependencies
pip install streamlit PyPDF2 openai python-dotenv
# .env file — reuse from Week 4 (same API keys!)

# Option A: Google Gemini
GOOGLE_API_KEY=your_gemini_key_here
GEMINI_MODEL=gemini-2.0-flash

# Option B: Ollama (local)
OLLAMA_MODEL=qwen3:1.7b

# Option C: OpenAI
OPENAI_API_KEY=your_openai_key_here
OPENAI_MODEL=gpt-4o-mini
💡 Reuse Your Week 4 Setup
  • If you completed Week 4, you already have a .env file with API keys
  • Copy it to practices/week5/ or create a symlink
  • The LLM client code is the same pattern — just a new wrapper

Step 1 — PDF Text Extraction (pdf_utils.py)

Extract text from uploaded PDFs and combine selected ones

# pdf_utils.py
from PyPDF2 import PdfReader

MAX_CHARS_PER_PDF = 15000  # Truncate to fit context limits

def extract_text_from_pdf(uploaded_file) -> str:
    """Extract text from a Streamlit UploadedFile (PDF)."""
    reader = PdfReader(uploaded_file)
    pages = []
    for page in reader.pages:
        text = page.extract_text()
        if text:
            pages.append(text)
    full_text = "\n".join(pages)
    if len(full_text) > MAX_CHARS_PER_PDF:
        full_text = full_text[:MAX_CHARS_PER_PDF] + "\n\n[... truncated ...]"
    return full_text

def get_combined_context(pdf_texts: dict, selected: list) -> str:
    """Combine text from selected PDFs into a single context string."""
    parts = []
    for name in selected:
        if name in pdf_texts:
            parts.append(f"### {name}\n{pdf_texts[name]}")
    return "\n\n---\n\n".join(parts)
💡 Why Truncation?
  • LLMs have context limits (Ollama: ~4K-8K tokens, Gemini: ~1M tokens)
  • 15,000 chars per PDF ≈ 4,000 tokens — fits 3-4 papers in most models
  • This is a design decision from your spec → real-world constraint

Step 2 — LLM Client (llm_client.py)

Same multi-provider pattern from Week 4 + PDF context injection

# llm_client.py
import os
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

SYSTEM_PROMPT = """You are a research paper analysis assistant.
You help researchers understand, compare, and synthesize academic papers.
When answering, always reference which paper(s) your answer is based on.
If the provided papers do not contain enough information, say so clearly.
Be precise, use academic language, and structure your responses with headings."""

def get_client(provider: str):
    if provider == "Gemini":
        return OpenAI(
            api_key=os.getenv("GOOGLE_API_KEY"),
            base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
        ), os.getenv("GEMINI_MODEL", "gemini-2.0-flash")
    elif provider == "Ollama":
        return OpenAI(base_url="http://localhost:11434/v1",
                      api_key="ollama"), os.getenv("OLLAMA_MODEL", "qwen3:1.7b")
    else:
        return OpenAI(api_key=os.getenv("OPENAI_API_KEY")
                      ), os.getenv("OPENAI_MODEL", "gpt-4o-mini")

def chat_with_pdfs(client, model, pdf_context, user_message, history):
    """Send a message with PDF context injected into the system prompt."""
    system_msg = SYSTEM_PROMPT
    if pdf_context:
        system_msg += f"\n\n# Selected Papers Content\n\n{pdf_context}"
    messages = [{"role": "system", "content": system_msg}]
    messages.extend(history)
    messages.append({"role": "user", "content": user_message})
    return client.chat.completions.create(model=model, messages=messages, stream=True)
💡 Spec → Code Mapping
  • System prompt = the "Role" from your spec (research assistant)
  • PDF context injection = the "selective context" feature from your spec
  • Streaming = better UX — the user sees tokens appear in real-time

Step 3 — Prompt Manager (prompt_manager.py)

Load presets, save/delete custom prompts

# prompt_manager.py
import json, os

PRESETS_PATH = os.path.join(os.path.dirname(__file__), "presets.json")
SAVED_PATH = os.path.join(os.path.dirname(__file__), "saved_prompts.json")

def load_presets(path=PRESETS_PATH):
    with open(path, "r", encoding="utf-8") as f:
        return json.load(f)

def load_saved(path=SAVED_PATH):
    if not os.path.exists(path):
        with open(path, "w", encoding="utf-8") as f:
            json.dump([], f)
        return []
    with open(path, "r", encoding="utf-8") as f:
        return json.load(f)

def save_prompt(name, template, path=SAVED_PATH):
    prompts = load_saved(path)
    prompts.append({"name": name, "template": template})
    with open(path, "w", encoding="utf-8") as f:
        json.dump(prompts, f, indent=2, ensure_ascii=False)

def delete_prompt(name, path=SAVED_PATH):
    prompts = load_saved(path)
    prompts = [p for p in prompts if p["name"] != name]
    with open(path, "w", encoding="utf-8") as f:
        json.dump(prompts, f, indent=2, ensure_ascii=False)
// presets.json — 5 default presets
[
  {"name": "Compare Papers", "template": "Compare the following papers..."},
  {"name": "Find Contradictions", "template": "Identify contradicting claims..."},
  {"name": "Generate Fusion Ideas", "template": "Propose 3-5 novel ideas..."},
  {"name": "Summarize Each", "template": "Structured summary of each paper..."},
  {"name": "Extract Methods", "template": "List research methods used..."}
]

Step 4 — App Layout (app.py — Sidebar)

Provider selection, PDF upload, and PDF selection checkboxes

# app.py (sidebar section)
import streamlit as st
from pdf_utils import extract_text_from_pdf, get_combined_context
from llm_client import get_client, chat_with_pdfs
from prompt_manager import load_presets, load_saved, save_prompt, delete_prompt

st.set_page_config(page_title="Multi-PDF Research Assistant", page_icon="📚", layout="wide")

# Session state
if "messages" not in st.session_state:
    st.session_state.messages = []
if "pdf_texts" not in st.session_state:
    st.session_state.pdf_texts = {}
if "selected_pdfs" not in st.session_state:
    st.session_state.selected_pdfs = []

# --- Sidebar ---
st.sidebar.title("⚙️ Settings")
provider = st.sidebar.radio("LLM Provider", ["Gemini", "Ollama", "OpenAI"])
client, model = get_client(provider)

st.sidebar.subheader("📄 Upload PDFs")
uploaded = st.sidebar.file_uploader("PDFs", type="pdf", accept_multiple_files=True)
if uploaded:
    for f in uploaded:
        if f.name not in st.session_state.pdf_texts:
            st.session_state.pdf_texts[f.name] = extract_text_from_pdf(f)

# Per-PDF checkboxes
if st.session_state.pdf_texts:
    st.sidebar.subheader("☑️ Select PDFs for Context")
    selected = [n for n in st.session_state.pdf_texts
                if st.sidebar.checkbox(n, value=True, key=f"pdf_{n}")]
    st.session_state.selected_pdfs = selected

Step 5 — Preset Buttons + Save/Delete (app.py — Main Area)

Click a preset → prompt auto-fills → AI runs immediately

# app.py (main area — preset buttons)
st.title("📚 Multi-PDF Research Assistant")

presets = load_presets()
saved = load_saved()

if st.session_state.pdf_texts:
    st.subheader("⚡ Quick Prompts")

    # Built-in presets
    cols = st.columns(min(len(presets), 5))
    for i, preset in enumerate(presets):
        with cols[i % len(cols)]:
            if st.button(f"📌 {preset['name']}", key=f"preset_{i}",
                         use_container_width=True):
                st.session_state.pending_prompt = preset["template"]
                st.rerun()

    # Saved custom prompts with delete button
    if saved:
        st.caption("Your saved prompts:")
        for i, sp in enumerate(saved):
            c1, c2 = st.columns([5, 1])
            with c1:
                if st.button(f"💾 {sp['name']}", key=f"saved_{i}"):
                    st.session_state.pending_prompt = sp["template"]
                    st.rerun()
            with c2:
                if st.button("✕", key=f"del_{i}"):
                    delete_prompt(sp["name"])
                    st.rerun()

    # Save new custom prompt
    with st.expander("➕ Save a New Custom Prompt"):
        new_name = st.text_input("Prompt name")
        new_template = st.text_area("Prompt template")
        if st.button("💾 Save") and new_name and new_template:
            save_prompt(new_name, new_template)
            st.rerun()

Step 6 — Chat Interface (app.py — Chat)

Message history + streaming LLM response with PDF context

# app.py (chat section)

# Display chat history
for msg in st.session_state.messages:
    with st.chat_message(msg["role"]):
        st.markdown(msg["content"])

# Get prompt (from preset click or chat input)
prompt = st.session_state.get("pending_prompt")
st.session_state.pending_prompt = None
if prompt is None:
    prompt = st.chat_input("Ask about your PDFs...")

if prompt and client:
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user"):
        st.markdown(prompt)

    # Build PDF context from selected PDFs
    pdf_context = get_combined_context(
        st.session_state.pdf_texts, st.session_state.selected_pdfs)

    # Build history for LLM
    history = [{"role": m["role"], "content": m["content"]}
               for m in st.session_state.messages[:-1]]

    # Stream response
    with st.chat_message("assistant"):
        placeholder = st.empty()
        full_response = ""
        stream = chat_with_pdfs(client, model, pdf_context, prompt, history)
        for chunk in stream:
            if chunk.choices[0].delta.content:
                full_response += chunk.choices[0].delta.content
                placeholder.markdown(full_response + "▌")
        placeholder.markdown(full_response)
        st.session_state.messages.append(
            {"role": "assistant", "content": full_response})

How It All Fits Together — Architecture

From PDF upload to AI-powered analysis

sequenceDiagram participant U as User (Browser) participant S as app.py (Streamlit) participant P as pdf_utils.py participant PM as prompt_manager.py participant L as Gemini / Ollama API U->>S: Upload PDFs S->>P: Extract text from each PDF P-->>S: {filename: text} U->>S: Select PDFs via checkboxes U->>S: Click "Compare Papers" preset S->>PM: Load preset template PM-->>S: "Compare the following papers..." S->>S: Build system_prompt + PDF context S->>L: messages (system + context + prompt) L-->>S: Streaming AI response S->>U: Display response in chat

Step 7 — Run and Test

Launch the app and verify each feature

# From the practices/week5 directory
cd practices/week5
streamlit run app.py
# If the above command fails, try this:
python -m streamlit run app.py
  You can now view your Streamlit app in your browser.
  Local URL: http://localhost:8501

Expected UI:
┌─────────────────────────────────────────────────────┐
│ ⚙️ Settings (Sidebar)  │  📚 Multi-PDF Research    │
│                         │     Assistant             │
│ Provider: [Gemini ▼]    │                           │
│                         │  ⚡ Quick Prompts          │
│ 📄 Upload PDFs          │  [Compare] [Summarize]    │
│ [Drop files here]       │  [Contradict] [Fuse]      │
│                         │  [Methods]                │
│ ☑️ Select PDFs          │                           │
│ ☑ paper1.pdf            │  💬 Chat                   │
│ ☑ paper2.pdf            │  You: Compare these papers │
│ ☐ paper3.pdf            │  AI: Based on paper1 and  │
│                         │      paper2, the key...   │
│ [🗑️ Clear Chat]         │                           │
│                         │  [Ask about your PDFs...] │
└─────────────────────────────────────────────────────┘

Practice Checklist

Complete these tasks during the hands-on session

📋 Checklist
  • [ ] Set up .env (reuse from Week 4 or create new)
  • [ ] Create all 5 files: app.py, pdf_utils.py, llm_client.py, prompt_manager.py, presets.json
  • [ ] Run streamlit run app.py and verify the UI loads
  • [ ] Upload 2+ PDF papers and verify text extraction works
  • [ ] Select/deselect PDFs and observe context changes in the caption
  • [ ] Test at least 2 preset prompts (e.g., Compare, Summarize)
  • [ ] Type a custom prompt and test it
  • [ ] Save a custom prompt, reload the page, verify it persists
  • [ ] Delete a saved prompt
  • [ ] (Bonus) Try with both Gemini and Ollama — compare response quality
  • [ ] (Bonus) Add a new preset to presets.json relevant to your research

Part 3: Discussion

Week 4 Review & Midterm Project Briefing

Week 4 Review — The Director's Role

If AI uses the tools, what is the human's unique contribution?

🦸 Iron Man — "Visionary Architect"
  • The human is no longer turning wrenches — we're drafting the master blueprint
  • Let AI handle the "algorithmic grunt work" and tedious execution
  • "Stop crying over the loss of manual data entry and start figuring out what magnificent empire to build"
🛡️ Captain America — "Moral Compass"
  • The human's irreplaceable contribution is moral compass and strict accountability
  • Surrendering tools to AI risks letting analytical skills and ethical vigilance atrophy
  • "True integrity simply cannot be automated" — take the slower, harder path
🧪 Hulk — "Ethical Fail-Safe"
  • The human must be the critical fail-safe against unchecked AI power
  • Let algorithms do heavy lifting, but rigorously double-check every single output
  • "If we just step back and let the system direct itself... the fallout could be devastating"

How Did You Vote?

The most nuanced responses yet — almost everyone combined multiple perspectives

📊 Voting Results
  • Hulk dominated again: Waad, Rupam, Lin, Seher, Hyunwoo, Ly, Han, Manuella — human as fail-safe
  • Iron Man + Hulk synthesis was the new trend: Tran, Irfan, Gyeongsu, Minh — automate execution, verify rigorously
  • Iron Man + Captain America appeared: DongYun, Tan, Nazhiefah — shift to director, but keep integrity
  • Margareth's unique view: "We are basically still the programmers, handling AI that handles the tools"
💡 Week 4 vs Previous Weeks
  • The class has moved from "should we trust AI?" to "how do we architect human-AI systems?"
  • Nearly every student now combines multiple perspectives — no one picks just one
  • The debate has matured from philosophical to operational and design-oriented
  • Perfect timing: today's lecture is about translating these principles into app specifications

Key Theme 1 — The Human as Architect AND Fail-Safe

The strongest consensus — combining Iron Man's vision with Hulk's caution

🏗️ The Architect Role
  • "Our primary role is shifting toward acting as the visionary architect" (Tran)
  • "Researchers must define the right problems, set constraints, and make final calls" (DongYun)
  • Define the "Right Problems" and provide the final validation (Minh)
  • "Only we know the exact thing we want AI to do, how we want it done, and whether the outcome is meaningful" (Margareth)
🛡️ The Fail-Safe Role
  • "Our unique contribution is to hold the leash and make the final decisions based on scrutiny" (Hyunwoo)
  • "If a human does not understand their specific area fully, AI can be guided into misleading steps" (Tran)
  • "Keeping AI in a boundary and keeping a logbook of its working" (Rupam)
  • Design a controlled workflow: AI handles exploration, humans define objectives and validate key stages (Irfan)
🎯 The Design Principle
  • Your midterm app should encode BOTH roles:
  • Architect: the human defines what problem to solve, what data to use, what questions to ask
  • Fail-safe: the human reviews AI outputs before they're used, with clear verification points
"The true human role becomes not doing the work, but deciding what work is worth doing in the first place." — Margareth

🗣️ Live Discussion — Architect vs Fail-Safe in YOUR App

10 minutes — Apply this to your midterm project

💡 Discussion Prompt
  • Think about the AI app you want to build for midterm
  • As Architect: What are the 3 most important decisions the human makes in your app? (problem framing, data selection, evaluation criteria)
  • As Fail-Safe: What are the 3 most dangerous things the AI could get wrong? How does your UI let the human catch these errors?
  • Design question: Can you design your interaction flow so the human plays BOTH roles naturally — without the verification feeling like extra work?
  • Connect to today's practice: in the Multi-PDF app, checkboxes = architect (choosing context), reading the response = fail-safe (verifying accuracy)

Key Theme 2 — "AI Is Harder to Debug Than Code"

Margareth's insight from hands-on tool-building experience

🔧 The Unpredictability Problem (Margareth)
  • "As someone who has been trying to design and give tools to AI, I would say they are hard to predict"
  • "Sometimes it surprises you with how well they choose the right tool; sometimes they keep failing a task you think is very clear"
  • Normal programming: "program does exactly what you intend" (given correct syntax)
  • AI: "doing things with AI sometimes feels like you just want them to read your mind"
🧪 The Implications for App Design
  • Traditional software: input → deterministic output → predictable UI
  • AI software: input → probabilistic output → UI must handle variability
  • Your midterm app needs to handle: what happens when the AI gives a bad answer?
  • Do you retry? Show alternatives? Let the user edit? Ask for clarification?
💡 Han's Industry Perspective
  • Han connected the 3 agents to real companies: Iron Man = OpenAI, Captain America = Google (privacy concerns), Hulk = Anthropic/Claude (ethics first)
  • "Utilization of AI must be considered carefully" — even companies with the best AI struggle with boundaries
  • Economic pressures push toward less safety — another reason human oversight matters

🗣️ Live Discussion — Designing for AI Unpredictability

5 minutes — How does your app handle bad AI output?

💡 Quick Exercise
  • Your AI app gives a wrong answer. The user can tell it's wrong.
  • Design 3 recovery options your app could offer:
  • Option A: "Retry" button → same prompt, hope for different output?
  • Option B: "Adjust" → let the user edit the prompt and try again?
  • Option C: "Show reasoning" → Chain-of-Thought so the user can see WHERE it went wrong?
  • Option D: Something else? (Switch model? Change context? Fall back to manual?)
  • Which option(s) will your midterm app include? Why?

Unexpected Insight — AI Should Free Us, Not Add More Work

Nazhiefah's human-centered perspective

❤️ The Human Cost (Nazhiefah)
  • "Back then... we used to collect data by ourselves, search papers one by one... it sometimes leads to unbalance work and life, stress, and unwell condition"
  • "If AI can ease some work, it will be good for doing such thing rather than doing a core thing"
  • BUT: "not because AI helps some work, we can add more work, no. Healthy and balance life are the important one"
🎯 The Design Implication
  • Many AI tools are designed to increase productivity — do more, faster
  • But Nazhiefah's insight: AI should also increase quality of life — do the same, with less stress
  • Your midterm spec: is your app designed to make the human do MORE? Or do the same with less burden?
  • Success criteria should include not just efficiency but researcher wellbeing
"We will be the architect itself, not just user. However, not because AI helps some work, we can add more work. Healthy and balance life are the important one." — Nazhiefah

From 5 Weeks of Discussion → Your Project Spec

Your insights become design requirements

🔗 Lecture → Practice → Your Turn
  • Lecture: The 5 questions every AI app must answer
  • Practice: Saw specification become a working Multi-PDF app
  • Your turn: Apply the same process to YOUR research problem
📐 Your Discussions → Your Design Requirements
  • Week 1: "AI is an assistant" → your app should have clear human control points
  • Week 2: "Treat AI output as hypothesis" → your app should make verification easy
  • Week 3: "Define what AI should never do" → your spec should have a Red Zone
  • Week 4: "Human = architect + fail-safe" → your interaction flow should encode both roles
  • Week 4 bonus: "AI is hard to predict" → your app needs recovery options for bad outputs
  • Week 4 bonus: "Don't just add more work" → your success criteria should include reduced burden
🎯 The Midterm Challenge
  • Design an AI agent app that embodies your own principles from 5 weeks of discussion
  • Not just "what does it do" but "where does the human stay in control?"

Midterm Project — Overview

Your Phase 2 goal — design, build, and present an AI agent application

📄 What
  • Design and build an AI agent application relevant to your research domain
  • Must solve a real problem you actually face in your work
  • Must include meaningful human-AI interaction (not just a chatbot)
📦 Deliverables
  • Specification Document — the 5-question template, filled in for YOUR app
  • Working Prototype — Streamlit or Gradio app, runnable code
  • 5-Minute Demo — live demonstration + explanation of design decisions
📅 Timeline
  • Week 7: Specification document draft due (submit on LMS)
  • Week 8: Working prototype + 5-minute live demo
  • Submit by April 17 (Fri) 24:00 → email to hogeony@ust.ac.kr
  • Start brainstorming today — the live exercise will help
📊 Evaluation Criteria
  • Clarity of specification — are the 5 questions well-answered?
  • Working prototype — does the app run and do what the spec says?
  • Design quality — is the human-AI interaction well-designed?
  • Relevance — does the app solve a real problem in your research?

Project Inspiration — What Could You Build?

Examples from different research domains

📊 Lab Data Analyzer
  • Upload CSV experiment data → AI identifies trends, anomalies, correlations
  • Preset: "Compare control vs treatment groups", "Suggest next experiment"
  • Human: selects datasets, validates AI findings against domain knowledge
📚 Literature Review Assistant
  • Like today's Multi-PDF app but with tagging, annotation, and comparison matrix
  • Preset: "Find methodological gaps", "Generate related work paragraph"
  • Human: curates the paper collection, decides what to include in the review
💻 Code Review Agent
  • Upload Python scripts → AI reviews for bugs, style, efficiency
  • Preset: "Check for security vulnerabilities", "Suggest optimizations"
  • Human: decides which suggestions to accept, maintains code ownership
📝 Research Proposal Drafter
  • Input topic, field, target grant → AI generates structured proposal sections
  • Preset: "Write significance section", "Generate timeline", "Draft budget justification"
  • Human: provides core ideas, edits AI drafts, takes ownership of final proposal

🗣️ Live Exercise — Sketch Your Project Specification

15 minutes — Draft a 1-page spec for YOUR midterm project

💡 Instructions
  • Open a blank document (or paper) and answer the 5 questions:
  • 1. Problem: What problem in YOUR research does this app solve?
  • 2. Users: Who will use it? (Just you? Your lab? Others?)
  • 3. Features: List 3-5 concrete features (verbs: upload, analyze, compare, generate, etc.)
  • 4. Interaction Flow: Draw a simple flow: Human does X → AI does Y → Human reviews → ...
  • 5. Success Criteria: How do you know the app works? What improves?
  • Share with a partner and give each other feedback
  • This draft is the starting point for your midterm spec document

🗣️ Week 5 Discussion Questions (UST LMS)

Post your response on the forum this week

Visit: UST LMS → Class → Discussion

1. Write a complete specification for your midterm project using the 5-question template from today's lecture. What problem does your app solve? Who uses it? What are the 3-5 core features? What is the human-AI interaction flow (include a diagram)? What does success look like?

2. Reflect on today's Multi-PDF Research Assistant practice. What design decisions did you notice (e.g., why checkboxes for PDF selection? why preset prompts? why save/delete?)? How would you modify the design for YOUR research domain?

3. Looking back at all 4 weeks of discussion (AI boundaries → stochastic parrot → what AI should never do → human's role), how do your accumulated insights inform the interaction flow of your midterm project? Where does your app let AI act autonomously, and where does it require human judgment?

Wrap-Up of Week 5

Three things to remember

📖 Lecture
  • 5 questions every AI app must answer: Problem, Users, Features, Interaction Flow, Outcomes; specification before code; the spec IS the design document
💻 Practice
  • Built a Multi-PDF Research Assistant with Streamlit: PDF upload, selective context, preset prompts, custom prompt save/load/delete; same app demonstrates specification → implementation flow
🗣️ Discussion
  • Week 4 review: class converged on "AI computes, human judges" + Margareth's anchoring bias insight; midterm announced: spec draft due Week 7, prototype + demo Week 8; start designing YOUR app NOW

Next week: Advanced context management — RAG (Retrieval-Augmented Generation), vector databases, and making your agent smarter with external knowledge.

×