From Idea to Specification — Building What Your Research Actually Needs
Lecture, Practice, and Discussion for Week 5
You understand AI. Now what will you BUILD with it?
Before you write a single line of code
This is exactly what you'll submit for midterm
# App Name: [Your App Name]
## 1. Problem Statement
What problem does this solve? Why does it matter?
Who currently suffers from this problem, and how?
## 2. Target Users
Who will use this? What is their skill level?
What is their context (research field, tools they already use)?
## 3. Core Features
| Feature | Description | Priority |
|---------|-------------|----------|
| Feature 1 | What it does | Must-have |
| Feature 2 | What it does | Must-have |
| Feature 3 | What it does | Nice-to-have |
## 4. Human-AI Interaction Flow
[Diagram or step-by-step description]
- Step 1: Human does X → Step 2: AI does Y → Step 3: Human reviews → ...
## 5. Technical Approach
- LLM: Gemini / Ollama / OpenAI
- Framework: Streamlit / Gradio
- Key libraries: [list]
- Data: [what data does the app need?]
## 6. Success Criteria
How do you know the app works? What does "good" look like?
Let's walk through the 5 questions for a concrete example
How the human and AI collaborate
Connecting to everything you learned in Weeks 1-4
calculate() or search_papers() from Week 4Practical guidelines — not UX theory
Key takeaways
Build a Multi-PDF Research Assistant — Streamlit Web App
A real web app that implements the example specification
app.py — Main Streamlit app (UI + chat logic)pdf_utils.py — PDF loading and text extractionllm_client.py — LLM client (reuses Week 4 pattern)prompt_manager.py — Preset and custom prompt managementpresets.json — Default preset promptsInstall dependencies and reuse your Week 4 API keys
# Navigate to practice folder
cd practices/week5
# Install dependencies
pip install streamlit PyPDF2 openai python-dotenv
# .env file — reuse from Week 4 (same API keys!)
# Option A: Google Gemini
GOOGLE_API_KEY=your_gemini_key_here
GEMINI_MODEL=gemini-2.0-flash
# Option B: Ollama (local)
OLLAMA_MODEL=qwen3:1.7b
# Option C: OpenAI
OPENAI_API_KEY=your_openai_key_here
OPENAI_MODEL=gpt-4o-mini
.env file with API keyspractices/week5/ or create a symlinkpdf_utils.py)Extract text from uploaded PDFs and combine selected ones
# pdf_utils.py
from PyPDF2 import PdfReader
MAX_CHARS_PER_PDF = 15000 # Truncate to fit context limits
def extract_text_from_pdf(uploaded_file) -> str:
"""Extract text from a Streamlit UploadedFile (PDF)."""
reader = PdfReader(uploaded_file)
pages = []
for page in reader.pages:
text = page.extract_text()
if text:
pages.append(text)
full_text = "\n".join(pages)
if len(full_text) > MAX_CHARS_PER_PDF:
full_text = full_text[:MAX_CHARS_PER_PDF] + "\n\n[... truncated ...]"
return full_text
def get_combined_context(pdf_texts: dict, selected: list) -> str:
"""Combine text from selected PDFs into a single context string."""
parts = []
for name in selected:
if name in pdf_texts:
parts.append(f"### {name}\n{pdf_texts[name]}")
return "\n\n---\n\n".join(parts)
llm_client.py)Same multi-provider pattern from Week 4 + PDF context injection
# llm_client.py
import os
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
SYSTEM_PROMPT = """You are a research paper analysis assistant.
You help researchers understand, compare, and synthesize academic papers.
When answering, always reference which paper(s) your answer is based on.
If the provided papers do not contain enough information, say so clearly.
Be precise, use academic language, and structure your responses with headings."""
def get_client(provider: str):
if provider == "Gemini":
return OpenAI(
api_key=os.getenv("GOOGLE_API_KEY"),
base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
), os.getenv("GEMINI_MODEL", "gemini-2.0-flash")
elif provider == "Ollama":
return OpenAI(base_url="http://localhost:11434/v1",
api_key="ollama"), os.getenv("OLLAMA_MODEL", "qwen3:1.7b")
else:
return OpenAI(api_key=os.getenv("OPENAI_API_KEY")
), os.getenv("OPENAI_MODEL", "gpt-4o-mini")
def chat_with_pdfs(client, model, pdf_context, user_message, history):
"""Send a message with PDF context injected into the system prompt."""
system_msg = SYSTEM_PROMPT
if pdf_context:
system_msg += f"\n\n# Selected Papers Content\n\n{pdf_context}"
messages = [{"role": "system", "content": system_msg}]
messages.extend(history)
messages.append({"role": "user", "content": user_message})
return client.chat.completions.create(model=model, messages=messages, stream=True)
prompt_manager.py)Load presets, save/delete custom prompts
# prompt_manager.py
import json, os
PRESETS_PATH = os.path.join(os.path.dirname(__file__), "presets.json")
SAVED_PATH = os.path.join(os.path.dirname(__file__), "saved_prompts.json")
def load_presets(path=PRESETS_PATH):
with open(path, "r", encoding="utf-8") as f:
return json.load(f)
def load_saved(path=SAVED_PATH):
if not os.path.exists(path):
with open(path, "w", encoding="utf-8") as f:
json.dump([], f)
return []
with open(path, "r", encoding="utf-8") as f:
return json.load(f)
def save_prompt(name, template, path=SAVED_PATH):
prompts = load_saved(path)
prompts.append({"name": name, "template": template})
with open(path, "w", encoding="utf-8") as f:
json.dump(prompts, f, indent=2, ensure_ascii=False)
def delete_prompt(name, path=SAVED_PATH):
prompts = load_saved(path)
prompts = [p for p in prompts if p["name"] != name]
with open(path, "w", encoding="utf-8") as f:
json.dump(prompts, f, indent=2, ensure_ascii=False)
// presets.json — 5 default presets
[
{"name": "Compare Papers", "template": "Compare the following papers..."},
{"name": "Find Contradictions", "template": "Identify contradicting claims..."},
{"name": "Generate Fusion Ideas", "template": "Propose 3-5 novel ideas..."},
{"name": "Summarize Each", "template": "Structured summary of each paper..."},
{"name": "Extract Methods", "template": "List research methods used..."}
]
app.py — Sidebar)Provider selection, PDF upload, and PDF selection checkboxes
# app.py (sidebar section)
import streamlit as st
from pdf_utils import extract_text_from_pdf, get_combined_context
from llm_client import get_client, chat_with_pdfs
from prompt_manager import load_presets, load_saved, save_prompt, delete_prompt
st.set_page_config(page_title="Multi-PDF Research Assistant", page_icon="📚", layout="wide")
# Session state
if "messages" not in st.session_state:
st.session_state.messages = []
if "pdf_texts" not in st.session_state:
st.session_state.pdf_texts = {}
if "selected_pdfs" not in st.session_state:
st.session_state.selected_pdfs = []
# --- Sidebar ---
st.sidebar.title("⚙️ Settings")
provider = st.sidebar.radio("LLM Provider", ["Gemini", "Ollama", "OpenAI"])
client, model = get_client(provider)
st.sidebar.subheader("📄 Upload PDFs")
uploaded = st.sidebar.file_uploader("PDFs", type="pdf", accept_multiple_files=True)
if uploaded:
for f in uploaded:
if f.name not in st.session_state.pdf_texts:
st.session_state.pdf_texts[f.name] = extract_text_from_pdf(f)
# Per-PDF checkboxes
if st.session_state.pdf_texts:
st.sidebar.subheader("☑️ Select PDFs for Context")
selected = [n for n in st.session_state.pdf_texts
if st.sidebar.checkbox(n, value=True, key=f"pdf_{n}")]
st.session_state.selected_pdfs = selected
app.py — Main Area)Click a preset → prompt auto-fills → AI runs immediately
# app.py (main area — preset buttons)
st.title("📚 Multi-PDF Research Assistant")
presets = load_presets()
saved = load_saved()
if st.session_state.pdf_texts:
st.subheader("⚡ Quick Prompts")
# Built-in presets
cols = st.columns(min(len(presets), 5))
for i, preset in enumerate(presets):
with cols[i % len(cols)]:
if st.button(f"📌 {preset['name']}", key=f"preset_{i}",
use_container_width=True):
st.session_state.pending_prompt = preset["template"]
st.rerun()
# Saved custom prompts with delete button
if saved:
st.caption("Your saved prompts:")
for i, sp in enumerate(saved):
c1, c2 = st.columns([5, 1])
with c1:
if st.button(f"💾 {sp['name']}", key=f"saved_{i}"):
st.session_state.pending_prompt = sp["template"]
st.rerun()
with c2:
if st.button("✕", key=f"del_{i}"):
delete_prompt(sp["name"])
st.rerun()
# Save new custom prompt
with st.expander("➕ Save a New Custom Prompt"):
new_name = st.text_input("Prompt name")
new_template = st.text_area("Prompt template")
if st.button("💾 Save") and new_name and new_template:
save_prompt(new_name, new_template)
st.rerun()
app.py — Chat)Message history + streaming LLM response with PDF context
# app.py (chat section)
# Display chat history
for msg in st.session_state.messages:
with st.chat_message(msg["role"]):
st.markdown(msg["content"])
# Get prompt (from preset click or chat input)
prompt = st.session_state.get("pending_prompt")
st.session_state.pending_prompt = None
if prompt is None:
prompt = st.chat_input("Ask about your PDFs...")
if prompt and client:
st.session_state.messages.append({"role": "user", "content": prompt})
with st.chat_message("user"):
st.markdown(prompt)
# Build PDF context from selected PDFs
pdf_context = get_combined_context(
st.session_state.pdf_texts, st.session_state.selected_pdfs)
# Build history for LLM
history = [{"role": m["role"], "content": m["content"]}
for m in st.session_state.messages[:-1]]
# Stream response
with st.chat_message("assistant"):
placeholder = st.empty()
full_response = ""
stream = chat_with_pdfs(client, model, pdf_context, prompt, history)
for chunk in stream:
if chunk.choices[0].delta.content:
full_response += chunk.choices[0].delta.content
placeholder.markdown(full_response + "▌")
placeholder.markdown(full_response)
st.session_state.messages.append(
{"role": "assistant", "content": full_response})
From PDF upload to AI-powered analysis
Launch the app and verify each feature
# From the practices/week5 directory
cd practices/week5
streamlit run app.py
# If the above command fails, try this:
python -m streamlit run app.py
You can now view your Streamlit app in your browser.
Local URL: http://localhost:8501
Expected UI:
┌─────────────────────────────────────────────────────┐
│ ⚙️ Settings (Sidebar) │ 📚 Multi-PDF Research │
│ │ Assistant │
│ Provider: [Gemini ▼] │ │
│ │ ⚡ Quick Prompts │
│ 📄 Upload PDFs │ [Compare] [Summarize] │
│ [Drop files here] │ [Contradict] [Fuse] │
│ │ [Methods] │
│ ☑️ Select PDFs │ │
│ ☑ paper1.pdf │ 💬 Chat │
│ ☑ paper2.pdf │ You: Compare these papers │
│ ☐ paper3.pdf │ AI: Based on paper1 and │
│ │ paper2, the key... │
│ [🗑️ Clear Chat] │ │
│ │ [Ask about your PDFs...] │
└─────────────────────────────────────────────────────┘
Complete these tasks during the hands-on session
.env (reuse from Week 4 or create new)app.py, pdf_utils.py, llm_client.py, prompt_manager.py, presets.jsonstreamlit run app.py and verify the UI loadspresets.json relevant to your researchWeek 4 Review & Midterm Project Briefing
If AI uses the tools, what is the human's unique contribution?
The most nuanced responses yet — almost everyone combined multiple perspectives
The strongest consensus — combining Iron Man's vision with Hulk's caution
10 minutes — Apply this to your midterm project
Margareth's insight from hands-on tool-building experience
5 minutes — How does your app handle bad AI output?
Nazhiefah's human-centered perspective
Your insights become design requirements
Your Phase 2 goal — design, build, and present an AI agent application
Examples from different research domains
15 minutes — Draft a 1-page spec for YOUR midterm project
Post your response on the forum this week
1. Write a complete specification for your midterm project using the 5-question template from today's lecture. What problem does your app solve? Who uses it? What are the 3-5 core features? What is the human-AI interaction flow (include a diagram)? What does success look like?
2. Reflect on today's Multi-PDF Research Assistant practice. What design decisions did you notice (e.g., why checkboxes for PDF selection? why preset prompts? why save/delete?)? How would you modify the design for YOUR research domain?
3. Looking back at all 4 weeks of discussion (AI boundaries → stochastic parrot → what AI should never do → human's role), how do your accumulated insights inform the interaction flow of your midterm project? Where does your app let AI act autonomously, and where does it require human judgment?
Streamlit & Web UI
📚 Streamlit Documentation 📚 Streamlit Chat Elements 📚 Gradio — Alternative to Streamlit
AI Application Design
📚 Google PAIR — People + AI Guidebook 📚 Microsoft HAX Toolkit — Human-AI Interaction 📚 Nielsen Norman Group — AI UX Guidelines
PDF Processing
📚 PyPDF2 Documentation 📚 LangChain Document LoadersThree things to remember
Next week: Advanced context management — RAG (Retrieval-Augmented Generation), vector databases, and making your agent smarter with external knowledge.