ChatGPT

OpenAI's flagship conversational AI. A single orchestrator model with function-calling access to ~7 tools, a four-layer memory system, multi-model routing (GPT-4o, o3, o4-mini), and layered safety infrastructure.

Architecture

Core Loop

ChatGPT operates as a single orchestrator model that decides whether to respond directly or invoke tools. Tool definitions (~7 tools) are injected into the system prompt as function schemas. The model evaluates the user's message against all tool descriptions and either generates a direct response or a structured tool call.

The model is stateless — every request includes the full conversation history within the context window. The system prompt includes tool definitions, saved memories, user preferences, recent conversation summaries, and behavioral instructions.

Tool System

Tool	Function
python	Code execution in a sandboxed Jupyter environment
web	Web search via Bing index
dalle	Image generation (DALL-E 3 or 4o native)
bio	Memory read/write for persistent user facts
file_search	RAG over uploaded documents
container.download	File retrieval from sandbox

Code Interpreter

Runs in Docker containers managed by Kubernetes on Azure. Each container runs Debian 12 with a stateful Jupyter session that persists for the conversation's duration. Supports 11 languages (Python, JavaScript, Bash, Ruby, PHP, Go, Java, and more). Package installation via pip and npm through a proxy. No outbound network access from the container.

Web Search

A text-based browser limited to GET requests. Searches via Bing's index, retrieves 3-10 sources minimum, and cites findings with structured footnotes. Enterprise queries are disassociated from user accounts.

File Search (RAG)

Uploaded documents are parsed, chunked, embedded, and stored in a vector database. Retrieval uses both keyword and semantic search with query rewriting and reranking. Each vector store supports up to 10,000 files.

Memory System

ChatGPT uses a four-layer memory architecture:

Layer	Scope	Mechanism
Saved Memories	Cross-session	Numbered facts with timestamps, stored via the `bio` tool
Chat History Profile	Cross-session	Inferred preferences, notable topics, user insights — updated out-of-band by OpenAI
Session Metadata	Single session	Device info, location, account age, usage metrics (~17 data points)
Current Context	Single session	Full un-summarized conversation history within the context window

Saved memories are injected into the system prompt under a "Model Set Context" section. Deleting a chat does not remove its saved memories. Memory retrieval uses RAG under the hood.

For long conversations, compaction compresses older turns: user messages are preserved verbatim while assistant messages, tool calls, and reasoning are replaced with an encrypted summary item.

Safety Infrastructure

Training-Time

RLHF — reinforcement learning from human feedback for alignment
Rule-Based Reward Models (RBRMs) — zero-shot classifiers using propositions to evaluate safety aspects without extensive human data
Deliberative Alignment (o-series) — the model reasons about safety guidelines in real-time using its chain of thought

Deployment-Time

Moderation API — free classifier checking inputs/outputs against harm categories (CSAM, hate, violence)
Safety Classifiers — domain-specific classifiers for biology, self-harm, CBRN
Instruction Hierarchy — trained priority order: Root > System > Developer > User > Tool Output. Lower-privileged instructions are selectively ignored when they conflict with higher ones.

Reasoning Models (o-series)

The o1/o3/o4 models introduce a distinct pattern:

Generate a hidden internal chain of thought before the visible response
Trained via reinforcement learning to recognize mistakes, break down complex steps, and try alternative approaches
Test-time search (o3+): generates hundreds of candidate reasoning paths and selects the best one
A model-generated summary of the reasoning is shown; raw chains are kept internal
Can agentically combine all ChatGPT tools during reasoning

Custom GPTs

Custom GPTs are composed of four elements:

Component	Function
Instructions	System prompt defining behavior and constraints
Knowledge	Up to 20 files, auto-chunked and embedded into a vector store
Capabilities	Toggle: code interpreter, DALL-E, web search, canvas
Actions	External API calls defined via OpenAPI schemas

The model uses function calling to decide when to invoke actions and generates the JSON input. Authentication supports API keys and OAuth.

Patterns Used

Pattern	How It's Used
Router	Single model classifies and dispatches to ~7 tools
Tool Router	Function calling for tool selection and invocation
RAG	File search with chunking, embedding, and reranking
Code Execution	Sandboxed Docker containers for code interpreter
Extracted Facts	Bio tool saves structured facts across sessions
Guardrails	Moderation API + safety classifiers + instruction hierarchy
Citation	Web search results with structured source attribution
Conversation Summarization	Compaction for long conversations
Structured Output	Custom GPT actions via OpenAPI schemas

Architecture​

Core Loop​

Tool System​

Code Interpreter​

Web Search​

File Search (RAG)​

Memory System​

Safety Infrastructure​

Training-Time​

Deployment-Time​

Reasoning Models (o-series)​

Custom GPTs​

Patterns Used​