ChatGPT
OpenAI's flagship conversational AI. A single orchestrator model with function-calling access to ~7 tools, a four-layer memory system, multi-model routing (GPT-4o, o3, o4-mini), and layered safety infrastructure.
Architecture
Core Loop
ChatGPT operates as a single orchestrator model that decides whether to respond directly or invoke tools. Tool definitions (~7 tools) are injected into the system prompt as function schemas. The model evaluates the user's message against all tool descriptions and either generates a direct response or a structured tool call.
The model is stateless — every request includes the full conversation history within the context window. The system prompt includes tool definitions, saved memories, user preferences, recent conversation summaries, and behavioral instructions.
Tool System
| Tool | Function |
|---|---|
| python | Code execution in a sandboxed Jupyter environment |
| web | Web search via Bing index |
| dalle | Image generation (DALL-E 3 or 4o native) |
| bio | Memory read/write for persistent user facts |
| file_search | RAG over uploaded documents |
| container.download | File retrieval from sandbox |
Code Interpreter
Runs in Docker containers managed by Kubernetes on Azure. Each container runs Debian 12 with a stateful Jupyter session that persists for the conversation's duration. Supports 11 languages (Python, JavaScript, Bash, Ruby, PHP, Go, Java, and more). Package installation via pip and npm through a proxy. No outbound network access from the container.
Web Search
A text-based browser limited to GET requests. Searches via Bing's index, retrieves 3-10 sources minimum, and cites findings with structured footnotes. Enterprise queries are disassociated from user accounts.
File Search (RAG)
Uploaded documents are parsed, chunked, embedded, and stored in a vector database. Retrieval uses both keyword and semantic search with query rewriting and reranking. Each vector store supports up to 10,000 files.
Memory System
ChatGPT uses a four-layer memory architecture:
| Layer | Scope | Mechanism |
|---|---|---|
| Saved Memories | Cross-session | Numbered facts with timestamps, stored via the bio tool |
| Chat History Profile | Cross-session | Inferred preferences, notable topics, user insights — updated out-of-band by OpenAI |
| Session Metadata | Single session | Device info, location, account age, usage metrics (~17 data points) |
| Current Context | Single session | Full un-summarized conversation history within the context window |
Saved memories are injected into the system prompt under a "Model Set Context" section. Deleting a chat does not remove its saved memories. Memory retrieval uses RAG under the hood.
For long conversations, compaction compresses older turns: user messages are preserved verbatim while assistant messages, tool calls, and reasoning are replaced with an encrypted summary item.
Safety Infrastructure
Training-Time
- RLHF — reinforcement learning from human feedback for alignment
- Rule-Based Reward Models (RBRMs) — zero-shot classifiers using propositions to evaluate safety aspects without extensive human data
- Deliberative Alignment (o-series) — the model reasons about safety guidelines in real-time using its chain of thought
Deployment-Time
- Moderation API — free classifier checking inputs/outputs against harm categories (CSAM, hate, violence)
- Safety Classifiers — domain-specific classifiers for biology, self-harm, CBRN
- Instruction Hierarchy — trained priority order: Root > System > Developer > User > Tool Output. Lower-privileged instructions are selectively ignored when they conflict with higher ones.
Reasoning Models (o-series)
The o1/o3/o4 models introduce a distinct pattern:
- Generate a hidden internal chain of thought before the visible response
- Trained via reinforcement learning to recognize mistakes, break down complex steps, and try alternative approaches
- Test-time search (o3+): generates hundreds of candidate reasoning paths and selects the best one
- A model-generated summary of the reasoning is shown; raw chains are kept internal
- Can agentically combine all ChatGPT tools during reasoning
Custom GPTs
Custom GPTs are composed of four elements:
| Component | Function |
|---|---|
| Instructions | System prompt defining behavior and constraints |
| Knowledge | Up to 20 files, auto-chunked and embedded into a vector store |
| Capabilities | Toggle: code interpreter, DALL-E, web search, canvas |
| Actions | External API calls defined via OpenAPI schemas |
The model uses function calling to decide when to invoke actions and generates the JSON input. Authentication supports API keys and OAuth.
Patterns Used
| Pattern | How It's Used |
|---|---|
| Router | Single model classifies and dispatches to ~7 tools |
| Tool Router | Function calling for tool selection and invocation |
| RAG | File search with chunking, embedding, and reranking |
| Code Execution | Sandboxed Docker containers for code interpreter |
| Extracted Facts | Bio tool saves structured facts across sessions |
| Guardrails | Moderation API + safety classifiers + instruction hierarchy |
| Citation | Web search results with structured source attribution |
| Conversation Summarization | Compaction for long conversations |
| Structured Output | Custom GPT actions via OpenAPI schemas |