Skip to main content

ChatGPT

OpenAI's flagship conversational AI. A single orchestrator model with function-calling access to ~7 tools, a four-layer memory system, multi-model routing (GPT-4o, o3, o4-mini), and layered safety infrastructure.


Architecture


Core Loop

ChatGPT operates as a single orchestrator model that decides whether to respond directly or invoke tools. Tool definitions (~7 tools) are injected into the system prompt as function schemas. The model evaluates the user's message against all tool descriptions and either generates a direct response or a structured tool call.

The model is stateless — every request includes the full conversation history within the context window. The system prompt includes tool definitions, saved memories, user preferences, recent conversation summaries, and behavioral instructions.


Tool System

ToolFunction
pythonCode execution in a sandboxed Jupyter environment
webWeb search via Bing index
dalleImage generation (DALL-E 3 or 4o native)
bioMemory read/write for persistent user facts
file_searchRAG over uploaded documents
container.downloadFile retrieval from sandbox

Code Interpreter

Runs in Docker containers managed by Kubernetes on Azure. Each container runs Debian 12 with a stateful Jupyter session that persists for the conversation's duration. Supports 11 languages (Python, JavaScript, Bash, Ruby, PHP, Go, Java, and more). Package installation via pip and npm through a proxy. No outbound network access from the container.

A text-based browser limited to GET requests. Searches via Bing's index, retrieves 3-10 sources minimum, and cites findings with structured footnotes. Enterprise queries are disassociated from user accounts.

File Search (RAG)

Uploaded documents are parsed, chunked, embedded, and stored in a vector database. Retrieval uses both keyword and semantic search with query rewriting and reranking. Each vector store supports up to 10,000 files.


Memory System

ChatGPT uses a four-layer memory architecture:

LayerScopeMechanism
Saved MemoriesCross-sessionNumbered facts with timestamps, stored via the bio tool
Chat History ProfileCross-sessionInferred preferences, notable topics, user insights — updated out-of-band by OpenAI
Session MetadataSingle sessionDevice info, location, account age, usage metrics (~17 data points)
Current ContextSingle sessionFull un-summarized conversation history within the context window

Saved memories are injected into the system prompt under a "Model Set Context" section. Deleting a chat does not remove its saved memories. Memory retrieval uses RAG under the hood.

For long conversations, compaction compresses older turns: user messages are preserved verbatim while assistant messages, tool calls, and reasoning are replaced with an encrypted summary item.


Safety Infrastructure

Training-Time

  • RLHF — reinforcement learning from human feedback for alignment
  • Rule-Based Reward Models (RBRMs) — zero-shot classifiers using propositions to evaluate safety aspects without extensive human data
  • Deliberative Alignment (o-series) — the model reasons about safety guidelines in real-time using its chain of thought

Deployment-Time

  • Moderation API — free classifier checking inputs/outputs against harm categories (CSAM, hate, violence)
  • Safety Classifiers — domain-specific classifiers for biology, self-harm, CBRN
  • Instruction Hierarchy — trained priority order: Root > System > Developer > User > Tool Output. Lower-privileged instructions are selectively ignored when they conflict with higher ones.

Reasoning Models (o-series)

The o1/o3/o4 models introduce a distinct pattern:

  • Generate a hidden internal chain of thought before the visible response
  • Trained via reinforcement learning to recognize mistakes, break down complex steps, and try alternative approaches
  • Test-time search (o3+): generates hundreds of candidate reasoning paths and selects the best one
  • A model-generated summary of the reasoning is shown; raw chains are kept internal
  • Can agentically combine all ChatGPT tools during reasoning

Custom GPTs

Custom GPTs are composed of four elements:

ComponentFunction
InstructionsSystem prompt defining behavior and constraints
KnowledgeUp to 20 files, auto-chunked and embedded into a vector store
CapabilitiesToggle: code interpreter, DALL-E, web search, canvas
ActionsExternal API calls defined via OpenAPI schemas

The model uses function calling to decide when to invoke actions and generates the JSON input. Authentication supports API keys and OAuth.


Patterns Used

PatternHow It's Used
RouterSingle model classifies and dispatches to ~7 tools
Tool RouterFunction calling for tool selection and invocation
RAGFile search with chunking, embedding, and reranking
Code ExecutionSandboxed Docker containers for code interpreter
Extracted FactsBio tool saves structured facts across sessions
GuardrailsModeration API + safety classifiers + instruction hierarchy
CitationWeb search results with structured source attribution
Conversation SummarizationCompaction for long conversations
Structured OutputCustom GPT actions via OpenAPI schemas