# Knowledge file → File Search → Responses API

How to upload a knowledge file to OpenAI, stage it as a File Search vector store,
and reference it from a Responses-API chat call. End-to-end with the scripts in
this repo (`bin/stageKnowledge.js` and `bin/chat.js -a chatgpt_rest_responses`).

## Prerequisites

- `OPENAI_API_KEY` exported in your environment (or sourced from `.env`).
- A file in a File Search–supported format (markdown, plain text, PDF, HTML,
  several Office formats). Check OpenAI's docs for the current list and the
  per-file size limit.

## Step 1 — Stage the knowledge file

`bin/stageKnowledge.js` performs four API calls and prints the resulting vector
store ID on stdout (progress lines go to stderr so the ID can be captured cleanly):

1. `POST /v1/files` with `purpose=assistants` — uploads the raw file.
2. `POST /v1/vector_stores` (or `POST /v1/vector_stores/{id}/files` if
   `--reuse` is given) — creates the vector store and attaches the file.
3. `GET /v1/vector_stores/{id}/files/{file_id}` polled until `status` is
   `completed` or `failed`.
4. Vector store ID written to stdout.

```bash
VS_ID=$(./bin/stageKnowledge.js path/to/notes.md)
echo "$VS_ID"   # e.g. vs_6a1ce825e40c8191a52e8dbf44ee3270
```

### Variants

Custom vector store name (default is the file stem):

```bash
./bin/stageKnowledge.js notes.md --name 'polish-notes-2026-05'
```

Add a second file to an existing vector store rather than creating a new one:

```bash
./bin/stageKnowledge.js extra-notes.md --reuse "$VS_ID"
```

Show usage:

```bash
./bin/stageKnowledge.js -h
```

## Step 2 — Query via the Responses API

`bin/chat.js -a chatgpt_rest_responses` accepts one or more `--vector-store <id>`
flags. The lib function (`lib/chatgpt_rest_responses.js`) translates these into
the request body's `tools: [{ type: 'file_search', vector_store_ids: [...] }]`
field on `POST /v1/responses`. The model decides when to invoke file_search and
inlines retrieved chunks into its reply.

```bash
source .env
printf 'Tell me my top 3 Polish pronunciation difficulties.' \
  | ./bin/chat.js -a chatgpt_rest_responses --vector-store "$VS_ID"
```

### Variants

Multiple knowledge bases — pass the flag repeatedly:

```bash
... --vector-store "$VS_POLISH" --vector-store "$VS_GERMAN"
```

Combine with a system message (added to `input[]`) or top-level instructions
(canonical Responses-API field). Both work together:

```bash
... --system 'Be terse.' --vector-store "$VS_ID"
... --instructions 'Cite the source line numbers.' --vector-store "$VS_ID"
```

Nudge the model to actually use file_search (it can choose not to on simple
prompts):

```bash
... --instructions 'Use the file_search tool to ground your answer in my notes.' \
    --vector-store "$VS_ID"
```

Override the model (gpt-4o-mini default; any tool-capable model works):

```bash
... --model gpt-4o --vector-store "$VS_ID"
```

## Cleanup / management

`bin/stageKnowledge.js` only creates artifacts; nothing deletes them. Vector
stores and their underlying files persist on your OpenAI account and continue
to incur storage costs until removed. Use the OpenAI platform dashboard at
https://platform.openai.com/storage, or the REST API directly:

```bash
# List vector stores
curl -sS -H "Authorization: Bearer $OPENAI_API_KEY" \
  https://api.openai.com/v1/vector_stores

# Inspect one
curl -sS -H "Authorization: Bearer $OPENAI_API_KEY" \
  "https://api.openai.com/v1/vector_stores/$VS_ID"

# Delete a vector store (detaches files; the underlying /v1/files records remain
# and must be deleted separately if no longer needed)
curl -sS -X DELETE -H "Authorization: Bearer $OPENAI_API_KEY" \
  "https://api.openai.com/v1/vector_stores/$VS_ID"

# List uploaded files
curl -sS -H "Authorization: Bearer $OPENAI_API_KEY" \
  https://api.openai.com/v1/files

# Delete an uploaded file
curl -sS -X DELETE -H "Authorization: Bearer $OPENAI_API_KEY" \
  "https://api.openai.com/v1/files/$FILE_ID"
```

## Notes / gotchas

- **Indexing is asynchronous** — `bin/stageKnowledge.js` blocks until the file's
  per-file status reaches `completed` or `failed`. Don't try to query a vector
  store before this finishes; file_search will return nothing.
- **The aggregate `file_counts` is eventually consistent** — right after creating
  a vector store with `file_ids`, the aggregate counters can briefly report
  `in_progress: 0` before the attachment is registered. That's why the script
  polls the per-file endpoint instead.
- **Retrieval is non-deterministic** — the model may choose not to invoke
  file_search on a given turn. If a query that should hit the knowledge base
  doesn't, add an `--instructions` nudge as shown above.
- **File Search vs ChatGPT Projects** — the ChatGPT UI's "project source files"
  are a separate, UI-only mechanism. They are *not* accessible from
  `/v1/responses` API calls. This workflow is the API-side equivalent.
