Schema - Charcoal

Every namespace has an attributes schema that controls how document fields are stored, searched, and filtered. Your schema determines what the search agent can see and how it retrieves your data.

Defining a schema

Pass a schema object when uploading documents. Each key is an attribute name, and the value configures its type and indexing behavior.

curl -X POST https://api.withcharcoal.com/v1/namespaces/support-tickets/documents \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [
      {
        "id": "ticket-4521",
        "title": "Login page returns 500 error",
        "body": "Users report seeing a 500 error when attempting to log in via SSO...",
        "priority": "high",
        "created_at": "2025-03-10T14:22:00Z",
        "tags": ["auth", "sso", "production"]
      }
    ],
    "schema": {
      "title": { "type": "string", "is_searchable": true },
      "body": { "type": "string", "is_searchable": true },
      "priority": { "type": "string", "is_filterable": true },
      "created_at": { "type": "datetime", "is_filterable": true },
      "tags": { "type": "[]string", "is_filterable": true }
    }
  }'

If you omit the schema, Charcoal infers types from the document data. Inference works well for simple cases, but you should always define an explicit schema when you need searchable or filterable attributes — inference will not enable these for you.

Choosing what to index

Every attribute in your schema falls into one of three categories.

Searchable

The search agent runs BM25 full-text search queries against your searchable fields. It chooses which field to query, what search terms to use, and iterates with different queries until it finds what it needs. Searchable fields are the agent’s primary way of discovering relevant documents. Make a field searchable when:

It contains natural language text that describes the document’s content
A human would ctrl+F through it to find information
The field has enough textual content for keyword matching to be useful

Common searchable fields: title, body, description, content, summary, comments, notes Do not make a field searchable when:

It contains structured data (IDs, enums, dates, numbers) — use filterable instead
It contains very short values (single words, codes) — filtering is more precise
It duplicates content from another searchable field — this adds cost without improving recall

Only string and []string attributes can be searchable.

{
  "title": { "type": "string", "is_searchable": true },
  "body": { "type": "string", "is_searchable": true }
}

Filterable

Filterable attributes can be used in filter expressions — both by your API callers and by the search agent itself. The agent can autonomously apply filters on these attributes during its retrieval loop to narrow down results. Make a field filterable when:

It has a bounded set of values (status, priority, category, type)
You want to scope searches by time range, numeric threshold, or tag membership
The agent should be able to narrow its search autonomously (e.g., filtering by status: "open")

Common filterable fields: status, priority, category, created_at, updated_at, tags, type, author Do not make a field filterable when:

It contains unique or high-cardinality text (document bodies, descriptions) — use searchable instead
You never need to narrow results by this field’s value

{
  "priority": { "type": "string", "is_filterable": true },
  "created_at": { "type": "datetime", "is_filterable": true },
  "tags": { "type": "[]string", "is_filterable": true }
}

Both searchable and filterable

Some fields benefit from both. A title field might be searched for keywords and also filtered for exact matches. This doubles the storage cost for that attribute.

{
  "title": { "type": "string", "is_searchable": true, "is_filterable": true }
}

Stored only

Attributes without is_searchable or is_filterable are stored and returned in results, but the agent cannot query or filter against them. Use this for metadata you want to read but never search on — IDs, URLs, internal references, raw data blobs.

Example: choosing a schema

Consider a knowledge base with support articles:

{
  "schema": {
    "title":       { "type": "string",     "is_searchable": true },
    "body":        { "type": "string",     "is_searchable": true },
    "category":    { "type": "string",     "is_filterable": true },
    "product":     { "type": "string",     "is_filterable": true },
    "tags":        { "type": "[]string",   "is_filterable": true },
    "updated_at":  { "type": "datetime",   "is_filterable": true },
    "author":      { "type": "string" },
    "source_url":  { "type": "string" }
  }
}

title and body are searchable — the agent needs to find articles by their content.
category, product, tags, and updated_at are filterable — the agent and your callers can narrow searches to specific products, categories, or time ranges.
author and source_url are stored only — useful in results but not worth indexing.

Attribute types

Type	Description	Example
`string`	Text	`"hello"`
`int`	Signed integer	`42`
`uint`	Unsigned integer	`100`
`float`	Floating point number	`3.14`
`bool`	Boolean	`true`
`uuid`	UUID string	`"550e8400-e29b-41d4-a716-446655440000"`
`datetime`	ISO 8601 datetime	`"2025-01-15T10:30:00Z"`
`[]string`	Array of strings	`["a", "b"]`
`[]int`	Array of integers	`[1, 2, 3]`
`[]uint`	Array of unsigned integers	`[1, 2, 3]`
`[]float`	Array of floats	`[1.5, 2.5]`
`[]bool`	Array of booleans	`[true, false]`
`[]uuid`	Array of UUIDs	`["550e8400-..."]`
`[]datetime`	Array of datetimes	`["2025-01-15T10:30:00Z"]`

All attributes are nullable. Types like uuid, uint, and datetime must be declared explicitly in the schema — they cannot be inferred.

Updating attributes

You can modify the following properties on existing attributes in-place:

is_searchable — enable or disable full-text search indexing
is_filterable — enable or disable filter indexing

You can also add new attributes to an existing schema. New attributes default to null for documents that were uploaded before the attribute was added. To update a schema, submit the updated schema in a document upload request:

curl -X POST https://api.withcharcoal.com/v1/namespaces/support-tickets/documents \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "documents": [],
    "schema": {
      "priority": { "type": "string", "is_filterable": true, "is_searchable": true }
    }
  }'

After enabling indexing on an existing attribute, the index needs time to build before queries against it return complete results.

Limitations

Types are immutable. Once an attribute’s type is set, it cannot be changed. If you need a different type, create a new attribute.
Attributes cannot be deleted in-place. To remove an attribute, re-upsert all documents without it.
Attribute names must be 128 bytes or fewer.
Max attributes per namespace: 256.
Max attribute value size: 8 MiB (4 KiB for filterable attributes).
Max document size: 64 MiB (across all attributes).

Billing

Attribute storage is billed based on indexing:

Configuration	Cost
Unindexed (no searchable or filterable)	0.5x attribute size
Filterable only	1x attribute size
Searchable only (full-text search)	1x attribute size
Searchable + filterable	2x attribute size

Only enable is_searchable and is_filterable on attributes you actively need — each index doubles the storage cost of that attribute.

​Defining a schema

​Choosing what to index

​Searchable

​Filterable

​Both searchable and filterable

​Stored only

​Example: choosing a schema

​Attribute types

​Updating attributes

​Limitations

​Billing

Defining a schema

Choosing what to index

Searchable

Filterable

Both searchable and filterable

Stored only

Example: choosing a schema

Attribute types

Updating attributes

Limitations

Billing