pgvector and Search

HiveCFM ships semantic search over feedback records — a respondent types “the login button is confusing” and the app surfaces every related response, even ones that never contain the literal word “login”. The plumbing is built on pgvector plus the Go Hub’s SearchService.

What pgvector is

pgvector is a Postgres extension that adds a vector(N) column type and similarity operators (<=> cosine distance, <-> L2, <#> inner product). It lets Postgres act as a vector database without introducing a second data store.

The schema registers the extension in hivecfm-core/packages/database/schema.prisma:

extensions = [pgvector(map: "vector")]

Embeddings, briefly

An embedding is a fixed-length array of floats that captures a piece of text’s meaning. Two pieces of text whose meanings are close will have embedding vectors that are close in the vector space (cosine similarity near 1.0). We use an embedding model — configurable per tenant — to produce one vector per feedback record.

Cosine similarity (the measure we use) = dot product of two vectors divided by the product of their magnitudes. Ranges from -1 to 1; 1.0 means identical direction.

Where embeddings live

The canonical embedding storage is in the Hub’s database, not core’s. From hivecfm-hub/migrations/004_add_feedback_records_embedding.sql:

CREATE TABLE embeddings (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  feedback_record_id UUID NOT NULL REFERENCES feedback_records(id) ON DELETE CASCADE,
  embedding halfvec(768) NOT NULL,
  model TEXT NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  UNIQUE (feedback_record_id, model)
);
 
CREATE INDEX idx_embeddings ON embeddings USING hnsw (embedding halfvec_cosine_ops);

Worth noting:

halfvec(768) — half-precision floats (2 bytes per dimension instead of 4). Saves 50% of storage with under 1% impact on recall.
model column — lets us run A/B against multiple embedding models by storing one row per (feedback_record, model) pair.
HNSW index — Hierarchical Navigable Small World, an approximate-nearest-neighbour index tuned for fast top-K lookups. Built on halfvec_cosine_ops, i.e. cosine similarity.

How embeddings get written

Every new feedback record enqueues a FeedbackEmbeddingArgs River job. The worker in hivecfm-hub/internal/workers/feedback_embedding.go:

Reads the feedback record by id.
Calls the embedding client to turn the text into a 768-d vector.
Writes it to embeddings with (feedback_record_id, model) as the upsert key.

If the embedding call fails, River retries with backoff.

How search is served

Semantic search goes through hivecfm-hub/internal/service/search_service.go. The entry point is SearchService.SemanticSearch:

func (s *SearchService) SemanticSearch(
    ctx context.Context, query, tenantID string,
    limit int, minScore float64, cursor string,
    filters *models.SearchFilters,
) (SearchResult, error)

The flow:

Embed the query. Call the embedding client on the user’s query string. Cache by (query, model) in an LRU so repeat searches never re-embed.
Run nearest-neighbour. SELECT ... ORDER BY embedding <=> $query_vec LIMIT $k WHERE tenant_id = $tenant. Postgres uses the HNSW index; the scan is O(log N) in practice.
Optional sentiment re-rank. If the service is constructed with a SentimentClassifier, queries classified as positive/negative get a score boost or penalty on results with matching sentiment (capped at ±35%).
Cursor pagination. The last-distance + last-id pair is encoded as an opaque cursor so the next page can resume in the same index scan.

Singleflight ensures that if 100 requests hit the same cold cache at once, only one embedding round-trip happens — the other 99 wait on the same future. This matters for “heavy load on a popular query” moments.

Tuning knobs

minScore — consumer-side cosine threshold. Below this, results are dropped before pagination.
hnsw.ef_search — a Postgres session GUC that controls the HNSW traversal budget. Higher = better recall, slower query. Set at connection time when needed.
limit — page size. The service also enforces an internal max.

Extending the search surface

Search is exposed at the Hub’s HTTP API, not Prisma. If you need it from the web app, call the Hub via the internal Hub client — do not add a new SQL query on the web side. This preserves the “Hub owns embeddings” boundary and keeps the LRU cache effective.