Skip to content

API Reference

Every function Quarrel exports, with all parameters and return values.


Text Processing

stripFrontmatter(text)

Removes YAML frontmatter (--- blocks) from the start of a string.

ParamTypeDescription
textstringInput text

Returns: string

js
quarrel.stripFrontmatter("---\ntitle: Hello\n---\nBody");
// => "\nBody"

quarrel.stripFrontmatter("No frontmatter");
// => "No frontmatter"

normalizeMarkdown(text)

Strips all markdown syntax — frontmatter, code blocks, inline code, images, links, blockquotes, headings, and emphasis — leaving plain text.

ParamTypeDescription
textstringRaw markdown

Returns: string

js
quarrel.normalizeMarkdown("# Title\n\nSome **bold** and [a link](http://x.com).");
// => "Title Some bold and ."

tokenize(text, options?)

Splits text into lowercase words, removing punctuation, short words, and stopwords.

ParamTypeDefaultDescription
textstringInput text
options.minTokenLengthnumber3Shortest word to keep
options.stopwordsSet<string>built-inWords to skip

Returns: string[]

js
quarrel.tokenize("The quick brown fox jumps over the lazy dog");
// => ["quick", "brown", "fox", "jumps", "over", "lazy", "dog"]

quarrel.tokenize("AI & ML", { minTokenLength: 2 });
// => ["ai", "ml"]

buildEmbeddingText(input, options?)

Merges a title and content into one string for vectorization. Content gets markdown-stripped and trimmed to contentExcerptLength.

ParamTypeDefaultDescription
input.titlestring""Document title
input.contentstring""Document body (markdown OK)
options.contentExcerptLengthnumber500Max content characters to use

Returns: string

js
quarrel.buildEmbeddingText({
  title: "My Note",
  content: "# Heading\n\nSome content..."
});
// => "My Note Heading Some content..."

fingerprintText(text)

Returns an 8-character hex hash for change detection. Same input always gives the same output.

ParamTypeDescription
textstringInput text

Returns: string (8 hex characters)

js
quarrel.fingerprintText("hello world");
// => "cad44818"

Not for security — just for checking if content changed between runs.


Vectorization

buildTfidfVectors(texts, options?)

Takes an array of plain text strings and returns weighted vectors plus the vocabulary used.

ParamTypeDefaultDescription
textsstring[]Plain text strings
options.maxVocabnumber5000Cap on vocabulary size
options.minTokenLengthnumber3Shortest token to keep
options.stopwordsSet<string>built-inWords to skip

Returns: { vectors: number[][], vocab: string[] }

  • vectors — one per input, length matches vocab
  • vocab — the terms, in order (index = vector position)
js
const { vectors, vocab } = quarrel.buildTfidfVectors([
  "javascript closures are useful",
  "python decorators are elegant"
]);

buildHashedTfidfVectors(texts, options?)

Like buildTfidfVectors, but maps words to a fixed-size vector using hashing instead of building a vocabulary. Faster, constant memory, slightly less precise.

ParamTypeDefaultDescription
textsstring[]Plain text strings
options.hashDimnumber2048Size of each vector
options.minTokenLengthnumber3Shortest token to keep
options.stopwordsSet<string>built-inWords to skip

Returns: { vectors: number[][] }

js
const { vectors } = quarrel.buildHashedTfidfVectors(
  ["javascript closures", "python decorators"],
  { hashDim: 512 }
);
// vectors[0].length === 512

vectorizeDocuments(docs, options?)

The main entry point. Takes document objects, handles markdown cleanup, and returns vectors. Use this unless you need lower-level control.

ParamTypeDefaultDescription
docsArray<{ id, title?, content }>Your documents
options.contentExcerptLengthnumber500Max content characters
options.useHashingbooleanfalseUse feature hashing
options.hashDimnumber2048Vector size (hashing only)
options.maxVocabnumber5000Vocabulary cap (standard only)
options.minTokenLengthnumber3Shortest token to keep
options.stopwordsSet<string>built-inWords to skip

Returns: { vectors: number[][], vocab?: string[], texts: string[] }

  • vectors — one per document
  • vocab — only present when useHashing is false
  • texts — the cleaned strings that were actually vectorized
js
const { vectors } = quarrel.vectorizeDocuments(
  [
    { id: "a", title: "Intro", content: "# Welcome\n\nHello world." },
    { id: "b", title: "Guide", content: "## Setup\n\nInstall and run." }
  ],
  { useHashing: true }
);

Similarity

cosineSimilarity(vecA, vecB)

Scores how similar two vectors are. 1 means identical, 0 means nothing in common.

ParamTypeDescription
vecAnumber[]First vector
vecBnumber[]Second vector

Returns: number (0 to 1)

Returns 0 for null/empty/mismatched vectors.

js
quarrel.cosineSimilarity([1, 0, 0], [1, 0, 0]); // => 1
quarrel.cosineSimilarity([1, 0, 0], [0, 1, 0]); // => 0
quarrel.cosineSimilarity([1, 1, 0], [1, 0, 0]); // => ~0.707

calculateSimilarities(items, options?)

Compares every item to every other item and returns ranked matches.

ParamTypeDefaultDescription
itemsArray<{ id, title, embedding }>Items with vectors
options.maxSimilarnumber5How many matches to return per item

Returns: Record<string, Array<{ id, title, similarity }>>

A map from each item's ID to its top matches, sorted by score.

js
const matches = quarrel.calculateSimilarities(
  [
    { id: "a", title: "Note A", embedding: [1, 0, 0] },
    { id: "b", title: "Note B", embedding: [0.9, 0.1, 0] },
    { id: "c", title: "Note C", embedding: [0, 0, 1] }
  ],
  { maxSimilar: 2 }
);

// matches["a"] => [
//   { id: "b", title: "Note B", similarity: 0.994 },
//   { id: "c", title: "Note C", similarity: 0 }
// ]