AI / LLM Workflows

Autotel provides all the building blocks for comprehensive AI/LLM observability:

Canonical gen_ai.* instrumentation via the autotel-genai package — traceGenAI(), token usage, cost, metric views, content/evaluation events, and an agent governance layer
Automatic LLM instrumentation via OpenLLMetry integration
Workflow orchestration via nested trace() calls
Context propagation via AsyncLocalStorage (correlation IDs, user context, etc.)
Business event tracking via ctx.setAttribute() and track()
Multi-destination events via adapters (PostHog, Mixpanel, etc.)

The `autotel-genai` package

GenAI/LLM instrumentation lives in a dedicated package, autotel-genai — the core autotel package is generic and AI-free. It emits the canonical OpenTelemetry GenAI semantic conventions (gen_ai.*, semconv v1.42.0), so any OTLP backend renders token usage, cost, and model info without custom mapping.

npm install autotel autotel-genai

import { traceGenAI, recordGenAiResponse, recordGenAiUsage } from 'autotel-genai/trace';

// Span named `chat gpt-4o`; canonical gen_ai.* request attributes set up front.
export const chat = traceGenAI({
  provider: 'openai',
  model: 'gpt-4o',
  operation: 'chat',
})((ctx) => async (prompt: string) => {
  const res = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: prompt }],
  });
  recordGenAiResponse(ctx, {
    model: res.model,
    finishReasons: res.choices.map((c) => c.finish_reason),
  });
  // gen_ai.usage.input_tokens / output_tokens + estimated gen_ai.usage.cost.usd
  recordGenAiUsage(ctx, 'gpt-4o', {
    inputTokens: res.usage?.prompt_tokens,
    outputTokens: res.usage?.completion_tokens,
  });
  return res.choices[0].message.content;
});

autotel-genai and OpenLLMetry are complementary: use autotel-genai for first-party, canonical control (and for cost, metric buckets, agents); enable OpenLLMetry to auto-instrument LLM SDK calls you don’t wrap yourself. Both emit gen_ai.* and share the same trace tree.

The package splits into focused subpath exports, so you import only what you use:

Subpath	Provides
`autotel-genai/trace`	`traceGenAI()`, `recordGenAiResponse()`, `recordGenAiUsage()`
`autotel-genai/cost`	`estimateLLMCost()`, `recordLLMCost()`, `MODEL_PRICING` (cache-read/write aware)
`autotel-genai/metrics`	`genAiMetricViews()` to re-bucket the canonical histograms
`autotel-genai/events`	opt-in content events (input/output gating), `inference.operation.details`, `evaluation.result`, `recordModelWarnings()`
`autotel-genai/guard`	inline cost/token/loop kill-switch: `createGenAiBudget()`, `createGenAiGuard()`, `parseGuardRules()`
`autotel-genai/streaming`	streaming performance: `createStreamTimer()`, `recordStreamTiming()` (TTFC, throughput, inter-chunk distribution)
`autotel-genai/observer`	`autotelTelemetry()` for `registerTelemetry()`, `subscribeAiTelemetry()` for the `ai:telemetry` channel, plus `createGenAiObserver()`, `createLangChainObserver()`, and `observeAiSdkResult()`
`autotel-genai/ai-sdk`	legacy Vercel AI SDK `ai.` to `gen_ai.` mapping plus cost, and `autotelEnrich()` for `@ai-sdk/otel`
`autotel-genai/agent`	agent identity, delegation, policy, audit, privacy, non-repudiation
`autotel-genai/semconv`	`GEN_AI_*` keys, `GEN_AI_OPERATION`, `GEN_AI_PROVIDER`, `genAiSpanName()`

Capturing framework event streams

traceGenAI() wraps code you own. When a framework emits its own telemetry stream, subscribe createGenAiObserver and feed every event through it. It rebuilds the gen_ai.* span tree and force-closes any child whose terminal event never arrives.

Token usage and cost land on leaf chat spans. Aggregate agent and workflow spans carry no gen_ai.usage.*, so a backend that sums usage across a trace counts each call once.

For the Vercel AI SDK, the primary path is the live Telemetry integration:

import { registerTelemetry } from 'ai';
import { autotelTelemetry, subscribeAiTelemetry } from 'autotel-genai/observer';

registerTelemetry(autotelTelemetry()); // preferred: live spans + cost + streaming timing

const unsubscribe = subscribeAiTelemetry(); // fallback: zero-config ai:telemetry channel

autotelTelemetry() emits canonical gen_ai.* spans as calls run, prices every model call, records streaming timing, and nests provider HTTP spans plus nested tool-triggered generateText() calls under the right parent. The channel subscriber gives you the same tree with usage and cost, but not per-call streaming timing.

The event-stream adapters are still useful when you already have a finished AI SDK result or another framework lifecycle stream:

import {
  createGenAiObserver,
  createLangChainObserver,
  observeAiSdkResult,
} from 'autotel-genai/observer';

const observe = createGenAiObserver();

// LangChain / LangGraph: one callback handler turns runId/parentRunId into the tree.
await graph.invoke(input, { callbacks: [createLangChainObserver(observe)] });

// Vercel AI SDK legacy/result-walker path: walk a finished generateText/streamText result.
observeAiSdkResult(observe, await generateText({ model, prompt }), {
  id: 'gen-1',
  provider: 'openai',
  model: 'gpt-4o',
});

Prompts, tool arguments, and tool results stay off spans by default. Pass an exportContent callback to opt in and redact what reaches each span.

The example-langchain-observer app runs a LangGraph agent on a local Ollama model and prints the captured tree, token counts, and tool arguments.

When to Use OpenLLMetry

Use Case	Recommendation	Why
Using LLM SDKs (OpenAI, Anthropic, etc.)	Enable OpenLLMetry	Automatic capture of prompts, completions, tokens
Custom LLM integrations	Manual `trace()` only	OpenLLMetry won’t detect custom integrations
Workflow orchestration	Always use `trace()`	Critical for tracking workflow steps
Business metrics	Always use `trace()` + `track()`	Domain events require explicit instrumentation
Production applications	Use both together	OpenLLMetry handles LLM internals, `trace()` handles everything else

What Each Approach Provides

OpenLLMetry Automatic Instrumentation

When enabled via init({ openllmetry: { enabled: true } }), OpenLLMetry automatically captures:

// Example: Using Vercel AI SDK
import { generateText } from 'ai';

// OpenLLMetry automatically instruments this call - zero code changes needed!
const result = await generateText({
  model: openai('gpt-4o'),
  prompt: 'Explain quantum computing',
});

// Automatic span attributes captured:
// - llm.request.model: "gpt-4o"
// - llm.provider: "openai"
// - llm.request.temperature: 0.7
// - llm.usage.prompt_tokens: 45
// - llm.usage.completion_tokens: 128
// - llm.usage.total_tokens: 173
// - llm.prompts.0.content: "Explain quantum computing"
// - llm.completions.0.content: "[full response text]"

What you get automatically:

LLM API request/response details (prompts, completions, model parameters)
Token usage tracking (prompt, completion, total)
Timing and latency for each LLM call
Error capture for failed LLM requests
Support for streaming responses
Works with 20+ LLM providers/SDKs (OpenAI, Anthropic, Langchain, LlamaIndex, Vercel AI SDK, etc.)

What you DON’T get:

Business workflow context (which agent? which step? why called?)
Business metrics (escalations, user satisfaction, custom events)
Correlation across workflow steps
Custom attributes for your domain logic

Manual `trace()` Instrumentation

Using autotel’s trace() function provides full control over observability:

import { trace } from 'autotel';

const triageAgent = trace('agent.triage', (ctx) => async (input: string) => {
  // Business context
  ctx.setAttributes({
    'agent.role': 'triage',
    'agent.purpose': 'route_to_specialist',
    'workflow.step': 1,
  });

  // Call LLM (OpenLLMetry will auto-instrument this call)
  const result = await generateText({
    model: openai('gpt-4o-mini'),
    prompt: `Triage this request: ${input}`,
  });

  // Business metrics
  const requiresEscalation = result.text.includes('ESCALATE');
  ctx.setAttribute('triage.escalation_required', requiresEscalation);

  return { decision: result.text, escalate: requiresEscalation };
});

What you get with trace():

Named workflow steps (clear span names like “agent.triage”)
Business attributes (agent roles, workflow state, custom logic)
Correlation IDs automatically propagated
Parent-child span relationships for complex workflows
Integration with events via track() events
Works with ANY code (LLM or non-LLM)

Setup

import { init } from 'autotel';

init({
  service: 'my-ai-app',
  endpoint: process.env.OTLP_ENDPOINT,
  openllmetry: {
    enabled: true, // Enable automatic LLM instrumentation
    options: {
      disableBatch: process.env.NODE_ENV !== 'production',
    },
  },
});

Setup Guide

Option 1: OpenLLMetry Only (Not Recommended)

If you only enable OpenLLMetry without using trace(), you’ll get LLM call details but miss business context:

import { init } from 'autotel';

init({
  service: 'my-ai-app',
  openllmetry: { enabled: true },
});

// You'll see LLM spans but no workflow context
const result = await generateText({ model: openai('gpt-4o'), prompt: 'test' });
// No way to know: which agent? which step? which user? why called?

Option 2: Manual trace() Only (Good for Custom Models)

If you’re using custom LLM integrations or direct HTTP calls:

import { traceGenAI, recordGenAiUsage } from 'autotel-genai/trace';

const callCustomLLM = traceGenAI({
  provider: 'self-hosted',
  model: 'my-custom-model-v2',
  operation: 'chat',
})((ctx) => async (prompt: string) => {
  const response = await fetch('https://my-llm-api.com/generate', {
    method: 'POST',
    body: JSON.stringify({ prompt }),
  });

  const data = await response.json();
  // Canonical gen_ai.usage.* tokens + cost — supply a pricing map for custom models
  recordGenAiUsage(
    ctx,
    'my-custom-model-v2',
    { inputTokens: data.usage.inputTokens, outputTokens: data.usage.outputTokens },
    { pricing: { 'my-custom-model-v2': { inputPer1M: 0, outputPer1M: 0 } } },
  );
  return data.text;
});

Option 3: Both Together (Recommended)

For production applications using LLM SDKs:

import { init, trace } from 'autotel';

init({
  service: 'production-ai-app',
  openllmetry: { enabled: true }, // Auto-instrument LLM SDKs
});

// Your workflow code uses trace() for business logic
const workflow = trace('workflow.main', (ctx) => async (input: string) => {
  // OpenLLMetry will auto-instrument any LLM calls inside
  // trace() provides workflow context and business metrics
  // Both appear as child spans in the same trace tree
});

Quick Decision Tree

Are you using LLM SDKs (OpenAI, Anthropic, Vercel AI SDK, Langchain)?
├─ Yes
│  └─ Enable OpenLLMetry
│     └─ Do you need business context/metrics?
│        ├─ Yes → Also use trace() (RECOMMENDED)
│        └─ No → OpenLLMetry only (you'll regret this later)
│
└─ No (custom models, direct HTTP)
   └─ Use trace() only
      └─ Add AI semantic conventions manually

Basic AI Operation

import { traceGenAI, recordGenAiUsage } from 'autotel-genai/trace';

// Span named `chat gpt-4o`, with canonical gen_ai.* request attributes.
const generateResponse = traceGenAI({
  provider: 'openai',
  model: 'gpt-4o',
  operation: 'chat',
})((ctx) => async (prompt: string) => {
  const response = await llm.generate(prompt);
  // gen_ai.usage.input_tokens / output_tokens + estimated gen_ai.usage.cost.usd
  recordGenAiUsage(ctx, 'gpt-4o', {
    inputTokens: response.usage.inputTokens,
    outputTokens: response.usage.outputTokens,
  });
  return response;
});

Core Concepts

Correlation IDs

Correlation IDs automatically propagate through your entire workflow, making it easy to trace requests across multiple agents, services, and LLM calls.

import { trace, track } from 'autotel';

export const processUserRequest = trace(
  'ai.user_request',
  (ctx) => async (userId: string, message: string) => {
    // Correlation ID is automatically available
    console.log('Trace ID:', ctx.traceId);
    console.log('Correlation ID:', ctx.correlationId); // First 16 chars of traceId

    // All nested operations inherit this correlation context
    const analysis = await analyzeIntent(message);
    const response = await generateResponse(analysis);

    // Events automatically include correlation IDs
    track('ai.request_completed', {
      userId,
      intent: analysis.intent,
      // correlationId, traceId, spanId are auto-added!
    });

    return response;
  },
);

What you get automatically:

ctx.traceId - Full OpenTelemetry trace ID
ctx.correlationId - Short correlation ID (first 16 chars)
ctx.spanId - Current span ID
Automatic propagation to all nested trace() calls
Enrichment of all track() events
Inclusion in structured logs (via autotel/logger)

Multi-Step Workflows

Create parent-child span hierarchies naturally with nested trace() calls. Each step becomes a child span with automatic error handling and lifecycle management.

import { trace } from 'autotel';

export const processDocument = trace(
  'document.processing',
  (ctx) => async (docId: string) => {
    ctx.setAttribute('document.id', docId);
    ctx.setAttribute('workflow.type', 'document_processing');

    // Step 1: Load document (creates child span)
    const document = await trace('document.load', async () => {
      return await loadDocument(docId);
    });

    // Step 2: Analyze with LLM (creates child span, OpenLLMetry auto-instruments LLM call)
    const analysis = await trace('document.analyze', async () => {
      const result = await llm.analyze(document.content);
      return result;
    });

    // Step 3: Store results (creates child span)
    const stored = await trace('document.store', async () => {
      return await storeAnalysis(docId, analysis);
    });

    return stored;
  },
);

Span Hierarchy Created:

document.processing (parent)
├── document.load (child)
├── document.analyze (child)
│   └── openai.chat.completions (child, auto-instrumented by OpenLLMetry)
└── document.store (child)

Domain Events

Track business-level events alongside technical telemetry using ctx.setAttribute() for span attributes and track() for events.

import { trace, track } from 'autotel';

export const handleAgentHandoff = trace(
  'agent.handoff',
  (ctx) => async (task: Task) => {
    const startTime = performance.now();

    // Set domain-specific span attributes
    ctx.setAttributes({
      'agent.from': 'triage',
      'agent.to': 'specialist',
      'task.priority': task.priority,
      'task.category': task.category,
    });

    // Perform handoff
    const result = await specialistAgent.process(task);

    // Track business metric with precise duration
    track('agent.handoff_completed', {
      from: 'triage',
      to: 'specialist',
      duration_ms: Math.round(performance.now() - startTime),
      success: true,
    });

    return result;
  },
);

Multi-Step Workflow

const workflow = trace('ai.workflow', (ctx) => async (input: string) => {
  const analysis = await trace('step1.analyze', async () => {
    return await analyzeInput(input);
  });

  const response = await trace('step2.generate', async () => {
    return await generateResponse(analysis);
  });

  return response;
});

Agent Handoffs

const runAgentWorkflow = trace(
  'workflow.agents',
  (ctx) => async (input: string) => {
    ctx.setAttributes({
      'workflow.type': 'multi_agent',
      'workflow.correlation_id': ctx.correlationId,
    });

    const triageResult = await triageAgent(input);
    ctx.setAttribute('handoff.from', 'triage');

    const specialistResult = await specialistAgent(triageResult);

    return specialistResult;
  },
);

Pattern: Multi-Agent Workflows

Multi-agent systems require tracking “baton passes” between agents with full context propagation.

Triage, Specialist, and QA Escalation

import { trace, track } from 'autotel';
import { generateText, generateObject } from 'ai';

// Agent 1: Triage
const triageAgent = trace('agent.triage', (ctx) => async (userRequest: string) => {
  ctx.setAttributes({
    'agent.role': 'triage',
    'agent.model': 'gpt-4o-mini',
  });

  const result = await generateText({
    model: openai('gpt-4o-mini'),
    prompt: `Analyze this request and create a plan: ${userRequest}`,
  });

  track('agent.triage_completed', {
    request_length: userRequest.length,
    plan_length: result.text.length,
  });

  return {
    plan: result.text,
    requiresSpecialist: true,
  };
});

// Agent 2: Specialist
const specialistAgent = trace('agent.specialist', (ctx) => async (plan: string) => {
  ctx.setAttributes({
    'agent.role': 'specialist',
    'agent.model': 'gpt-4o',
  });

  ctx.track('specialist_engaged', { plan_length: plan.length });

  const result = await generateText({
    model: openai('gpt-4o'),
    prompt: `Execute this plan: ${plan}`,
  });

  track('agent.specialist_completed', {
    plan_length: plan.length,
    response_length: result.text.length,
  });

  return {
    response: result.text,
    requiresQA: true,
  };
});

// Agent 3: QA
const qaAgent = trace('agent.qa', (ctx) => async (response: string) => {
  ctx.setAttributes({
    'agent.role': 'qa',
    'agent.model': 'gpt-4o',
  });

  const result = await generateObject({
    model: openai('gpt-4o'),
    schema: z.object({
      approved: z.boolean(),
      feedback: z.string().optional(),
      requiresFollowUp: z.boolean(),
    }),
    prompt: `Review this response for quality: ${response}`,
  });

  ctx.setAttribute('qa.approved', result.object.approved);

  track('agent.qa_completed', {
    approved: result.object.approved,
    requires_follow_up: result.object.requiresFollowUp,
  });

  return result.object;
});

// Orchestrator: Workflow coordinator
export const runMultiAgentWorkflow = trace(
  'workflow.multi_agent_escalation',
  (ctx) => async (userRequest: string, userId: string) => {
    ctx.setAttributes({
      'workflow.type': 'multi_agent_escalation',
      'workflow.user_id': userId,
      'workflow.correlation_id': ctx.correlationId,
    });

    // Step 1: Triage
    const triage = await triageAgent(userRequest);
    ctx.track('triage_complete', { requires_specialist: triage.requiresSpecialist });

    // Step 2: Specialist (if needed)
    let response;
    if (triage.requiresSpecialist) {
      response = await specialistAgent(triage.plan);
      ctx.track('specialist_complete', { requires_qa: response.requiresQA });
    }

    // Step 3: QA (if needed)
    let qa;
    if (response?.requiresQA) {
      qa = await qaAgent(response.response);
      ctx.track('qa_complete', { approved: qa.approved });
    }

    // Track workflow completion
    track('workflow.completed', {
      workflow_type: 'multi_agent_escalation',
      user_id: userId,
      agents_involved: qa ? 3 : response ? 2 : 1,
      final_approval: qa?.approved ?? true,
    });

    return {
      plan: triage.plan,
      response: response?.response,
      qa: qa,
    };
  },
);

RAG Pipeline

import { trace } from 'autotel';
import { embed } from 'ai';
import { openai } from '@ai-sdk/openai';

// Step 1: Generate embeddings
const generateEmbeddings = trace('rag.embeddings', (ctx) => async (query: string) => {
  ctx.setAttribute('query.length', query.length);

  const { embedding } = await embed({
    model: openai.embedding('text-embedding-3-small'),
    value: query,
  });

  ctx.setAttribute('embedding.dimensions', embedding.length);

  return embedding;
});

// Step 2: Vector search
const vectorSearch = trace(
  'rag.search',
  (ctx) => async (embedding: number[], topK: number = 5) => {
    ctx.setAttributes({
      'search.top_k': topK,
      'search.embedding_dimensions': embedding.length,
    });

    const results = await vectorDb.search(embedding, topK);

    ctx.setAttribute('search.results_count', results.length);

    return results;
  },
);

// Step 3: Generate response with context
const generateWithContext = trace(
  'rag.generate',
  (ctx) => async (query: string, context: string[]) => {
    ctx.setAttributes({
      'generation.context_chunks': context.length,
      'generation.model': 'gpt-4o',
    });

    const prompt = `
Context:
${context.join('\n\n')}

Question: ${query}

Answer based on the context above:
    `.trim();

    const result = await generateText({
      model: openai('gpt-4o'),
      prompt,
    });

    ctx.setAttributes({
      'generation.tokens_used': result.usage.totalTokens,
      'generation.response_length': result.text.length,
    });

    return result.text;
  },
);

// Complete RAG Pipeline
export const ragPipeline = trace(
  'rag.pipeline',
  (ctx) => async (query: string, userId: string) => {
    ctx.setAttributes({
      'pipeline.type': 'rag',
      'pipeline.user_id': userId,
      'pipeline.query': query,
    });

    const embedding = await generateEmbeddings(query);
    ctx.track('embeddings_generated');

    const searchResults = await vectorSearch(embedding);
    ctx.track('search_completed', { results_count: searchResults.length });

    const context = searchResults.map((r) => r.content);
    const response = await generateWithContext(query, context);
    ctx.track('generation_completed', { response_length: response.length });

    track('rag.pipeline_completed', {
      user_id: userId,
      query_length: query.length,
      results_retrieved: searchResults.length,
      response_length: response.length,
    });

    return {
      query,
      response,
      sources: searchResults.map((r) => r.metadata),
    };
  },
);

Span Hierarchy:

rag.pipeline (parent)
├── rag.embeddings (child)
│   └── openai.embeddings (auto-instrumented by OpenLLMetry)
├── rag.search (child)
│   └── pinecone.query (auto-instrumented by OpenLLMetry)
└── rag.generate (child)
    └── openai.chat.completions (auto-instrumented by OpenLLMetry)

Pattern: Streaming Responses

Streaming latency is two numbers, not one: time to first chunk (the wait before anything appears) and throughput (how fast tokens then arrive). A single duration hides both. createStreamTimer from autotel-genai/streaming captures the full picture and records the headline values as canonical gen_ai.response.* attributes.

import { traceGenAI, recordGenAiUsage } from 'autotel-genai/trace';
import { createStreamTimer, recordStreamTiming } from 'autotel-genai/streaming';
import { streamText } from 'ai';

export const generateStreamingResponse = traceGenAI({
  provider: 'openai',
  model: 'gpt-4o',
  operation: 'chat',
})((ctx) => async (prompt: string) => {
  const timer = createStreamTimer();
  const stream = await streamText({ model: openai('gpt-4o'), prompt });

  const chunks: string[] = [];
  for await (const chunk of stream.textStream) {
    timer.chunk(); // first call also marks time-to-first-chunk
    chunks.push(chunk);
  }

  const usage = await stream.usage;
  // gen_ai.response.time_to_first_chunk / .time_to_finish /
  // .output_tokens_per_second / .time_per_output_chunk (seconds)
  recordStreamTiming(ctx, timer.finish({ outputTokens: usage.outputTokens }));
  recordGenAiUsage(ctx, 'gpt-4o', {
    inputTokens: usage.inputTokens,
    outputTokens: usage.outputTokens,
  });

  return chunks.join('');
});

computeStreamTiming is the pure function underneath — it also returns the inter-chunk gap distribution { min, p10, median, avg, p90, max } (seconds) for diagnosing token-arrival jitter.

Pattern: Budgets & Guardrails

A guard runs during a run, not after the bill. Feed it each step (an LLM call, a tool call, a delegation) and it accumulates cost / tokens / loop state, then halts the run when a rule crosses its threshold — aborting an AbortSignal and (by default) throwing a GEN_AI_GUARD_STOP structured error. All deterministic, no LLM in the loop.

import { createGenAiBudget } from 'autotel-genai/guard';
import { estimateLLMCost } from 'autotel-genai/cost';

const budget = createGenAiBudget({ maxCostUsd: 5, warnAtUsd: 4 });

for (const task of tasks) {
  if (budget.stopped) break;
  const res = await model.chat(task);
  budget.record(
    {
      kind: 'llm',
      usage: {
        costUsd: estimateLLMCost('gpt-4o', {
          inputTokens: res.usage.input,
          outputTokens: res.usage.output,
        }),
      },
    },
    ctx, // optional TraceContext → records gen_ai.guard.* + gen_ai.session.* telemetry
  ); // throws once total cost > $5
}

Rules can also come from a shorthand string — cost/token ceilings, spin-loop detection (N identical calls in a window of M), error loops, tool-call/step caps, wall-clock timeouts, and context-window budgets:

import { createGenAiGuard, parseGuardRules } from 'autotel-genai/guard';

const guard = createGenAiGuard({
  rules: parseGuardRules('budget:$2,loop:3/10,max-tools:50,timeout:5m'),
  onStop: 'abort', // 'throw' (default) | 'abort' (signal only) | 'silent'
});

guard.record({ kind: 'tool', name: 'search', signature: JSON.stringify(args) });

Each rule fires once. Thread guard.signal into your model/tool calls for cooperative cancellation, or catch GEN_AI_GUARD_STOP to wind the run down.

Pattern: Evaluation Loops

Implement quality checks and iterative refinement with full observability.

import { trace } from 'autotel';

const generateContent = trace(
  'ai.generate_content',
  (ctx) => async (prompt: string, model: string) => {
    ctx.setAttribute('generation.model', model);

    const result = await generateText({
      model: openai(model),
      prompt,
    });

    return result.text;
  },
);

const evaluateQuality = trace('ai.evaluate_quality', (ctx) => async (content: string) => {
  const result = await generateObject({
    model: openai('gpt-4o'),
    schema: z.object({
      score: z.number().min(0).max(100),
      feedback: z.string(),
      passesThreshold: z.boolean(),
    }),
    prompt: `Evaluate this content quality (0-100): ${content}`,
  });

  ctx.setAttributes({
    'evaluation.score': result.object.score,
    'evaluation.passes': result.object.passesThreshold,
  });

  return result.object;
});

export const generateWithQualityCheck = trace(
  'ai.generate_with_qa',
  (ctx) =>
    async (
      prompt: string,
      options: { maxAttempts?: number; qualityThreshold?: number } = {},
    ) => {
      const { maxAttempts = 3, qualityThreshold = 75 } = options;

      ctx.setAttributes({
        'qa.max_attempts': maxAttempts,
        'qa.threshold': qualityThreshold,
      });

      let attempt = 0;
      let content: string;
      let evaluation: any;

      do {
        attempt++;
        ctx.track('generation_attempt', { attempt });

        content = await generateContent(prompt, 'gpt-4o');
        evaluation = await evaluateQuality(content);

        if (evaluation.passesThreshold) {
          ctx.track('quality_passed', {
            attempt,
            score: evaluation.score,
          });
          break;
        } else if (attempt < maxAttempts) {
          ctx.track('quality_failed_retrying', {
            attempt,
            score: evaluation.score,
            feedback: evaluation.feedback,
          });
          prompt = `${prompt}\n\nPrevious attempt feedback: ${evaluation.feedback}`;
        }
      } while (attempt < maxAttempts);

      ctx.setAttributes({
        'qa.attempts_used': attempt,
        'qa.final_score': evaluation.score,
        'qa.success': evaluation.passesThreshold,
      });

      track('ai.qa_loop_completed', {
        attempts: attempt,
        final_score: evaluation.score,
        success: evaluation.passesThreshold,
        threshold: qualityThreshold,
      });

      return {
        content,
        evaluation,
        attempts: attempt,
      };
    },
);

Semantic Conventions

Following the OpenTelemetry GenAI semantic conventions (gen_ai.*, semconv v1.42.0) ensures any OTLP backend renders your AI telemetry without custom mapping. Let autotel-genai emit the canonical names rather than inventing llm.* / ai.* attributes by hand.

LLM attributes — use traceGenAI + recordGenAiUsage (canonical gen_ai.*):

import { traceGenAI, recordGenAiResponse, recordGenAiUsage } from 'autotel-genai/trace';

const generate = traceGenAI({
  provider: 'openai', // gen_ai.provider.name (NOT gen_ai.system)
  model: 'gpt-4o', // gen_ai.request.model
  operation: 'chat', // gen_ai.operation.name
  temperature: 0.7, // gen_ai.request.temperature
  maxTokens: 4096, // gen_ai.request.max_tokens
})((ctx) => async (prompt: string) => {
  const res = await llm.generate(prompt);
  recordGenAiResponse(ctx, { model: res.model, finishReasons: res.finishReasons });
  // gen_ai.usage.input_tokens / output_tokens (NOT prompt_tokens / total_tokens)
  // + estimated gen_ai.usage.cost.usd
  recordGenAiUsage(ctx, 'gpt-4o', {
    inputTokens: res.usage.inputTokens,
    outputTokens: res.usage.outputTokens,
  });
  return res.text;
});

If you set attributes directly, build canonical maps instead of literals:

import { genAiRequestAttributes, genAiUsageAttributes } from 'autotel-genai';

ctx.setAttributes({
  ...genAiRequestAttributes({ operation: 'chat', provider: 'openai', model: 'gpt-4o' }),
  ...genAiUsageAttributes({ inputTokens: 100, outputTokens: 250 }),
});

Agent attributes — for agent identity, delegation, policy, and audit, use the governance layer (autotel-genai/agent), which emits gen_ai.agent.* plus agent.* / delegation.* / policy.*. For a plain agent span, set the canonical agent name/operation:

ctx.setAttributes({
  'gen_ai.operation.name': 'invoke_agent',
  'gen_ai.agent.name': 'specialist',
  'gen_ai.provider.name': 'openai',
});

Workflow attributes:

ctx.setAttributes({
  'workflow.type': 'multi_agent_escalation',
  'workflow.correlation_id': ctx.correlationId,
  'workflow.user_id': userId,
  'workflow.session_id': sessionId,
});

RAG attributes:

ctx.setAttributes({
  'rag.embedding_model': 'text-embedding-3-small',
  'rag.chunks_retrieved': 5,
  'rag.search_top_k': 5,
  'rag.rerank_enabled': true,
});

Evaluation attributes:

ctx.setAttributes({
  'evaluation.score': 85,
  'evaluation.threshold': 75,
  'evaluation.passes': true,
  'evaluation.attempts': 2,
});

Business events:

import { track } from 'autotel';

track('workflow.completed', {
  type: 'multi_agent',
  agents_used: 3,
  // traceId, spanId, correlationId auto-added!
});

Best Practice: Use Both Together

import { init, trace, track } from 'autotel';

init({
  service: 'customer-support-ai',
  endpoint: process.env.OTLP_ENDPOINT,
  openllmetry: { enabled: true },
});

const handleCustomerQuery = trace(
  'workflow.customer_query',
  (ctx) => async (query: string, userId: string) => {
    ctx.setAttributes({
      'workflow.type': 'customer_support',
      'user.id': userId,
    });

    // Step 1: Triage (OpenLLMetry auto-instruments the LLM call)
    const triage = await trace('step.triage', async () => {
      return await generateText({
        model: openai('gpt-4o-mini'),
        prompt: `Triage: ${query}`,
      });
    });

    const needsEscalation = triage.text.includes('ESCALATE');

    if (needsEscalation) {
      const specialist = await trace('step.specialist', async () => {
        return await generateText({
          model: openai('gpt-4o'),
          prompt: `Expert response needed: ${query}`,
        });
      });

      track('escalation_occurred', {
        category: triage.text,
        userId,
        correlationId: ctx.correlationId,
      });
      return { response: specialist.text, escalated: true };
    }

    return { response: triage.text, escalated: false };
  },
);

What you get with both:

Trace Tree:
workflow.customer_query (trace)
├─ user.id: "user123"
├─ workflow.type: "customer_support"
├─ correlation.id: "abc-123-def"
│
├─ step.triage (trace)
│  ├─ llm.chat (OpenLLMetry auto-span)
│  │  ├─ llm.request.model: "gpt-4o-mini"
│  │  ├─ llm.usage.prompt_tokens: 23
│  │  ├─ llm.usage.completion_tokens: 45
│  │  └─ llm.prompts.0.content: "Triage: ..."
│  └─ triage.category: "billing_issue"
│
└─ step.specialist (trace)
   ├─ llm.chat (OpenLLMetry auto-span)
   │  ├─ llm.request.model: "gpt-4o"
   │  ├─ llm.usage.prompt_tokens: 78
   │  ├─ llm.usage.completion_tokens: 234
   │  └─ llm.prompts.0.content: "Expert response needed: ..."
   └─ escalated: true

Events:
escalation_occurred
├─ category: "billing_issue"
├─ userId: "user123"
└─ correlationId: "abc-123-def"

Key benefits of combining both:

Zero-effort LLM telemetry: OpenLLMetry captures all SDK calls automatically
Business context: trace() adds workflow meaning and business logic
Perfect correlation: All spans and events share the same correlation ID
Complete picture: See both “what the LLM did” (OpenLLMetry) and “why it did it” (your trace spans)
Events integration: Business events automatically correlated with technical traces

Real-World Examples

example-ai-agent — Multi-agent escalation systems (simulated and real LLM with OpenLLMetry), RAG pipelines, and @openai/agents integration.

See apps/example-ai-agent/src/multi-agent-workflow-with-openllmetry.ts for a complete example showing OpenLLMetry enabled in init(), multi-agent workflow using trace() for business context, and real OpenAI SDK calls auto-instrumented by OpenLLMetry.

Compare with apps/example-ai-agent/src/multi-agent-workflow.ts which uses simulated LLM calls (no OpenLLMetry needed).

See apps/example-ai-agent/src/rag-pipeline.ts for a complete RAG pipeline example showing embeddings generation tracking, vector search observability, context assembly monitoring, and end-to-end pipeline metrics.