Skip to content

AI / LLM Workflows

Autotel provides all the building blocks for comprehensive AI/LLM observability:

  • Automatic LLM instrumentation via OpenLLMetry integration
  • Workflow orchestration via nested trace() calls
  • Context propagation via AsyncLocalStorage (correlation IDs, user context, etc.)
  • Business event tracking via ctx.setAttribute() and track()
  • Multi-destination events via adapters (PostHog, Mixpanel, etc.)

| Use Case | Recommendation | Why | | ---------------------------------------- | -------------------------------- | -------------------------------------------------------------------- | | Using LLM SDKs (OpenAI, Anthropic, etc.) | Enable OpenLLMetry | Automatic capture of prompts, completions, tokens | | Custom LLM integrations | Manual trace() only | OpenLLMetry won't detect custom integrations | | Workflow orchestration | Always use trace() | Critical for tracking workflow steps | | Business metrics | Always use trace() + track() | Domain events require explicit instrumentation | | Production applications | Use both together | OpenLLMetry handles LLM internals, trace() handles everything else |

When enabled via init({ openllmetry: { enabled: true } }), OpenLLMetry automatically captures:

// Example: Using Vercel AI SDK
import { generateText } from 'ai';
// OpenLLMetry automatically instruments this call - zero code changes needed!
const result = await generateText({
model: openai('gpt-4o'),
prompt: 'Explain quantum computing',
});
// Automatic span attributes captured:
// - llm.request.model: "gpt-4o"
// - llm.provider: "openai"
// - llm.request.temperature: 0.7
// - llm.usage.prompt_tokens: 45
// - llm.usage.completion_tokens: 128
// - llm.usage.total_tokens: 173
// - llm.prompts.0.content: "Explain quantum computing"
// - llm.completions.0.content: "[full response text]"

What you get automatically:

  • LLM API request/response details (prompts, completions, model parameters)
  • Token usage tracking (prompt, completion, total)
  • Timing and latency for each LLM call
  • Error capture for failed LLM requests
  • Support for streaming responses
  • Works with 20+ LLM providers/SDKs (OpenAI, Anthropic, Langchain, LlamaIndex, Vercel AI SDK, etc.)

What you DON'T get:

  • Business workflow context (which agent? which step? why called?)
  • Business metrics (escalations, user satisfaction, custom events)
  • Correlation across workflow steps
  • Custom attributes for your domain logic

Using autotel's trace() function provides full control over observability:

import { trace } from 'autotel';
const triageAgent = trace('agent.triage', (ctx) => async (input: string) => {
// Business context
ctx.setAttributes({
'agent.role': 'triage',
'agent.purpose': 'route_to_specialist',
'workflow.step': 1,
});
// Call LLM (OpenLLMetry will auto-instrument this call)
const result = await generateText({
model: openai('gpt-4o-mini'),
prompt: `Triage this request: ${input}`,
});
// Business metrics
const requiresEscalation = result.text.includes('ESCALATE');
ctx.setAttribute('triage.escalation_required', requiresEscalation);
return { decision: result.text, escalate: requiresEscalation };
});

What you get with trace():

  • Named workflow steps (clear span names like "agent.triage")
  • Business attributes (agent roles, workflow state, custom logic)
  • Correlation IDs automatically propagated
  • Parent-child span relationships for complex workflows
  • Integration with events via track() events
  • Works with ANY code (LLM or non-LLM)
import { init } from 'autotel';
init({
service: 'my-ai-app',
endpoint: process.env.OTLP_ENDPOINT,
openllmetry: {
enabled: true, // Enable automatic LLM instrumentation
options: {
disableBatch: process.env.NODE_ENV !== 'production',
},
},
});
Section titled “Option 1: OpenLLMetry Only (Not Recommended)”

If you only enable OpenLLMetry without using trace(), you'll get LLM call details but miss business context:

import { init } from 'autotel';
init({
service: 'my-ai-app',
openllmetry: { enabled: true },
});
// You'll see LLM spans but no workflow context
const result = await generateText({ model: openai('gpt-4o'), prompt: 'test' });
// No way to know: which agent? which step? which user? why called?

Option 2: Manual trace() Only (Good for Custom Models)

Section titled “Option 2: Manual trace() Only (Good for Custom Models)”

If you're using custom LLM integrations or direct HTTP calls:

import { trace } from 'autotel';
const callCustomLLM = trace('llm.custom_model', (ctx) => async (prompt: string) => {
ctx.setAttributes({
'llm.model': 'my-custom-model-v2',
'llm.provider': 'self-hosted',
'llm.prompt': prompt,
});
const response = await fetch('https://my-llm-api.com/generate', {
method: 'POST',
body: JSON.stringify({ prompt }),
});
const data = await response.json();
ctx.setAttributes({
'llm.completion': data.text,
'llm.tokens': data.usage.totalTokens,
});
return data.text;
});

For production applications using LLM SDKs:

import { init, trace } from 'autotel';
init({
service: 'production-ai-app',
openllmetry: { enabled: true }, // Auto-instrument LLM SDKs
});
// Your workflow code uses trace() for business logic
const workflow = trace('workflow.main', (ctx) => async (input: string) => {
// OpenLLMetry will auto-instrument any LLM calls inside
// trace() provides workflow context and business metrics
// Both appear as child spans in the same trace tree
});
Are you using LLM SDKs (OpenAI, Anthropic, Vercel AI SDK, Langchain)?
├─ Yes
│ └─ Enable OpenLLMetry
│ └─ Do you need business context/metrics?
│ ├─ Yes → Also use trace() (RECOMMENDED)
│ └─ No → OpenLLMetry only (you'll regret this later)
└─ No (custom models, direct HTTP)
└─ Use trace() only
└─ Add AI semantic conventions manually
import { trace } from 'autotel';
const generateResponse = trace(
'ai.generate',
(ctx) => async (prompt: string) => {
ctx.setAttributes({
'ai.model': 'gpt-4o',
'ai.provider': 'openai',
});
const response = await llm.generate(prompt);
ctx.setAttribute('ai.tokens', response.usage.totalTokens);
return response;
},
);

Correlation IDs automatically propagate through your entire workflow, making it easy to trace requests across multiple agents, services, and LLM calls.

import { trace, track } from 'autotel';
export const processUserRequest = trace(
'ai.user_request',
(ctx) => async (userId: string, message: string) => {
// Correlation ID is automatically available
console.log('Trace ID:', ctx.traceId);
console.log('Correlation ID:', ctx.correlationId); // First 16 chars of traceId
// All nested operations inherit this correlation context
const analysis = await analyzeIntent(message);
const response = await generateResponse(analysis);
// Events automatically include correlation IDs
track('ai.request_completed', {
userId,
intent: analysis.intent,
// correlationId, traceId, spanId are auto-added!
});
return response;
},
);

What you get automatically:

  • ctx.traceId - Full OpenTelemetry trace ID
  • ctx.correlationId - Short correlation ID (first 16 chars)
  • ctx.spanId - Current span ID
  • Automatic propagation to all nested trace() calls
  • Enrichment of all track() events
  • Inclusion in structured logs (via autotel/logger)

Create parent-child span hierarchies naturally with nested trace() calls. Each step becomes a child span with automatic error handling and lifecycle management.

import { trace } from 'autotel';
export const processDocument = trace(
'document.processing',
(ctx) => async (docId: string) => {
ctx.setAttribute('document.id', docId);
ctx.setAttribute('workflow.type', 'document_processing');
// Step 1: Load document (creates child span)
const document = await trace('document.load', async () => {
return await loadDocument(docId);
});
// Step 2: Analyze with LLM (creates child span, OpenLLMetry auto-instruments LLM call)
const analysis = await trace('document.analyze', async () => {
const result = await llm.analyze(document.content);
return result;
});
// Step 3: Store results (creates child span)
const stored = await trace('document.store', async () => {
return await storeAnalysis(docId, analysis);
});
return stored;
},
);

Span Hierarchy Created:

document.processing (parent)
├── document.load (child)
├── document.analyze (child)
│ └── openai.chat.completions (child, auto-instrumented by OpenLLMetry)
└── document.store (child)

Track business-level events alongside technical telemetry using ctx.setAttribute() for span attributes and track() for events.

import { trace, track } from 'autotel';
export const handleAgentHandoff = trace(
'agent.handoff',
(ctx) => async (task: Task) => {
const startTime = performance.now();
// Set domain-specific span attributes
ctx.setAttributes({
'agent.from': 'triage',
'agent.to': 'specialist',
'task.priority': task.priority,
'task.category': task.category,
});
// Perform handoff
const result = await specialistAgent.process(task);
// Track business metric with precise duration
track('agent.handoff_completed', {
from: 'triage',
to: 'specialist',
duration_ms: Math.round(performance.now() - startTime),
success: true,
});
return result;
},
);
const workflow = trace('ai.workflow', (ctx) => async (input: string) => {
const analysis = await trace('step1.analyze', async () => {
return await analyzeInput(input);
});
const response = await trace('step2.generate', async () => {
return await generateResponse(analysis);
});
return response;
});
const runAgentWorkflow = trace(
'workflow.agents',
(ctx) => async (input: string) => {
ctx.setAttributes({
'workflow.type': 'multi_agent',
'workflow.correlation_id': ctx.correlationId,
});
const triageResult = await triageAgent(input);
ctx.setAttribute('handoff.from', 'triage');
const specialistResult = await specialistAgent(triageResult);
return specialistResult;
},
);

Multi-agent systems require tracking "baton passes" between agents with full context propagation.

import { trace, track } from 'autotel';
import { generateText, generateObject } from 'ai';
// Agent 1: Triage
const triageAgent = trace('agent.triage', (ctx) => async (userRequest: string) => {
ctx.setAttributes({
'agent.role': 'triage',
'agent.model': 'gpt-4o-mini',
});
const result = await generateText({
model: openai('gpt-4o-mini'),
prompt: `Analyze this request and create a plan: ${userRequest}`,
});
track('agent.triage_completed', {
request_length: userRequest.length,
plan_length: result.text.length,
});
return {
plan: result.text,
requiresSpecialist: true,
};
});
// Agent 2: Specialist
const specialistAgent = trace('agent.specialist', (ctx) => async (plan: string) => {
ctx.setAttributes({
'agent.role': 'specialist',
'agent.model': 'gpt-4o',
});
ctx.track('specialist_engaged', { plan_length: plan.length });
const result = await generateText({
model: openai('gpt-4o'),
prompt: `Execute this plan: ${plan}`,
});
track('agent.specialist_completed', {
plan_length: plan.length,
response_length: result.text.length,
});
return {
response: result.text,
requiresQA: true,
};
});
// Agent 3: QA
const qaAgent = trace('agent.qa', (ctx) => async (response: string) => {
ctx.setAttributes({
'agent.role': 'qa',
'agent.model': 'gpt-4o',
});
const result = await generateObject({
model: openai('gpt-4o'),
schema: z.object({
approved: z.boolean(),
feedback: z.string().optional(),
requiresFollowUp: z.boolean(),
}),
prompt: `Review this response for quality: ${response}`,
});
ctx.setAttribute('qa.approved', result.object.approved);
track('agent.qa_completed', {
approved: result.object.approved,
requires_follow_up: result.object.requiresFollowUp,
});
return result.object;
});
// Orchestrator: Workflow coordinator
export const runMultiAgentWorkflow = trace(
'workflow.multi_agent_escalation',
(ctx) => async (userRequest: string, userId: string) => {
ctx.setAttributes({
'workflow.type': 'multi_agent_escalation',
'workflow.user_id': userId,
'workflow.correlation_id': ctx.correlationId,
});
// Step 1: Triage
const triage = await triageAgent(userRequest);
ctx.track('triage_complete', { requires_specialist: triage.requiresSpecialist });
// Step 2: Specialist (if needed)
let response;
if (triage.requiresSpecialist) {
response = await specialistAgent(triage.plan);
ctx.track('specialist_complete', { requires_qa: response.requiresQA });
}
// Step 3: QA (if needed)
let qa;
if (response?.requiresQA) {
qa = await qaAgent(response.response);
ctx.track('qa_complete', { approved: qa.approved });
}
// Track workflow completion
track('workflow.completed', {
workflow_type: 'multi_agent_escalation',
user_id: userId,
agents_involved: qa ? 3 : response ? 2 : 1,
final_approval: qa?.approved ?? true,
});
return {
plan: triage.plan,
response: response?.response,
qa: qa,
};
},
);
import { trace } from 'autotel';
import { embed } from 'ai';
import { openai } from '@ai-sdk/openai';
// Step 1: Generate embeddings
const generateEmbeddings = trace('rag.embeddings', (ctx) => async (query: string) => {
ctx.setAttribute('query.length', query.length);
const { embedding } = await embed({
model: openai.embedding('text-embedding-3-small'),
value: query,
});
ctx.setAttribute('embedding.dimensions', embedding.length);
return embedding;
});
// Step 2: Vector search
const vectorSearch = trace(
'rag.search',
(ctx) => async (embedding: number[], topK: number = 5) => {
ctx.setAttributes({
'search.top_k': topK,
'search.embedding_dimensions': embedding.length,
});
const results = await vectorDb.search(embedding, topK);
ctx.setAttribute('search.results_count', results.length);
return results;
},
);
// Step 3: Generate response with context
const generateWithContext = trace(
'rag.generate',
(ctx) => async (query: string, context: string[]) => {
ctx.setAttributes({
'generation.context_chunks': context.length,
'generation.model': 'gpt-4o',
});
const prompt = `
Context:
${context.join('\n\n')}
Question: ${query}
Answer based on the context above:
`.trim();
const result = await generateText({
model: openai('gpt-4o'),
prompt,
});
ctx.setAttributes({
'generation.tokens_used': result.usage.totalTokens,
'generation.response_length': result.text.length,
});
return result.text;
},
);
// Complete RAG Pipeline
export const ragPipeline = trace(
'rag.pipeline',
(ctx) => async (query: string, userId: string) => {
ctx.setAttributes({
'pipeline.type': 'rag',
'pipeline.user_id': userId,
'pipeline.query': query,
});
const embedding = await generateEmbeddings(query);
ctx.track('embeddings_generated');
const searchResults = await vectorSearch(embedding);
ctx.track('search_completed', { results_count: searchResults.length });
const context = searchResults.map((r) => r.content);
const response = await generateWithContext(query, context);
ctx.track('generation_completed', { response_length: response.length });
track('rag.pipeline_completed', {
user_id: userId,
query_length: query.length,
results_retrieved: searchResults.length,
response_length: response.length,
});
return {
query,
response,
sources: searchResults.map((r) => r.metadata),
};
},
);

Span Hierarchy:

rag.pipeline (parent)
├── rag.embeddings (child)
│ └── openai.embeddings (auto-instrumented by OpenLLMetry)
├── rag.search (child)
│ └── pinecone.query (auto-instrumented by OpenLLMetry)
└── rag.generate (child)
└── openai.chat.completions (auto-instrumented by OpenLLMetry)

Track streaming LLM responses with progress events and final metrics.

import { trace } from 'autotel';
import { streamText } from 'ai';
export const generateStreamingResponse = trace(
'ai.stream',
(ctx) => async (prompt: string) => {
ctx.setAttributes({
'stream.model': 'gpt-4o',
'stream.prompt_length': prompt.length,
});
const stream = await streamText({
model: openai('gpt-4o'),
prompt,
});
let chunkCount = 0;
let totalLength = 0;
const chunks: string[] = [];
for await (const chunk of stream.textStream) {
chunkCount++;
totalLength += chunk.length;
chunks.push(chunk);
// Add event for significant milestones
if (chunkCount % 10 === 0) {
ctx.track('streaming_progress', {
chunks_received: chunkCount,
total_length: totalLength,
});
}
}
ctx.setAttributes({
'stream.chunks_count': chunkCount,
'stream.total_length': totalLength,
'stream.avg_chunk_size': Math.round(totalLength / chunkCount),
});
track('ai.stream_completed', {
model: 'gpt-4o',
chunks: chunkCount,
total_length: totalLength,
});
return chunks.join('');
},
);

Implement quality checks and iterative refinement with full observability.

import { trace } from 'autotel';
const generateContent = trace(
'ai.generate_content',
(ctx) => async (prompt: string, model: string) => {
ctx.setAttribute('generation.model', model);
const result = await generateText({
model: openai(model),
prompt,
});
return result.text;
},
);
const evaluateQuality = trace('ai.evaluate_quality', (ctx) => async (content: string) => {
const result = await generateObject({
model: openai('gpt-4o'),
schema: z.object({
score: z.number().min(0).max(100),
feedback: z.string(),
passesThreshold: z.boolean(),
}),
prompt: `Evaluate this content quality (0-100): ${content}`,
});
ctx.setAttributes({
'evaluation.score': result.object.score,
'evaluation.passes': result.object.passesThreshold,
});
return result.object;
});
export const generateWithQualityCheck = trace(
'ai.generate_with_qa',
(ctx) =>
async (
prompt: string,
options: { maxAttempts?: number; qualityThreshold?: number } = {},
) => {
const { maxAttempts = 3, qualityThreshold = 75 } = options;
ctx.setAttributes({
'qa.max_attempts': maxAttempts,
'qa.threshold': qualityThreshold,
});
let attempt = 0;
let content: string;
let evaluation: any;
do {
attempt++;
ctx.track('generation_attempt', { attempt });
content = await generateContent(prompt, 'gpt-4o');
evaluation = await evaluateQuality(content);
if (evaluation.passesThreshold) {
ctx.track('quality_passed', {
attempt,
score: evaluation.score,
});
break;
} else if (attempt < maxAttempts) {
ctx.track('quality_failed_retrying', {
attempt,
score: evaluation.score,
feedback: evaluation.feedback,
});
prompt = `${prompt}\n\nPrevious attempt feedback: ${evaluation.feedback}`;
}
} while (attempt < maxAttempts);
ctx.setAttributes({
'qa.attempts_used': attempt,
'qa.final_score': evaluation.score,
'qa.success': evaluation.passesThreshold,
});
track('ai.qa_loop_completed', {
attempts: attempt,
final_score: evaluation.score,
success: evaluation.passesThreshold,
threshold: qualityThreshold,
});
return {
content,
evaluation,
attempts: attempt,
};
},
);

Following OpenTelemetry semantic conventions ensures consistency across your AI applications.

LLM attributes:

ctx.setAttributes({
'llm.model': 'gpt-4o',
'llm.provider': 'openai',
'llm.temperature': 0.7,
'llm.max_tokens': 4096,
'llm.response_tokens': 250,
'llm.prompt_tokens': 100,
'llm.total_tokens': 350,
});

Agent attributes:

ctx.setAttributes({
'agent.role': 'specialist',
'agent.model': 'gpt-4o',
'agent.provider': 'openai',
'agent.temperature': 0.7,
});

Workflow attributes:

ctx.setAttributes({
'workflow.type': 'multi_agent_escalation',
'workflow.correlation_id': ctx.correlationId,
'workflow.user_id': userId,
'workflow.session_id': sessionId,
});

RAG attributes:

ctx.setAttributes({
'rag.embedding_model': 'text-embedding-3-small',
'rag.chunks_retrieved': 5,
'rag.search_top_k': 5,
'rag.rerank_enabled': true,
});

Evaluation attributes:

ctx.setAttributes({
'evaluation.score': 85,
'evaluation.threshold': 75,
'evaluation.passes': true,
'evaluation.attempts': 2,
});

Business events:

import { track } from 'autotel';
track('workflow.completed', {
type: 'multi_agent',
agents_used: 3,
// traceId, spanId, correlationId auto-added!
});
import { init, trace, track } from 'autotel';
init({
service: 'customer-support-ai',
endpoint: process.env.OTLP_ENDPOINT,
openllmetry: { enabled: true },
});
const handleCustomerQuery = trace(
'workflow.customer_query',
(ctx) => async (query: string, userId: string) => {
ctx.setAttributes({
'workflow.type': 'customer_support',
'user.id': userId,
});
// Step 1: Triage (OpenLLMetry auto-instruments the LLM call)
const triage = await trace('step.triage', async () => {
return await generateText({
model: openai('gpt-4o-mini'),
prompt: `Triage: ${query}`,
});
});
const needsEscalation = triage.text.includes('ESCALATE');
if (needsEscalation) {
const specialist = await trace('step.specialist', async () => {
return await generateText({
model: openai('gpt-4o'),
prompt: `Expert response needed: ${query}`,
});
});
track('escalation_occurred', {
category: triage.text,
userId,
correlationId: ctx.correlationId,
});
return { response: specialist.text, escalated: true };
}
return { response: triage.text, escalated: false };
},
);

What you get with both:

Trace Tree:
workflow.customer_query (trace)
├─ user.id: "user123"
├─ workflow.type: "customer_support"
├─ correlation.id: "abc-123-def"
├─ step.triage (trace)
│ ├─ llm.chat (OpenLLMetry auto-span)
│ │ ├─ llm.request.model: "gpt-4o-mini"
│ │ ├─ llm.usage.prompt_tokens: 23
│ │ ├─ llm.usage.completion_tokens: 45
│ │ └─ llm.prompts.0.content: "Triage: ..."
│ └─ triage.category: "billing_issue"
└─ step.specialist (trace)
├─ llm.chat (OpenLLMetry auto-span)
│ ├─ llm.request.model: "gpt-4o"
│ ├─ llm.usage.prompt_tokens: 78
│ ├─ llm.usage.completion_tokens: 234
│ └─ llm.prompts.0.content: "Expert response needed: ..."
└─ escalated: true
Events:
escalation_occurred
├─ category: "billing_issue"
├─ userId: "user123"
└─ correlationId: "abc-123-def"

Key benefits of combining both:

  1. Zero-effort LLM telemetry: OpenLLMetry captures all SDK calls automatically
  2. Business context: trace() adds workflow meaning and business logic
  3. Perfect correlation: All spans and events share the same correlation ID
  4. Complete picture: See both "what the LLM did" (OpenLLMetry) and "why it did it" (your trace spans)
  5. Events integration: Business events automatically correlated with technical traces
  • example-ai-agent — Multi-agent escalation systems (simulated and real LLM with OpenLLMetry), RAG pipelines, and @openai/agents integration.

See apps/example-ai-agent/src/multi-agent-workflow-with-openllmetry.ts for a complete example showing OpenLLMetry enabled in init(), multi-agent workflow using trace() for business context, and real OpenAI SDK calls auto-instrumented by OpenLLMetry.

Compare with apps/example-ai-agent/src/multi-agent-workflow.ts which uses simulated LLM calls (no OpenLLMetry needed).

See apps/example-ai-agent/src/rag-pipeline.ts for a complete RAG pipeline example showing embeddings generation tracking, vector search observability, context assembly monitoring, and end-to-end pipeline metrics.