OpenTelemetry

OpenTelemetry 是一个开源项目,旨在通过统一的工具集、API和SDK,简化在多样化的技术栈中集成可观测性功能,确保一致地收集、处理及导出应用性能数据。

OTel探针如何保证与OTel SDK的兼容性


背景

对于探针来说,一般是对一些知名的开源中间件或者SDK进行自动的埋点,以帮助用户简单方便地收集span,metrics等观测数据。但是还是有一部分用户,他们对于可观测数据收集的需求比较高阶,并不满足于只能看到OTel探针收集到的Span,而是想要同时通过OTel SDK与Otel探针对应用程序进行全方位的埋点覆盖,本文将简单讲述OTel探针是如何保证两者的兼容性的。

关键问题

要在使用OTel探针的时候同时使用OTel SDK,首先要考虑以下两个核心问题。

问题1:用户的SDK与探针内的SDK版本不一致怎么办?

为了保持观测数据的一致性,在OTel探针内,也是使用OTel的SDK进行Span与Metrics的生成,那么问题来了,如果用户使用了一个X版本的OTel SDK,探针里使用了Y版本的SDK,他们的公共API可能并不是完全兼容的。这就要求我们的代码保证依赖的兼容性,不管用户使用什么版本的SDK,探针内的SDK都要能正常工作。

问题2:用户使用的SDK的Span怎么和探针产生的Span串起来

这个问题又可以拆解为两个子问题:

  1. 用户使用OTel SDK,之前可能配置了一个Span上报的端点,但是在接入OTel探针之后上报的端点可能发生了改变。举个例子,之前用户是上报到自建的服务端,现在需要上报到ARMS的服务端,那么之前SDK上报到自建服务端的Span怎么在ARMS里面串起来?
  2. 用户使用OTel SDK,生成的Span如何与探针中的Span关联父子关系?因为用户SDK与探针SDK中Span的生成逻辑可能并不互通,探针SDK可能无法感知到用户SDK中Span的存在,因此Span的串联成为了又一个相对棘手的问题。

OTel探针的实现

如何解决问题1:

OTel探针通过类加载器等机制隔离了用户的SDK与探针内的SDK,这里不再赘述。简单来说就是有两套SDK,用户一套,探针一套,两套互不干扰。

如何解决问题2:

探针通过对OTel SDK进行埋点来解决问题2,主要埋点的内容分为以下几个模块:

可以先参考以下文档了解一下OTel中上面这些概念:

首先我们来梳理一下在OTel SDK里面,创建一个Span的流程是怎么样的:

  1. 需要初始化对应的TraceProvider以及Propagators
  2. 根据TraceProvider以及Propagators创建Tracer
import io.opentelemetry.api.OpenTelemetry;
import io.opentelemetry.api.common.Attributes;
import io.opentelemetry.api.trace.Tracer;
import io.opentelemetry.api.trace.propagation.W3CTraceContextPropagator;
import io.opentelemetry.context.propagation.ContextPropagators;
import io.opentelemetry.exporter.otlp.http.trace.OtlpHttpSpanExporter;
import io.opentelemetry.sdk.OpenTelemetrySdk;
import io.opentelemetry.sdk.resources.Resource;
import io.opentelemetry.sdk.trace.SdkTracerProvider;
import io.opentelemetry.sdk.trace.export.BatchSpanProcessor;
import io.opentelemetry.semconv.resource.attributes.ResourceAttributes;
public class OpenTelemetrySupport {
static {
// 获取OpenTelemetry Tracer
Resource resource = Resource.getDefault()
.merge(Resource.create(Attributes.of(
ResourceAttributes.SERVICE_NAME, "",
ResourceAttributes.SERVICE_VERSION, "",
ResourceAttributes.DEPLOYMENT_ENVIRONMENT, "",
ResourceAttributes.HOST_NAME, "${host-name}" // 请将 ${host-name} 替换为您的主机名,
)));
SdkTracerProvider sdkTracerProvider = SdkTracerProvider.builder()
.addSpanProcessor(BatchSpanProcessor.builder(OtlpHttpSpanExporter.builder()
.setEndpoint("http://tracing-analysis-dc-hz-internal.aliyuncs.com/adapt_ggxw4lnjuz@7323a5caae30263_ggxw4lnjuz@53df7ad2afe8301/api/otlp/traces")
.build()).build())
.setResource(resource)
.build();
OpenTelemetry openTelemetry = OpenTelemetrySdk.builder()
.setTracerProvider(sdkTracerProvider)
.setPropagators(ContextPropagators.create(W3CTraceContextPropagator.getInstance()))
.buildAndRegisterGlobal();
tracer = openTelemetry.getTracer("OpenTelemetry Tracer", "1.0.0");
}
private static Tracer tracer;
public static Tracer getTracer() {
return tracer;
}
}
  1. 根据Tracer,生成出对应的Span,之后通过其startSpan与endSpan来上报对应的Span
import io.opentelemetry.api.trace.Span;
import io.opentelemetry.api.trace.StatusCode;
import io.opentelemetry.context.Scope;
public class Main {
public static void parentMethod() {
Span span = OpenTelemetrySupport.getTracer().spanBuilder("parent span").startSpan();
try (Scope scope = span.makeCurrent()) {
span.setAttribute("good", "job");
childMethod();
} catch (Throwable t) {
span.setStatus(StatusCode.ERROR, "handle parent span error");
} finally {
span.end();
}
}
public static void childMethod() {
Span span = OpenTelemetrySupport.getTracer().spanBuilder("child span").startSpan();
try (Scope scope = span.makeCurrent()) {
span.setAttribute("hello", "world");
} catch (Throwable t) {
span.setStatus(StatusCode.ERROR, "handle child span error");
} finally {
span.end();
}
}
public static void main(String[] args) {
parentMethod();
}
}

兼容的做法很简单,就是在用户创建Span的关键流程上使用包装类对以上所有的操作进行代理,以创建Span为例,埋点代码如下:

/*
* Copyright The OpenTelemetry Authors
* SPDX-License-Identifier: Apache-2.0
*/
package io.opentelemetry.javaagent.instrumentation.opentelemetryapi;
import static net.bytebuddy.matcher.ElementMatchers.isMethod;
import static net.bytebuddy.matcher.ElementMatchers.isStatic;
import static net.bytebuddy.matcher.ElementMatchers.named;
import application.io.opentelemetry.api.trace.Span;
import application.io.opentelemetry.api.trace.SpanContext;
import io.opentelemetry.javaagent.extension.instrumentation.TypeInstrumentation;
import io.opentelemetry.javaagent.extension.instrumentation.TypeTransformer;
import io.opentelemetry.javaagent.instrumentation.opentelemetryapi.trace.Bridging;
import net.bytebuddy.asm.Advice;
import net.bytebuddy.description.type.TypeDescription;
import net.bytebuddy.matcher.ElementMatcher;
public class SpanInstrumentation implements TypeInstrumentation {
@Override
public ElementMatcher<TypeDescription> typeMatcher() {
return named("application.io.opentelemetry.api.trace.PropagatedSpan");
}
@Override
public void transform(TypeTransformer transformer) {
transformer.applyAdviceToMethod(
isMethod().and(isStatic()).and(named("create")),
SpanInstrumentation.class.getName() + "$CreateAdvice");
}
@SuppressWarnings("unused")
public static class CreateAdvice {
// We replace the return value completely so don't need to call the method.
@Advice.OnMethodEnter(skipOn = Advice.OnDefaultValue.class)
public static boolean methodEnter() {
return false;
}
@Advice.OnMethodExit
public static void methodExit(
@Advice.Argument(0) SpanContext applicationSpanContext,
@Advice.Return(readOnly = false) Span applicationSpan) {
applicationSpan =
Bridging.toApplication(
io.opentelemetry.api.trace.Span.wrap(Bridging.toAgent(applicationSpanContext)));
}
}
}

其先把用户使用的OTel SDK中的Context转化成探针中SDK的Context

public static io.opentelemetry.api.trace.SpanContext toAgent(SpanContext applicationContext) {
if (applicationContext.isRemote()) {
return io.opentelemetry.api.trace.SpanContext.createFromRemoteParent(
applicationContext.getTraceId(),
applicationContext.getSpanId(),
BridgedTraceFlags.toAgent(applicationContext.getTraceFlags()),
toAgent(applicationContext.getTraceState()));
} else {
return io.opentelemetry.api.trace.SpanContext.create(
applicationContext.getTraceId(),
applicationContext.getSpanId(),
BridgedTraceFlags.toAgent(applicationContext.getTraceFlags()),
toAgent(applicationContext.getTraceState()));
}
}

此后,用这个探针 SDK中的Context创建一个探针 SDK的Span,此后将这个Span做一层代理转化成用户SDK中的Span:

public static Span toApplication(io.opentelemetry.api.trace.Span agentSpan) {
if (!agentSpan.getSpanContext().isValid()) {
// no need to wrap
return Span.getInvalid();
} else {
return new ApplicationSpan(agentSpan);
}
}
class ApplicationSpan implements Span {
private final io.opentelemetry.api.trace.Span agentSpan;
ApplicationSpan(io.opentelemetry.api.trace.Span agentSpan) {
this.agentSpan = agentSpan;
}
io.opentelemetry.api.trace.Span getAgentSpan() {
return agentSpan;
}
@Override
@CanIgnoreReturnValue
public Span setAttribute(String key, String value) {
agentSpan.setAttribute(key, value);
return this;
}
@Override
@CanIgnoreReturnValue
public Span setAttribute(String key, long value) {
agentSpan.setAttribute(key, value);
return this;
}
@Override
@CanIgnoreReturnValue
public Span setAttribute(String key, double value) {
agentSpan.setAttribute(key, value);
return this;
}
@Override
@CanIgnoreReturnValue
public Span setAttribute(String key, boolean value) {
agentSpan.setAttribute(key, value);
return this;
}
@Override
@CanIgnoreReturnValue
public <T> Span setAttribute(AttributeKey<T> applicationKey, T value) {
@SuppressWarnings("unchecked")
io.opentelemetry.api.common.AttributeKey<T> agentKey = Bridging.toAgent(applicationKey);
if (agentKey != null) {
agentSpan.setAttribute(agentKey, value);
}
return this;
}
@Override
@CanIgnoreReturnValue
public Span addEvent(String name) {
agentSpan.addEvent(name);
return this;
}
@Override
@CanIgnoreReturnValue
public Span addEvent(String name, long timestamp, TimeUnit unit) {
agentSpan.addEvent(name, timestamp, unit);
return this;
}
@Override
@CanIgnoreReturnValue
public Span addEvent(String name, Attributes applicationAttributes) {
agentSpan.addEvent(name, Bridging.toAgent(applicationAttributes));
return this;
}
@Override
@CanIgnoreReturnValue
public Span addEvent(
String name, Attributes applicationAttributes, long timestamp, TimeUnit unit) {
agentSpan.addEvent(name, Bridging.toAgent(applicationAttributes), timestamp, unit);
return this;
}
@Override
@CanIgnoreReturnValue
public Span setStatus(StatusCode status) {
agentSpan.setStatus(Bridging.toAgent(status));
return this;
}
@Override
@CanIgnoreReturnValue
public Span setStatus(StatusCode status, String description) {
agentSpan.setStatus(Bridging.toAgent(status), description);
return this;
}
@Override
@CanIgnoreReturnValue
public Span recordException(Throwable throwable) {
agentSpan.recordException(throwable);
return this;
}
@Override
@CanIgnoreReturnValue
public Span recordException(Throwable throwable, Attributes attributes) {
agentSpan.recordException(throwable, Bridging.toAgent(attributes));
return this;
}
@Override
@CanIgnoreReturnValue
public Span updateName(String name) {
agentSpan.updateName(name);
return this;
}
@Override
public void end() {
agentSpan.end();
}
@Override
public void end(long timestamp, TimeUnit unit) {
agentSpan.end(timestamp, unit);
}
@Override
public SpanContext getSpanContext() {
return Bridging.toApplication(agentSpan.getSpanContext());
}
@Override
public boolean isRecording() {
return agentSpan.isRecording();
}
@Override
public boolean equals(@Nullable Object obj) {
if (obj == this) {
return true;
}
if (!(obj instanceof ApplicationSpan)) {
return false;
}
ApplicationSpan other = (ApplicationSpan) obj;
return agentSpan.equals(other.agentSpan);
}
@Override
public String toString() {
return "ApplicationSpan{agentSpan=" + agentSpan + '}';
}
@Override
public int hashCode() {
return agentSpan.hashCode();
}
static class Builder implements SpanBuilder {
private final io.opentelemetry.api.trace.SpanBuilder agentBuilder;
Builder(io.opentelemetry.api.trace.SpanBuilder agentBuilder) {
this.agentBuilder = agentBuilder;
}
@Override
@CanIgnoreReturnValue
public SpanBuilder setParent(Context applicationContext) {
agentBuilder.setParent(AgentContextStorage.getAgentContext(applicationContext));
return this;
}
@Override
@CanIgnoreReturnValue
public SpanBuilder setNoParent() {
agentBuilder.setNoParent();
return this;
}
@Override
@CanIgnoreReturnValue
public SpanBuilder addLink(SpanContext applicationSpanContext) {
agentBuilder.addLink(Bridging.toAgent(applicationSpanContext));
return this;
}
@Override
@CanIgnoreReturnValue
public SpanBuilder addLink(
SpanContext applicationSpanContext, Attributes applicationAttributes) {
agentBuilder.addLink(Bridging.toAgent(applicationSpanContext));
return this;
}
@Override
@CanIgnoreReturnValue
public SpanBuilder setAttribute(String key, String value) {
agentBuilder.setAttribute(key, value);
return this;
}
@Override
@CanIgnoreReturnValue
public SpanBuilder setAttribute(String key, long value) {
agentBuilder.setAttribute(key, value);
return this;
}
@Override
@CanIgnoreReturnValue
public SpanBuilder setAttribute(String key, double value) {
agentBuilder.setAttribute(key, value);
return this;
}
@Override
@CanIgnoreReturnValue
public SpanBuilder setAttribute(String key, boolean value) {
agentBuilder.setAttribute(key, value);
return this;
}
@Override
@CanIgnoreReturnValue
public <T> SpanBuilder setAttribute(AttributeKey<T> applicationKey, T value) {
@SuppressWarnings("unchecked")
io.opentelemetry.api.common.AttributeKey<T> agentKey = Bridging.toAgent(applicationKey);
if (agentKey != null) {
agentBuilder.setAttribute(agentKey, value);
}
return this;
}
@Override
@CanIgnoreReturnValue
public SpanBuilder setSpanKind(SpanKind applicationSpanKind) {
io.opentelemetry.api.trace.SpanKind agentSpanKind = toAgentOrNull(applicationSpanKind);
if (agentSpanKind != null) {
agentBuilder.setSpanKind(agentSpanKind);
}
return this;
}
@Override
@CanIgnoreReturnValue
public SpanBuilder setStartTimestamp(long startTimestamp, TimeUnit unit) {
agentBuilder.setStartTimestamp(startTimestamp, unit);
return this;
}
@Override
public Span startSpan() {
return new ApplicationSpan(agentBuilder.startSpan());
}
}
}

可以看到,这个代理的ApplicationSpan实现了用户代码中OTel SDK的Span接口,里面的方法全部都是一个普通的代理转发。同时这个埋点把用户SDK中的createSpan逻辑进行了跳过,所以其实这段代码只会执行探针中的相关逻辑,从而避免了用户SDK与探针冲突。

总结

Otel探针通过对用户的Otel SDK进行埋点增强,从而保证了两者的兼容性。通过将Otel中的一些关键类进行包装代理,从而优雅的将SDK与Agent进行桥接。


observability.cn Authors 2024 | Documentation Distributed under CC-BY-4.0
Copyright © 2017-2024, Alibaba. All rights reserved. Alibaba has registered trademarks and uses trademarks.
浙ICP备2021005855号-32