OpenTelemetry

OpenTelemetry 是一个开源项目,旨在通过统一的工具集、API和SDK,简化在多样化的技术栈中集成可观测性功能,确保一致地收集、处理及导出应用性能数据。

Python 语言快速开始


快速开始

本文将描述您如何在 Python 中开始使用 OpenTelemetry即OTel。您将学习如何对一个简单的Python应用程序进行观测,并向控制台上报trace、log、metrics数据

前置条件

请确保您已经安装

Demo示例

下文将展示以一个简单的 Flask 应用程序接入OTel Python 探针的过程。当然,OTel Python 探针也支持Django、FastAPI等框架。有关支持框架的库的完整列表,详见:OTel Python探针插件支持列表

环境准备

首先,我们创建一个新的目录,并设置新的python环境

Terminal window
mkdir otel-getting-started
cd otel-getting-started
python3 -m venv venv
source ./venv/bin/activate

使用pip安装Flask:

Terminal window
pip install flask

创建并启动一个HTTP服务器

新建一个名为app.py的文件, 具体代码如下:

from random import randint
from flask import Flask, request
import logging
app = Flask(__name__)
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@app.route("/rolldice")
def roll_dice():
player = request.args.get('player', default=None, type=str)
result = str(roll())
if player:
logger.warning("%s is rolling the dice: %s", player, result)
else:
logger.warning("Anonymous player is rolling the dice: %s", result)
return result
def roll():
return randint(1, 6)

使用以下命令运行应用程序,并在您的网页浏览器中打开 http://localhost:8080/rolldice以确保它正常工作。

Terminal window
flask run -p 8080

接入OTel Python探针

使用无侵入方式接入OTel Python探针,您不需要更改任何代码既可拥有完整的可观测数据,详见: 无侵入注入原理

首先需要安装opentelemetry-distroopentelemetry-bootstrapopentelemetry-instrument三个package,具体的

Step 1. 安装opentelemetry-distro

Terminal window
pip install opentelemetry-distro

Step 2. 使用 opentelemetry-bootstrap命令安装观测应用所需的埋点插件

Terminal window
opentelemetry-bootstrap -a install

在本示例中这将会安装观测Flask的插件。

Run 使用OTel Python探针来启动应用

Terminal window
export OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true
opentelemetry-instrument \
--traces_exporter console \
--metrics_exporter console \
--logs_exporter console \
--service_name dice-server \
flask run -p 8080

在您的网页浏览器中打开 http://localhost:8080/rolldice,并多次刷新页面。过一段时间后,您应该会在控制台中看到打印出的spans,如下所示:

{
"name": "/rolldice",
"context": {
"trace_id": "0xdb1fc322141e64eb84f5bd8a8b1c6d1f",
"span_id": "0x5c2b0f851030d17d",
"trace_state": "[]"
},
"kind": "SpanKind.SERVER",
"parent_id": null,
"start_time": "2023-10-10T08:14:32.630332Z",
"end_time": "2023-10-10T08:14:32.631523Z",
"status": {
"status_code": "UNSET"
},
"attributes": {
"http.method": "GET",
"http.server_name": "127.0.0.1",
"http.scheme": "http",
"net.host.port": 8080,
"http.host": "localhost:8080",
"http.target": "/rolldice?rolls=12",
"net.peer.ip": "127.0.0.1",
"http.user_agent": "curl/8.1.2",
"net.peer.port": 58419,
"http.flavor": "1.1",
"http.route": "/rolldice",
"http.status_code": 200
},
"events": [],
"links": [],
"resource": {
"attributes": {
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.17.0",
"service.name": "dice-server",
"telemetry.auto.version": "0.38b0"
},
"schema_url": ""
}
}
{
"body": "Anonymous player is rolling the dice: 3",
"severity_number": "<SeverityNumber.WARN: 13>",
"severity_text": "WARNING",
"attributes": {
"otelSpanID": "5c2b0f851030d17d",
"otelTraceID": "db1fc322141e64eb84f5bd8a8b1c6d1f",
"otelServiceName": "dice-server"
},
"timestamp": "2023-10-10T08:14:32.631195Z",
"trace_id": "0xdb1fc322141e64eb84f5bd8a8b1c6d1f",
"span_id": "0x5c2b0f851030d17d",
"trace_flags": 1,
"resource": "BoundedAttributes({'telemetry.sdk.language': 'python', 'telemetry.sdk.name': 'opentelemetry', 'telemetry.sdk.version': '1.17.0', 'service.name': 'dice-server', 'telemetry.auto.version': '0.38b0'}, maxlen=None)"
}

生成的 span 是对 /rolldice 路由的请求生命周期。在请求期间发出的调用链中包含相同的Trace ID 和 span ID,并通过Span Exporter导出到控制台。

发送几条请求到该端点,然后稍等片刻或终止应用程序,你将会在控制台输出中看到一些指标,例如以下内容:

{
"resource_metrics": [
{
"resource": {
"attributes": {
"service.name": "unknown_service",
"telemetry.auto.version": "0.34b0",
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.13.0"
},
"schema_url": ""
},
"schema_url": "",
"scope_metrics": [
{
"metrics": [
{
"data": {
"aggregation_temporality": 2,
"data_points": [
{
"attributes": {
"http.flavor": "1.1",
"http.host": "localhost:5000",
"http.method": "GET",
"http.scheme": "http",
"http.server_name": "127.0.0.1"
},
"start_time_unix_nano": 1666077040061693305,
"time_unix_nano": 1666077098181107419,
"value": 0
}
],
"is_monotonic": false
},
"description": "measures the number of concurrent HTTP requests that are currently in-flight",
"name": "http.server.active_requests",
"unit": "requests"
},
{
"data": {
"aggregation_temporality": 2,
"data_points": [
{
"attributes": {
"http.flavor": "1.1",
"http.host": "localhost:5000",
"http.method": "GET",
"http.scheme": "http",
"http.server_name": "127.0.0.1",
"http.status_code": 200,
"net.host.port": 5000
},
"bucket_counts": [0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
"count": 1,
"explicit_bounds": [
0, 5, 10, 25, 50, 75, 100, 250, 500, 1000
],
"max": 1,
"min": 1,
"start_time_unix_nano": 1666077040063027610,
"sum": 1,
"time_unix_nano": 1666077098181107419
}
]
},
"description": "measures the duration of the inbound HTTP request",
"name": "http.server.duration",
"unit": "ms"
}
],
"schema_url": "",
"scope": {
"name": "opentelemetry.instrumentation.flask",
"schema_url": "",
"version": "0.34b0"
}
}
]
}
]
}

手动埋点与自动埋点联动

自动埋点主要用于采集一些常用组件的观测数据,如HTTP请求的信息,但是没法自动观测应用业务的数据。如果需要采集业务的观测数据,需要手动埋点,以上是进行手动埋点和自动埋点联动的例子:

手动埋点采集日志

首先更改上文中的app.py代码,获取当前trace实例,tracer,并使用tracer来新建一条新的trace。

from random import randint
from flask import Flask
from opentelemetry import trace
# Acquire a tracer
tracer = trace.get_tracer("diceroller.tracer")
app = Flask(__name__)
@app.route("/rolldice")
def roll_dice():
return str(roll())
def roll():
# This creates a new span that's the child of the current one
with tracer.start_as_current_span("roll") as rollspan:
res = randint(1, 6)
rollspan.set_attribute("roll.value", res)
return res

重新使用OTel Python探针启动应用:

Terminal window
export OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true
opentelemetry-instrument \
--traces_exporter console \
--metrics_exporter console \
--logs_exporter console \
--service_name dice-server \
flask run -p 8080

这时在向Flask Web服务器发送请求时,您将看到两个Span,如下所示:    

{
"name": "roll",
"context": {
"trace_id": "0x6f781c83394ed2f33120370a11fced47",
"span_id": "0x623321c35b8fa837",
"trace_state": "[]"
},
"kind": "SpanKind.INTERNAL",
"parent_id": "0x09abe52faf1d80d5",
"start_time": "2023-10-10T08:18:28.679261Z",
"end_time": "2023-10-10T08:18:28.679560Z",
"status": {
"status_code": "UNSET"
},
"attributes": {
"roll.value": "6"
},
"events": [],
"links": [],
"resource": {
"attributes": {
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.17.0",
"service.name": "dice-server",
"telemetry.auto.version": "0.38b0"
},
"schema_url": ""
}
}
{
"name": "/rolldice",
"context": {
"trace_id": "0x6f781c83394ed2f33120370a11fced47",
"span_id": "0x09abe52faf1d80d5",
"trace_state": "[]"
},
"kind": "SpanKind.SERVER",
"parent_id": null,
"start_time": "2023-10-10T08:18:28.678348Z",
"end_time": "2023-10-10T08:18:28.679677Z",
"status": {
"status_code": "UNSET"
},
"attributes": {
"http.method": "GET",
"http.server_name": "127.0.0.1",
"http.scheme": "http",
"net.host.port": 8080,
"http.host": "localhost:8080",
"http.target": "/rolldice?rolls=12",
"net.peer.ip": "127.0.0.1",
"http.user_agent": "curl/8.1.2",
"net.peer.port": 58485,
"http.flavor": "1.1",
"http.route": "/rolldice",
"http.status_code": 200
},
"events": [],
"links": [],
"resource": {
"attributes": {
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.17.0",
"service.name": "dice-server",
"telemetry.auto.version": "0.38b0"
},
"schema_url": ""
}
}

roll Span的的parent_id与/rolldice的span_id相同,这表明了其父子关系。

手动采集指标

修改 app.py 文件,在代码中初始化一个meter,并使用meter创建一个Counter类型的指标,用于统计每个可能请求数量。

# These are the necessary import declarations
from opentelemetry import trace
from opentelemetry import metrics
from random import randint
from flask import Flask, request
import logging
# Acquire a tracer
tracer = trace.get_tracer("diceroller.tracer")
# Acquire a meter.
meter = metrics.get_meter("diceroller.meter")
# Now create a counter instrument to make measurements with
roll_counter = meter.create_counter(
"dice.rolls",
description="The number of rolls by roll value",
)
app = Flask(__name__)
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
@app.route("/rolldice")
def roll_dice():
# This creates a new span that's the child of the current one
with tracer.start_as_current_span("roll") as roll_span:
player = request.args.get('player', default = None, type = str)
result = str(roll())
roll_span.set_attribute("roll.value", result)
# This adds 1 to the counter for the given roll value
roll_counter.add(1, {"roll.value": result})
if player:
logger.warn("{} is rolling the dice: {}", player, result)
else:
logger.warn("Anonymous player is rolling the dice: %s", result)
return result
def roll():
return randint(1, 6)

重新启动应用:

Terminal window
export OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true
opentelemetry-instrument \
--traces_exporter console \
--metrics_exporter console \
--logs_exporter console \
--service_name dice-server \
flask run -p 8080

当你向服务器发送请求时,你将在控制台看到请求roll的计数指标被输出,并且每次请求roll都有单独的计数:

{
"resource_metrics": [
{
"resource": {
"attributes": {
"telemetry.sdk.language": "python",
"telemetry.sdk.name": "opentelemetry",
"telemetry.sdk.version": "1.17.0",
"service.name": "dice-server",
"telemetry.auto.version": "0.38b0"
},
"schema_url": ""
},
"scope_metrics": [
{
"scope": {
"name": "opentelemetry.instrumentation.flask",
"version": "0.38b0",
"schema_url": ""
},
"metrics": [
{
"name": "http.server.active_requests",
"description": "measures the number of concurrent HTTP requests that are currently in-flight",
"unit": "requests",
"data": {
"data_points": [
{
"attributes": {
"http.method": "GET",
"http.host": "localhost:8080",
"http.scheme": "http",
"http.flavor": "1.1",
"http.server_name": "127.0.0.1"
},
"start_time_unix_nano": 1696926005694857000,
"time_unix_nano": 1696926063549782000,
"value": 0
}
],
"aggregation_temporality": 2,
"is_monotonic": false
}
},
{
"name": "http.server.duration",
"description": "measures the duration of the inbound HTTP request",
"unit": "ms",
"data": {
"data_points": [
{
"attributes": {
"http.method": "GET",
"http.host": "localhost:8080",
"http.scheme": "http",
"http.flavor": "1.1",
"http.server_name": "127.0.0.1",
"net.host.port": 8080,
"http.status_code": 200
},
"start_time_unix_nano": 1696926005695798000,
"time_unix_nano": 1696926063549782000,
"count": 7,
"sum": 6,
"bucket_counts": [
1, 6, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
],
"explicit_bounds": [
0.0, 5.0, 10.0, 25.0, 50.0, 75.0, 100.0, 250.0, 500.0,
750.0, 1000.0, 2500.0, 5000.0, 7500.0, 10000.0
],
"min": 0,
"max": 1
}
],
"aggregation_temporality": 2
}
}
],
"schema_url": ""
},
{
"scope": {
"name": "diceroller.meter",
"version": "",
"schema_url": ""
},
"metrics": [
{
"name": "dice.rolls",
"description": "The number of rolls by roll value",
"unit": "",
"data": {
"data_points": [
{
"attributes": {
"roll.value": "5"
},
"start_time_unix_nano": 1696926005695491000,
"time_unix_nano": 1696926063549782000,
"value": 3
},
{
"attributes": {
"roll.value": "6"
},
"start_time_unix_nano": 1696926005695491000,
"time_unix_nano": 1696926063549782000,
"value": 1
},
{
"attributes": {
"roll.value": "1"
},
"start_time_unix_nano": 1696926005695491000,
"time_unix_nano": 1696926063549782000,
"value": 1
},
{
"attributes": {
"roll.value": "3"
},
"start_time_unix_nano": 1696926005695491000,
"time_unix_nano": 1696926063549782000,
"value": 1
},
{
"attributes": {
"roll.value": "4"
},
"start_time_unix_nano": 1696926005695491000,
"time_unix_nano": 1696926063549782000,
"value": 1
}
],
"aggregation_temporality": 2,
"is_monotonic": true
}
}
],
"schema_url": ""
}
],
"schema_url": ""
}
]
}

将观测数据发往 OTel Collector

OTel Collector 是大多数生产部署中一个关键的组件。以下是一些使用OTel Collector的优势:

● 一个由多个服务共享的单一可观测数据收集器,以减少切换Exporter的开销

  • 在发往服务端之前可以集中处理trace,避免重复处理操作

● 可以聚合多个服务、多个主机上的Trace

除非您只有一个服务或正在进行测试,否则在生产部署中,推荐您使用收集器

配置并启动一个本地的OTel Collector

首先,将以下OTel Collector配置代码保存到 /tmp/ 目录中的文件中:

/tmp/otel-collector-config.yaml
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
exporters:
# NOTE: Prior to v0.86.0 use `logging` instead of `debug`.
debug:
verbosity: detailed
processors:
batch:
service:
pipelines:
traces:
receivers: [otlp]
exporters: [debug]
processors: [batch]
metrics:
receivers: [otlp]
exporters: [debug]
processors: [batch]
logs:
receivers: [otlp]
exporters: [debug]
processors: [batch]

以上配置将使用otlp 协议来接收用户的输入,即OTel Python探针与OTel Collector之间使用otlp 协议进行通信。并将数据最终打印在OTel Collector控制台。您也可以将数据上报至于Prometheus 或者Jaeger中,详细的配置信息见:OTel Collector配置

然后运行 Docker 命令,根据此配置获取并运行OTel Collector:

Terminal window
docker run -p 4317:4317 \
-v /tmp/otel-collector-config.yaml:/etc/otel-collector-config.yaml \
otel/opentelemetry-collector:latest \
--config=/etc/otel-collector-config.yaml

您现在将在本地运行一个OTel Collector实例,该实例监听4317端口。

修改OTel Python 探针的启动命令,使用OTLP上报Trace和Metrics

下一步是修改命令,使其通过 OTLP 将Trace和Metrics发送到OTel Collector中,而不是打印到控制台。

首先安装OTLP exporter :

Terminal window
pip install opentelemetry-exporter-otlp

opentelemetry-instrument 将会检测到您刚刚安装的包,并在下次运行时默认为 OTLP 导出。

启动应用

像之前一样运行应用程序,但不要打印到控制台:

Terminal window
export OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true
opentelemetry-instrument --logs_exporter otlp flask run -p 8080

默认情况下,opentelemetry-instrument 通过 OTLP/gRPC 导出追踪和指标,并将它们发送到 localhost:4317,即发送到上文启动的OTel Collector中,

此时再访问 /rolldice 地址时,你会在OTel Collector 进程中看到输出,而不是在 Flask 进程中:

Terminal window
2022-06-09T20:43:39.915Z DEBUG debugexporter/debug_exporter.go:51 ResourceSpans #0
Resource labels:
-> telemetry.sdk.language: STRING(python)
-> telemetry.sdk.name: STRING(opentelemetry)
-> telemetry.sdk.version: STRING(1.12.0rc1)
-> telemetry.auto.version: STRING(0.31b0)
-> service.name: STRING(unknown_service)
InstrumentationLibrarySpans #0
InstrumentationLibrary app
Span #0
Trace ID : 7d4047189ac3d5f96d590f974bbec20a
Parent ID : 0b21630539446c31
ID : 4d18cee9463a79ba
Name : roll
Kind : SPAN_KIND_INTERNAL
Start time : 2022-06-09 20:43:37.390134089 +0000 UTC
End time : 2022-06-09 20:43:37.390327687 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> roll.value: INT(5)
InstrumentationLibrarySpans #1
InstrumentationLibrary opentelemetry.instrumentation.flask 0.31b0
Span #0
Trace ID : 7d4047189ac3d5f96d590f974bbec20a
Parent ID :
ID : 0b21630539446c31
Name : /rolldice
Kind : SPAN_KIND_SERVER
Start time : 2022-06-09 20:43:37.388733595 +0000 UTC
End time : 2022-06-09 20:43:37.390723792 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Attributes:
-> http.method: STRING(GET)
-> http.server_name: STRING(127.0.0.1)
-> http.scheme: STRING(http)
-> net.host.port: INT(5000)
-> http.host: STRING(localhost:5000)
-> http.target: STRING(/rolldice)
-> net.peer.ip: STRING(127.0.0.1)
-> http.user_agent: STRING(curl/7.82.0)
-> net.peer.port: INT(53878)
-> http.flavor: STRING(1.1)
-> http.route: STRING(/rolldice)
-> http.status_code: INT(200)
2022-06-09T20:43:40.025Z INFO debugexporter/debug_exporter.go:56 MetricsExporter {"#metrics": 1}
2022-06-09T20:43:40.025Z DEBUG debugexporter/debug_exporter.go:66 ResourceMetrics #0
Resource labels:
-> telemetry.sdk.language: STRING(python)
-> telemetry.sdk.name: STRING(opentelemetry)
-> telemetry.sdk.version: STRING(1.12.0rc1)
-> telemetry.auto.version: STRING(0.31b0)
-> service.name: STRING(unknown_service)
InstrumentationLibraryMetrics #0
InstrumentationLibrary app
Metric #0
Descriptor:
-> Name: roll_counter
-> Description: The number of rolls by roll value
-> Unit:
-> DataType: Sum
-> IsMonotonic: true
-> AggregationTemporality: AGGREGATION_TEMPORALITY_CUMULATIVE
NumberDataPoints #0
Data point attributes:
-> roll.value: INT(5)
StartTimestamp: 2022-06-09 20:43:37.390226915 +0000 UTC
Timestamp: 2022-06-09 20:43:39.848587966 +0000 UTC
Value: 1

如果OTel Collector配置为上报至jaeger,将会出现如下记录:


observability.cn Authors 2024 | Documentation Distributed under CC-BY-4.0
Copyright © 2017-2024, Alibaba. All rights reserved. Alibaba has registered trademarks and uses trademarks.
浙ICP备2021005855号-32