Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(apps): export logs to open telemetry endpoint #1617

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 62 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,11 +126,12 @@ These health checks are integrated with Azure Container Apps' health probe syste

## Observability with OpenTelemetry

This project uses OpenTelemetry for distributed tracing and metrics collection. The setup includes:
This project uses OpenTelemetry for distributed tracing, metrics collection, and logging. The setup includes:

### Core Features
- Distributed tracing across services
- Runtime and application metrics
- Log aggregation and correlation
- Integration with Azure Monitor/Application Insights
- Support for both OTLP and Azure Monitor exporters
- Automatic instrumentation for:
Expand All @@ -157,15 +158,72 @@ OpenTelemetry is configured through environment variables that are automatically
### Local Development

For local development, the project includes a docker-compose setup with:
- OpenTelemetry Collector
- Grafana
- Other supporting services
- OpenTelemetry Collector (ports 4317/4318 for OTLP receivers)
- Grafana (port 3000)
- Jaeger (port 16686)
- Loki (port 3100)
- Prometheus (port 9090)

To run the local observability stack:
```bash
podman compose -f docker-compose-otel.yml up
```

### Accessing Observability Tools

Once the local stack is running, you can access the following tools:

#### Distributed Tracing with Jaeger
- URL: http://localhost:16686
- Features:
- View distributed traces across services
- Search by service, operation, or trace ID
- Analyze timing and dependencies
- Debug request flows and errors

#### Metrics with Prometheus
- URL: http://localhost:9090
- Features:
- Query raw metrics data
- View metric targets and service discovery
- Debug metric collection

#### Log Aggregation with Loki
- Direct URL: http://localhost:3100
- Grafana Integration: http://localhost:3000 (preferred interface)
- Features:
- Search and filter logs across all services
- Correlate logs with traces using trace IDs
- Create log-based alerts and dashboards
- Use LogQL to query logs:
```logql
# Example: Find all error logs
{container="web-api"} |= "error"

# Example: Find logs with specific trace ID
{container=~"web-api|graphql"} |~ "trace_id=([a-f0-9]{32})"
```

#### Metrics and Dashboards in Grafana
- URL: http://localhost:3000
- Features:
- Pre-configured dashboards for:
- Application metrics
- Runtime metrics
- HTTP request metrics
- Data sources:
- Prometheus (metrics)
- Loki (logs)
- Jaeger (traces)
- Create custom dashboards
- Set up alerts

#### OpenTelemetry Collector Endpoints
- OTLP gRPC receiver: localhost:4317
- OTLP HTTP receiver: localhost:4318
- Prometheus metrics: localhost:8888
- Prometheus exporter metrics: localhost:8889

### Request Filtering

The telemetry setup includes smart filtering to:
Expand Down
23 changes: 23 additions & 0 deletions docker-compose-otel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,12 @@ services:
- "14250:14250" # Model used by collector
environment:
- COLLECTOR_OTLP_ENABLED=true
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "localhost:16686"]
interval: 3s
timeout: 3s
retries: 10
start_period: 10s

# Prometheus for metrics
prometheus:
Expand All @@ -31,6 +37,21 @@ services:
ports:
- "9090:9090"

# Loki for log aggregation
loki:
image: grafana/loki:3.2.2
ports:
- "3100:3100"
volumes:
- ./local-otel-configuration/loki-config.yaml:/etc/loki/local-config.yaml
command: -config.file=/etc/loki/local-config.yaml
healthcheck:
test: ["CMD-SHELL", "wget -q --tries=1 -O- http://localhost:3100/ready"]
interval: 3s
timeout: 3s
retries: 10
start_period: 10s

# Grafana for metrics visualization
grafana:
image: grafana/grafana:11.4.0
Expand All @@ -43,3 +64,5 @@ services:
- ./local-otel-configuration/grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml
- ./local-otel-configuration/grafana-dashboards.yml:/etc/grafana/provisioning/dashboards/dashboards.yml
- ./local-otel-configuration/dashboards:/etc/grafana/provisioning/dashboards
depends_on:
- loki
50 changes: 23 additions & 27 deletions local-otel-configuration/dashboards/runtime-metrics.json
Original file line number Diff line number Diff line change
Expand Up @@ -85,13 +85,19 @@
"type": "timeseries",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "Prometheus"
},
"expr": "dialogporten_process_runtime_dotnet_gc_heap_size_bytes",
"legendFormat": "Heap Size",
"refId": "A"
},
{
"expr": "dialogporten_process_runtime_dotnet_gc_committed_memory_size_bytes",
"legendFormat": "Committed Memory",
"refId": "B"
},
{
"expr": "dialogporten_dotnet_process_memory_working_set_bytes",
"legendFormat": "Working Set",
"refId": "C"
}
]
},
Expand Down Expand Up @@ -171,13 +177,14 @@
"type": "timeseries",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "Prometheus"
},
"expr": "rate(dialogporten_process_runtime_dotnet_gc_collections_count_total[5m])",
"legendFormat": "Gen {{generation}}",
"legendFormat": "Collections/sec",
"refId": "A"
},
{
"expr": "rate(dialogporten_process_runtime_dotnet_gc_duration_nanoseconds_total[5m])",
"legendFormat": "GC Duration/sec",
"refId": "B"
}
]
},
Expand Down Expand Up @@ -257,22 +264,19 @@
"type": "timeseries",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "Prometheus"
},
"expr": "dialogporten_process_runtime_dotnet_thread_pool_queue_length",
"legendFormat": "Queue Length",
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "Prometheus"
},
"expr": "dialogporten_process_runtime_dotnet_thread_pool_threads_count",
"legendFormat": "Thread Count",
"refId": "B"
},
{
"expr": "rate(dialogporten_process_runtime_dotnet_thread_pool_completed_items_count_total[5m])",
"legendFormat": "Completed Items/sec",
"refId": "C"
}
]
},
Expand Down Expand Up @@ -352,20 +356,12 @@
"type": "timeseries",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "Prometheus"
},
"expr": "rate(dialogporten_process_runtime_dotnet_exceptions_count_total[$__rate_interval])",
"expr": "rate(dialogporten_process_runtime_dotnet_exceptions_count_total[5m])",
"legendFormat": "Exceptions/sec",
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "Prometheus"
},
"expr": "rate(dialogporten_process_runtime_dotnet_monitor_lock_contention_count_total[$__rate_interval])",
"expr": "rate(dialogporten_process_runtime_dotnet_monitor_lock_contention_count_total[5m])",
"legendFormat": "Lock Contentions/sec",
"refId": "B"
}
Expand Down
9 changes: 8 additions & 1 deletion local-otel-configuration/grafana-datasources.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,11 @@ datasources:
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
isDefault: true

- name: Loki
type: loki
access: proxy
url: http://loki:3100
jsonData:
maxLines: 1000
45 changes: 45 additions & 0 deletions local-otel-configuration/loki-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
auth_enabled: false

server:
http_listen_port: 3100

common:
path_prefix: /tmp/loki

compactor:
working_directory: /tmp/loki/compactor
compaction_interval: 10m

ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 5m
chunk_retain_period: 30s

schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h

storage_config:
tsdb_shipper:
active_index_directory: /tmp/loki/tsdb-index
cache_location: /tmp/loki/tsdb-cache
cache_ttl: 24h
filesystem:
directory: /tmp/loki/chunks

limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h
allow_structured_metadata: true
4 changes: 3 additions & 1 deletion local-otel-configuration/otel-collector-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ exporters:
verbosity: detailed
sampling_initial: 5
sampling_thereafter: 200
otlphttp:
endpoint: "http://loki:3100/otlp"

extensions:
health_check:
Expand All @@ -49,4 +51,4 @@ service:
logs:
receivers: [otlp]
processors: [batch]
exporters: [debug]
exporters: [otlphttp, debug]
Loading
Loading