Назад към всички

kubernetes-operations

// Kubernetes cluster operations on minikube including observability (Grafana, Prometheus, Alertmanager, Loki, Tempo), debugging (kubectl debug, ephemeral containers), and cluster management (ArgoCD). Use when working with cluster/manifests/, Kubernetes workloads, pods, deployments, operators, controll

$ git log --oneline --stat
stars:2
forks:0
updated:March 2, 2026
SKILL.mdreadonly
SKILL.md Frontmatter
namekubernetes-operations
descriptionKubernetes cluster operations on minikube including observability (Grafana, Prometheus, Alertmanager, Loki, Tempo), debugging (kubectl debug, ephemeral containers), and cluster management (ArgoCD). Use when working with cluster/manifests/, Kubernetes workloads, pods, deployments, operators, controllers, or cluster components.
contextfork
keywordskubernetes, k8s, minikube, grafana, prometheus, loki, tempo, argocd, pod, クラスタ, 監視, kubectl, deployment
  • Access Grafana at https://grafana.minikube.127.0.0.1.nip.io
  • Disable ArgoCD selfHeal before manual changes, re-enable after

Debugging

InvestigatingToolEntry Point
Traces, metrics, logs, profilesGrafanahttps://grafana.minikube.127.0.0.1.nip.io
Pod startup failures, eventskubectlkubectl get events -n <namespace>
In-container process/networkkubectl debugEphemeral container

Observability Investigation (via Grafana)

Use Grafana for all observability signals. Do NOT query backends (Tempo, Mimir, Loki) directly.

  1. Get query parameters - Check cluster/manifests/<app>/ for namespace, labels, OTEL_SERVICE_NAME (see Queries for parameter locations)
  2. Open Grafana - Access https://grafana.minikube.127.0.0.1.nip.io in browser
  3. Select datasource and query - Use appropriate signal:
SignalBackendDatasourceQuery LanguageUse Case
TracesTempoTempoTraceQLRequest flow, latency, access destinations
MetricsMimir (Prometheus)MimirPromQLResource usage, HTTP rates, alerting
LogsLokiLokiLogQLError investigation, audit
ProfilesPyroscopePyroscopeFlamegraph UICPU/memory hotspots
ProbesBlackbox ExporterMimirPromQLEndpoint reachability
SymptomSignalQuery Approach
Errors in logsLoki → TempoExtract traceid from logs, trace in Tempo
Latency/5xxTempoSearch traces with status = error
Unknown outbound dependenciesTempoSearch traces, inspect spans for outbound calls
Resource saturationMimirQuery CPU/memory metrics
High CPU/memoryPyroscopeCheck flamegraphs

Pod Direct Investigation (via kubectl)

Use kubectl when Grafana cannot answer the question (e.g., pod not starting, container-level inspection).

SymptomAction
Pod not startingkubectl get events -n <namespace>
CrashLoopBackOffkubectl logs <pod> -n <namespace> --previous
Network connectivitykubectl debug with ephemeral container
Process inspectionkubectl debug with ephemeral container

Ephemeral Container

kubectl debug <pod-name> -n <namespace> \
  --profile=restricted \
  --image=ghcr.io/hippocampus-dev/hippocampus/ephemeral-container:main \
  --target=<container-name> \
  -- <command>

Note: Do not use -it flag when executing commands. It causes output streaming issues.

Manual Changes

Required when directly modifying live cluster resources (e.g., kubectl apply, kubectl patch, kubectl delete) outside of the GitOps workflow.

OperationRequires Manual Changes
kubectl apply/patch/deleteYes
Debugging (read-only: logs, events, debug)No
Grafana queriesNo
Editing manifests in repoNo (ArgoCD syncs automatically)

ArgoCD selfHeal Control

ArgoCD reverts manual changes unless selfHeal is disabled first. Always re-enable after.

# Disable selfHeal
kubectl patch application <app-name> -n argocd --type=merge \
  -p '{"spec":{"syncPolicy":{"selfHeal":false}}}'

# Re-enable selfHeal (after work is complete)
kubectl patch application <app-name> -n argocd --type=merge \
  -p '{"spec":{"syncPolicy":{"selfHeal":true}}}'

External Service Access

Services exposed via Istio Gateway follow a tiered host naming convention:

HostAuthenticationAccess
{service}.minikube.127.0.0.1.nip.ioNoneLocal development
{service}.kaidotio.devOAuth2 (ext-authz)Browser access
{service}-public.kaidotio.devCustom (JWT, method restriction, etc.)Programmatic access excluded from OAuth2

The -public variant is used when OAuth2 is technically impossible (e.g., W3C Reporting API browser-generated requests) or when the service implements its own AuthorizationPolicy (e.g., GitHub Actions OIDC token validation).

To access a service's data locally, use https://{service}.minikube.127.0.0.1.nip.io.

Observability Stack Manifests

ComponentPath
Grafanacluster/manifests/grafana/
Tempocluster/manifests/tempo/
Mimircluster/manifests/mimir/
Lokicluster/manifests/loki/
Pyroscopecluster/manifests/pyroscope/
Prometheuscluster/manifests/prometheus/
Fluentdcluster/manifests/fluentd/
OpenTelemetrycluster/manifests/otel-agent/, cluster/manifests/otel-collector/

Reference

If writing observability queries: See Queries