Kestrel AI Documentation

Diagnostic Commands

Before troubleshooting, gather diagnostic information:

# Check operator pod status
kubectl get pods -n kestrel-ai -l app=kestrel-operator

# View operator logs
kubectl logs -n kestrel-ai -l app=kestrel-operator --tail=100

# Describe pod for events
kubectl describe pod -n kestrel-ai -l app=kestrel-operator

# Check service account permissions
kubectl auth can-i --list --as=system:serviceaccount:kestrel-ai:kestrel-operator

Common Issues

Operator Not Starting

Pod in CrashLoopBackOff

Symptoms:

Pod repeatedly restarts
Status shows CrashLoopBackOff

Check logs:

kubectl logs -n kestrel-ai -l app=kestrel-operator --previous

Common causes and solutions:

Invalid or missing token

Error message:

Error: authentication failed: invalid token

Solution:

Verify token is correctly set in values file
Ensure token hasn’t been truncated
Generate a new token from the dashboard if needed

# Check current token (first few characters)
kubectl get secret -n kestrel-ai kestrel-operator -o jsonpath='{.data.token}' | base64 -d | head -c 20

Cannot connect to server

Error message:

Failed to connect to server: connection refused

Solution:

Check network policies or firewalls
Verify egress is allowed to grpc.platform.usekestrel.ai:443
Test connectivity:

kubectl run test-connection --rm -i --tty --image=busybox -- sh
# Inside the pod:
nc -zv grpc.platform.usekestrel.ai 443

Insufficient permissions

Error message:

Error creating informer: forbidden: User "system:serviceaccount:kestrel-ai:kestrel-operator" cannot list resource

Solution:

Verify RBAC is properly configured
Check ClusterRole and ClusterRoleBinding:

kubectl get clusterrole kestrel-operator
kubectl get clusterrolebinding kestrel-operator

Pod in Pending State

Symptoms:

Pod stays in Pending status
Not scheduled to any node

Solutions:

Check resource availability:

kubectl describe nodes | grep -A 5 "Allocated resources"

Review pod events:

kubectl describe pod -n kestrel-ai -l app=kestrel-operator | grep -A 10 Events

Adjust resource requests if needed:

resources:
  requests:
    cpu: 100m      # Lower CPU request
    memory: 256Mi  # Lower memory request

Connection Issues

Operator Shows as Offline

Dashboard shows cluster as offline but pod is running Diagnostic steps:

Check operator logs for connection errors:

kubectl logs -n kestrel-ai -l app=kestrel-operator | grep -i "error\|failed\|connection"

Verify gRPC stream health:

kubectl logs -n kestrel-ai -l app=kestrel-operator | grep "stream"

Check liveness probe:

kubectl get pod -n kestrel-ai -l app=kestrel-operator -o json | jq '.items[0].status.conditions'

Common solutions:

Restart the operator pod:

kubectl rollout restart deployment/kestrel-operator -n kestrel-ai

Verify network connectivity to Kestrel platform
Check for proxy or firewall configurations

Flow Collection Issues

No Flows from Cilium

Symptoms:

Operator is connected but no network flows appear
Topology map shows no flows

Diagnostic steps:

Verify Hubble is enabled:

cilium hubble status

Check Hubble Relay is running:

kubectl get pods -n kube-system -l k8s-app=hubble-relay

Test Hubble connectivity:

kubectl exec -n kestrel-ai deployment/kestrel-operator -- nc -zv hubble-relay.kube-system.svc.cluster.local 4245

Solutions:

Enable Hubble in Cilium:

cilium hubble enable --relay

No Flows from Istio

Symptoms:

Istio is enabled but no L7 flows appear on Topology map
ALS endpoint not receiving data

Diagnostic steps:

Verify Istio telemetry configuration:

kubectl get telemetry -A

Check Envoy configuration:

kubectl exec -n <namespace> <pod> -c istio-proxy -- curl -s localhost:15000/config_dump | grep access_log

Verify ALS port is accessible:

kubectl get svc -n kestrel-ai kestrel-operator -o yaml | grep -A 5 ports

Solutions:

Configure Istio telemetry to send logs to operator:

apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: kestrel-als
  namespace: istio-system
spec:
  accessLogging:
  - providers:
    - name: kestrel

Ensure operator service exposes ALS port:

operator:
  istio:
    enabled: true
    alsPort: 8080

Safe-Apply Issues

Permissions Denied

Symptoms:

Safe-apply enabled but cannot apply resources
Error: “cannot create/update/delete resource”

Solutions:

Verify safe-apply is enabled in both places:
- Dashboard: Check cluster safe-apply toggle
- Helm values: operator.safeApply.enabled: true
Check RBAC permissions:

# List operator permissions
kubectl auth can-i create networkpolicies --as=system:serviceaccount:kestrel-ai:kestrel-operator

# Check ClusterRole
kubectl get clusterrole kestrel-operator -o yaml | grep -A 10 rules

Re-deploy with correct permissions:

helm upgrade kestrel-operator \
  oci://ghcr.io/kestrelai/charts/kestrel-operator \
  --namespace kestrel-ai \
  --set operator.safeApply.enabled=true \
  -f values.yaml

Getting Help

If you can’t resolve an issue:

Check documentation: Review configuration and onboarding guides
Collect diagnostics: Run the support bundle script
Contact support: Email hello@usekestrel.ai with:
- Cluster name and ID
- Support bundle
- Description of the issue
- Steps to reproduce

FAQ

Can I run multiple operators in one cluster?

No, only one Kestrel operator should run per cluster. Multiple operators would cause conflicts and duplicate data.

How do I change the cluster name after deployment?

The cluster name is tied to the authentication token. To change it:

Generate a new token with the desired name
Update your values file with the new token
Upgrade the helm release

What ports need to be open for the operator?

Outbound:

443/tcp to grpc.platform.usekestrel.ai (gRPC)
4245/tcp to Hubble Relay (if using Cilium)

Inbound:

8080/tcp from Envoy proxies (if using Istio)
8081/tcp for health checks (internal)

Can I pause flow collection temporarily?

Yes, you can disable flows without removing the operator:

operator:
  cilium:
    disableFlows: true

Then upgrade the helm release.

Get Started

Kubernetes

Cloud

Integrations

On-Premise Deployment

Troubleshooting Guide

Diagnostic Commands

Common Issues

Operator Not Starting

Pod in CrashLoopBackOff

Pod in Pending State

Connection Issues

Operator Shows as Offline

Flow Collection Issues

No Flows from Cilium

No Flows from Istio

Safe-Apply Issues

Permissions Denied

Getting Help

FAQ

Next Steps

Get Started

Kubernetes

Cloud

Integrations

On-Premise Deployment

​Diagnostic Commands

​Common Issues

​Operator Not Starting

​Pod in CrashLoopBackOff

​Pod in Pending State

​Connection Issues

​Operator Shows as Offline

​Flow Collection Issues

​No Flows from Cilium

​No Flows from Istio

​Safe-Apply Issues

​Permissions Denied

​Getting Help

​FAQ

​Next Steps

Diagnostic Commands

Common Issues

Operator Not Starting

Pod in CrashLoopBackOff

Pod in Pending State

Connection Issues

Operator Shows as Offline

Flow Collection Issues

No Flows from Cilium

No Flows from Istio

Safe-Apply Issues

Permissions Denied

Getting Help

FAQ

Next Steps