Diagnostic Commands
Before troubleshooting, gather diagnostic information:Common Issues
Operator Not Starting
Pod in CrashLoopBackOff
Symptoms:- Pod repeatedly restarts
- Status shows
CrashLoopBackOff
Invalid or missing token
Invalid or missing token
Error message:Solution:
- Verify token is correctly set in values file
- Ensure token hasn’t been truncated
- Generate a new token from the dashboard if needed
Cannot connect to server
Cannot connect to server
Error message:Solution:
- Check network policies or firewalls
- Verify egress is allowed to
grpc.platform.usekestrel.ai:443 - Test connectivity:
Insufficient permissions
Insufficient permissions
Error message:Solution:
- Verify RBAC is properly configured
- Check ClusterRole and ClusterRoleBinding:
Pod in Pending State
Symptoms:- Pod stays in
Pendingstatus - Not scheduled to any node
- Check resource availability:
- Review pod events:
- Adjust resource requests if needed:
Connection Issues
Operator Shows as Offline
Dashboard shows cluster as offline but pod is running Diagnostic steps:- Check operator logs for connection errors:
- Verify gRPC stream health:
- Check liveness probe:
- Restart the operator pod:
- Verify network connectivity to Kestrel platform
- Check for proxy or firewall configurations
Flow Collection Issues
No Flows from Cilium
Symptoms:- Operator is connected but no network flows appear
- Topology map shows no flows
- Verify Hubble is enabled:
- Check Hubble Relay is running:
- Test Hubble connectivity:
- Enable Hubble in Cilium:
No Flows from Istio
Symptoms:- Istio is enabled but no L7 flows appear on Topology map
- ALS endpoint not receiving data
- Verify Istio telemetry configuration:
- Check Envoy configuration:
- Verify ALS port is accessible:
- Configure Istio telemetry to send logs to operator:
- Ensure operator service exposes ALS port:
Safe-Apply Issues
Permissions Denied
Symptoms:- Safe-apply enabled but cannot apply resources
- Error: “cannot create/update/delete resource”
-
Verify safe-apply is enabled in both places:
- Dashboard: Check cluster safe-apply toggle
- Helm values:
operator.safeApply.enabled: true
- Check RBAC permissions:
- Re-deploy with correct permissions:
Getting Help
If you can’t resolve an issue:- Check documentation: Review configuration and onboarding guides
- Collect diagnostics: Run the support bundle script
- Contact support: Email hello@usekestrel.ai with:
- Cluster name and ID
- Support bundle
- Description of the issue
- Steps to reproduce
FAQ
Can I run multiple operators in one cluster?
Can I run multiple operators in one cluster?
No, only one Kestrel operator should run per cluster. Multiple operators would cause conflicts and duplicate data.
How do I change the cluster name after deployment?
How do I change the cluster name after deployment?
The cluster name is tied to the authentication token. To change it:
- Generate a new token with the desired name
- Update your values file with the new token
- Upgrade the helm release
What ports need to be open for the operator?
What ports need to be open for the operator?
Outbound:
- 443/tcp to grpc.platform.usekestrel.ai (gRPC)
- 4245/tcp to Hubble Relay (if using Cilium)
- 8080/tcp from Envoy proxies (if using Istio)
- 8081/tcp for health checks (internal)
Can I pause flow collection temporarily?
Can I pause flow collection temporarily?
Yes, you can disable flows without removing the operator:Then upgrade the helm release.