Confluence Unable to Locate Hazelcast Members After Adding Outbound HTTP & HTTPS Proxy to Kubernetes Deployment
Platform Notice: Data Center - This article applies to Atlassian products on the Data Center platform.
Note that this knowledge base article was created for the Data Center version of the product. Data Center knowledge base articles for non-Data Center-specific features may also work for Server versions of the product, however they have not been tested. Support for Server* products ended on February 15th 2024. If you are running a Server product, you can visit the Atlassian Server end of support announcement to review your migration options.
*Except Fisheye and Crucible
Summary
Confluence is unable to locate Hazelcast members after the addition of an Outbound HTTP & HTTPS Proxy to a Kubernetes deployment. This issue persists even in a single-node cluster configuration with Outbound HTTP & HTTPS Proxy. When configuring Outbound HTTP & HTTPS Proxy in the Confluence pod using additionalJvmArgs in the YAML file, Confluence fails to start entirely, resulting in the following error in the logs: "Kube API failed to execute the REST call.
2024-04-15 08:05:34,074 WARN [Catalina-utility-1] [com.hazelcast.kubernetes.RetryUtils] log Couldn't discover Hazelcast members using Kubernetes API, [1] retrying in 1 seconds...
2024-04-15 08:05:35,589 WARN [Catalina-utility-1] [com.hazelcast.kubernetes.RetryUtils] log Couldn't discover Hazelcast members using Kubernetes API, [2] retrying in 2 seconds...
2024-04-15 08:05:37,854 WARN [Catalina-utility-1] [com.hazelcast.kubernetes.RetryUtils] log Couldn't discover Hazelcast members using Kubernetes API, [3] retrying in 3 seconds...
2024-04-15 08:05:41,247 WARN [Catalina-utility-1] [spi.discovery.integration.DiscoveryService] log [x.x.x.x]:5701 [confluence] [3.12.14-atlassian-2] Cannot fetch the current zone, ZONE_AWARE feature is disabled
2024-04-15 08:05:55,613 ERROR [Catalina-utility-1] [internal.cluster.impl.DiscoveryJoiner] log [x.x.x.x]:5701 [confluence] [3.12.14-atlassian-2] Failure in executing REST call
com.hazelcast.kubernetes.RestClientException: Failure in executing REST call
...
Caused by: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
...
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Environment
Kubernetes
Cause
Hazelcast Kubernetes node discovery operates through the (default) "kubernetes.default.svc" DNS entry. In Kubernetes, "kubernetes.default.svc" is a DNS entry that resolves to the internal IP address of the Kubernetes API server within the cluster. It serves as a crucial component for facilitating communication between applications and Kubernetes services within the cluster. This DNS entry forms part of Kubernetes' default service discovery mechanism, enabling applications to interact seamlessly with the Kubernetes API and other services running in the cluster.
However, when an Outbound HTTP & HTTPS Proxy is configured in Confluence using additionalJvmArgs, Confluence encounters difficulties in locating the Hazelcast member and executing internal Kubernetes API calls for cluster node discovery. Upon inspecting the Confluence Pod logs, the issue is evident, with Hazelcast Kubernetes API discovery failing to execute as expected.
2024-04-15 08:05:34,074 WARN [Catalina-utility-1] [com.hazelcast.kubernetes.RetryUtils] log Couldn't discover Hazelcast members using Kubernetes API, [1] retrying in 1 seconds...
2024-04-15 08:05:35,589 WARN [Catalina-utility-1] [com.hazelcast.kubernetes.RetryUtils] log Couldn't discover Hazelcast members using Kubernetes API, [2] retrying in 2 seconds...
2024-04-15 08:05:37,854 WARN [Catalina-utility-1] [com.hazelcast.kubernetes.RetryUtils] log Couldn't discover Hazelcast members using Kubernetes API, [3] retrying in 3 seconds...
2024-04-15 08:05:41,247 WARN [Catalina-utility-1] [spi.discovery.integration.DiscoveryService] log [x.x.x.x]:5701 [confluence] [3.12.14-atlassian-2] Cannot fetch the current zone, ZONE_AWARE feature is disabled
2024-04-15 08:05:55,613 ERROR [Catalina-utility-1] [internal.cluster.impl.DiscoveryJoiner] log [x.x.x.x]:5701 [confluence] [3.12.14-atlassian-2] Failure in executing REST call
com.hazelcast.kubernetes.RestClientException: Failure in executing REST call
...
Caused by: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
...
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Solution
To resolve the issue, we need to ensure that the Kubernetes API endpoint kubernetes.default.svc
is excluded. We should generally exclude the Base URL as well.
- Update the Helm Chart as such, where we added the HTTP non-proxy hosts with Confluence Base URL (Domain name) and Kubernetes.default.svc DNS entry which we are excluding from the Outbound HTTP & HTTPS Proxy.
confluence:
additionalJvmArgs:
- "-Dhttp.proxyHost=www.myproxy.com"
- "-Dhttp.proxyPort=8080"
- "-Dhttps.proxyHost=www.myproxy.com"
- "-Dhttps.proxyPort=8080"
- "-Dhttp.nonProxyHosts='<Confluence-Base-URL>|kubernetes.default.svc'"
2. After this change, the Confluence Kubernetes cluster is ready to deploy.