Unique Constraint Violation on Synchrony SNAPSHOTS Table
Summary
Synchrony logs show multiple errors about a duplicate key value violates unique constraint
related to SNAPSHOTS_pkey
Environment
Confluence Data Center with Synchrony having its own Cluster (with at least 2 nodes).
Diagnosis
The atlassian-synchrony.log
files are crowded with the following entries:
{"synchrony":{"message":"synchrony.data [warn] error persisting snapshots","ns":"synchrony.data","level":"warn","throwable":"clojure.lang.ExceptionInfo: duplicate key {:type :duplicate-key, :key \"snapshot|0.confluence$content$123456789.2|/Synchrony-4dc1234-1234-12e3-1234-a1b23d40e00a/confluence-123456789\"}
...
Caused by: com.mysema.query.QueryException: Caught PSQLException for insert into \"SNAPSHOTS\" (\"key\", \"value\") values (?, ?)\r\n\tat com.mysema.query.sql.DefaultSQLExceptionTranslator.translate(DefaultSQLExceptionTranslator.java:38)
...
Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint \"SNAPSHOTS_pkey\"\n Detail: Key (key)=(snapshot|0.confluence$content$123456789.2|/Synchrony-1dc23456-1234-12e3-1234-a1b23d40e00a/confluence-123456789) already exists.
Aside from the exceptions above, the following entries indicating a network issue can also be observed:
{"synchrony":{"member-id":"12cb3a4a-1f23-1234-123f-1d2345de6a7f","ns":"synchrony.aleph-cluster","level":"info","status":"open","message":"synchrony.aleph-cluster [info] connection transition"},"message":"synchrony.aleph-cluster [info] connection transition"}
...
[10.1.123.456]:5701 [Confluence-Synchrony] [3.7.4] Connection[id=3, /10.1.123.456:56436->/10.1.123.457:5701, endpoint=[10.1.123.457]:5701, alive=false, type=MEMBER] closed. Reason: Connection closed by the other side
...
[10.1.123.456]:5701 [Confluence-Synchrony] [3.7.4] Connecting to /10.1.123.457:5701, timeout: 0, bind-any: true
A network issue between the synchrony can also result in clojure.lang.ExceptionInfo: no such sequence
and left-merge revision not found errors
in the synchrony logs as shown below:
{"timestamp":"2020-11-19T21:43:59,583Z","level":"WARN","thread":"async-dispatch-7","logger":"synchrony.sync.hub","message":{"synchrony":{"message":"synchrony.sync.hub [warn] error in hub process","entity":"/Synchrony-1c9a3ac6-dfa2-3fcd-9576-b201658ad52d/confluence-660842486","ns":"synchrony.sync.hub","throwable":"clojure.lang.ExceptionInfo: no such sequence {:message \"no such sequence\", :type :no-such-sequence, :from #synchrony.history.rev{:origin \"fjSNjMCkMBYvol8ejCnJcpM\", :sequence 1, :partition 2}, :to #synchrony.history.rev{:origin \"q5fSYsRl_dm1c4AxkTBmng\", :sequence 0, :partition 3}}\n\tat clojure.core$ex_info.invokeStatic(core.clj:4725)\n\tat ginga.core$throwable.invokeStatic(core.cljc:326)\n\tat ginga.core$throw_map.invokeStatic(core.cljc:331)\n\tat synchrony.sync.hub$init_in_state_from_rev$fn__42493.invoke(hub.clj:365)\n\tat synchrony.sync.hub.(take?)(hub.clj:396)\n\tat synchrony.sync.hub$init$fn__42595.invoke(hub.clj:386)\n\tat synchrony.sync.hub.(take?)(hub.clj:409)\n\tat synchrony.sync.hub$fn__42640$fn__42770.invoke(hub.clj:400)\n\tat synchrony.sync.hub.(take?)(hub.clj:762)\n\tat synchrony.sync.hub$process_message$fn__44225.invoke(hub.clj:754)\n\tat clojure.lang.AFn.run(AFn.java:22)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n","level":"warn"}},"location":{"class":"synchrony.logging$eval69$fn__73","method":"invoke","line":"0"}}
{"timestamp":"2020-11-13T00:43:15,401Z","level":"WARN","thread":"async-dispatch-17","logger":"synchrony.http.entity-api","message":{"synchrony":{"message":"synchrony.http.entity-api [warn] Error in put-entity","entity":"/Synchrony-1c9a3ac6-dfa2-3fcd-9576-b201658ad52d/confluence-665492179","id":"Rr4jC6RLca-tywqIvGGPpQ","ns":"synchrony.http.entity-api","throwable":"clojure.lang.ExceptionInfo: left-merge revision not found {:type :server-error, :source :server}\n\tat clojure.core$ex_info.invokeStatic(core.clj:4725)\n\tat synchrony.sync.messages$ex_info_from_error_message.invokeStatic(messages.cljc:29)\n\tat synchrony.sync.connection$request_BANG_$fn__31266.invoke(connection.cljc:92)\n\tat synchrony.http.entity-api.(take?)(entity_api.clj:493)\n\tat synchrony.http.entity_api$content_reconciliation$fn__48540.invoke(entity_api.clj:472)\n\tat synchrony.http.entity-api.(take?)(entity_api.clj:536)\n\tat synchrony.http.entity_api$put_revision_handler$fn__48739.invoke(entity_api.clj:518)\n\tat clojure.lang.AFn.run(AFn.java:22)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n","level":"warn"}},"location":{"class":"synchrony.logging$eval69$fn__73","method":"invoke","line":"0"}}
Restarting a synchrony node will reconfigure the cluster and if there are network issues an error like the following will be present in the synchrony logs, for example:
{"timestamp":"2020-12-02T00:50:04,681Z","level":"INFO","thread":"hz._hzInstance_1_Confluence-Synchrony.cached.thread-1","logger":"com.hazelcast.nio.tcp.InitConnectionTask","message":"[10.133.40.181]:5701 [Confluence-Synchrony] [3.7.4] Could not connect to: /20.131.13.10:5701. Reason: SocketException[Connection timed out to address /20.131.13.10:5701]","location":{"class":"com.hazelcast.logging.Log4jFactory$Log4jLogger","method":"log","line":"50"}}
Cause
This problem happens because Synchrony nodes are not able to talk to each other over the network (network issues).
Workaround
The workaround for this scenario, if no network issue is identified, is simply to restart all Synchrony nodes in the attempt to make them communicate properly again.
Resolution
We need to ensure that all Synchrony nodes are communicating properly with one another.
A good validation measurement is to telnet
from one node to another on port 5701
(Hazelcast Synchrony Port) and 25500 (Synchrony Aleph port). If connections are not established, then the log entries above will crowd your log files. More details on the port requirements for DC here: