Skip to content

Feature Request: Implement a Self-Healing Sync Queue (Journal) for transient connection failures #4226

@realrellek

Description

@realrellek

Is your enhancement related to a problem? Please describe.

Hello,

Thank you for your feedback on issue #4224.

Following up on that, I'd like to suggest a feature to make the plugin more robust when the ES server is temporarily unreachable. This would solve the root problem of #4224 and replace the manual snippet suggested there. That snippet feels like a workaround rather than a robust solution, and it becomes tedious to run manually across many sites.

Here's my idea: ElasticPress should keep a persistent journal (or queue) for write operations that fail due to connection issues.

This can happen for many reasons:

  • A planned reboot of the ES/OpenSearch server for updates.
  • A brief, intermittent network interruption.
  • Transient DNS lookup failures or routing problems on the WordPress server's side.

I am aware that ES/OpenSearch is designed for high availability via clusters. However, this doesn't account for the client-side network (the WordPress server). As you can imagine, that network's reliability is an unknown, and intermittent issues are often the hardest to debug.

My proposal is to extend the existing read-fallback logic (using MySQL for search) to write operations as well:

  1. When a post is to be inserted, updated, or deleted, EP attempts the operation.
  2. If the connection to the ES server fails (e.g., timeout, connection refused, 503 error), EP should not discard this operation.
  3. Instead, it should add the operation (e.g., {action: 'delete', post_id: 123}) to a persistent journal (e.g., a custom table or a persistent option).
  4. A recurring WP-Cron job then checks (e.g., every 5 minutes) if the ES server is reachable and if the journal contains pending jobs.
  5. If both are true, it processes the jobs from the journal, retrying them until they succeed, effectively making the sync "self-healing."

This would make the plugin significantly more robust and provide a much better Quality-of-Life experience for all users: self-hosters and elasticpress.io customers alike.

While your own hosting is no doubt highly available, you cannot control the reliability of your customers' WordPress server networks. This feature would transparently handle those intermittent issues, likely reducing support tickets related to "out-of-sync" indexes caused by temporary network blips.

Thank you for considering this proposal and for your great work on the plugin!

Designs

No response

Describe alternatives you've considered

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions