Multi-writer support with Apache Hudi

6 min readJun 25, 2023

Apache Hudi has multi-writer support from 0.8.0. Essentially, two writers can concurrently write to hudi and successfully commit given they are updating different set of data. This is very critical for advanced users who might wish to speed up the throughput of the system in some cases or those having source data being ingested from multiple sources and each source writes to a different set of data within the same Hudi table. Thought will do a short write up on multi-writer support in Apache Hudi.

Concurrency Control

OCC(Optimistic Concurrency Control) and MVCC (Multi version Concurrency Control) are well known concurrency control mechanisms. OCC means, each writer optimistically proceeds assuming it might succeed and finally before committing will go through a conflict resolution step to determine whether to commit or abort. So, with OCC, if two writers are concurrently writing to the same Hudi table, updating the same set of data, one of them will succeed, while the other will abort in the end, detecting that there was a concurrent successfully completed operation.

Even before adding multi-writer support for regular writers in 0.8.0, Hudi has employed MVCC to achieve async execution of table services. We should definitely discuss the async MVCC design in some other blog, let’s just focus on multi-writer support in this blog.

Multi-writer support with Apache Hudi

Here is the RFC for the multi-writer support in Hudi. Feel free to give it a read when you can as it covers a lot more breadth on multi-writing. Hudi has a Conflict Resolution component to assist in carrying out multiple concurrent writes/table services. CR takes care of deducing conflicting operations and employs conflict resolution strategy to resolve the conflicts. It could either result in committing the current transaction or aborting it based on the strategy implementation. Default conflict resolution strategy, i.e. SimpleConcurrentFileWritesConflictResolutionStrategy operates at the File Group level. If two concurrent writers write to two different file groups (same or different partition ) concurrently, both will succeed. But if both writes to overlapping file groups, the first one reaching completeness will succeed and the later one will get aborted.

If this is a little abstract, let me explain in terms of write operations.

If two concurrent writers updates data from two different file groups, both writes will succeed.
If two concurrent writers update data from overlapping file groups in Hudi, one of them will succeed while the other will fail.
If two concurrent writes inserts new data, with some probability they might succeed or fail. So, there is no guarantee that two concurrent writers ingesting the same exact batch of new data, will ensure a single copy with snapshot load once they succeed.
If two concurrent writers inserts diff set of data, with some probability they might succeed or fail. Due to small file handling both writers could end up writing new data to the same file group and hence. If there are no small files or if small files are disabled, or if both are routed to different set.
Deletes are similar to updates. Depending on whether concurrent writers delete the same set of data or different, the conflict resolution happens.

Note: Something to call out wrt Inserts. Just because Hudi enables small file handling by default, you may see some behavior. Some of the above may not be true if you disable small file handling.

Again to reiterate, the CR operates at the file group level. So, there is a chance that, within the same file group, writer 1 updates a subset of records and writer2 concurrently tries to update a different set of records, But the CR strategy will abort one of them since the granularity is at the file level. If you really want record level granularity, you can always implement a new RecordLevel conflict resolution strategy. But as you could imagine, this comes at a very high cost, since we might have to compare every record written by all concurrent writers during the conflict resolution stage.

In summary, if you have a use-case where different writers ingest different set of partitions, it’s easy to reason about the result of concurrent writes. Or if your regular writers are touching only new data whereas your backfill is ingesting old data, it will work. But if you have different writers writing to the same partition and no guarantee of non-overlapping writes, you could see write failures. But on retry, they are bound to succeed since likely the previous conflict was completed and on re-attempt there is none. I also have put up a WIP patch to add automatic retries to spark datasource writes. If there is interest from community users, we can probably look into adding it to the next release.

Lock Providers

We might have to configure lock providers for all writers trying to write concurrently to Hudi.

Lock provider reliability and guarantees are very critical to this entire multi-writer support. A good lock provider should hold the following properties:

A lock should be granted to only one accessor at any point in time even at very high concurrent scenarios.
Once the lock is released, next accessor waiting to acquire it should be able to acquire it reliably. Again, if there are multiple accessors waiting, only one of them should be granted.
Locks should be released if the acquirer dies/crashes either immediately or after some short timeout. If not, it could lead to deadlocks and some manual intervention is required which could be an operational burden.
Setting up and managing the Lock provider should also be considered when looking to choose the right one.

So, try to do your due diligence by reading more about the lock provider you might wanna choose. Also, do test it out w/ different failure scenarios and ensure the lock provider is stable, reliable and works as expected. If its feasible, you can also avoid lock providers altogether for multi-writing scenarios, if you find it hard to manage one.

Hudi supports numerous lock providers namely HMS based lock provider, Zookeeper based lock provider, Dynamo Db based lock provider, File System based lock provider and so on. We have also added an InProcessLock Provider to assist with locking within the same process but across different threads.

Lock providers are used to obtain a lock by all interesting entities as and when required. All different writers and table services should be configured to a single lock. This does not mean that, entire write operation will execute under a lock. This means that any service trying to acquire the lock will succeed only if no other entities have already acquired it. In general, most of the locks obtained are short lived, just that during entire lifecycle of a write, there could be many short lived locks. And all writes to Metadata table is also under the lock. For example, here is an illustration of two writers and a table service running concurrently for a given Hudi table.

As you could see, every process (ingestion or table service) acquires a very short lived lock at different points in time in its entire lifecyle. We could also see that, if the lock is acquired by some other entity, the requestor awaits and then proceeds to do the work under the critical section. Once done, the lock will be released.

Conclusion

Lakehouse tech in general has been evolving very rapidly in recent years. So, multi-writer support is no longer an advanced use-case :) We have a lot more interesting tech to build on multi-writers and concurrency control in this space. With 1.x, we are expecting to see more exciting features around this with Hudi.

Multi-writer support with Apache Hudi

Concurrency Control

Multi-writer support with Apache Hudi

Lock Providers

Conclusion

Written by Sivabalan Narayanan

No responses yet