-
-
Notifications
You must be signed in to change notification settings - Fork 82
Description
Requirements
ArcadeDB could be slow to count the records in a bucket or a type because it doesn't keep track of the number of records stored. This means when you execute a count the entire buckets are scanned.
The main reason why we don't save metadata about the number of records is that counting is considered a rare operation, mainly for a DBA, but rarely used at run-time.
There are some use cases where having a fast count could be useful, such as:
- Sequences: ArcadeDB doesn't support sequences, but it'd be easy to create a sequence-like script if the count would be immediate. Example:
insert into X set id = (select count(*) + 1 as newId from X)[0].newId - Studio: nice to have a count column next to the type and a total of records in the database
Implementation
The easiest and fastest implementation on top of my mind is to save the count in a new JSON file under the database directory with name database_count.json like this:
{
"Invoice_238278273": 100000,
"Invoice_238278999": 30000
}This file is saved at the database shutdown and loaded at startup. After loaded, the file must be deleted, so in case the database hasn't properly closed, the actual count with scanning must be done.
In RAM the EmbeddedDatabase instance will have a ConcurrentHashMap<String,Long> containing a map of bucket names (strings) with the counter of records (longs). This map will be updated only at transaction commit inside the exclusive lock to prevent concurrent updates.
Pseudo algorithm
public long Bucket.count() {
Long cachedCount = database.getBucketCount( name );
if( cachedCount != null )
return cachedCountl;
// SCAN THE BUCKET AND COUNT THE RECORDS
long total = countRecords();
database.updateBucketCount( name, total );
return total;
}