I have a collection with 1.7m records and unfortunately some automated processes are frequently creating exact duplicate records. I have two Spring batch jobs to #1 report on them, and #2 actually remove the "newest" dupe record.
but recently Mongodb has been giving TransactionExceededLifetimeLimitSeconds error. Is it because I'm paging/slicing the output? I dont want to have to process the results all at once, it might be in the 100k range of records. Ive tried from slice size of 10 upto 512 to no avail.
In CollectionRepository:
@Aggregation(pipeline = { "{$group: {_id: $record, dupeid: { $addToSet: '$_id'},
'count':{$sum:1,},},}","{$match: {'count': {'$gt': 1)))" })
@Meta(allowDiskUse = true)
Slice<Collection> findDuplicates(Pageable page);
In Tasklet:
Slice<Collection> slice = collectionRepository.findDuplicates(PageRequest.of(0,64));
List<Collection> dupeList = slice.getContent();
dupeList.stream().forEach(dupe -> log.info(dupe));
vol = dupeList.size();
while (slice.hasNext()) {
slice=collectionRepository.findDuplicates(slice.nextPageable());
dupeList = slice.getContent();
dupeList.stream().forEach(dupe -> log.info(dupe));
vol += dupeList.size();
}
Error:
Command failed with error 290 TransactionExceededLifetimeLimitSeconds PlanExecutor
error during aggregation :: caused by :: operation was interrupted because the
transaction exceeded the configured 'transactionLifetimeLimitSeconds' on mongoserver