I’m experiencing some connection issues with RedisCluster.
I’m using Redis version 7.0 and connecting to RedisCluster(memoryStore GCP using Iam and disabling TLS) with the ioredis package in a Node.js environment.
In my development environment, I notice frequent connection closures. I suspect this is due to inactivity, so I’ve set keepAlive to 600000.
In production, some pods occasionally encounter this error: "WRONGPASS invalid username-password pair or user is disabled."
Additionally, in Cloud Functions, some instances report this error: "[ioredis] Unhandled error event: ClusterAllFailedError: Failed to refresh slots cache."
Do you think the issue could be related to the Redis-Cluster configuration? Any suggestions on how to resolve this?
Thanks
code implementation-
new Redis.Cluster(hosts, {
scaleReads: 'all', // Send write queries to masters and read queries to masters or slaves randomly.
redisOptions: {
password: token,
keepAlive: 600000, // 10 min in milliseconds
reconnectOnError: (err) => {
console.error('Reconnect on error:', err);
return true;
},
maxRetriesPerRequest: null // Infinite retries for requests let commands wait forever until the connection is alive again.
},
slotsRefreshTimeout: 5000,
clusterRetryStrategy: (times) => this.exponentialBackoffWithJitter(times)
})
I’m experiencing some connection issues with RedisCluster.
I’m using Redis version 7.0 and connecting to RedisCluster(memoryStore GCP using Iam and disabling TLS) with the ioredis package in a Node.js environment.
In my development environment, I notice frequent connection closures. I suspect this is due to inactivity, so I’ve set keepAlive to 600000.
In production, some pods occasionally encounter this error: "WRONGPASS invalid username-password pair or user is disabled."
Additionally, in Cloud Functions, some instances report this error: "[ioredis] Unhandled error event: ClusterAllFailedError: Failed to refresh slots cache."
Do you think the issue could be related to the Redis-Cluster configuration? Any suggestions on how to resolve this?
Thanks
code implementation-
new Redis.Cluster(hosts, {
scaleReads: 'all', // Send write queries to masters and read queries to masters or slaves randomly.
redisOptions: {
password: token,
keepAlive: 600000, // 10 min in milliseconds
reconnectOnError: (err) => {
console.error('Reconnect on error:', err);
return true;
},
maxRetriesPerRequest: null // Infinite retries for requests let commands wait forever until the connection is alive again.
},
slotsRefreshTimeout: 5000,
clusterRetryStrategy: (times) => this.exponentialBackoffWithJitter(times)
})
IAM auth token is a short live token, it is valid for one hour only (in GCP context).
Meaning, if one of your connections disconnected for any reason after an hour, you need to regenerate a new access token and use it as the password.
Even if you didn't have a disconnection, the authenticated connection is valid for 12 hours only, and should be re-authenticate.
In addition, GCP Redis cluster closes an idle connection after 10 minutes of inactivity.
So few things happen here:
Solutions:
For the idle issue, just send a ping once in a few minutes.
ioredis doesn't support OOB credentials providers, so you will need to set the new tokens to the client-connection object manually, the best solution for all the issues above is to manually schedule a replace each 50 min and re-auth each 10 hours (less than required for safety):
function renewToken() {
// Logic to generate or retrieve the new token
return 'yourNewToken'; // Replace with your actual token logic
}
async function updatePassword() {
const newToken = renewToken();
try {
// Update the Redis password
clusterClient.options.redisOptions.password = newToken;
console.log('Password updated successfully to:', newToken);
} catch (error) {
console.error('Error updating password:', error);
}
}
async function authenticate() {
const newToken = renewToken();
try {
// Authenticate with the new token
await clusterClient.auth(newToken);
console.log('Authenticated successfully with new token:', newToken);
} catch (error) {
console.error('Error during authentication:', error);
}
}
function schedulePasswordUpdates() {
// Initial password update immediately
updatePassword();
// Update password every 50 minutes
setInterval(() => {
updatePassword();
}, 3000000); // 50 minutes (3000000 milliseconds)
// Every 10 hours (600 minutes): update password and authenticate
setInterval(async () => {
await updatePassword();
await authenticate(); // Run authentication after updating the password
}, 60000000); // 10 hours (60000000 milliseconds)
}
// Start the periodic password update and authentication
schedulePasswordUpdates();
See more on automating renew token in GCP docs: https://cloud.google.com/memorystore/docs/cluster/manage-iam-auth#automate_access_token_retrieval.
Other options:
What to consider when choosing from the above:
So in general, I recommend the first option.
For the [ioredis] Unhandled error event: ClusterAllFailedError: Failed to refresh slots cache.
error, just increase the slotsRefreshTimeout
so Cloud Functions has enough time to complete.
Disclosure:
I'm from AWS Elasticache, and not from GCP, or using memory Store.
My knowledge about GCP memory Store and IAM comes from working together with GCP engineers on valkey-glide, and working currently on designing OOB IAM integration for the valkey-glide which will do all the above without the user need to set it all by itself both for GCP and AWS.
And because of the similarities of Elasticache IAM usage and memory store IAM usage.
I might miss something unique to GCP, but I don't think so, my work currently in the design including integration with both, and a nice amount of research on GCP IAM auth.
See GCP pointing to glide as the future client of valkey/redis-oss.
uncaughtException: WRONGPASS invalid username-password pair or user is disabled. ReplyError: WRONGPASS invalid username-password pair or user is disabled. at parseError (/app/node_modules/redis-parser/lib/parser.js:179:12) at parseType (/app/node_modules/redis-parser/lib/parser.js:302:14)
. the pods getting the error after some time, all pods have the same permission and in the same vpc. about slotrefresh I increased the number and I don't get that error anymore – Shahar Ben Ezra Commented Feb 3 at 11:42