python - random.sample() generating same sequence every time it is run - Stack Overflow

admin2025-04-17 13

I run this code on two machines:

from apscheduler.schedulers.asyncio import AsyncIOScheduler

# this part is simplified. It is only here to show how scheduler is basically initialized (for context)
scheduler = AsyncIOScheduler(timezone=utc)
scheduler.start()

# This is real code (with exception of the list)
@scheduler.scheduled_job('interval', minutes=1, misfire_grace_time=None)
async def do_dada_news():
    pages = [...] # shortened for better readability. It is longer than 20 elements
    print("---")
    for page in random.sample(pages, min(len(pages), 20)):
        print(page)

On both machines I get different outputs which are strange:

Local docker container: I get 20 different lines every time do_dada_news() runs.
Kubernetes cluster: I get the exact same 20 lines every time it is run.

I expect both machines to have the same behavior. How can this be such a different behavior?

To temporarily fix the problem, I now do random.seed(time.time()*10000) inside do_dada_news(). But that does not feel right.

I run this code on two machines:

from apscheduler.schedulers.asyncio import AsyncIOScheduler

# this part is simplified. It is only here to show how scheduler is basically initialized (for context)
scheduler = AsyncIOScheduler(timezone=utc)
scheduler.start()

# This is real code (with exception of the list)
@scheduler.scheduled_job('interval', minutes=1, misfire_grace_time=None)
async def do_dada_news():
    pages = [...] # shortened for better readability. It is longer than 20 elements
    print("---")
    for page in random.sample(pages, min(len(pages), 20)):
        print(page)

On both machines I get different outputs which are strange:

Local docker container: I get 20 different lines every time do_dada_news() runs.
Kubernetes cluster: I get the exact same 20 lines every time it is run.

I expect both machines to have the same behavior. How can this be such a different behavior?

To temporarily fix the problem, I now do random.seed(time.time()*10000) inside do_dada_news(). But that does not feel right.

Share Improve this question edited Feb 1 at 20:27 Péter Szilvási 2,1882 gold badges26 silver badges48 bronze badges asked Jan 30 at 22:04 FEZ 211 bronze badge

Seeding the RNG from the time is the normal way to get a different random sequence on each run. – Barmar Commented Jan 30 at 22:11
But why do consecutive calls to random.sample() create different results on one system and always the same on the other? Consecutive calls to any random function usually create different results without seeding inbetween – FEZ Commented Jan 30 at 22:20
Of course you get different results on consecutive calls, it wouldn't be random if you didn't. Seeding just sets the starting point. As for why you get different results on each system, it could be a difference between Docker and Kubernetes. – Barmar Commented Jan 30 at 23:23

Add a comment |

1 Answer 1

Sorted by: Reset to default 0

If no seed is provided for pythons built-in random then it will use os.urandom() to set the seed. Crucially, if the operating system (Linux and Windows both do this) has a built in source of randomness it will default to using that instead of just using the system time.

While you could mess with the Linux configuration settings, it would be much easier just to initialize a random seed with random.seed(int(time.time())**20%999979).

Linux in particular uses an entropy pool as the source of randomness, and there's a suggestion here that the issue might be ameliorable with an upgrade to 5.6. In general though the entropy pool will require a short delay in order to generate the randomness needed.

If I was very concerned about not having this issue in future, I would set up a queue and create a function that when called returns the top number from the queue, deques it, and then adds a new random number to the bottom of the queue based on the mod-product of the numbers still in it. That way you shouldn't should be at least guaranteed a source of randomness that you control.

转载请注明原文地址:http://anycun.com/QandA/1744888562a89047.html

python - random.sample() generating same sequence every time it is run - Stack Overflow

1 Answer 1

pythonrandomsample() generating same sequence every time it is runStack Overflow