I run this code on two machines:
from apscheduler.schedulers.asyncio import AsyncIOScheduler
# this part is simplified. It is only here to show how scheduler is basically initialized (for context)
scheduler = AsyncIOScheduler(timezone=utc)
scheduler.start()
# This is real code (with exception of the list)
@scheduler.scheduled_job('interval', minutes=1, misfire_grace_time=None)
async def do_dada_news():
pages = [...] # shortened for better readability. It is longer than 20 elements
print("---")
for page in random.sample(pages, min(len(pages), 20)):
print(page)
On both machines I get different outputs which are strange:
do_dada_news()
runs.I expect both machines to have the same behavior. How can this be such a different behavior?
To temporarily fix the problem, I now do random.seed(time.time()*10000)
inside do_dada_news()
. But that does not feel right.
I run this code on two machines:
from apscheduler.schedulers.asyncio import AsyncIOScheduler
# this part is simplified. It is only here to show how scheduler is basically initialized (for context)
scheduler = AsyncIOScheduler(timezone=utc)
scheduler.start()
# This is real code (with exception of the list)
@scheduler.scheduled_job('interval', minutes=1, misfire_grace_time=None)
async def do_dada_news():
pages = [...] # shortened for better readability. It is longer than 20 elements
print("---")
for page in random.sample(pages, min(len(pages), 20)):
print(page)
On both machines I get different outputs which are strange:
do_dada_news()
runs.I expect both machines to have the same behavior. How can this be such a different behavior?
To temporarily fix the problem, I now do random.seed(time.time()*10000)
inside do_dada_news()
. But that does not feel right.
If no seed is provided for pythons built-in random then it will use os.urandom() to set the seed. Crucially, if the operating system (Linux and Windows both do this) has a built in source of randomness it will default to using that instead of just using the system time.
While you could mess with the Linux configuration settings, it would be much easier just to initialize a random seed with random.seed(int(time.time())**20%999979).
Linux in particular uses an entropy pool as the source of randomness, and there's a suggestion here that the issue might be ameliorable with an upgrade to 5.6. In general though the entropy pool will require a short delay in order to generate the randomness needed.
If I was very concerned about not having this issue in future, I would set up a queue and create a function that when called returns the top number from the queue, deques it, and then adds a new random number to the bottom of the queue based on the mod-product of the numbers still in it. That way you shouldn't should be at least guaranteed a source of randomness that you control.
random.sample()
create different results on one system and always the same on the other? Consecutive calls to any random function usually create different results without seeding inbetween – FEZ Commented Jan 30 at 22:20