r - Should memfrac in terra be adjusted for the number of workers in parallel? - Stack Overflow

admin2025-04-21  2

I have some code in terra that I'm running using the future/future.apply package but I'm running into some memory issues (Error in eval(expr, p) : std::bad_alloc). I'm using the future package's "multisession" plan and leaving terra's memfrac at the default (0.6). I'm wondering if memfrac should be adjusted based on the number of workers. Additionally, I've seen future.callr::callr may be useful but I'm not sure if this will be beneficial in my scenario or if memfrac would be handled differently for that plan. I have a small example of code similar to what I am running below.

library(terra)
#> terra 1.7.78
library(future.apply)
#> Loading required package: future
library(stringr)

# Slope function
slopefun<- function(x){
  fn<- paste0(str_remove(x, "\\.tif$"),"_slope.tif")
  terrain(rast(x), v="slope", filename=fn, overwrite=TRUE)
  return(fn)
}

# Create a dataset
r<- rast(volcano, 
         extent= ext(2667400, 2667400 + ncol(volcano)*10, 
                     6478700, 6478700 + nrow(volcano)*10), 
         crs = "EPSG:27200")

# List of file names
r_list<- list("test1.tif", "test2.tif", "test3.tif", "test4.tif")

# Write to those file names
writeRaster(r, filename = r_list[[1]], overwrite=TRUE)
writeRaster(r*2, filename = r_list[[2]], overwrite=TRUE)
writeRaster(r*3, filename = r_list[[3]], overwrite=TRUE)
writeRaster(r*4, filename = r_list[[4]], overwrite=TRUE)

nworkers<- 4

plan(strategy = "multisession", workers= nworkers) #Set up parallel
res_list<- future_lapply(r_list, FUN = slopefun)
plan(strategy = "sequential")

# This is more than 1. Do I need to divide this by nworkers?
# For example in this case would should memfrac be reduced to 0.25 or less since there are 4 workers?
terraOptions()$memfrac*nworkers
#> memfrac   : 0.6
#> tolerance : 0.1
#> verbose   : FALSE
#> todisk    : FALSE
#> tempdir   : C:/Users/socce/AppData/Local/Temp/Rtmp2puqqO
#> datatype  : FLT4S
#> memmin    : 1
#> progress  : 3
#> [1] 2.4

Created on 2025-01-22 with reprex v2.1.1

I have some code in terra that I'm running using the future/future.apply package but I'm running into some memory issues (Error in eval(expr, p) : std::bad_alloc). I'm using the future package's "multisession" plan and leaving terra's memfrac at the default (0.6). I'm wondering if memfrac should be adjusted based on the number of workers. Additionally, I've seen future.callr::callr may be useful but I'm not sure if this will be beneficial in my scenario or if memfrac would be handled differently for that plan. I have a small example of code similar to what I am running below.

library(terra)
#> terra 1.7.78
library(future.apply)
#> Loading required package: future
library(stringr)

# Slope function
slopefun<- function(x){
  fn<- paste0(str_remove(x, "\\.tif$"),"_slope.tif")
  terrain(rast(x), v="slope", filename=fn, overwrite=TRUE)
  return(fn)
}

# Create a dataset
r<- rast(volcano, 
         extent= ext(2667400, 2667400 + ncol(volcano)*10, 
                     6478700, 6478700 + nrow(volcano)*10), 
         crs = "EPSG:27200")

# List of file names
r_list<- list("test1.tif", "test2.tif", "test3.tif", "test4.tif")

# Write to those file names
writeRaster(r, filename = r_list[[1]], overwrite=TRUE)
writeRaster(r*2, filename = r_list[[2]], overwrite=TRUE)
writeRaster(r*3, filename = r_list[[3]], overwrite=TRUE)
writeRaster(r*4, filename = r_list[[4]], overwrite=TRUE)

nworkers<- 4

plan(strategy = "multisession", workers= nworkers) #Set up parallel
res_list<- future_lapply(r_list, FUN = slopefun)
plan(strategy = "sequential")

# This is more than 1. Do I need to divide this by nworkers?
# For example in this case would should memfrac be reduced to 0.25 or less since there are 4 workers?
terraOptions()$memfrac*nworkers
#> memfrac   : 0.6
#> tolerance : 0.1
#> verbose   : FALSE
#> todisk    : FALSE
#> tempdir   : C:/Users/socce/AppData/Local/Temp/Rtmp2puqqO
#> datatype  : FLT4S
#> memmin    : 1
#> progress  : 3
#> [1] 2.4

Created on 2025-01-22 with reprex v2.1.1

Share Improve this question edited Jan 26 at 16:57 marc_s 757k184 gold badges1.4k silver badges1.5k bronze badges asked Jan 22 at 20:29 ailichailich 1128 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

It seems to me that that your question, in more general terms, is whether, if you have n parallel processes with similar memory requirements that access the same physical RAM x, you need to account for the size of n to determine the maximum amount of RAM that each process can use.

I would say that each process should not use more RAM than x/n.

So in your example for four processes and total RAM capped at 60%, I would use memfrac=0.15 for each process, or achieve something similar with memmax=.

Either can be set with terraOptions or passed as an additional argument to most raster processing methods. If the additional arguments are used for another purpose, you can use wopt=list(memfrac=0.15)

memfrac sets a limit to the fraction of available (free) RAM that may be used. The problem with parallel processes is that if they all start at the same time, there may be a lot of RAM that seems available, but won't be, as it needs to be shared between the processes.

You can use terra::mem_info to investigate what happens.

library(terra)
r1 <- rast(res=1/60)
mem_info(r1)
r2 <- rast(res=1/60, vals=1)
mem_info(r1)
r3 <- rast(res=1/60, vals=1)
mem_info(r1)
转载请注明原文地址:http://anycun.com/QandA/1745229623a90515.html