julia - Is there a way to improve the parallel process described below to solve an ODE equation? - Stack Overflow

admin2025-04-29  2

I am trying to solve coupled ODEs, but I present a small part of the problem here. I have tried to solve for R along the entire grid. The grid size is too big to run conventional methods, hence I am trying to parallelise the same in Julia but I am still unable to run the code within my Mac environment. I use M1 with 8GB RAM and have tried multiple methods to solve the problem. Could you please let me know, given the below code, if I am doing anything wrong?


function f(R, v)
    numerator = R * exp(1 / (3 - 50 * R))
    denominator = (2 + (200 * R) / 3)^(4 / 9) * (2 - (100 * R) / 3)^(5 / 9)
    return v - (numerator / denominator)
end


function df(R, v)
    # Approximate the derivative using central difference method
    h = 1e-6
    return (f(R + h, v) - f(R - h, v)) / (2 * h)
end

# Newton's method implementation
function newton(f, df, R0, v; tol=1e-6, maxiter=1000)
    R = R0
    for i in 1:maxiter
        R_new = R - f(R, v) / df(R, v)
        if abs(R_new - R) < tol
            return R_new
        end
        R = R_new
    end
    error("Newton's method did not converge")
end

# Array to store final converged values of R
# This is the value along the intial hypersurface
R_0 = Float64[] 
v = LinRange(0.0000, 150, 110001)
# Initial conditions
let R_initial = 0.025  # Initial guess for R
    for v in v
    R_solution = newton(f, df, R_initial, v)
    R_initial = R_solution
    push!(R_0, R_solution) # Update R0 to the last converged value of R
    end
end
using PlotlyJS
plot(R_0, v)

function Initial_Function(R)
    H_0 = R^2 * (0.0525 - R)^2
end

H_initial = Float64[]
for R in R_0
    if R <= 0.0525
        H = Initial_Function(R)
        push!(H_initial, H)
    end
    if R > 0.0525
        H = 0
        push!(H_initial, H)
    end
end
using PlotlyJS
plot(R_0, H_initial)
# H, R, v are defined
# We run the RK4 scheme to calculate the value of R across the grid.

u = range(-100, 10, length = 110001)
v_range = range(0,150,length = 110001)
using Distributed
addprocs(4)
@everywhere using SharedArrays
R_grid = SharedArray{Float64}(110001,110001)
R_grid[1,:] = R_0

function dR_du(u, R)
    g = (R^2/6)*(u*R + (u^2 * R^2)/9)
    return (- 0.5 * g)
end

function rk4(u, R, du)
    k1 = dR_du(u, R)
    k2 = dR_du(u + 0.5*du, R+0.5*du*k1)
    k3 = dR_du(u + 0.5*du, R + 0.5*du*k2)
    k4 = dR_du(u + du, R+ du*k3)
    return R + (du/6) * (k1 + 2*k2 + 2*k3 + k4)
end
Threads.@threads for j in 1:110001 
    for i in 2:110001
        u_vals = u[i-1]
        du = u[i] - u_vals
        R_grid[j, i] = rk4(u_vals, R_grid[j, i-1], du)
    end
end

I expect to find the value of R along the entire u and v grid. You can see I have calculated the value of R_0 which is along u = -100 and v = 0. Runge-Kutta Scheme is used then to solve for the rest of the grid points after we have arrived at R_0 using the Newton scheme to solve a certain equation.

I have tried to setup up a parallel process but my laptop crashes each time.

I am trying to solve coupled ODEs, but I present a small part of the problem here. I have tried to solve for R along the entire grid. The grid size is too big to run conventional methods, hence I am trying to parallelise the same in Julia but I am still unable to run the code within my Mac environment. I use M1 with 8GB RAM and have tried multiple methods to solve the problem. Could you please let me know, given the below code, if I am doing anything wrong?


function f(R, v)
    numerator = R * exp(1 / (3 - 50 * R))
    denominator = (2 + (200 * R) / 3)^(4 / 9) * (2 - (100 * R) / 3)^(5 / 9)
    return v - (numerator / denominator)
end


function df(R, v)
    # Approximate the derivative using central difference method
    h = 1e-6
    return (f(R + h, v) - f(R - h, v)) / (2 * h)
end

# Newton's method implementation
function newton(f, df, R0, v; tol=1e-6, maxiter=1000)
    R = R0
    for i in 1:maxiter
        R_new = R - f(R, v) / df(R, v)
        if abs(R_new - R) < tol
            return R_new
        end
        R = R_new
    end
    error("Newton's method did not converge")
end

# Array to store final converged values of R
# This is the value along the intial hypersurface
R_0 = Float64[] 
v = LinRange(0.0000, 150, 110001)
# Initial conditions
let R_initial = 0.025  # Initial guess for R
    for v in v
    R_solution = newton(f, df, R_initial, v)
    R_initial = R_solution
    push!(R_0, R_solution) # Update R0 to the last converged value of R
    end
end
using PlotlyJS
plot(R_0, v)

function Initial_Function(R)
    H_0 = R^2 * (0.0525 - R)^2
end

H_initial = Float64[]
for R in R_0
    if R <= 0.0525
        H = Initial_Function(R)
        push!(H_initial, H)
    end
    if R > 0.0525
        H = 0
        push!(H_initial, H)
    end
end
using PlotlyJS
plot(R_0, H_initial)
# H, R, v are defined
# We run the RK4 scheme to calculate the value of R across the grid.

u = range(-100, 10, length = 110001)
v_range = range(0,150,length = 110001)
using Distributed
addprocs(4)
@everywhere using SharedArrays
R_grid = SharedArray{Float64}(110001,110001)
R_grid[1,:] = R_0

function dR_du(u, R)
    g = (R^2/6)*(u*R + (u^2 * R^2)/9)
    return (- 0.5 * g)
end

function rk4(u, R, du)
    k1 = dR_du(u, R)
    k2 = dR_du(u + 0.5*du, R+0.5*du*k1)
    k3 = dR_du(u + 0.5*du, R + 0.5*du*k2)
    k4 = dR_du(u + du, R+ du*k3)
    return R + (du/6) * (k1 + 2*k2 + 2*k3 + k4)
end
Threads.@threads for j in 1:110001 
    for i in 2:110001
        u_vals = u[i-1]
        du = u[i] - u_vals
        R_grid[j, i] = rk4(u_vals, R_grid[j, i-1], du)
    end
end

I expect to find the value of R along the entire u and v grid. You can see I have calculated the value of R_0 which is along u = -100 and v = 0. Runge-Kutta Scheme is used then to solve for the rest of the grid points after we have arrived at R_0 using the Newton scheme to solve a certain equation.

I have tried to setup up a parallel process but my laptop crashes each time.

Share Improve this question edited Jan 6 at 23:45 Rahul Rai asked Jan 6 at 23:28 Rahul RaiRahul Rai 11 bronze badge 2
  • 2 Your memory error is definitely because of the 110001 x 110001 array you are trying to initialize. A single 110001 array of floating point numbers is 0.88 Mb, thus a 110001 x 110001 array will require 96.8 Gb of memory...far beyond what most personal computers have. Parallel processes will not help this as all of the processes on your computer share the same memory. You need to re-think your code -- for example usually in Rk4 you do not need to keep track of every step, you just need to add the value to the previous step. This requires keeping just a few floating point numbers in memory... – kirklong Commented Jan 8 at 20:15
  • @kirklong I appreciate your comment and thanks for letting me even parallel processing will not be able to help here. The 110k grid points are required for refinement and accuracy needed for these equations. I can understand these equations are not setup for numerical processes as they were written in the 1970s. I will try to think of some other way to write the code. – Rahul Rai Commented Jan 18 at 12:38
Add a comment  | 

1 Answer 1

Reset to default 0

As mentioned in the comments your matrix is way to big.

You should look into mechanisms for out-of-core computation, which are designed for exactly this.

Classical ones include:

Blocking: instead of allocating the entire matrix, you can have each thread allocate a row, or whatever block size you can so that block&size*num_cores is less than your ram. On completion they put it to disk and free the memory. This is easy if the rows can be distributed easily, can en more difficult if all of them need access to all of it at the same time.

The second is memory mapping (mmap) the file, this is a Linux mechanism that gives you something that looks like a memory block, and you can then map to your array, but instead when you write to a[][], the OS caches it for a while and when memory is full writes it to disk. This approach can be complicated, as the mmap file is shared, which means you need synchronization control (locks) to ensure threads don't step on each other when writing and reading. With that the algorithm should definitely work, but, if you have access patterns that are very random, and a hard drive, this is a well know way to tank your performance as hdds don't perform well with random Io.

Other methods exist for this with different tradeoffs or requirements, and would not require any change to the algorithm.

转载请注明原文地址:http://anycun.com/QandA/1745939353a91398.html