You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi all. I have ran a stan program on a cluster running on CentOS linux 7 (core), but stan just terminated without warning nor error messages.
The stan code is:
functions {
vector NRTL(vector x, vector T, vector p12, vector p21, real a, matrix map_tij, matrix map_tij_dT) {
int N = rows(x);
vector[N] t12 = map_tij * p12;
vector[N] t21 = map_tij * p21;
vector[N] dt12_dT = map_tij_dT * p12;
vector[N] dt21_dT = map_tij_dT * p21;
vector[N] at12 = a * t12;
vector[N] at21 = a * t21;
vector[N] G12 = exp(-at12);
vector[N] G21 = exp(-at21);
vector[N] term1 = ( ( (1-x) .* G12 .* (1 - at12) + x .* square(G12) ) ./ square((1-x) + x .* G12) ) .* dt12_dT;
vector[N] term2 = ( ( x .* G21 .* (1 - at21) + (1-x) .* square(G21) ) ./ square(x + (1-x) .* G21) ) .* dt21_dT;
return -8.314 * square(T) .* x .* (1-x) .* ( term1 + term2 );
}
real ps_like(array[] int N_slice, int start, int end, vector y, vector x, vector T, array[] matrix U_raw,
array[] matrix V_raw, vector v_ARD, vector v, vector scaling, real a, real error, array[] int N_points,
array[,] int Idx_known, array[] matrix mapping, vector var_data) {
real all_target = 0;
for (i in start:end) {
vector[4] p12_raw;
vector[4] p21_raw;
vector[N_points[i]] y_std = sqrt(var_data[sum(N_points[:i-1])+1:sum(N_points[:i])]+v[i]);
vector[N_points[i]] y_means;
for (j in 1:4) {
p12_raw[j] = dot_product(U_raw[j,:,Idx_known[i,1]] .* v_ARD, V_raw[j,:,Idx_known[i,2]]);
p21_raw[j] = dot_product(U_raw[j,:,Idx_known[i,2]] .* v_ARD, V_raw[j,:,Idx_known[i,1]]);
}
y_means = NRTL(x[sum(N_points[:i-1])+1:sum(N_points[:i])],
T[sum(N_points[:i-1])+1:sum(N_points[:i])],
p12_raw, p21_raw, a,
mapping[1][sum(N_points[:i-1])+1:sum(N_points[:i]),:],
mapping[2][sum(N_points[:i-1])+1:sum(N_points[:i]),:]);
all_target += normal_lpdf(y[sum(N_points[:i-1])+1:sum(N_points[:i])] | y_means, y_std);
}
return all_target;
}
}
data {
int N_known; // number of known data points
array[N_known] int N_points; // number of data points in each known data set
vector[sum(N_points)] x; // mole fraction
vector[sum(N_points)] T; // temperature
vector[sum(N_points)] y; // excess enthalpy
vector[4] scaling; // scaling factor for NRTL parameter
real a; // alpha value for NRTL model
int grainsize; // grainsize for parallelization
int N; // number of compounds
int D; // number of features
array[N_known,2] int Idx_known; // indices of known data points
vector<lower=0>[N_known] v; // known data-model variance
}
transformed data {
real error = 0.01; // error in the data (fraction of experimental data)
vector[sum(N_points)] var_data = square(error*y); // variance of the data
array[2] matrix[sum(N_points),4] mapping; // temperature mapping
array[N_known] int N_slice; // slice indices for parallelization
for (i in 1:N_known) {
N_slice[i] = i;
}
mapping[1] = append_col(append_col(append_col(rep_vector(1.0, sum(N_points)), T),
1.0 ./ T), log(T)); // mapping for tij
mapping[1] = mapping[1] .* rep_matrix(scaling', sum(N_points)); // scaling the mapping
mapping[2] = append_col(append_col(append_col(rep_vector(0.0, sum(N_points)), rep_vector(1.0, sum(N_points))),
-1.0 ./ square(T)), 1.0 ./ T); // mapping for dtij_dT
mapping[2] = mapping[2] .* rep_matrix(scaling', sum(N_points)); // scaling the mapping
}
parameters {
array[4] matrix[D,N] U_raw; // feature matrices U
array[4] matrix[D,N] V_raw; // feature matrices V
real<lower=0> scale; // scale dictating the strenght of ARD effect
vector<lower=0>[D] v_ARD; // ARD variances aranged in increasing order with lower bound zero
}
model {
// Gamma Prior for scale
profile("Scale Prior"){
scale ~ gamma(1e-9, 1e-9);
}
// ARD Exponential prior
profile("ARD Prior"){
v_ARD ~ exponential(scale);
}
// Priors for feature matrices
profile("Feature Matrices"){
for (i in 1:4) {
to_vector(U_raw[i]) ~ std_normal();
to_vector(V_raw[i]) ~ std_normal();
}
}
// Likelihood function
profile("Likelihood"){
target += reduce_sum(ps_like, N_slice, grainsize, y, x, T, U_raw,
V_raw, v_ARD, v, scaling, a, error, N_points,
Idx_known, mapping, var_data);
}
}
The model compiled and everything, and even did the prelimary gradient evaluations. The (relevant) sample python code is:
print('Step1: Sampling sort chain using random initialization')
fit = model.sample(data=f'{path}/data.json', output_dir=output_dir1,
refresh=1, iter_warmup=5000,
iter_sampling=1000, chains=chains, parallel_chains=chains,
threads_per_chain=threads_per_chain, max_treedepth=5,
metric='dense_e', save_profile=True, sig_figs=18,
show_console=True)
The output from the stan `.txt` file displays:
method = sample (Default)
sample
num_samples = 1000 (Default)
num_warmup = 5000
save_warmup = 0 (Default)
thin = 1 (Default)
adapt
engaged = 1 (Default)
gamma = 0.050000 (Default)
delta = 0.800000 (Default)
kappa = 0.750000 (Default)
t0 = 10.000000 (Default)
init_buffer = 75 (Default)
term_buffer = 50 (Default)
window = 25 (Default)
save_metric = 0 (Default)
algorithm = hmc (Default)
hmc
engine = nuts (Default)
nuts
max_depth = 5
metric = dense_e
metric_file = (Default)
stepsize = 1.000000 (Default)
stepsize_jitter = 0.000000 (Default)
num_chains = 8
id = 1 (Default)
data
file = Subsets/Alkane_Primary alcohol/Include_clusters_False/Variance_known_True/rank_1/data.json
init = 2 (Default)
random
seed = 96157
output
file = /mnt/lustre/users/ghermanus/Hybrid PMF/Subsets/Alkane_Primary alcohol/Include_clusters_False/Variance_known_True/rank_1/Step1/Hybrid_PMF-20240521214809.csv
diagnostic_file = (Default)
refresh = 1
sig_figs = 18
profile_file = /mnt/lustre/users/ghermanus/Hybrid PMF/Subsets/Alkane_Primary alcohol/Include_clusters_False/Variance_known_True/rank_1/Step1/Hybrid_PMF-20240521214809-profile.csv
save_cmdstan_config = 0 (Default)
num_threads = 24 (Default)
Gradient evaluation took 0.009228 seconds
1000 transitions using 10 leapfrog steps per transition would take 92.28 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 0.00178 seconds
1000 transitions using 10 leapfrog steps per transition would take 17.8 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 0.001501 seconds
1000 transitions using 10 leapfrog steps per transition would take 15.01 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 0.001735 seconds
1000 transitions using 10 leapfrog steps per transition would take 17.35 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 0.001593 seconds
1000 transitions using 10 leapfrog steps per transition would take 15.93 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 0.001275 seconds
1000 transitions using 10 leapfrog steps per transition would take 12.75 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 0.001483 seconds
1000 transitions using 10 leapfrog steps per transition would take 14.83 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 0.00134 seconds
1000 transitions using 10 leapfrog steps per transition would take 13.4 seconds.
Adjust your expectations accordingly!
And the output from `show_console=True` is:
Evaluating the following conditions for the Hybrid Model:
Include clusters: False
Variance known: True
Lower rank of feature matrices: 1
Step1: Sampling sort chain using random initialization
method = sample (Default)
sample
num_samples = 1000 (Default)
num_warmup = 5000
save_warmup = 0 (Default)
thin = 1 (Default)
adapt
engaged = 1 (Default)
gamma = 0.050000 (Default)
delta = 0.800000 (Default)
kappa = 0.750000 (Default)
t0 = 10.000000 (Default)
init_buffer = 75 (Default)
term_buffer = 50 (Default)
window = 25 (Default)
save_metric = 0 (Default)
algorithm = hmc (Default)
hmc
engine = nuts (Default)
nuts
max_depth = 5
metric = dense_e
metric_file = (Default)
stepsize = 1.000000 (Default)
stepsize_jitter = 0.000000 (Default)
num_chains = 8
id = 1 (Default)
data
file = Subsets/Alkane_Primary alcohol/Include_clusters_False/Variance_known_True/rank_1/data.json
init = 2 (Default)
random
seed = 96157
output
file = /mnt/lustre/users/ghermanus/Hybrid PMF/Subsets/Alkane_Primary alcohol/Include_clusters_False/Variance_known_True/rank_1/Step1/Hybrid_PMF-20240521214809.csv
diagnostic_file = (Default)
refresh = 1
sig_figs = 18
profile_file = /mnt/lustre/users/ghermanus/Hybrid PMF/Subsets/Alkane_Primary alcohol/Include_clusters_False/Variance_known_True/rank_1/Step1/Hybrid_PMF-20240521214809-profile.csv
save_cmdstan_config = 0 (Default)
num_threads = 24 (Default)
Gradient evaluation took 0.009228 seconds
1000 transitions using 10 leapfrog steps per transition would take 92.28 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 0.00178 seconds
1000 transitions using 10 leapfrog steps per transition would take 17.8 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 0.001501 seconds
1000 transitions using 10 leapfrog steps per transition would take 15.01 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 0.001735 seconds
1000 transitions using 10 leapfrog steps per transition would take 17.35 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 0.001593 seconds
1000 transitions using 10 leapfrog steps per transition would take 15.93 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 0.001275 seconds
1000 transitions using 10 leapfrog steps per transition would take 12.75 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 0.001483 seconds
1000 transitions using 10 leapfrog steps per transition would take 14.83 seconds.
Adjust your expectations accordingly!
Gradient evaluation took 0.00134 seconds
1000 transitions using 10 leapfrog steps per transition would take 13.4 seconds.
Adjust your expectations accordingly!
The standard error file displays:
21:47:31 - cmdstanpy - INFO - compiling stan file /home/ghermanus/lustre/tmphhck335m/tmpvaov4lbc.stan to exe file /mnt/lustre/users/ghermanus/Hybrid PMF/Subsets/Alkane_Primary alcohol/Include_clusters_False/Variance_known_True/rank_1/Hybrid_PMF
21:48:09 - cmdstanpy - INFO - compiled model executable: /mnt/lustre/users/ghermanus/Hybrid PMF/Subsets/Alkane_Primary alcohol/Include_clusters_False/Variance_known_True/rank_1/Hybrid_PMF
21:48:09 - cmdstanpy - INFO - CmdStan start processing
21:48:10 - cmdstanpy - INFO - CmdStan done processing
21:48:10 - cmdstanpy - ERROR - CmdStan error: terminated by signal 11 Unknown error -11
Traceback (most recent call last):
File "/mnt/lustre/users/ghermanus/Hybrid PMF/Hybrid_PMF.py", line 147, in <module>
fit = model.sample(data=f'{path}/data.json', output_dir=output_dir1,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ghermanus/cmdstan_condaforge/lib/python3.12/site-packages/cmdstanpy/model.py", line 1136, in sample
raise RuntimeError(msg)
RuntimeError: Error during sampling:
Command and output files:
RunSet: chains=8, chain_ids=[1, 2, 3, 4, 5, 6, 7, 8], num_processes=1
cmd (chain 1):
['/mnt/lustre/users/ghermanus/Hybrid PMF/Subsets/Alkane_Primary alcohol/Include_clusters_False/Variance_known_True/rank_1/Hybrid_PMF', 'id=1', 'random', 'seed=96157', 'data', 'file=Subsets/Alkane_Primary alcohol/Include_clusters_False/Variance_known_True/rank_1/data.json', 'output', 'file=/mnt/lustre/users/ghermanus/Hybrid PMF/Subsets/Alkane_Primary alcohol/Include_clusters_False/Variance_known_True/rank_1/Step1/Hybrid_PMF-20240521214809.csv', 'profile_file=/mnt/lustre/users/ghermanus/Hybrid PMF/Subsets/Alkane_Primary alcohol/Include_clusters_False/Variance_known_True/rank_1/Step1/Hybrid_PMF-20240521214809-profile.csv', 'refresh=1', 'sig_figs=18', 'method=sample', 'num_samples=1000', 'num_warmup=5000', 'algorithm=hmc', 'engine=nuts', 'max_depth=5', 'metric=dense_e', 'adapt', 'engaged=1', 'num_chains=8']
retcodes=[-11]
per-chain output files (showing chain 1 only):
csv_file:
/mnt/lustre/users/ghermanus/Hybrid PMF/Subsets/Alkane_Primary alcohol/Include_clusters_False/Variance_known_True/rank_1/Step1/Hybrid_PMF-20240521214809_1.csv
profile_file:
/mnt/lustre/users/ghermanus/Hybrid PMF/Subsets/Alkane_Primary alcohol/Include_clusters_False/Variance_known_True/rank_1/Step1/Hybrid_PMF-20240521214809-profile_1.csv
console_msgs (if any):
/mnt/lustre/users/ghermanus/Hybrid PMF/Subsets/Alkane_Primary alcohol/Include_clusters_False/Variance_known_True/rank_1/Step1/Hybrid_PMF-20240521214809-stdout.txt
Nothing shows that PBS terminated the job either. I currently have the same code running on the server but with a different values for D, the above case is when setting D=1. The command qstat -fx <JOBID> yield the comment
comment = Job run at Tue May 21 at 21:47 on (cnode0897:ncpus=24:mem=1572864
0kb) and finished
Indicating that none of the admins, neither myself terminated the job
Running the job with the same data (and same seed which failed) does not reproduce this error on my device. I have attached a json file with data used. data.json
Description
Hi all. I have ran a stan program on a cluster running on CentOS linux 7 (core), but stan just terminated without warning nor error messages.
The stan code is:
The model compiled and everything, and even did the prelimary gradient evaluations. The (relevant) sample python code is:
The output from the stan `.txt` file displays:
And the output from `show_console=True` is:
The standard error file displays:
Nothing shows that PBS terminated the job either. I currently have the same code running on the server but with a different values for
D
, the above case is when settingD=1
. The commandqstat -fx <JOBID>
yield the commentIndicating that none of the admins, neither myself terminated the job
Running the job with the same data (and same seed which failed) does not reproduce this error on my device. I have attached a json file with data used.
data.json
Current Version:
cluster:
cmdstan 2.34.0 hff4ab46_0 conda-forge
cmdstanpy 1.2.2 pyhd8ed1ab_0 conda-forge
The text was updated successfully, but these errors were encountered: