My top-level is a NoC consists of a Network-On-Chip with a grid of 15*15 nodes (Router+PE).
Been trying to simulate it in different machines/configurations but kept stopping at different times of the simulation. Can't see what causes the problem and hence I am stack.
1) Running only SystemC / C++ executable (output of the compiler) in a LSF cluster. Simulation runs normally with expected output then it stops at the 55000ish cycle (cycle accurate model) with this error message:
noc_exe: ../../../../src/sysc/kernel/sc_cor_qt.cpp:107: virtual void sc_core::sc_cor_qt::stack_protect(bool): Assertion `ret == 0' failed. /home/#######/.lsbatch/1438183854.772657: line 8: 15547 Aborted (core dumped) ./noc_exe
2) Running with Cadence irun command in a LSF cluster. The simulation was heaps of times slower but managed to reach 100 000ish cycles before generating this error message:
Simulation interrupted at 1025080 NS + 0 ncsim> ncsim: *W,NCTERM: Simulation received SIGTERM signal from process 22268, user id 0 (/env/seki/app/lsf/8.0/etc/sbatchd). make: *** [run] Error 15
I have investigated the error NCTERM with nchelp and got:
nchelp: 14.20-s010: (c) Copyright 1995-2015 Cadence Design Systems, Inc. ncsim/NCTERM = A SIGTERM signal was received by the running simulation. This signal may have been issued due to various reasons: * sent by the user using the kill command * machine on which the job was running went down * sent by LSF (Load Sharing Facility) to enforce certain user specified job control limits (memory, CPU, swap, etc.)
I had a little doubt that the stack size might not be enough for my threads. The outputs from 1) and 2) took place even after I tried to increase the stack size.
In 2), it is enough to add the
-SC_THREAD_STACKSIZE 0x80000
switch to irun command.
In 1), I had to go to every registration of thread in my constructors and append it with another line:
SC_THREAD(controller_thread); set_stack_size(NOC_THREAD_STACK_SIZE); // 0x80000
I'd appreciate any prompt reply
When I run the same test with a smaller number of nodes, the issue does not occur.