Hi again, I am writing this post to point out one of the problems that we faced during our application development.
We have a client-server application.the communication between involved multiple client connected to single server. The operating system that we use is Redhat linux or ubuntu.
The problem was whenever the user wanted to kill a particular client the client application used to hang on
strange futex statement. Because of this the server also used to hang and we had to restart our application over and over again. This problem was frequent (like every day once or twice) but not that frequent.
When we looked into it using strace command (strace -p process-id) we found that our code used to hang on below call.
futex(0x5ac9df, FUTEX_WAIT, ....
This error was not reoccurring whenever we tried to kill a client and was in between.
But our application is real-time application and we wanted to solve this problem
as soon as possible cause it was very annoying and this problem bit us in past but
was resolved after re-installation of system but this time even after re-installation
it persisted.
So after a long time of goggling for the issue we struck upon a link:
http://www.fedoraforum.org/forum/showthread.php?t=187375
There it is written that there are only few functions that can be called from signal-
handler and it also gives the list of safe-functions.Meaning the system calls that you
can use in your signal handler safely.
Here is one more link which involves communication on same.
http://sta ckoverflow .com/quest ions/20132 02/i-need- a-list-of- async-sign al-safe-fu nctions-fr om-glibc
There it is written that which functioins you can safely call from signal handler and the reason for not calling any other functions.
The reason is "If an unsafe function, and handler calls and unsafe functions then the behaviour is undefined.
Its time to demonstrate you how we simulated the error and what is the workaround we used.
They program may not hang first few times or may hang in even first attempt.The reason for this is we are calling ctime.The reason is consider our main code as a thread when it enters the while loop it is prepared to call ctime and aquires lock over it and if during this time if we try to kill this program using kill -INT then handler code is invoked and in that again it tries to aquire lock over the ctime call.But it has already aquired lock over it and waiting for himself to release it (which will never happen).
But suppose during the normal execution in while loop and had done call to ctime and reached statement printf then it would have realeased lock over ctime and if at this point of time we would have killed using kill -INT then the program would have exited normally.Hence the problem.
Sample code causing the issue :
exit(0);
}
int main()
{
struct sigaction sa;
time_t t;
int counter = 0;
memset(&sa, 0, sizeof(sa));
sa.sa_handler = handler;
sigaction(SIGINT, &sa, NULL);
}
int main()
{
struct sigaction sa;
time_t t;
int counter = 0;
memset(&sa, 0, sizeof(sa));
sa.sa_handler = handler;
sigaction(SIGINT, &sa, NULL);
while(1) {
counter++;
time(&t);
r = ctime(&t);
printf("Loop %d\n",counter);
}
return 0;
}
counter++;
time(&t);
r = ctime(&t);
printf("Loop %d\n",counter);
}
return 0;
}
Now here is workaround the problem.See the POSIX says you must not call the unsafe functions from signal handlers.So there is no way we can call them and sit comfortably. The simple solution is to use some sort of flag and run the code in main till its false and when the signal is received set the flag true indicating that it has received a kill signal and it should try to exit after doing house cleaning stuff I mean freeing memory, closing file pointers etc.
Here is the way i tried to do it:
volatile char *r;
printf("Loop %d safe_check %d\n",counter,safe_check);
}
else {
house_cleaning_function();
}
}
return 0;
}
}
else {
house_cleaning_function();
}
}
return 0;
}
Here safe_check int variable is used as a flag to check to see if the program has received a kill signal and if it has I call callthisfunction
If you run the above code you will see that the safe_check variable should always print 0 in while loop but sometimes it's not the case. You understand the reason why?
Hope this post was useful to you..
No comments:
Post a Comment