First off, Oracle has the ability to automagically configure everything without any configuration file. The defaults should usually suffice, but if you want to make some changes it can get a bit confusing.
When the oracle service starts (and more importantly, the listener) it says to where it is logging and where it is taking the listener parameters from (for example $ORACLE_HOME/network/admin/listener.ora). Now, if for any reason Oracle can't find or read this file it will happily continue with it's automatic configuration but will do so silently, while still printing the configuration file name!
This is where it stumped me. I made some changes to that file (changed tracing level) but it didn't make any effect. Later I found the reason was that the file was owned by root, while the listener was running as oracle and had no read permission to the file. It took a while to figure that out.
Anyway, back to the problems I was trying to solve. Namely, the occasional hangs at oracle startup/shutdown. I found out that oracle (again, actually the listener) always tries to resolve the machine hostname (probably from gethostname(), dunno) and connect to the result (in netstat I saw connections to ports 199 and 1521). If there is a bogus resolution (put something unreachable in /etc/hosts), the listener process will hang for the machine's defined connect timeout (usually 2 or 5 minutes).
It seems this happens regardless of what you put down in the (HOST = x.x.x) in the listener.ora file. I tried putting down 0.0.0.0, localhost and 127.0.0.1, but in all cases it still tried resolving the hostname.
The lesson here, boys and girls, is that you must make sure that your hostname always resolves to something reachable. E.g. in /etc/hosts you should have:
127.0.0.1 myhost
or
0.0.0.0 myhost
Fair enough. But there are a few additional things that can be done. First, the old oracle init script (very old, probably taken from RedHat 9) calls lsnrctl before calling dbstart and dbshut.
I noticed that these scripts already handle the lsnrctl, so I removed the additional calls from the init script. The hang is gone! Actually the hang is still happening, but since the oracle scripts start lsnrctl in the background, it doesn't block the oracle init script anymore.
Also a warning message in the listener.log after the timeout expires drew my suspicion to the ONS ("subscription to the node down event is still pending"). I found a couple of solutions online, one is to add SUBSCRIBE_FOR_NODE_DOWN_EVENT_{listener_name} = OFF. That took care of the warning message and some of the connections (the one to 1521 is still there) but it still hangs.
Finally I also commented out the cfgHostname call in the init script since all that crap is not really needed. It's easier to set the HOST in listener.ora to something constant such as 0.0.0.0 or localhost than changing the file and moving directories around everytime the hostname/ip changes.
No comments:
Post a Comment