Tuesday, May 19, 2009

Unpack just what you need

The other day I thought it would be interesting to emerge mtree. mtree is an application that generates a detailed file listing, but that's not the interesting part of today's post. I typed emerge mtree and I see 18 MB need to be downloaded. That sounds funny, should be a small utility. Then I saw it was downloading a big pkgsrc archive from BSD. Okay, that figures they distribute everything in one big archive.

So I let it download, now it's in the unpack phase... and still unpacking... and still unpacking. Hey, what's going on? It's already unpacked 500 MB (kudos bzip2!). I know disk space is cheap today and maybe blackbird is showing its age, but unpacking hundreds of megs for this utility seems really inefficient.

I figured I could modify the ebuild to unpack only the relevant directories of the archive. I was right - a small change to the ebuild (well, fix a typo and then try again) and it works... Hardly any disk space used and I also shaved some time off the emerge.

I don't know if this would be accepted as a bug but I decided to share my optimized ebuild anyway in this bug.

Wednesday, May 6, 2009

Magic SysRq Ehancement (3)

It would be nice to take the sysrq-visual and add it as a kernel patch. I thought it would be nice to see the leds blinking while the emergency sync is running, because many times you can't know for sure when it's done.

Looking at the code, there is a call emergency_sync(), which is performed asynchronously. It prints a debug message to when it's done, but there doesn't seem to be a nice and clean way to be notified of the completion. I need to figure out which is better, calling a different sync method (synchronous), or adding a notification mechanism to the sync operation (patching the kernel code in a pretty critical area.)

Oracle Listener Configuration Revisited

Configuring the oracle 10.2 listener correctly might not be as easy as it seems.
First off, Oracle has the ability to automagically configure everything without any configuration file. The defaults should usually suffice, but if you want to make some changes it can get a bit confusing.

When the oracle service starts (and more importantly, the listener) it says to where it is logging and where it is taking the listener parameters from (for example $ORACLE_HOME/network/admin/listener.ora). Now, if for any reason Oracle can't find or read this file it will happily continue with it's automatic configuration but will do so silently, while still printing the configuration file name!

This is where it stumped me. I made some changes to that file (changed tracing level) but it didn't make any effect. Later I found the reason was that the file was owned by root, while the listener was running as oracle and had no read permission to the file. It took a while to figure that out.

Anyway, back to the problems I was trying to solve. Namely, the occasional hangs at oracle startup/shutdown. I found out that oracle (again, actually the listener) always tries to resolve the machine hostname (probably from gethostname(), dunno) and connect to the result (in netstat I saw connections to ports 199 and 1521). If there is a bogus resolution (put something unreachable in /etc/hosts), the listener process will hang for the machine's defined connect timeout (usually 2 or 5 minutes).
It seems this happens regardless of what you put down in the (HOST = x.x.x) in the listener.ora file. I tried putting down 0.0.0.0, localhost and 127.0.0.1, but in all cases it still tried resolving the hostname.

The lesson here, boys and girls, is that you must make sure that your hostname always resolves to something reachable. E.g. in /etc/hosts you should have:

127.0.0.1 myhost

or

0.0.0.0 myhost

Fair enough. But there are a few additional things that can be done. First, the old oracle init script (very old, probably taken from RedHat 9) calls lsnrctl before calling dbstart and dbshut.
I noticed that these scripts already handle the lsnrctl, so I removed the additional calls from the init script. The hang is gone! Actually the hang is still happening, but since the oracle scripts start lsnrctl in the background, it doesn't block the oracle init script anymore.

Also a warning message in the listener.log after the timeout expires drew my suspicion to the ONS ("subscription to the node down event is still pending"). I found a couple of solutions online, one is to add SUBSCRIBE_FOR_NODE_DOWN_EVENT_{listener_name} = OFF. That took care of the warning message and some of the connections (the one to 1521 is still there) but it still hangs.

Finally I also commented out the cfgHostname call in the init script since all that crap is not really needed. It's easier to set the HOST in listener.ora to something constant such as 0.0.0.0 or localhost than changing the file and moving directories around everytime the hostname/ip changes.