Showing posts with label filesystem. Show all posts
Showing posts with label filesystem. Show all posts

Wednesday, February 4, 2009

The Linux/Unix SysAdmin Covert File Storage Method Number 57

Hey there,

Today's post is the 57th installment of a two or three part series of posts that refuses to play by the rules. Look for Volume 3 and Issue 437 in the near future ;)

This post's trick (actually it's more of a gimmick - or a way any one of us has probably screwed up at some point in time ;) is fairly simple and, as is generally the case, inversely proportionate in complexity to the work I'm currently gettting paid to do so my wife, kids and 5 animals don't go hungry. I'm somewhat obsessive-compulsive and tend to forget to eat more often than I remember to. Metabolism, of course, is just another one of life's cruel jokes. I'm not gigantic, but my waistline implies a lavish and sedentary lifestyle I don't enjoy. Actually, my fitness-oriented friends tell me that I'd lose the spare-tire if I just ate more regularly. While this makes perfect sense, I generally don't ;) ...and then, every once in a while, I digress...

This little goof is pretty simple to pull off (assuming you're a sysadmin and/or have the access, privilege and opportunity to do it) and can be a life-saver. Technically, you shouldn't ever need to do this, but sometimes convenience trumps sanity...

You may recall a post we did a long long time ago, in a galaxy just down the block, regarding finding space hogs on multiple overlay-mounted filesystems. This little way to hide bits of information works relatively along the same lines. The one serious limitation it has is that, while you'll be secretly storing your information, you won't be hiding the actual disk space it consumes, so this method of packing away all the stuff you're not supposed to have on the company's production web server has its limitations.

For today, we'll use a /usr/local mount point that we have on a Solaris machine (independent of the /usr mount point) to demonstrate.

Step 1: Take a lay of the land. In order for this to work, you need to have enough space to stow away what you need to and, hopefully, enough space to make your addition barely noticeable. Our setup isn't bad, especially since the "actual" filesystem that's going to be impacted will be the /usr filesystem underneath /usr/local (If it were /usr/local, the change might be noticed since the filesystem is so "empty")

host # ls /usr/local
PKG-Get etc lib lost+found pkgs share
bin info libexec man sbin
host # df -k /usr/local
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c0t0d0s5 51650439 167184 50966751 1% /usr/local
host # df -k /usr
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c0t0d0s3 5233086 3550979 1629777 69% /usr


Step 2: Peel back the carpet. This is where you have to be quick, and is the essence of our little shell-game. First, we'll unmount the /usr/local filesystem, leaving us with this (df -k for /usr/local now shows that it's just a simple directory on /usr):

host # umount /usr/local
host # df -k /usr/local
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c0t0d0s3 5233086 3550979 1629777 69% /usr
host # ls /usr/local


Under normal circumstances, this directory should now be empty (unless someone else is doing the same thing as you, or just forgot to clean up before they created the separate /usr/local overlay mount).

Step 3: Sweep your non-work-related-stuff under the rug, or into the /usr/local directory, as it were:

host # mv non-work-related-stuff /usr/local/
host # ls /usr/local
non-work-related-stuff


Step 4: Make sure things look normal. See if your addition makes a noticeable difference in your df output (It probably won't unless you're going to try and sack-away your mp3 collection ;)

host # df -k /usr
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c0t0d0s3 5233086 3550981 1629775 69% /usr


Step 5: Pretend nothing happened. Once you're satisfied (which should be as soon as possible), remount /usr/local and verify that everything looks the same (excepting your modification of the /usr filesystem):

host # mount /usr/local
host # ls /usr/local
PKG-Get etc lib lost+found pkgs share
bin info libexec man sbin
host # df -k /usr/local
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c0t0d0s5 51650439 167184 50966751 1% /usr/local
host # df -k /usr
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c0t0d0s3 5233086 3550981 1629775 69% /usr


And you're all set. You can get your stuff back eventually (even sooner if you don't care what people think ;)

Of course, that's an old trick, but one we've never covered here. As this blog gets larger, we're going to try and devote a little less time to being original. Of course, we mean that in a good way :) Since this is essentially a knowledge-dumping-ground, meant for users and admins of all skill levels, every post can't be about some crazy way to do something you'd have to be insane to want to do in the first place!

Come on in; the mediocrity's fine ;)

Cheers,

, Mike




Discover the Free Ebook that shows you how to make 100% commissions on ClickBank!



Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.

Wednesday, December 10, 2008

Why DU And DF Display Different Values On Linux And Unix

Hey there,

Today we're going to look at a little something that is a fairly hot-running-water issue on most of the Linux and Unix boards lately (actually, probably always has been, but our research staff quit on us ;) This post will be similar in focus to our previous post on the differences between sar and vmstat with regards to free memory/swap reporting.

Today's post is similar in spirit, but not a replay of that previous memory/swap reporting issue. Today, we're going to take a look at two other commands that, seemingly, measure and display the same information, although with (sometimes) huge discrepancies in output. Those two commands are du and df.

The question you'll most often see (or, perhaps, have :) is something to the effect of "Why do my outputs from du and df differ? One says I'm using more disk space than the other. Which of them is correct?" Generally you'll find that df shows more disk spaced used than du does, but the case can sometimes be the opposite. It's very rare (unless you don't use your computer, and it doesn't use itself, at all ;) that the output from the two commands match. It's actually rare that they ever come close to matching. Generally, the longer a machine is up, the greater the rift between figures becomes.

The bad news: This is confusing and sometimes hard to communicate to others, even when you know why the situation exists :(

The good news: This is normal and can be explained; perhaps even simply :)

To set up the "typical" situation, we used a Solaris 10 box (although this issue is common on all proprietary Unix and Linux distro's). Below, the output from four commands executed in the /opt directory (NOTE: For du, the "-s" flag is used to print summary information for the entire object, rather than a listing of all of its parts (i.e. the whole directory rather than all the files and subdirectories)):

host # cd /opt <-- For df, look at the "used" column and for du, look at the only output there is ;)

host # df -h .
Filesystem size used avail capacity Mounted on
/dev/dsk/c0t0d0s6 3.9G 490M 3.4G 13% /opt
host # du -sh
486M .
host # df -k .
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c0t0d0s6 4130238 502180 3586756 13% /opt
host # du -sk
498068 .


Now, in two parts, the complicated reasons, explanation and summation of why this disparity exists, followed by the more customer/user-friendly version of the exact same thing :)

1. The convoluted, complicated and hard to pass-along explanation:

Reasons: While du and df report "approximately" the same information, they don't report "exactly" the same information. While they both report "about" the "same things" (some of the time), they don't measure those "things" using equal methods or metrics.

Explanation: These differences can occur for a number of reasons, including (but not limited to):

a) Overlay mounts, which can skew df output when it's run from a higher level directory (e.g. df -k /opt would not report the total space used on /opt if there were overlay mounts of /opt/something and /opt/anotherthing on the system - du will. If you actually need the total space used by /opt and everything underneath it, including the overlay mounts, df can do it, but it takes some plate-spinning ;).

b) Sometimes (This actually should never happen on any "recent" OS) hidden files and subdirectories could skew du's output.

c) Unlinked inodes can cause unexpected statistics with both df and du (For, instance, in the situation where your filesystem shows almost 100% free space but all the inodes are in use.)

d) You're using the "-h" flag instead of the "-k" flag for one or the other commands (or both). The "-h" (human readable) flag can sometimes make things seem worse (or more divergent) than they are. Since it tries to make the output more easily digestible, it will round the numbers you see, so you can't "really" be sure whether "5.1 GB" is closer to 5 GB or 5.2 GB. "-k" is slightly more likely to produce relatively equal results, as it reports the size in kb. It still does do rounding so that you only get straight up integers and no floating point results, but it's generally better to check with if your output from "-h" is very close, or even the same (since it might not be). If your implementation of df and/or du supports the "-v" flag (or something similar), that's even better since it reports in multiples of your system blocksize and is even more exact.

e) The fundamental way in which each of the commands work:

i) df:

df reports only on the mount point, or filesystem, level. So a df on /opt would produce the same results as a df on /opt/csw (assuming, as noted above, that they're both on the same partition):

host # df -k /opt
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c0t0d0s6 4130238 502180 3586756 13% /opt
host # df -k /opt/csw
Filesystem kbytes used avail capacity Mounted on
/dev/dsk/c0t0d0s6 4130238 502180 3586756 13% /opt


df gets most of its information from a filesystem's primary superblock (except in the odd instance that an alternate superblock is being used - although this would only happen in an "fsck" situation. After that, the information from the alternate superblock would be copied back to the primary, and all other superblocks). It takes this information at face value, which is to say that it does not question the information provided to it by the primary superblock. In this respect, df is a very fast tool for getting disk usage information (at the cost of reliability).

df will include open files (in memory, but not on disk), data/index files (used for data management - sometimes using approximately 2 to 5% of each filesystem) and unnamed files in its size calculation. This is one reason why, sometimes (although not very often), df can show a larger amount of disk used than du does.

df, as per above, will explicitly trust any errors in space calculation that may have occurred over time, since it trusts the primary superblock entirely. This means that if you've fsck'ed a filesystem (and not resynced or rebooted since), and/or have experienced any hard or soft errors on the device/disk housing the filesystem, you're measuring (again, assuming you haven't rebooted since they happened) and/or experience any possible corruption or inconsistencies in any filesystem-state records (like /etc/mnttab), your output will be commensurately incorrect.

df sometimes reports file sizes incorrectly, as it works on what we like to call the whole-enchilada-principle ;) Basically, if you take your filesystem's block size (for the /opt filesystem here - different filesystems may have different block sizes), which you can find by executing either of the following commands, both of them, or whatever works on your OS:

host # df -g /opt|grep "block size"|awk '{print $4}'
8192
host # fstyp -v /dev/rdsk/c0t0d0s6|grep "^bsize"|awk '{print $2}'
8192


...you'll be able to do this experiment on your own (we'll leave out the grisly details to save on space :). In our instance, we have a default block size of 8192 bits or 8 kb. Now, here's where it gets somewhat interesting ;) If you create a new file that's 1 kb in size and it writes to a new, or - depending - not fully used, block, df will report that file as being 8 kb in size, even though it's actually only 1 kb in size!

ii) du:

du reports at the "object" level rather than at the filesystem/mountpoint level, as df does. So, to repeat the example from above, if you run du on /opt and /opt/csw, you'll get different results. I find it easier to think of du as handling its measurement via a simple "object" model. The main partition would be the meta-object, while any subdirectory you may be running du against would be considered a sub-object of the filesystem meta-object (you'll note that the du size output for /opt/csw is, naturally, smaller than that for the entirety of /opt):

host # du -sk /opt
498068 /opt
host # du -sk /opt/csw
64065 /opt/csw


du gets its information at the time you execute it (unless you run it repeatedly in succession, where you'll notice a slight performance improvement). To test this, run du on a partition, then wait 5 minutes and run it again. It should take just as long as the first time (unless you've added lots of files since then). In this respect, du can be a very slow tool for getting disk usage information (with the benefit that your information will be more accurate). It should be noted that, because of the way it takes measure of most (see below) filesystem objects, it takes much longer for it to report the size of a billion 1 kb files than it does to report the size of one file of 1 billion kb size.

du does not count data/index files or open files (in memory, but not on disk).

du does not take into account any information "supplied" by the system (meaning the information, like from the superblock, as listed under the df section) and gets its information independent of whatever the system thinks is correct.

du does not rely on block size (see the whole-enchilada-principle in df's section above) to determine file size. So, if you have a default 8 kb block size on your filesystem, you create a new 1 kb file that writes to an empty 8 kb block, du will report that file as being 1 kb in size and "not" assume a minimum size of the filesystem block size. This is worth remembering, because it can cause a great deal of difference in the filesystem "usage" size between du and df (which would consider that 1 kb file an 8 kb file - 8 times larger than it actually is)!

du is more reliable if you want to know the state of your filesystem "right now." It doesn't count any data/index blocks .

Summation: If you are interested in knowing exactly how much of your filesystem is actually being used, du is a much more accurate tool for collecting and displaying this information. Note, however, that - since du does "on demand" filesystem size reporting, it is much slower than df. Also, du does not play as well with some system internal files and processes since it essentially ignores information reported by the primary superblock, system mount information tables, etc. Ultimately, the purpose for which you need to determine your filesystem's size (coupled with an understanding of the "Explanation" section for both utilities) is the best way to decide which utility to use in any given situation.

2. (Did you forget this was coming, too? ;) The simple, and easy to convey, explanation:

Reasons: df and du rely on different information to determine how much disk space is used on a filesystem.

Explanation: df and du report filesystem information differently, for very basic reasons:

df and du don't use the same yardsticks to measure filesystem size.

df:

df relies mostly on system information, supplied by various files and built-in reporting mechanisms that may, or may not, be correct at any given time.

du:

du relies on what it can "see" at the particular moment in time that you run it.

Summation: du is the better tool to use if you are interested in knowing how much space is actually being used on your filesystem "right now." df is great for "ballpark estimates" and is preferred if you need to know how big df thinks your filesystem is (so it will agree with other incorrect system statistics).

3. The really easy, and simple to blurt-out, explanation:

If du and df don't agree on what size your filesystem is, du is more correct than df is.

See; it's all very simple ;)

Cheers,

, Mike




Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.

Tuesday, November 25, 2008

Quick And Easy Local Filesystem Troubleshooting For SUSE Linux

Hey There,

Today we're going to take a look at some quick and easy ways to determine if you have a problem with your local filesystem on SUSE Linux (tested on 8.x and 9.x). Of course, we're assuming that you have some sort of an i/o wait issue and the users are blaming it on the local disk. While it's not always the case (i/o wait can occur because of CPU, memory and even network latency), it never hurts to be able to put out a fire when you need to. And, when the mob's pounding on your door with lit torches, that analogue is never more appropriate ;)

Just as in previous "quick troubleshooting" posts, like the week before last's on basic VCS troubleshooting, we'll be running through this with quick bullets. This won't be too in-depth, but it should cover the basics.

1. Figure out where you are and what OS you're on:

Generally, something as simple as:

host # uname -a

will get you the info you need. For instance, with SUSE Linux (and most others), you'll get output like:

Linux “hostname” kernel-version blah..blah Date Architecture... yada yada

The kernel version in that string is your best indicator. Generally, a kernel-version starting with 2.4.x will be fore SUSE 8.x and 2.6.x will be for SUSE 9.x. Of course, also avail yourselves of the, possibly available, /etc/release, /etc/issue, /etc/motd, /etc/issue.net files and others like them. It's important that you know what you're working with when you get started. Even if it doesn't matter now, it might later :)

2. Figure out how many local disks and/or volume groups you have active and running on your system:

Determine your server model, number of disks and volume groups. Since you're on SUSE, you may as well use the "hwinfo" command. I never know how much I'm going to need to know about the system when I first tackle a problem, so I'll generally dump it all into a file and then extract it from there as needed. See our really old post for a script that Lists out hardware information on SUSE Linux in a more pleasant format:

host # hwinfo >/var/tmp/hwinfo.out
host # grep system.product /var/tmp/hwinfo.out
system.product = 'ProLiant DL380 G4'


Now, I know what I'm working with. If this specific of a grep doesn't work for you, try "grep -i product" - you'll get a lot more information than you need, but your machine's model and number will be in there and much easier to find than if you looked through the entire output file.

Then, go ahead and check out /proc/partitions. This will give you the layout of your disk:

host # /proc # cat /proc/partitions
major minor #blocks name

104 0 35561280 cciss/c0d0
104 1 265041 cciss/c0d0p1
104 2 35294805 cciss/c0d0p2
104 16 35561280 cciss/c0d1
104 17 35559846 cciss/c0d1p1
253 0 6291456 dm-0
253 1 6291456 dm-1
253 2 2097152 dm-2
253 3 6291456 dm-3
253 4 10485760 dm-4
253 5 3145728 dm-5
253 6 2097152 dm-6



"cciss/c0d0" and "cciss/c0d1" show you that you have two disks (most probably mirrored, which we can infer from the dm-x output). Depending upon how your local disk is managed, you may see lines that indicate, clearly, that LVM is being used to manage the disk (because the lines contain hints like "lvma," "lvmb" and so forth ;)

58 0 6291456 lvma 0 0 0 0 0 0 0 0 0 0 0
58 1 6291456 lvmb 0 0 0 0 0 0 0 0 0 0 0


3. Check out your local filesystems and fix anything you find that's broken:

Although it's boring, and manual, it's a good idea do take the output of:

host # df -l

and compare that with the contents of your /etc/fstab. This will clear up any obvious errors like mounts that are supposed to be up but aren't or mounts that aren't supposed to up that are, etc... You can whittle down your output from /etc/fstab to show (mostly) only local filesystems by doing a reverse grep on the colon character (:) - This is generally found in remote mounts and almost never found in local filesystem listings.

host # grep -v ":" /etc/fstab

4. Keep hammering away at the obvious:

Check the USED% column in the output of your "df -l" command. If any filesystems are at 100%, some cleanup is in order. It may seem silly, but usually the simplest problems get missed when one too many managers begin breathing down your neck ;) Also, check the inodes column and ensure that those aren't all being used up either.

Mount any filesystems that are supposed to be mounted but aren't, and unmount any filesystems that are mounted but (according to /etc/fstab) shouldn't be). Someone will complain about the latter at some point (almost guaranteed), which will put you in a perfect position to request that it either be put in the /etc/fstab file or not mounted at all.

You're most likely to have an issue here with mounting the unmounted filesystem that's supposed to be mounted. If you try to mount and get an error that indicates the mountpoint can't be found in /etc/fstab or /etc/mnttab, the mount probably isn't listed in /etc/fstab or there is an issue with the syntax of that particular line (could even be a "ghost" control character). You should also check to make sure the mount point being referenced actually exists, although you should get an entirely different (and very self-explanatory) error message in the event that you have that problem.

If you still can't mount, after correcting any of these errors (of course, you could always avoid the previous step and mount from the command line using logical device names instead of paths from /etc/vfstab, but it's always nice to know that what you fix will probably stay fixed for a while ;), you may need to "fix" the disk. This will range in complexity from the very simple to the moderately un-simple ;) The simple (Note: If you're running ReiserFS, use reiserfsck instead of plain fsck for all the following examples. I'm just trying to save myself some typing):

host # umount /uselessFileSystem
host # fsck -y /uselessFileSystem
....
host # mount /


which, you may note, would be impossible to do (or, I should say, I'd highly recommend you DON'T do) on used-and-mounted filesystems or any special filesystems, like root "/" - In cases like that, if you need to fsck the filesystem, you should optimally do it when booted up off of a cdrom or, at the very least, in single user mode (although you still run a risk if you run fsck against a mounted root filesystem).

For the moderately un-simple, we'll assume a "managed file system," like one under LVM control. In this case you could check a volume that refuses to mount (assuming you tried most of the other stuff above and it didn't do you any good) by first scanning all of them (just in case):

host # vgscan
Reading all physical volumes. This may take a while...
Found volume group "usvol" using metadata type lvm2
Found volume group "themvol" using metadata type lvm2


If "usvol" (or any of them) is showing up as inactive, or is completely missing from your output, you can try the following:

host # vgchange –a y

to use the brute-force method of trying to activate all volume groups that are either missing or inactive. If this command gives you errors, or it doesn't and vgscan still gives you errors, you most likely have a hardware related problem. Time to walk over to the server room and check out the situation more closely. Look for amber lights on most servers. I've yet to work on one where "green" meant trouble ;)

If doing the above sorts you out and fixes you up, you just need to scan for logical volumes within the volume group, like so:

host # lvscan
ACTIVE '/dev/usvol/usfs02' [32.00 GB] inherit
....


And (is this starting to sound familiar or am I just repeating myself ;), if this gives you errors, try:

host # lvchange –a y

If the logical volume throws you into an error loop, or it doesn't complain but a repeated run of "lvscan" fails, you've got a problem outside the scope of this post. But, at least you know pretty much where it is!

If you manage to make it through the logical volume scan, and everything seems okay, you just need to remount the filesystem as you normally would. Of course, that could also fail... (Does the misery never end? ;)

At that point, give fsck (or reiserfsck) another shot and, if it doesn't do any good, you'll have to dig deeper and look at possible filesystem corruption so awful you may as well restore the server from a backup or server image (ghost).

And, that's that! Hopefully it wasn't too much or too little, and helps you out in one way or another :)

Cheers,

, Mike




Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.

Thursday, October 23, 2008

Removing Local Zone File Systems On Solaris 10 Unix

Hey There,

Today's post is a follow-up to our very recent posts on modifying existing local zone filesystems and creating new file systems in a local zone on Solaris 10. Today, we're moving to completion with this simple how-to on removing filesystems on local zones. We'll follow up with another simple post on how to create and destroy (all in one) local zones on Solaris 10. Even though I can write 500 hundred words on why I can't write less than 500 words, that post should be short ;) It almost "has" to be!

NOTE: This note is the same as in the last two posts. Nothing has changed ;) For those of you who haven't read those (or find this post all on its lonesome ;) here it is: When removing existing filesystems on a Solaris 10 local zone, all changes must be made to the filesystem from the global zone (assuming that you originally mounted it from there)!

1. First things first; you can't unmount a filesystem from within a local zone if it was created in the global zone. As per our notation in the post on creating filesystems on local zones, this does not apply to NFS, NAS or any other remote-or-network-mounted filesystems. These are mounted directly from the local zone and, as such, can be unmounted directly from the local zone.

One interesting thing, if you have your filesystem set up under the local zone's root, is that it won't show up on the global zone's "df -k" or "df -h," etc output. If you want to see the mount from the global zone (assuming you forgot where you put it or you just want to make sure), you can find out what filesystems are mounted in your local zone, from the global zone, (at least, this is one way) by running:

host # zoneadm list -vc <-- To list out the zones on your system from the global zone (DING is our example zone).
host # zonecfg -z DING info <-- This will list out more information than you need. Basically, look for the line at the top titled "zonepath:" You can then look under that zone path to see your local zone's root filesystem mounts. Note that you could just use "grep" to find all the lines in /etc/vfstab that include your local zone's name.

2. Back to cold harsh (but mercifully swift ;) reality. From the global zone, simply unmount your local zone filesystem, like so:

host # umount /zones/DING/root/DONG

3. Then, all that's left to do is to remove the local zone from the overall zone configuration (we're assuming that you've unmounted all of "your" local zone's filesystems before continuing and executing the steps here :) Again, this can all be done very simply by using the zonecfg command:

zonecfg –z DING
zonecfg:DING> remove fs dir=/DONG
zonecfg:DING:fs> commit
zonecfg:DING:fs> quit


4. Okay; step 3 included a white lie ;) If you don't want to get error messages or, worse, cause your system not to boot properly, it's recommended that you remove the local zone filesystem entries from /etc/vfstab in the global zone, as well.

And now (I kind of promise ;) you should be all set :) We'll post that creation/destruction post very soon, if not tomorrow. After that, we'll get back to Linux, AIX or any Operation System other than Solaris 10. We try to keep a fair spread given the sheer amount of distro's the term "Linux and Unix" actually encompasses. On the bright side, this blog's subject matter could easily be much more vague ;)

Cheers,

, Mike




Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.

Wednesday, October 22, 2008

Modifying Existing Local Zone File Systems On Solaris 10 Unix

Hey There,

Today's post is a follow-up to yesterday's post on creating new file systems in a local zone on Solaris 10. Today, we're moving on to a simple how-to on modifying existing local zones. As some of you may have noticed, with yesterday's post, I managed to blithely bypass the creation of the local zone we were working on. It's generally required that you create (or make) a local zone before you can add a filesystem to it ;) Although this topic is covered within our previous post on setting up Branded Linux zones in Solaris 10, we'll attack zone creation and initial setup at the end of this short series of posts, just so it has its own place and isn't presented in a, possibly, unfamiliar context.

Today's bit of work is a nifty trick because, although it seems that there should be some "special way" to modify an existing filesystem in a Solaris 10 zone, it's really about as simple, or as difficult, as you make it ( For this example, we'll be consistent with yesterday's post and have the "DING" local zone's "DONG" filesystem be of the type VxFS ).

NOTE: This caveat carries over from our previous post, as well. When modifying existing filesystems on a Solaris 10 local zone, all changes must be made to the filesystem from the global zone!

1. Log into your global zone again. This is generally just the "host" or what we used to think of (before zones or containers) as "the box" or "the computer." Then, check out the filesystem you want to modify, with "df -k|grep DONG" (or something similar), like so:

host # df -h|grep DONG
/dev/vx/dsk/DINGdg/DONGvol 6547824 24356 6523468 1% /DONG


2. Now, you can simply resize (shrink, grow, etc -- this part is heavily dependent on your filesystem and its capabilities) as you would normally. We've attacked the following procedure in much more depth in our previous post on adding and managing storage with Veritas Volume Manager, so we'll leave this example boiled down to the essentials. For our example, we want to add 10 Gb of space to the "DONG" filesystem. And, we can do so, just as simply as if we were modifying any other VxFS filesystem, like this:

host # vxresize -g DINGdg DONGvol +10g

3. And that's it! Now all we have to do is verify that the filesystem actually grew, by issuing the exact same command we did in step 1. It should be noted that we used "vxresize" specifically (rather than "vxassist -growby") so that we wouldn't have to run two commands to increase the capacity of the volume and grow the filesystem on it. Depending on your situation, you may need to do both separately, but you'll still be following standard protocol for whatever disk management tool you prefer to use, rather than modifying your method to suit Solaris:

host # df -h|grep DONG
/dev/vx/dsk/DINGdg/DONGvol 17033584 24356 17009228 1% /DONG


And, there you go. Simple as pie, and the work week is just about half over :)

Cheers,

, Mike




Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.

Tuesday, October 21, 2008

Creating New File Systems In Local Zones On Solaris 10

Hey There,

A while ago (back in May, 2008, I believe), we took a look at working with storage pools using ZFS on Solaris 10. As we know, ZFS stands for the Zettabyte File System and not the Zone File System (which some folks think it does. Not criticizing. It makes more sense than a lot of other things you might get from those initials ;). The point being, we never stopped to take a look at zones and lay down some simple procedures for working with them. Since this blog attempts to serve as some sort of a reference (hopefully folks come here and find some help or usable information/advice), today I thought we'd start taking a look at Solaris zones. Today, we'll start looking at Solaris 10 zones by looking at "local zones." There are many important distinctions to be made between local and global zones in Solaris, but, for now, let's just leave the meaning as vague as possible. Global = All Zones. Local = Specific Zones. Perhaps I could make it simpler to understand than that. Suggestions are, of course, always welcome. But, enough dwelling on that...

Here's one of (I'm sure) many ways to add a new filesystem to a Solaris 10 local zone. Note that all of this work needs to be done "from" the global zone.

1. Once you've logged onto your global zone (usually just the main host name, like you would log in to any other zone-less version of Solaris), create the volume and filesystem "on the global zone" using standard filesystem creation commands. For this instance, we'll assume VxFS (the Veritas File System), but you could use any filesystem Solaris supports. If you are using VxFS, Solaris Volume Manager or however you prefer to do your disk partitioning and management, it's always a good idea to try to include the name of your local zone in the disk group, etc. This way, in the future, you'll never have to wonder what belongs to which when you have 15 zones on the same box!

2. Add an entry for your new local zone in /etc/vfstab file on the global zone. You should use best practice and make the mount point under /zones/MyNewZone/root. So if you wanted to add a filesystem name "DONG" to the "DING" local zone, your mount point would be:

/zones/DING/root/DONG

3. Now, still on the global zone, modify the "DING" local zone:

host # zonecfg –z DING
zonecfg:DING> add fs
zonecfg:DING:fs> set dir=/DONG
zonecfg:DING:fs> set special=/dev/vx/dsk/DINGdg/DONGvol
zonecfg:DING:fs> set raw=/dev/vx/rdsk/DINGdg/DONGvol
zonecfg:DING:fs> set type=vxfs
zonecfg:DING:fs> end
zonecfg:DING:fs> commit
zonecfg:DING:fs> quit


4. Next, mount your new "DONG" filesystem while still in the global zone:

host # mount /zones/DING/root/DONG

5. Now, hop on out of the global zone (or log out, whichever works for you ;). Then, login to the "DING" local zone and you should be able to see your new filesystem:

/dev/vx/dsk/DINGdg/DONGvol 6547824 24356 6523468 1% /DONG


NOTE: The only exceptions to the "create your local zones in the global zone ONLY" rule, have to do with NFS, and the like, filesystems. When you are adding a filesystem of this type to your local zone, you need to add it while "in" the local zone.

Tomorrow, we'll take a look at maintaining/modifying your new "DING" local zone. Until then, enjoy the peace and quiet :)

Cheers,

, Mike




Please note that this blog accepts comments via email only. See our Mission And Policy Statement for further details.

Monday, June 9, 2008

Finding The Number Of Open File Descriptors Per Process On Linux And Unix

Hey There,

Today, we're going to take a look at a fairly simple process (no pun intended), but one that (perhaps) doesn't come up enough in our workaday environments that the answer comes to mind as obviously as it should. How does one find the number of open file descriptors being used by any given process?

The question is a bit of a trick, in and of itself, since some folks define "open file descriptors" as the number of files any given process has open at any given time. For our purposes, we'll be very strict, and make the (usually fairly large) distinction between "files open" and "open file descriptors."

Generally, the two easiest ways to find out how many "open files" a process has, at any given point in time, are to use the same utilities you'd use to find a process that's using a network port. On most Linux flavours, you can do this easily with lsof, and on most Unix flavours you can find it with a proc command, such as pfiles for Solaris.

This is where the difference in definitions makes a huge difference in outcome. Both pfiles and lsof report on information for "open files," rather than "open file descriptors," exclusively. So, if, for instance, we were running lsof on Linux against a simple shell process we might see output like this (all output dummied-up to a certain degree, to protect the innocent ;)

host # lsof -p 2034
CMD PID USER FD TYPE DEVICE SIZE NODE NAME
process 2034 user1 cwd DIR 3,5 4096 49430 /tmp/r (deleted)
process 2034 user1 rtd DIR 3,7 1024 2 /
process 2034 user1 txt REG 3,5 201840 49439 /tmp/r/process (deleted)
process 2034 user1 mem REG 3,7 340771 40255 /lib/ld-2.1.3.so
process 2034 user1 mem REG 3,7 4101836 40258 /lib/libc-2.1.3.so
process 2034 user1 0u CHR 136,9 29484 /dev/pts/9
process 2034 user1 1u CHR 136,9 29484 /dev/pts/9
process 2034 user1 2u CHR 136,9 29484 /dev/pts/9
process 2034 user1 4r CHR 5,0 29477 /dev/tty


However, if we check this same output by interrogating the /proc filesystem, we get much different results:

host # ls -l /proc/2034/fd/
total 0
lrwx------ 1 user1 user1 64 Jul 30 15:16 0 -> /dev/pts/9
lrwx------ 1 user1 user1 64 Jul 30 15:16 1 -> /dev/pts/9
lrwx------ 1 user1 user1 64 Jul 30 15:16 2 -> /dev/pts/9
lrwx------ 1 user1 user1 64 Jul 30 15:16 4 -> /dev/tty


So, we see that, although this one particular process has more than 4 "open files," it actually only has 4 "open file descriptors."

An easy way to iterate through each processes open file descriptors is to just run a simple shell loop, substituting your particular version of ps's arguments, like:

host # for x in `ps -ef| awk '{ print $2 }'`;do ls /proc/$x/fd;done

If you're only interested in the number of open file descriptors per process, you can shorten that output up even more:

host # for x in `ps -ef| awk '{ print $2 }'`;do ls /proc/$x/fd|wc -l;done

Here's to being able to over-answer that seemingly simple question in the future ;)

, Mike

Saturday, April 12, 2008

Patching Solaris 10 Zones - Global And Local Issues

Hello Again,

In a long overdue continuation of our coverage of Solaris 10 (as in our previous post on migrating between local and global zones) today, we're going to take a look at patching on Solaris 10, insofar as it relates to zones. There are a lot of questions going around about how to patch appropriately and what's permissible. This is completely understandable since, if you do it wrong, the consequences can be disastrous and irreversible (and none of us wants to stay up all night... working ;)

We'll start off with the concept of patching and zones. As far as patching is concerned, each zone actually has it's own patch (and package) database. This makes it possible for you to, theoretically, patch single zones on a host individually or patch all of them at once by patching the "global" zone.

There are a few things to keep in mind though, before you go ahead and apply your patches, either way...

1. Even if you have "umpteen" number of zones on a single machine, they all run off of the same kernel, so if you have to patch your kernel (or anything kernel-related) you need to do that from the "global" zone. If you apply a kernel patch on a local zone, and bring its kernel patch revision to a different level than the "global" zone's (and other zones') calamity will ensue (Let's see how many different ways I can write that something bad is going to happen ;)

2. "patchadd" will eventually drive you nuts, anyway, so the following proviso's should be no surprise ;).

a. If you run patchadd with -G in a "global" zone, any packages that have SUNW_PKG_ALLZONES set to true will cause the entire patch operation to fail.

b. If you run patchadd with -G in a "global" zone and "no" packages have SUNW_PKG_ALLZONES set to true, you should be able to install the patch to all zones from the "global" zone.

c. If you run patchadd without -G in a "global" zone, regardless of the setting of SUNW_PKG_ALLZONES, you can install the appropriate patches to any individual zones or the global zone (which will patch all the zones, by default).

d. If you run patchadd in a "local" zone, with or without -G specified, any packages that have SUNW_PKG_ALLZONES set to true will fail and not install, and if none have SUNW_PKG_ALLZONES set to true, everything should work in each "local" zone you apply the patch to.

3. Any software that can be installed at the "local" zone level, can also be patched independently at the "local" zone level, on whatever zones it was installed. This is true regardless of your zone type (whole root or sparse root).

4. The "-G" option to patchadd doesn't stand for "global zone," rather it stands for "the current zone." I can't think of a good mnemonic for this so "G"ood Luck ;) Of course, if you use this flag in the "global" zone, you can pretend it makes sense and stands for that. I do...

5. The "-t" option to patchadd is available for people who got used to the old patch error code numbers and know them by heart (like me. Well... most of them). Even on a system with zones, they make it so only a return code of "0" indicates absolute success. Any other number indicates a problem.

6. The "-R" option to patchadd (with zones enabled) cannot be used to reference the root filesystem in any zone other than the "global" zone. If you choose to ignore this warning and use it on a "local" zone anyway, side effects may include damage to the "local" zone's filesystem, damage to the "global" zone's filesystem, security problems with the "global" zone's filesystem, nauseau, fatigue, dry-eye, constipation, arthritis and random "night terrors." Contact your doctor if you have any trouble breathing calmly while patching "global" zones, as this may be the sign of a rare, but serious, side effect ;)

Happy patching. Here's to eventually being allowed to do it on the weekdays :)

, Mike




Saturday, December 8, 2007

Adding Storage using Volume Manager in a Veritas Cluster

Today, we're going to take a look at adding extra disk to a Veritas Cluster using Veritas Volume Manager. We'll assume for the purposes of this post, that you've been asked to add 35gb of space to a Veritas volume (filesystem: /veritas/a - volume name: veritas_a). Now we'll walk, step by step, through determining whether we have the disk available to add, and adding it if we do.

The first thing you'll want to do is determine what disk group(s) the filesystem/volume belongs to. Generally, you'll be told this when you're asked to add disk to an existing system, but we'll just assume the very worst ;) In any event, even if you are told, it's good practice to verify that the filesystem/volume does, in fact, belong to the disk group(s) the request is asking to have you augment.

Now, since we're dealing with a Veritas Cluster, if the disk group is shared between more than one server node, you can run:

vxdctl -c mode

and that will show you what the master node is. This server is where you'll want to execute all of your Veritas Volume Manager commands. If the cluster is inactive, vxdctl will tell you that, and you'll need to execute your commands on the server you're already on.

Now, either on the Cluster's master node, or the machine you're on if the cluster is inactive, get a list of all the disk groups, like so:

vxdg list
NAME STATE ID
rootdg enabled 1199919191.1025.host1
mydg enabled 8373777737.1314.host2


Since the user requested more filespace under /veritas/a, we'll need to see what disk group this filesystem/volume belongs to:

df -kl | grep veritas
/dev/vx/dsk/mydg/veritas_a 34598912 24632328 9903320 72% /veritas/a
/dev/vx/dsk/mydg/veritas_b 34598912 649112 33687984 2% /veritas/b
/dev/vx/dsk/mydg/veritas_c 35350080 17696 35056360 1% /veritas/c


We can see that, under /dev/vx/dsk, mydg is showing as the disk group for our /veritas/a filesystem, and veritas_a is showing as the volume. In cases like this, it's a time saver if you just set the default group once on the command line, so you don't have to use the "-g" option for every command you run:

vxdctl defaultdg mydg

Now you'll run some standard commands to make sure that your system can see all the disk available to it and make sure that the data is fresh:

devfsadm -Cv (note that the cluster will only update the volume manager configuration for cluster nodes that use the mydg diskgroup -- we don't have to specify that here, because we defined it as the default above)
vxdctl enable (this is basically the second part of the same update. The devfsadm command is Solaris specific while vxdctl is Veritas specific)

Now we need to check for free space, to see if we can accommodate the request with what we have available to us at present:

vxdg -g mydg list <-- Again "-g mydg" is optional because it's the default.
DISK DEVICE TAG OFFSET LENGTH FLAGS
DISK1 c5t27d27s2 c5t27d27 69198720 1501440 n
DISK2 c5t27d30s2 c5t27d30 69198720 1501440 n
DISK3 c5t27d28s2 c5t27d28 69198720 1501440 n


This output is, admittedly, a pain to read, because it expects you to be able to calculate disk offsets in your head. For this output, if the offset were zero and length was 70700160 and flags were '-' then that DISK would have 35 GB of free space available to us. Here all the listed storage has been partially used up. Hence, there is not enough storage space available!. We can also verify this by running

vxdg free (optionally: vxdg -g mydg free) to see if there is any free space left in the disk group. Either way, we're out of luck, it would seem.

So, now we're going to have to look into Veritas' disk configuration and see if there isn't a spare disk (hope) that we can pick up and import into the mydg diskgroup so that we can add the space (One thing to note in the output below is that all online disk shows its disk group membership even though it's imported locally):

vxdisk -o alldgs list <--- (abbreviated to keep this post bearable ;)
DEVICE TYPE DISK GROUP STATUS
c0t0d0s2 sliced - - error
c0t2d0s2 sliced disk02 rootdg online
c1t2d0s2 sliced disk01 rootdg online
c5t27d40s2 sliced - (mydg200) online
c5t27d41s2 sliced - (mydg2) online
c5t27d300s2 sliced - (mydg300) online
c5t20d99s2 sliced - - error


Luckily, it appears that disk c5t20d99s2 is available and doesn't belong to any existing disk groups! We'll use the command line to initialize this free disk and add it to our disk group (mydg):

vxdiskadd c5t20d99s2

You will be asked if you want to initialize the disk (or re-initialize it, if it's already initialized). Answer yes to either option, to keep things simple. Then it will ask you what disk group you would like to add the disk to. Answer: mydg. It will ask you if you want to use a default disk name. Depending on your situation, this may be okay, but we'll say no and add it as "DISK4" to keep with our naming standard. Then you will be prompted if you want to add the disk as a spare for the mydg disk group. Answer no to this, or you won't be able to use it for the additional storage you want! Answer no again, when it asks you if you want to encapsulate the disk, and then answer yes when it asks you if you want to initalize it. You can then choose to perform a surface analysis before adding it (It says it's highly recommended but my experience is it's generally not necessary and takes a very very very long time). Phew... that wasn't too much to info to have to feed to a single command ;)

Now, we'll want to grow the mydg disk group's volume and filesystem size. We've added the new disk, so we have more space, but the user can't see it or use it at this point. Now, common wisdom is to use the vxassist command. It's common, because it can be a great command to use (very simple), but I prefer to use "vxresize" because, unless your filesystem is vxfs, running vxassist to grow the volume (/veritas_a) can cause serious issues when growing the filesystem (if, for instance, you're using the Veritas Volume Manager but have your volume filesystem type set to Solaris "ufs")!

Calculate the maximum length of your new volume using vxassist (this part is okay - forget about what I just said above - that only applies to using vxassist to do the actually growing):

vxassist -g mydg (optional, if default is already set) maxsize <--- This will be the NEW_LENGTH argument for the next command.

vxresize -F (vxfs or ufs,usually) -g mydg (optional, still) -x (to make sure that the operation fails if the volume doesn't increase in length) veritas_a NEW_LENGTH <-- The last time I checked, - and another reason I prefer this method - this command grows your filesystem with the volume so you don't need to run any more steps after this.

Of course, you can keep tabs on all this, after you're done typing, using the vxtask command. At no point during this entire operation will you need to unmount your filesystem. I believe it's "recommended" (because something could always go wrong), but I've never found it to be necessary (technically, it's not supposed to fail, but Veritas has to cover itself, just in case)

Once the task is finished, your "df -k /veritas/a" output should show your new disk available and ready for use!

Cheers,

, Mike