views:

210

answers:

4

I have a script which connects to database and gets all records which statisfy the query. These record results are files present on a server, so now I have a text file which has all file names in it.

I want a script which would know:

  1. What is the size of each file in the output.txt file?
  2. What is the total size of all the files present in that text file?

Update: I would like to know how can I achieve my task using Perl programming language, any inputs would be highly appreciated.

Note: I do not have any specific language constraint, it could be either Perl or Python scripting language which I can run from the Unix prompt. Currently I am using the bash shell and have sh and py script. How can this be done?

My scripts:

#!/usr/bin/ksh
export ORACLE_HOME=database specific details
export PATH=$ORACLE_HOME/bin:path information
sqlplus database server information<<EOF
SET HEADING OFF
SET ECHO OFF
SET PAGESIZE 0
SET LINESIZE 1000
SPOOL output.txt
select * from my table_name;
SPOOL OFF
EOF

I know du -h would be the command which I should be using but I am not sure how should my script be, I have tried something in python. I am totally new to Python and it's my first time effort.

Here it is:

import os

folderpath='folder_path'
file=open('output file which has all listing of query result','r')

for line in file:
 filename=line.strip()
 filename=filename.replace(' ', '\ ')
 fullpath=folderpath+filename
# print (fullpath)
 os.system('du -h '+fullpath)

File names in the output text file for example are like: 007_009_Bond Is Here_009_Yippie.doc

Any guidance would be highly appreciated.

Update:

  1. How can I move all the files which are present in output.txt file to some other folder location using Perl ?
  2. After doing step1, how can I delete all the files which are present in output.txt file ?

Any suggestions would be highly appreciated.

A: 

You can do it in your shell script itself.

You have all the files names in your spooled file output.txt, all you have to add at the end of existing script is:

< output.txt  du -h

It will give size of each file and also a total at the end.

codaddict
@ codaddict: I did not understood the working on `<` ahead of the output.txt du -h command, can you explain more on this ?
Rachel
Its same as `du -h < output.txt`
codaddict
or `cat output.txt | du -h`
codaddict
where should I add this, beside spool command in first script ?
Rachel
After the line `EOF`
codaddict
But from the sql am just getting the file name but not the actual folder location where the file is, that I am passing in the python script and so not sure how it would work ?
Rachel
This doesn't work - `du` does not take piped arguments. There may be a way to make it work, but as written, it's non-functional.
RickF
A: 

You can use the Python skeleton that you've sketched out and add os.path.getsize(fullpath) to get the size of individual file.

For example, if you wanted a dictionary with the file name and size you could:

dict((f, os.path.getsize(f)) for f in file)

Keep in mind that the result from os.path.getsize(...) is in bytes so you'll have to convert it to get other units if you want.

In general os.path is a key module for manipulating files and paths.

dtlussier
I have added the script in update above question but still am getting standard errors.
Rachel
I have updated questions with the latest script according to your suggestions and the issues what am facing with it.
Rachel
I have updated answer for your response and the error which am getting using that approach.
Rachel
The list comprehension is superfluous in this case, a simple generator expression does fine and is more efficient. Just leave the square brackets.
lunaryorn
Can you elaborate on this, also I have update my questions with specific response to this answer, can you share your comments on it.
Rachel
@lunaryon - thanks. I've updated the response w/o the square brackets.
dtlussier
@Rachel - I don't see any updates in the question, are you still getting errors using `os.path.getsize(...)`?
dtlussier
+1  A: 

Eyeballing, you can make YOUR script work this way:

1) Delete the line filename=filename.replace(' ', '\ ') Escaping is more complicated than that, and you should just quote the full path or use a Python library to escape it based on the specific OS;

2) You are probably missing a delimiter between the path and the file name;

3) You need single quotes around the full path in the call to os.system.

This works for me:

#!/usr/bin/python
import os

folderpath='/Users/andrew/bin'
file=open('ft.txt','r')

for line in file:
    filename=line.strip()
    fullpath=folderpath+"/"+filename
    os.system('du -h '+"'"+fullpath+"'")

The file "ft.txt" has file names with no path and the path part is '/Users/andrew/bin'. Some of the files have names that would need to be escaped, but that is taken care of with the single quotes around the file name.

That will run du -h on each file in the .txt file, but does not give you the total. This is fairly easy in Perl or Python.

Here is a Python script (based on yours) to do that:

#!/usr/bin/python
import os

folderpath='/Users/andrew/bin/testdir'
file=open('/Users/andrew/bin/testdir/ft.txt','r')

blocks=0
i=0
template='%d total files in %d blocks using %d KB\n'

for line in file:
    i+=1
    filename=line.strip()
    fullpath=folderpath+"/"+filename
    if(os.path.exists(fullpath)):
        info=os.stat(fullpath)
        blocks+=info.st_blocks
        print `info.st_blocks`+"\t"+fullpath
    else:
        print '"'+fullpath+"'"+" not found"

print `blocks`+"\tTotal"
print " "+template % (i,blocks,blocks*512/1024)

Notice that you do not have to quote or escape the file name this time; Python does it for you. This calculates file sizes using allocation blocks; the same way that du does it. If I run du -ahc against the same files that I have listed in ft.txt I get the same number (well kinda; du reports it as 25M and I get the report as 24324 KB) but it reports the same number of blocks. (Side note: "blocks" are always assumed to be 512 bytes under Unix even though the actual block size on larger disc is always larger.)

Finally, you may want to consider making your script so that it can read a command line group of files rather than hard coding the file and the path in the script. Consider:

#!/usr/bin/python
import os, sys

total_blocks=0
total_files=0
template='%d total files in %d blocks using %d KB\n'

print
for arg in sys.argv[1:]: 
    print "processing: "+arg
    blocks=0
    i=0
    file=open(arg,'r')
    for line in file:
        abspath=os.path.abspath(arg)
        folderpath=os.path.dirname(abspath)
        i+=1
        filename=line.strip()
        fullpath=folderpath+"/"+filename
        if(os.path.exists(fullpath)):
           info=os.stat(fullpath)
           blocks+=info.st_blocks
           print `info.st_blocks`+"\t"+fullpath
        else:
           print '"'+fullpath+"'"+" not found"

    print "\t"+template % (i,blocks,blocks*512/1024)
    total_blocks+=blocks
    total_files+=i

print template % (total_files,total_blocks,total_blocks*512/1024)

You can then execute the script (after chmod +x [script_name].py) by ./script.py ft.txt and it will then use the path to the command line file as the assumed path to the files "ft.txt". You can process multiple files as well.

drewk
@drewk: I tried your approach, when I try to add files then I get values like `315904L`, not sure what `L` stands for ? Also, if I run first script then it gives me size as `86K` and `259K`, total of which if I do on Calc gives me `345K` and so not sure but we are getting different numbers on summation of two numbers in different ways, any thoughts on this ?
Rachel
Because your files are really big, huh? Let me change the script to use blocks instead of byte the way du does....
drewk
@drewk: I do not see any changes in the script, what do we mean by blocks here ?
Rachel
It has been changed now...
drewk
Suggestion: I am confused whose answer should I accept, RickF's or drewk's as both have solved my problem, any suggestions ?
Rachel
+1  A: 

In perl, the -s filetest operator is probaby what you want.

use strict;
use warnings;
use File::Copy;

my $folderpath = 'the_path';
my $destination = 'path/to/destination/directory';
open my $IN, '<', 'path/to/infile';
my $total;
while (<$IN>) {
    chomp;
    my $size = -s "$folderpath/$_";
    print "$_ => $size\n";
    $total += $size;
    move("$folderpath/$_", "$destination/$_") or die "Error when moving: $!";
}
print "Total => $total\n";

Note that -s gives size in bytes not blocks like du.

On further investigation, perl's -s is equivalent to du -b. You should probably read the man pages on your specific du to make sure that you are actually measuring what you intend to measure.

If you really want the du values, change the assignment to $size above to:

my ($size) = split(' ', `du "$folderpath/$_"`);
RickF
@RickF: Is this giving size of each and every file in the folder plus total size of the file in the folder ?
Rachel
@RickF: Is there a way I can get blocks size instead of byte size which I get using `du` ?
Rachel
On my Linux box, the block size is 1024 bytes (1kb), so you would just divide `$size` by that.
RickF
Actually, on further testing, `du -h` returns a minimum of '4.0k' for any file on my system, even where `ls` or perl `-s` shows a size smaller than that.
RickF
@RickF: So would it be wise to say that `perl -s` gives more accurate statistics as compared to `du -h`
Rachel
Actually, `du` always uses an assumption of 512 bytes for a block even if your block size is different. Part of the Single UNIX Specification http://en.wikipedia.org/wiki/Du_(Unix)
drewk
Suggestion: I am confused whose answer should I accept, RickF's or drewk's as both have solved my problem, any suggestions ?
Rachel
`du` will result in `KB` or `bytes` ?
Rachel
@RickF: Doing modification you suggested, Total gives me number which is way less than what I use to get without `du` which leads me to thinking that `du` is somehow giving me in `kb`, i know this can be wrong and please do correct me if am having wrong understanding of it.
Rachel
@Rachel: if you use the `-h` argument to `du` the magnitude of the output is automatically from bytes to KB, MB, TB, etc to maintain about 2 digits. So a file that is 300 bytes will report that; 3,890,000 byte file will report as 3.7M because each order of mag is 1024. Even still -- `du` is off on small files because an assumption of fixed 512 byte blocks is made.
drewk
@drewk: The Single Unix Spec is not how my Ubuntu box is configured by default. My default block size is 4kb. You can check yours with `tune2fs -l /dev/sda1 | grep Block`
RickF
@Rachel: `du` response will depend on your system specifics. On my Ubuntu system, plain `du` returns kb rounded up to the nearest 4k. `du -h` returns the same value converted to KB/MB/TB as appropriate. `du -b` returns the file size in bytes ignoring the 4k block size.
RickF
@RickF: Now how can I move all the files present in outlog.txt to another location(some folder) and then delete the files which are in current location ?
Rachel
@RickF: How would script look like in case we need to move all files in output.txt file to some other location and then delete files present in output.txt, I have updated question for reference.
Rachel
http://perldoc.perl.org/File/Copy.html (Edited answer to include move code)
RickF
@RickF: I have filename with special characters in file and I want to move that file but am not able to move files having special characters, is there a way in perl to handle with special characters, also I can not even write the file name here as encoded value appears for the file name, is there a way out for the situation ?
Rachel
Do the special characters show up in the printed output file? Are they Unicode? If so, http://perldoc.perl.org/perlunitut.html is a good place to start. Also, is there an actual error message?
RickF
@RickF: It is not that your block size is actually 512 B, it is that some Unix utilities reports block use as if it were assumed to be 512 B unless the env variable BLOCKSIZE has been set. type `echo $BLOCKSIZE` to check Most are not set. Try `du -ac` or `ls -s` on a file that size > 512 bytes but much smaller than 4kb then `du -ach` on the same file. A 621 B file is reported as 4.0K (correct given block size) and 8 blocks (incorrect with 4kb blocks but it assumes 512 B blocks). type `man ls` read about `-s`. If $BLOCKSIZE eq "", $blocks=$blocks_reported*512/$block_size
drewk
RickF
@RickF: I was just quoting my man page on 512 vs 1024. If you follow the BSD link, it states 512. My point is that file size based on blocks need to be adjusted. You stated that your block size from `tune2fs -l /dev/sda1 | grep Block` is 4k. If the file actually was using 4 blocks, it would be using 4 blocks * 4096 block size=16,384 bytes. Your 648B file fits in 1 block, so uses 4,096 bytes. You are correct that it is not safe to assume 512B assumption, but neither is it accurate to add up all the byte in a series of file for total disk use. It needs to be adjusted for actual block use.
drewk
@drewk: my `tune2fs` output led me to the conclusion that my `du` is not giving a count of blocks used, but the count of disk used in kb. Block size is 4k, `du` output for a tiny file is '4' or '4.0k', not '1'. GNU `du` defaults to 1024b block size, but seems aware of the system block size.
RickF
@RickF: Well that is definitely not what the documents say. I believe your conclusion is incorrect. The '4' you are getting from `du` just happens to be interchangeable in this case because the assumed block size is the same as kilobytes, but that will not always be true...
drewk
@drewk: What OS are you using? BSD systems may not be using the GNU version of `du`. I'm quite sure that my conclusions are correct for *my* situation: Ubuntu 9.10 w/ GNU `du`. Block size is a function of the file system, and Ext3 defaults to 4kb blocks.
RickF
@RickF: I have both OS X and Ubuntu 10.04. On Ubuntu, there is a feature of `du` that shows what I am talking about. On a file where the size > block size and size << 2 blocks, run these commands: `du [file]; du --block-size=1024 [file]; du --block-size=512 [file]; du --block-size=1 [file]` What do you conclude?
drewk
@RickF: I am have tried using `du` command which you suggested but it gives me some number, how can I interpret it, is it `kb, mb, gb or by` ? Also my os version is very old and so I do not have `du -h` option, is there a way I can get storage value in `MB` from du using the command `my ($size) = split(' ', `du "$folderpath/$_"`);` ?
Rachel
@drewk: On a file of 5655b (according to ls -l), `du` and `du --blocksize=1024` give "8", as expected since the docs say `du` defaults to blocksize=1024. 512 => 8, 1 => 8192. This all seems consistent with measuring disk usage (4k blocks). Do your results differ? http://www.gnu.org/software/coreutils/manual/html_node/du-invocation.html
RickF
@Rachel: It depends on your OS version - what are you running? As I've been discussing here with drewk, it depends. You'll need to figure out what your specific `du` is returning. Or specify it - `du --blocksize=1024 "$folderpath/$_"` and you'll know that it's in kbytes. blocksize is in bytes.
RickF
@RickF: Am getting `/usr/bin/du: illegal option -- -usage: du [-a][-d][-k][-r][-o|-s][-L] [file ...]` error message, also am using `OS Information uname :SunOS uname -v :Generic_117350-39`
Rachel
@RickF: yes, on Ubuntu, I get the same. I was understanding though that you said "du is not giving a count of blocks used, but a count of disk used in kb" This is what I was saying is incorrect. du gives blocks unless you use a switch to get a human readable form.
drewk
@Rachel: According to http://docs.sun.com/app/docs/doc/816-0210/6m6nb7m7t?a=view your version of `du` reports the count in 512b blocks by default. It doesn't support `--blocksize=X` but it does support `-k` which will give output in kb rather than blocks. Or you can take the default output and divide it by 2 (in the script) to get kb. 2 blocks = 1 kb.
RickF
@Rachael: just use Perl to get block size and blocks used in a call to stat. You will not have any shell issues. brian d foy gave you the code.
drewk