tags:

views:

520

answers:

11

Hi,

How can I produce a regular expressions pattern that returns the filename from any one of these lines? (I will search one line at a time).

drwxrwxrwx  4 apache      apache       4096 Oct 14 09:40 .
drwxrwxrwx 11 apache      apache       4096 Oct 13 11:33 ..
-rwxrwxrwx  1 apache      apache      16507 Oct 17 10:16 .bash_history
-rwxrwxrwx  1 apache      apache         33 Sep  1 09:36 .bash_logout
-rwxrwxrwx  1 apache      apache        176 Sep  1 09:36 .bash_profile
-rwxrwxrwx  1 apache      apache        124 Sep  1 09:36 .bashrc
-rwxrwxrwx  1 apache      apache        515 Sep  1 09:36 .emacs
-rw-------  1 christoffer christoffer 11993 Sep 18 10:00 .mysql_history
drwxrwxrwx  3 apache      apache       4096 Sep  1 09:48 .subversion
-rwxrwxrwx  1 christoffer christoffer  9204 Oct 14 09:40 .viminfo
drwxrwxrwx 14 apache      apache       4096 Oct 12 07:39 www

The search is done using PHP, but I guess that doesn't really make a difference. :)

EDIT: The file listing is retrieved by a SSH connection and that is why I don't use a built in PHP-function. I need this full listing to see whether or not a file is actually a directory.

+5  A: 

The main question is... Why? Use readdir and stat instead.

<?php

$directory = './';
$dh = opendir($directory);

while (($file = readdir($dh)) !== false)
{
    $stat = stat($directory.$file);
    echo '<b>'.$directory.$file.':</b><br/>';
    var_dump($stat);
}
Matthew Scharley
The file listing is retrieved by a SSH connection and that is why I don't use a built in PHP-function. I need this full listing to see whether or not a file is actually a directory.
Christoffer
A: 
\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+(\S+)

Each string is built of 9 parts separated by whitespace. You are looking for the 9th part.

Igor Oks
9th part, which may have whitespaces in it.
Michael Krelin - hacker
A: 

Use glob('*') instead?

Ollie Saunders
I can understand the downvote for not answering the question, but half the answers here don't. What makes this one any worse?
Michael Krelin - hacker
It doesn't return hidden files.
Ollie Saunders
+2  A: 

I wouldn't use regex

Given a line, you could explode and pop the last element from the array

if (preg_match('/^d/', $line)) {
    $name = array_pop(explode(' ', $line));
}

EDIT: none of your examples have embedded spaces but a later comment suggests that it IS important to find filenames

pavium
Better use `split` or `preg_split`.
Gumbo
But if you want it to work with provided example only, why not do the `array('.','..','.bash_history', …)` instead? ;-)
Michael Krelin - hacker
Gumbo, split is a deprectead alias for explode and preg_split is slower.
Michael Krelin - hacker
BTW, IIRC, you can't `array_pop()` from non-variable since the `array_pop()` wants reference.
Michael Krelin - hacker
And it doesn't distinguish between files and directories, but the OP asked how to return *the filename from any one of these lines*, so it seemed like a general approach was needed.
pavium
@hacker: `return 4`?
Matthew Scharley
Why use preg_match to see if the first character is a 'd'? (!empty($line)) -)
Michael Krelin - hacker
Maybe subconsciously I was trying to satisfy the OP's reqest for a regex. It was typed in haste ...
pavium
;-)) sounds like a good excuse.
Michael Krelin - hacker
+1  A: 

There's a nicer way to do this in php5 using the spl and DirectoryIterator

$dir = '.';
foreach (new DirectoryIterator($dir) as $fileInfo) {
    echo $fileInfo->getFilename() . "<br>\n";
}
linead
Why would you ignore dot files when the OP asks about `ls -la` which explicitly shows them?
Matthew Scharley
Good point. Removed.
linead
I could never understand the use of `$dir` variable in cases like this ;-)
Michael Krelin - hacker
+1  A: 

Adding to what Matthew said, there's plenty of reasons to not parse ls output. You might have spaces in file names - or even delete characters. The format of the date part of the listing, especially for older files, is different, the size of the large files can break the listing.

If you must use regex, and you really have no spaces in file names, then just tie to the end of the line and get the non-spaces you find there

(\S+)$
martin clayton
I wouldn't assume there are no whitespaces in filenames.
Michael Krelin - hacker
A: 

Instead of trying to parse difficult output, how about generating some more helpful output in the first place. For example:

ssh user@machine 'cd /etc; for a in *; do [ -f "$a" ] && echo "$a"; done'

will generate a list of non-directory files in /etc on the remote machine. This should be much easier for you to parse.

Tim
+3  A: 

Try ls -a1F instead. That will list you all entries (-a), one per line (-1), with additional information about the file type appended to the name (-F).

You will then probably get something like this for your directory:

./
../
.bash_history
.bash_logout
.bash_profile
.bashrc
.emacs
.mysql_history
.subversion/
.viminfo
www/

The directories have a slash / at the end.

Gumbo
If you do use this approach, just be careful that you are aware of when `ls -F` will put *other* characters on the end of files, such as executables and so on.
Tim
I found that it can put * at the end... Is there a ref of what kind of chars it can put there?
Christoffer
-F Display a slash (`/') immediately after each pathname that is a directory, an asterisk (`*') after each that is executable, an at sign (`@') after each symbolic link, an equals sign (`=') after each socket, a percent sign (`%') after each whiteout, and a vertical bar (`|') after each that is a FIFO.
Michael Krelin - hacker
You can also run this via `grep '/$'` on the server side to save a bit of bandwidth and effort in your PHP page.
Matthew Scharley
Matthew, then `ls -da1 */` sounds even better way ;-)
Michael Krelin - hacker
@hacker: only problem is: that ignores dot folders (the `*/` is evaluated by the shell). A better version is `ls -da1 */ .*/`, but you might get errors from that if there's no dot folders.
Matthew Scharley
Matthew, oops, indeed ;-)
Michael Krelin - hacker
+3  A: 

If you are looking for directories, rather than parsing ls output, just use find.

find -maxdepth 1 -mindepth 1 -type d

This will list the directories like this:

./Documents
./.gnupg
./Download

You no longer have to parse the data to determine what is a directory and what isn't.

If you're actually wanting the files, and not the directories, you use -type f instead.

Your parsing of the ls output may very well break on symlinks...

retracile
A: 

Displays hidden files too, try it if you don't believe me.

 glob('{,.}*', GLOB_BRACE);
Ollie Saunders
A: 

Given your constraint of using the full directory listing I would do it this way:

ls -l | egrep '^d' | awk '{print $NF}'

Egrep command would search for the letter "d" at the beginning of the line. Awk by default uses spaces as seperators and the $NF will get you the last element. The only edge case I can think of where this wouldn't always work 100% of the time is when the file name would have spaces in it.

I would suggest using the find command:

find . -maxdepth 1 -type d | awk -F '/' '{print $NF}'

The find command above will get you only the files/directories in your current directory (b/c of -maxdepth 1 arg). The awk command will split the line using the '/' and will only retrieve the last token ($NF).

Because, the awk command

awk -F '/' '{print $NF}'

will get you the last element you can essentially use:

find . -maxdepth x -type d

where x is a number of your choice >= 1, you'll still get what you want, the filename and/or the directory name.

ilustreous