tags:

views:

78

answers:

4

I am in the process of creating a guitar tab to rtttl (Ring Tone Text Transfer Language) converter in PHP. In order to prepare a guitar tab for rtttl conversion I first strip out all comments (comments noted by #- and ended with -#), I then have a few lines that set tempo, note the tunning and define multiple instruments (Tempo 120\nDefine Guitar 1\nDefine Bass 1, etc etc) which are stripped out of the tab and set aside for later use.

Now I essentially have nothing left except the guitar tabs. Each tab is prefixed with it's instrument name in conjunction with the instrument name noted prior.

Some times we have tabs for 2 separate instruments that are linked because they are to be played together, ie a Guitar and a Bass Guitar playing together.

Example 1, Standard Guitar Tab:

 |Guitar 1
e|--------------3-------------------3------------|
B|------------3---3---------------3---3----------|
G|----------0-------0-----------0-------0--------|
D|--------0-----------0-------0-----------0------|
A|------2---------------2---2---------------2----|
E|----3-------------------3-------------------3--|

Example 2, Conjunction Tab:

 |Guitar 1
e|--------------3-------------------3------------|
B|------------3---3---------------3---3----------|
G|----------0-------0-----------0-------0--------|
D|--------0-----------0-------0-----------0------|
A|------2---------------2---2---------------2----|
E|----3-------------------3-------------------3--|
 |
 |
 |Bass 1
G|----------0-------0-----------0-------0--------|
D|--------2-----------2-------2-----------2------|
A|------3---------------3---3---------------3----|
E|----3-------------------3-------------------3--|

I have considered other methods of identifying the tabs with no solid results. I am hoping that someone who does regular expressions could help me find a way to identify a single guitar tab and if possible also be able to match a tab with multiple instruments linked together.

Once the tabs are in an array I will go through them one line at a time and convert them into rtttl lines (exploded at each new line "\n").

I do not want to separate the guitar tabs in the document via explode "\n\n" or something similar because it does not identify the guitar tab, rather, it is identifying the space between the tabs - not on the tabs themselves.

I have been messing with this for about a week now and this is the only major hold up I have. Everything else is fairly simple.

As of current, I have tried many variations of the regex pattern. Here is one of the most recent test samples:

<?php
$t = "
 |Guitar 1
e|--------------3-------------------3------------|
B|------------3---3---------------3---3----------|
G|----------0-------0-----------0-------0--------|
D|--------0-----------0-------0-----------0------|
A|------2---------------2---2---------------2----|
E|----3-------------------3-------------------3--|

 |Guitar 1
e|--------------3-------------------3------------|
B|------------3---3---------------3---3----------|
G|----------0-------0-----------0-------0--------|
D|--------0-----------0-------0-----------0------|
A|------2---------------2---2---------------2----|
E|----3-------------------3-------------------3--|
 |
 |
 |Bass 1
G|----------0-------0-----------0-------0--------|
D|--------2-----------2-------2-----------2------|
A|------3---------------3---3---------------3----|
E|----3-------------------3-------------------3--|

";

preg_match_all("/^.*?(\\|).*?(\\|)/is",$t,$p);
print_r($p);

?>

It is also worth noting that inside the tabs, where the dashes and #'s are, you may also have any variation of letters, numbers and punctuation. The beginning of each line marks the tuning of each string with one of the following case insensitive: a,a#,b,c,c#,d,d#,e,f,f#,g or g.

Thanks in advance for help with this most difficult problem.

A: 

The ^ in your regex will prevent the /s switch from doing what you want.

Also, preg_match_all is going to return a lot of duplicate "matches" because you are using ( ) grouping. If you plan to use preg_match_all() on a file with multiple tabs, isolating real matches might be difficult with those duplicates.

simplemotives
Yes, I think that was one of the issues I was having without realizing it.I only understand the most basic of regex at the moment, but I am catching on. I have been able to match strings and what not in the past, but the escaped chars really threw me off!Thank you for clarifying!
John
+1  A: 
<?php
$t = <<<EOD
 |Guitar 1
e|--------------3-------------------3------------|
B|------------3---3---------------3---3----------|
G|----------0-------0-----------0-------0--------|
D|--------0-----------0-------0-----------0------|
A|------2---------------2---2---------------2----|
E|----3-------------------3-------------------3--|

 |Guitar 1
e|--------------3-------------------3------------|
B|------------3---3---------------3---3----------|
G|----------0-------0-----------0-------0--------|
D|--------0-----------0-------0-----------0------|
A|------2---------------2---2---------------2----|
E|----3-------------------3-------------------3--|
 |
 |
 |Bass 1
G|----------0-------0-----------0-------0--------|
D|--------2-----------2-------2-----------2------|
A|------3---------------3---3---------------3----|
E|----3-------------------3-------------------3--|

EOD;

$t = preg_replace('/\r\n?/', "\n", $t); //normalize line endings

$te = explode("\n", $t);

$out = array();
$cur_inst = "";
$trim = false;
$lastlines = array();
$i = 0;
foreach ($te as $line) {
    if (preg_match("/^\\s\\|(\\w+ \\d+)\$/", $line, $matches)) {
        if ($matches[1] == $cur_inst) {
            $trim = true;
        }
        else {
            $out[$i++] = $line;
            $trim = false;
            $lastline = array();
            $cur_inst = $matches[1];
        }
    }
    elseif (empty($line) || preg_match("/^\\s\\|\$/", $line)) {
        if (!preg_match("/^\\s\\|\$/", end($out)))
            $out[$i++] = $line;
    }
    elseif (preg_match("/^([a-zA-Z])\\|(.*)\$/", $line, $matches)) {
        if ($trim) {
            if (array_key_exists($matches[1], $lastlines)) {
                $oldi= $lastlines[$matches[1]];
                $out[$oldi] = rtrim($out[$oldi], "|") . $matches[2];
            }
            else {
                die("unexpected line: $line");
            }
        }
        else {
            $lastlines[$matches[1]] = $i;
            $out[$i++] = $matches[0];
        }
    }
    else {
        die("unexpected line: $line");
    }
}

$t = implode(PHP_EOL, $out);

echo $t;

gives

 |Guitar 1
e|--------------3-------------------3--------------------------3-------------------3------------|
B|------------3---3---------------3---3----------------------3---3---------------3---3----------|
G|----------0-------0-----------0-------0------------------0-------0-----------0-------0--------|
D|--------0-----------0-------0-----------0--------------0-----------0-------0-----------0------|
A|------2---------------2---2---------------2----------2---------------2---2---------------2----|
E|----3-------------------3-------------------3------3-------------------3-------------------3--|

 |
 |Bass 1
G|----------0-------0-----------0-------0--------|
D|--------2-----------2-------2-----------2------|
A|------3---------------3---3---------------3----|
E|----3-------------------3-------------------3--|

If you prefer, you can iterate over the $out array.

Artefacto
Right, this is what I will essentially be doing once I get each "section" of tab separated from the document. The problem is that there will more than likely be some variations between each tab "section" and separating the sections based on what between them may not be reliable. ie If someone has lyrics in between tabs it might break. (ok, bad example, but I think you understand what I am trying to point out).Thank you for the input. If I have no other success I may have to fall back on this.
John
@John Then change the last `else` to, instead of dying, ignoring the lines it doesn't recognize.
Artefacto
oh, I see now. I didn't notice the scroll bar before. This appears to be valid. I'll be testing it further when I get a chance. Thank you for all your help!
John
+1  A: 

I'm not entirely sure what exactly you mean, but if you want to separate tabs by instrument, try this:

^[^|\r\n]+\|([^|\r\n]+)$\r?\n  # match the line that contains the instrument name
                               # and capture this in backreference 1
(                              # capture the block of lines that follows
 (?:                           # repeat this for each line
  ^[^|\r\n]+                   # everything up to the first |
  \|                           # |
  [^|\r\n]+                    # everything up to the next |
  \|                           # |
  \r?\n                        # newline
 )+                            # at least once
)                              # end capture

In PHP:

preg_match_all('/^[^|\r\n]+\|([^|\r\n]+)$\r?\n((?:^[^|\r\n]+\|[^|\r\n]+\|\r?\n)+)/im', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
    # Matched text = $result[0][$i];
}

Each match will be of the form

 |Bass 1
G|----------0-------0-----------0-------0--------|
D|--------2-----------2-------2-----------2------|
A|------3---------------3---3---------------3----|
E|----3-------------------3-------------------3--|

and everything else between those blocks will be ignored.

Tim Pietzcker
This looks perfect! Thank you so much! I will test it out and report any success / failure. I am willing to share my code with anyone else who is interested as well once I finish. Thanks again!
John
Upon initial testing I found no results. I removed the first part of the regex string you gave me and it matches the actual tabs themselves perfectly:/((?:^[^|\r\n]+\|[^|\r\n]+\|\r?\n)+)/imBut the part before that (the part that identified the instrument) seamed to have been the culprit. When I ran the full regex string I receive 3 empty arrays as results.I have not tested in depth as of yet, but the other PHP example / comment below seams to be working!Thank you for all your help!
John
+2  A: 

I really like this question :-P. i had fun figuring this one out.
Here's what I got:

<?php
$t = <<<EOD
 |Guitar 1
e|--------------3-------------------3------------|
B|------------3---3---------------3---3----------|
G|----------0-------0-----------0-------0--------|
D|--------0-----------0-------0-----------0------|
A|------2---------------2---2---------------2----|
E|----3-------------------3-------------------3--|

 |Guitar 1
e|--------------3-------------------3------------|
B|------------3---3---------------3---3----------|
G|----------0-------0-----------0-------0--------|
D|--------0-----------0-------0-----------0------|
A|------2---------------2---2---------------2----|
E|----3-------------------3-------------------3--|
 |
 |
 |Bass 1
G|----------0-------0-----------0-------0--------|
D|--------2-----------2-------2-----------2------|
A|------3---------------3---3---------------3----|
E|----3-------------------3-------------------3--|

EOD;


GetTabs($t);

function GetTabs($tabString) {
    $tabs = array();
    $tabcount = 0;
    $instrumentcount = 0;
    $tabline = 0;

    $tabStringArray = explode("\n", $tabString);

    foreach ($tabStringArray as $tabStringRow) {

        if (preg_match  ('/^(?<snaretuningprefix>[bgdaeBGDAE#])+\|(?<tabline>[0-9-]+)\|/', $tabStringRow)) {
            //Matches a tab line
            //The tabline group can be expanded with characters for hammer on's, pull off's and whatnot
            $tabs[$tabcount][$instrumentcount-1][$tabline] = $tabStringRow;
            $tabline++;
            continue;
        }

        if (preg_match  ('/^\s\|\s+/', $tabStringRow, $matches)) {
            //Matches ' |'
            //Continuation of tab do nothing
            continue;
        }

        if (preg_match  ('/^\s\|(?<instrument>[A-z0-9\s]+)/', $tabStringRow, $matches)) {
            //Matches an instrument line ' |Guitar 1'

            $tabs[$tabcount][$instrumentcount]['instrumentname'] = $matches['instrument'];
            $instrumentcount++;
            $tabline = 0;
            continue;
        }

        if (preg_match  ('/^\s+/', $tabStringRow)) {
            //Matches empty line
            //new tab

            $tabcount++;
            $instrumentcount = 0;

            continue;
        }

    }

    print_r($tabs);
}


?>

The function is commented somewhat, it's not that hard to read I think.
this outputs:

Array
(
    [0] => Array
        (
            [0] => Array
                (
                    [instrumentname] => Guitar 1
                    [0] => e|--------------3-------------------3------------|
                    [1] => B|------------3---3---------------3---3----------|
                    [2] => G|----------0-------0-----------0-------0--------|
                    [3] => D|--------0-----------0-------0-----------0------|
                    [4] => A|------2---------------2---2---------------2----|
                    [5] => E|----3-------------------3-------------------3--|
                )

        )

    [1] => Array
        (
            [0] => Array
                (
                    [instrumentname] => Guitar 1
                    [0] => e|--------------3-------------------3------------|
                    [1] => B|------------3---3---------------3---3----------|
                    [2] => G|----------0-------0-----------0-------0--------|
                    [3] => D|--------0-----------0-------0-----------0------|
                    [4] => A|------2---------------2---2---------------2----|
                    [5] => E|----3-------------------3-------------------3--|
                )

            [1] => Array
                (
                    [instrumentname] => Bass 1
                    [0] => G|----------0-------0-----------0-------0--------|
                    [1] => D|--------2-----------2-------2-----------2------|
                    [2] => A|------3---------------3---3---------------3----|
                    [3] => E|----3-------------------3-------------------3--|
                )

        )

)
klennepette
Out of the box this is perfect! Though I don't understand the dynamic code quite yet :D The output result is absolutely what I need! Thank you for helping!!
John