tags:

views:

473

answers:

2

i have text such as

http://pastebin.com/H8zTbG54

we can say this text is set of rules splitted by "OR" at the end of lines

i need to put set of lines(rules) into buckets (bash array members) but i have character limit for each array member which is 1024

so each array member should contain set of rules but character count for each array member can not exceed 1024

suppose rule text like a OR b OR c OR d OR e OR f OR g OR h

output should be array member 1 = a OR b

array member 2 = c OR d OR e

array member 3 = f OR g

array member 4 = h

can anybody help me to do that

working on solaris 10 server

+1  A: 

This is not entirely trivial and would require a bit more clarification, but basically you split them initially by OR/AND (and maybe some other patterns, depending on your needs) and then recursively split again those chunks that are larger than 1024.

P.S. This seems one of those cases, when using a fully-fledged scripting language such as Perl, Python, PHP or any other would be able to achieve result more convieniently.

Eg. a basic thing in PHP (not sure if completely correct, haven't done PHP in a while), could go like this:

function splitByOr($input)
{
  $tokens = explode(" OR ",$input);
  foreach ($t in $tokens)
    if (strlen($t) > 1024)
         $t=splitByOr($t);
  return $tokens;
}
Gnudiff
this is not actually i am looking for :(
soField
A: 

None of the individual rules in the samplerule file exceed 148 characters in length - far less than the 1024 character limit. You don't say what should be done with the rules if they do exceed that limit.

This is a very simple Bash script that will split your sample on literal "\n" into and array called "rules". It skips lines that exceed 1024 characters and prints an error message:

#!/bin/bash
while read -r line
do
    (( count++ ))
    if (( ${#line} > 1024 ))
    then
        echo "Line length limit of 1024 characters exceeded: Length: ${#line} Line no.: $count"
        echo "$line"
        continue
    fi
    rules+=($line)
done < <(echo -e "$(<samplerule)")

This variation will truncate the line length without regard to the consequences:

#!/bin/bash
while read -r line
do
    rules+=(${line:0:1024})
done < <(echo -e "$(<samplerule)")

If the literal "\n" is not actually in the file and you need to use Bash arrays rather than coding this entirely in AWK, change the line in either version above that says this:

done < <(echo -e "$(<samplerule)")

to say this:

done < <(awk 'BEGIN {RS="OR"} {print $0,"OR"}' samplerule)
if [[ "${rules[${#rules[@]}-1]}" == "OR" ]]
then
    unset "rules[${#rules[@]}-1]"
fi

which will split the lines on the "OR".

Edit: Added a command to remove an extra "OR" at the end.

Dennis Williamson