views:

59

answers:

3

i'm looking to optimize a wordlist for the english language using sed or a similar linux application.. in order to do this i need to:

Remove lines containing anything except a-z, 0-9, or special characters

Remove urls - maybe detection of the "\" character

Remove lines over 16 characters long, and 4 characters or shorter. (5-16 chars)

Preferably in sed =)

Thanks!

A: 
perl -ne "print if /^[a-zA-Z0-9{other allowed characters here}]{4,16}$/"
Anon.
is there any way to do this in sed?
MKv4
A: 

Based on Anon.:

egrep '^[a-zA-Z0-9{other allowed characters here}]{4,16}$'
Emilio Silva
is there any way to do this in sed?
MKv4
A: 
sed -nr '/^[[:alnum:]]{5,16}$/p' words
  • -n means don't print lines by default
  • -r means use 'extended' regular expressions

The sed command is:

  • /.../ when we have a something that matches
    • ^...$ a whole line consisting of only
      • [...] the character class of:
        • [:alnum:] alphanumeric characters
      • {5,16} between 5 and 16 times
  • p then we print it
Porges
is there a way to include custom characters as well?
MKv4
Porges