ansaurus

Question

not autolinking all-numeric twitter hashtags in perl?

Answer 1

+1 A:

Your regexp wouldn't capture anchors that contain more than one letter separated by numbers, e.g. #a0a:

my @anchors = ($tweet =~ m/#(\w+)/g);
foreach my $anchor (@anchors)
{
    next unless $anchor =~ m/[a-z]/i;
    $tweet =~ s{#$anchor}{<a href="http://twitter.com/search?q=%23$anchor"&gt;#$anchor&lt;/a&gt;}g;
}

e.g. consider my $tweet = "hello #123 hello #abc1a hello #a0a";

Your code produces hello #123 hello <a href="http://twitter.com/search?q=%23abc1">#abc1</a>a hello <a href="http://twitter.com/search?q=%23a9">#a0</a>a

and mine produces hello #123 hello <a href="http://twitter.com/search?q=%23abc1a">#abc1a</a> hello <a href="http://twitter.com/search?q=%23a9a">#a0a</a>

Ether 2010-04-22 16:37:37

Answer 2

A:

I didn't realize how complex twitter text is! http://engineering.twitter.com/2010/02/introducing-open-source-twitter-text.html

I found these hashtag-related lines in the Ruby library that's linked in that blog post. Don't know much Ruby -- there may be more...

# Latin accented characters (subtracted 0xD7 from the range, it's a confusable multiplication sign. Looks like "x")
LATIN_ACCENTS = [(0xc0..0xd6).to_a, (0xd8..0xf6).to_a, (0xf8..0xff).to_a].flatten.pack('U*').freeze
REGEXEN[:latin_accents] = /[#{LATIN_ACCENTS}]+/o

# Characters considered valid in a hashtag but not at the beginning, where only a-z and 0-9 are valid.
HASHTAG_CHARACTERS = /[a-z0-9_#{LATIN_ACCENTS}]/io
REGEXEN[:auto_link_hashtags] = /(^|[^0-9A-Z&\/]+)(#|＃)([0-9A-Z_]*[A-Z_]+#{HASHTAG_CHARACTERS}*)/io

I can't see a reason for handling `LATIN_ACCENTS' separately. If configured correctly, the \w shortcut should catch all those accented characters. Maybe it's different in Ruby... Maybe they had other reasons...

For now, I'm settling for something that looks like this

$tweet =~ s{#([0-9A-Z_]*[A-Z_]+\w+)}{<a href="http://twitter.com/search?q=%23$1"&gt;#$1&lt;/a&gt;}gi

Can't say that it's solved yet...

all_numeric_no_hash 2010-04-27 11:23:16

Answer 3

A:

Ether: Thanks a lot for the code, it really helped. There may be a bug in it? I don't know if it would handle this tweet correctly http://twitter.com/KateLeiter/status/12874298805 :-)

all_numeric_no_hash 2010-04-27 11:28:16

ansaurus

tags:

views:

answers:

not autolinking all-numeric twitter hashtags in perl?

related questions