views:

849

answers:

2

I'm building a little Twitter thing in PHP and I'm trying to parse URLs, @replies and #hashtags and make them into clickable links.

I've found a class for parsing URLs and I'm wondering if this could also be used to parse @replies and #hashtags as well:

// http://josephscott.org/archives/2008/11/makeitlink-detecting-urls-in-text-and-making-them-links/    
class MakeItLink {
protected function _link_www( $matches ) {
    $url = $matches[2];
    $url = MakeItLink::cleanURL( $url );
    if( empty( $url ) ) {
        return $matches[0];
    }

    return "{$matches[1]}<a href='{$url}'>{$url}</a>";
}

public function cleanURL( $url ) {
    if( $url == '' ) {
        return $url;
    }

    $url = preg_replace( "|[^a-z0-9-~+_.?#=!&;,/:%@$*'()x80-xff]|i", '', $url );
    $url = str_replace( array( "%0d", "%0a" ), '', $url );
    $url = str_replace( ";//", "://", $url );

    /* If the URL doesn't appear to contain a scheme, we
     * presume it needs http:// appended (unless a relative
     * link starting with / or a php file).
     */
    if(
        strpos( $url, ":" ) === false
        && substr( $url, 0, 1 ) != "/"
        && !preg_match( "|^[a-z0-9-]+?.php|i", $url )
    ) {
        $url = "http://{$url}";
    }

    // Replace ampersans and single quotes
    $url = preg_replace( "|&([^#])(?![a-z]{2,8};)|", "&#038;$1", $url );
    $url = str_replace( "'", "&#039;", $url );

    return $url;
}

public function transform( $text ) {
    $text = " {$text}";

    $text = preg_replace_callback(
        '#(?<=[\s>])(\()?([\w]+?://(?:[\w\\x80-\\xff\#$%&~/\-=?@\[\](+]|[.,;:](?![\s<])|(?(1)\)(?![\s<])|\)))*)#is',
        array( 'MakeItLink', '_link_www' ),
        $text
    );

    $text = preg_replace( '#(<a( [^>]+?>|>))<a [^>]+?>([^>]+?)</a></a>#i', "$1$3</a>", $text );
    $text = trim( $text );

    return $text;
}
}
+7  A: 

I think what you're looking to do is essentially what I've included below. You'd add these two statements in transform method, just before the return statement.

$text = preg_replace('#@([\\d\\w]+)#', '<a href="http://twitter.com/$1"&gt;$0&lt;/a&gt;', $text);
$text = preg_replace('/#([\\d\\w]+)/', '<a href="http://twitter.com/#search?q=%23$1"&gt;$0&lt;/a&gt;', $text);

Is that what you're looking for?

SoaperGEM
That's working loverly, thanks :)
Tom
A: 

Twitter recently released to open source both java and ruby (gem) implementations of the code they use for finding user names, hash tags, lists and urls.

It is very regular expression oriented.

Evan