tags:

views:

2803

answers:

7

I have an input String say "Please go to http://stackoverflow.com". The url part of the String is detected and an anchor <a href=""></a> is automatically added by many browser/IDE/applications. So it becomes "Please go to <a href='http://stackoverflow.com'>http://stackoverflow.com</a>".

I need to do the same using Java.

Thanks

+1  A: 

You could do something like this (adjust the regex to suit your needs):

String originalString = "Please go to http://www.stackoverflow.com";
String newString = originalString.replaceAll("http://.+?(com|net|org)/{0,1}", "<a href=\"$0\">$0</a>");
Jason Coco
A: 

Primitive:

String msg = "Please go to http://stackoverflow.com";
String withURL = msg.replaceAll("(?:https?|ftps?)://[\\w/%.-]+", "<a href='$0'>$0</a>");
System.out.println(withURL);

This needs refinement, to match proper URLs, and particularly GET parameters (?foo=bar&x=25)

PhiLho
A: 

Your are asking two separate questions.

  1. What is the best way to identify URLs in Strings? See this thread
  2. How to code the above solution in Java? other responses illustrating String.replaceAll usage have addressed this
ykaganovich
+6  A: 

While it's not Java specific, Jeff Atwood recently posted an article about the pitfalls you might run into when trying to locate and match URLs in arbitrary text:

The Problem With URLs

It gives a good regex that can be used along with the snippet of code that you need to use to properly (more or less) handle parens.

The regex:

\(?\bhttp://[-A-Za-z0-9+&amp;@#/%?=~_()|!:,.;]*[-A-Za-z0-9+&amp;@#/%=~_()|]

The paren cleanup:

if (s.StartsWith("(") && s.EndsWith(")"))
{
    return s.Substring(1, s.Length - 2);
}
Michael Burr
Such paren cleanup doesn't handle case "go to (http://example.com web site)".
Vilmantas Baranauskas
+7  A: 

Use java.net.URL for that!!

Hey, why don't use the core class in java for this "java.net.URL" and let it validate the URL.

While the following code violates the golden principle "Use exception for exceptional conditions only" it does not make sense to me to try to reinvent the wheel for something that is veeery mature on the java platform.

Here's the code:

import java.net.URL;
import java.net.MalformedURLException;

// Replaces URLs with html hrefs codes
public class URLInString {
    public static void main(String[] args) {
        String s = args[0];
        // separete input by spaces ( URLs don't have spaces )
        String [] parts = s.split("\\s");

        // Attempt to convert each item into an URL.   
        for( String item : parts ) try {
            URL url = new URL(item);
            // If possible then replace with anchor...
            System.out.print("<a href=\"" + url + "\">"+ url + "</a> " );    
        } catch (MalformedURLException e) {
            // If there was an URL that was not it!...
            System.out.print( item + " " );
        }

        System.out.println();
    }
}

Using the following input:

"Please go to http://stackoverflow.com and then mailto:[email protected] to download a file from    ftp://user:pass@someserver/someFile.txt"

Produces the following output:

Please go to <a href="http://stackoverflow.com"&gt;http://stackoverflow.com&lt;/a&gt; and then <a href="mailto:[email protected]">mailto:[email protected]</a> to download a file from    <a href="ftp://user:pass@someserver/someFile.txt">ftp://user:pass@someserver/someFile.txt</a>

Of course differents protocols could be handled in different ways. You can get all the info with the getters of URL class, for instance

 url.getProtocol();

Or the rest of the attributes: spec, port, file, query ,ref etc. etc

http://java.sun.com/javase/6/docs/api/java/net/URL.html

Handles all the protocols ( at least all of those the java platform is aware ) and as an extra benefit, if is there any URL that java currently does not recognize and eventually gets incorpored into the URL class ( by library updating ) you'll get it transparently!

OscarRyz
I like this over Jeff Atwoods article specifically for java, as you don't have to deal with the regex at all. But his article *does* have good points about URLs often being embedded in things like parentheses. A combination would work really well.
TREE
You should at least split on characters not found on URLs: you often see URLs between angle brackets (precisely because they don't have the inconveniences of parentheses). But good idea.
PhiLho
I have just know, the URL class is broken:http://www.youtube.com/watch?v=wDN_EYUvUq0
OscarRyz
A: 

Use the code here (java version of the first post) http://blog.houen.net/?p=174

Houen
A: 

How would you wrap img src around that href to make an image a clickable link in java?

Bullsfan
Welcome at Stackoverflow! Hey, sorry to say, but this does not seem to be an answer. This is a question! I am not sure why you posted a question as an answer, but you should be using the `Ask Question` button at the right top to ask a question, not the `Post Your Answer` button at the bottom :) See also http://stackoverflow.com/faq to learn how Stackoverflow works. Good luck!
BalusC