views:

254

answers:

5

Hi,

I was recently asked to submit a solution to a problem for a job.

Problem: Find a sub-string in a string.

Input: "Little star's deep dish pizza sure is fantastic."  
Search: "deep dish pizza"  
Output: "Little star's [[HIGHLIGHT]]deep dish pizza[[ENDHIGHLIGHT]] sure is fantastic."

Note that the highlighter doesn't have to have the exact same result on this example, since you are defining what a good snippet is and return the the most relevant snippet with the query terms highlighted.

The most important requirement was to write it as I would write a production code.

My solution was not accepted. How could I have improved it? I know, I could have used:

  1. Knuth–Morris–Pratt algorithm
  2. Regex (could I?)

My QUESTION:

  1. What do tech companies take into consideration when they review a code for a job. I submitted the code the same day, does that help in any way?

  2. In one of the comments, it pointed out, it looks like a school code than production code. How? Any suggestions?

My solution:
FindSubString.java

/**
 * FindSubString.java: Find sub-string in a given query
 * 
 * @author zengr
 * @version 1.0
 */

public class FindSubstring {
    private static final String startHighlight = "[[HIGHLIGHT]]";
    private static final String endHighlight = "[[ENDHIGHLIGHT]]";

    /**
     * Find sub-string in a given query
     * 
     * @param inputQuery: A string data type (input Query)
     * @param highlightDoc: A string data type (pattern to match)
     * @return inputQuery: A String data type.
     */
    public String findSubstringInQuery(String inputQuery, String highlightDoc) {
        try {

            highlightDoc = highlightDoc.trim();

            if (inputQuery.toLowerCase().indexOf(highlightDoc.toLowerCase()) >= 0) {
                // update query if exact doc exists
                inputQuery = updateString(inputQuery, highlightDoc);
            }

            else {
                // If exact doc is not in the query then break it up
                String[] docArray = highlightDoc.split(" ");

                for (int i = 0; i < docArray.length; i++) {
                    if (inputQuery.toLowerCase().indexOf(docArray[i].toLowerCase()) > 0) {
                        inputQuery = updateString(inputQuery, docArray[i]);
                    }
                }
            }
        } catch (NullPointerException ex) {
            // Ideally log this exception
            System.out.println("Null pointer exception caught: " + ex.toString());
        }

        return inputQuery;
    }

    /**
     * Update the query with the highlighted doc
     * 
     * @param inputQuery: A String data type (Query to update)
     * @param highlightDoc: A String data type (pattern around which to update)
     * @return inputQuery: A String data type.
     */
    private String updateString(String inputQuery, String highlightDoc) {
        int startIndex = 0;
        int endIndex = 0;

        String lowerCaseDoc = highlightDoc.toLowerCase();
        String lowerCaseQuery = inputQuery.toLowerCase();

        // get index of the words to highlight
        startIndex = lowerCaseQuery.indexOf(lowerCaseDoc);
        endIndex = lowerCaseDoc.length() + startIndex;

        // Get the highlighted doc
        String resultHighlightDoc = highlightString(highlightDoc);

        // Update the original query
        return inputQuery = inputQuery.substring(0, startIndex - 1) + resultHighlightDoc + inputQuery.substring(endIndex, inputQuery.length());
    }

    /**
     * Highlight the doc
     * 
     * @param inputString: A string data type (value to be highlighted)
     * @return highlightedString: A String data type.
     */
    private String highlightString(String inputString) {
        String highlightedString = null;

        highlightedString = " " + startHighlight + inputString + endHighlight;

        return highlightedString;
    }
}

TestClass.java

/**
 * TestClass.java: jUnit test class to test FindSubString.java
 * 
 * @author zengr
 * @version 1.0
 */

import junit.framework.Test;
import junit.framework.TestCase;
import junit.framework.TestSuite;

public class TestClass extends TestCase
{
    private FindSubstring simpleObj = null;
    private String originalQuery = "I like fish. Little star's deep dish pizza sure is fantastic. Dogs are funny.";

    public TestClass(String name) {
        super(name);
    }

    public void setUp() { 
        simpleObj = new FindSubstring();
    }

    public static Test suite(){

        TestSuite suite = new TestSuite();
        suite.addTest(new TestClass("findSubstringtNameCorrect1Test"));
        suite.addTest(new TestClass("findSubstringtNameCorrect2Test"));
        suite.addTest(new TestClass("findSubstringtNameCorrect3Test"));
        suite.addTest(new TestClass("findSubstringtNameIncorrect1Test"));
        suite.addTest(new TestClass("findSubstringtNameNullTest"));

        return suite;
    }

    public void findSubstringtNameCorrect1Test() throws Exception
    {
        String expectedOutput = "I like fish. Little star's deep [[HIGHLIGHT]]dish pizza[[ENDHIGHLIGHT]] sure is fantastic. Dogs are funny.";
        assertEquals(expectedOutput, simpleObj.findSubstringInQuery(originalQuery, "dish pizza"));
    }

    public void findSubstringtNameCorrect2Test() throws Exception 
    {
        String expectedOutput = "I like fish. Little star's [[HIGHLIGHT]]deep dish pizza[[ENDHIGHLIGHT]] sure is fantastic. Dogs are funny.";
        assertEquals(expectedOutput, simpleObj.findSubstringInQuery(originalQuery, "deep dish pizza"));
    }

    public void findSubstringtNameCorrect3Test() throws Exception 
    {
        String expectedOutput = "Hello [[HIGHLIGHT]]how[[ENDHIGHLIGHT]] are [[HIGHLIGHT]]you[[ENDHIGHLIGHT]]r?";
        assertEquals(expectedOutput, simpleObj.findSubstringInQuery("Hello how are your?", "how you"));
    }

    public void findSubstringtNameIncorrect1Test() throws Exception 
    {
        String expectedOutput = "I like fish. Little star's deep dish pizza sure is fantastic. Dogs are funny.";
        assertEquals(expectedOutput, simpleObj.findSubstringInQuery(originalQuery, "I love Ruby too"));
    }

    public void findSubstringtNameNullTest() throws Exception
    {
        String expectedOutput = "I like fish. Little star's deep dish pizza sure is fantastic. Dogs are funny.";
        assertEquals(expectedOutput, simpleObj.findSubstringInQuery(originalQuery, null));

    }
}
A: 

Do you mean to be matching on partial words in the case where you break up the query? You also aren't accounting for the possibility that the search string contains the word "HIGHLIGHT".

For example, try this:

Input: "Little star's deep dish pizza sure is fantastic."
Search: "a HIGHLIGHT"
Output: (probably not what you want)

David Gelhar
+1  A: 

I don't know if I'm missing something, but I'd write a simple block of code using indexOf() and other string methods. Depending on the defintion of the problem, I'd probably use a StringBuilder.insert() method to inject the highlighting. I'd not be looking do to tokenising and looping as you have done here. I'd justify it by saying that it's the simplest solution to the problem as specified. KISS is the best approach to open questions like this.

Although they have given you the inputs and outputs, did they specify what was to happen if the inputs changed and there was no match?

I also notice the catching of a null pointer. Good idea if you know where it's likely to occur and intend to do something about it. But as you are just logging, perhaps you should have done a more general catch. For example, can your code trigger a IndexOutOfBoundsException. So I'd be looking to catch Exception or Throwable.

The other thing I'd be asking is a definition of what they consider "Production code". At first glance it sounds simple enough, but my experience is that it can be interpreted in many different ways.

The real problem is that they will be expecting certain things, and you don't know them. So you code what works for you and hope that it matches what they expect.

Derek Clarkson
I/P and O/P could have been anything and I *thought* (rather assumed) using `catch(Exception ex)` is a bad practice. I always use `Exception`, but to make the code look more *professional*, I made it `NullPointerException`
zengr
There will be a bazillion different views on this, but at the top level of an application, for example a `public static void main(String args[])` method, I'll always have an Exception catch. But inside my application I'll always deal with specific exceptions. There are also a number of opinions on the issue of catch and wrapping exceptions and whether you should wrap thrown exceptions and re-throw etc. Perhaps one of the issues might have been not that you caught a NPE, but that the comment didn't explain why. I'm just guessing :-)
Derek Clarkson
@Derek...catching general Exceptions might not be an idea solution since if it has to be a production code, then it has to be maintained...and if you use general Exceptions, then you should have enough documentation in place to let the next guy know what the try catch block was for...adding a check is a good solution, but I would not use General Exceptions.
topgun_ivard
+1 for starting with the simplest solution. And maybe they expected you to ask them some questions to clarify the behavior (exception handling, null input etc...) rather than just deciding yourself...
pgras
I agree with the KISS part. I don't think that catching that NPE is a good idea at all (but catching `Exception` or `Throwable` does not make it any better in my opinion).
Grodriguez
Um, I did say it would be in a main() method. i.e. at the top level of the application and that lower down I would catch specific ones. This is where it comes back to the definition of production code. The solution as supplied by @Zengr would not run by itself. It needs a container of some sort. Thats where you would expect general exception catching. :-)
Derek Clarkson
@Derek: You mentioned the main method in your comments, but not in the text of your answer, which only says that catching the null pointer is a good idea if you know where it's likely to occur. This is the bit I disagree with. I don't see any reason for catching a NPE in his code. If he knows it's "likely to occur", then that should be fixed so that it does not occur.
Grodriguez
@Grodriguez: Your right, there is no reason for catching it in this code. If there was a reason for catching it, then the code is fine. That's what I was trying to say. General exception catches such as Exception or Throwable should only occur in the base containers. Where they can be reported on in logs or consoles. In this case, there is no base container in the example code, so I'm assuming another class with a main() method would be needed.
Derek Clarkson
@Grodriguez: One other thing. It's not always possible to control the code you are dealing with so sometimes you have to catch NPEs. Not in this example, but I've seen plenty of production code where NPEs are being caught.
Derek Clarkson
Sure. I was only addressing this specific case.
Grodriguez
+1  A: 

As they seem to emphasize that you can define what a good output is ... perhaps how do you do the parsing is not what they want to know. Maybe they want that you realize that marking the text in the string with a marker is not a very good solution. If the result is to be used in the program for further processing something like this might be more appropriate.

class MarkedText {
    String highlightDoc;
    String inputQuery;
    List<Range> ranges;
}

class Range {
    int offset;
    int length;
}

public MarkedText findSubstringInQuery(String inputQuery, String highlightDoc) {
    [...]
}
Arne
+3  A: 

If this code was submitted to me for review, this is what I would think:

  • The code is overly verbose and complex where it does not need to be. The whole thing can be done in a small method in ten lines of code, including all sanity checks. This method should probably be static.
  • Your code is doing things that (I assume) were not asked for. You were being asked to search for a substring in a string. If it is not found, then that's fine -- no need to split the substring into words and search for each individual word.
  • Unless they asked you to remove leading and trailing whitespace, I would not have called trim()
  • I would not have included the calls to toLowerCase() unless explicitly asked for, although I would have added a comment stating that they could be added if needed. Anyway even if the search is meant to be case insensitive, there are too many redundant calls to toLowerCase() in your code.
  • You should not need to catch NullPointerException -- instead, you should ensure that this exception is never thrown.
Grodriguez
but would your point 1 would have adhered to "production code"?
zengr
Yes. "Production code" does not necessarily mean "lots of lines of code".
Grodriguez
making the method static would make it susceptible for the JIT to treat the method as inline...so if the method is called multiple times, wouldn't that be bad? also, since it is a production code, this class should be designed in a way that it can be extended. Static methods cannot be extended with subclassing. So is static really good?
topgun_ivard
The JIT probably knows better than you and me :)
Grodriguez
Regarding subclassing: Not everything needs to be extended or overriden. A simple method that searches for a substring in a string might not need to. In any case I would try to stick to the KISS principle and provide the simplest solution that does what is expected, while still maintaining production quality. Avoid overengineering.
Grodriguez
agreed with KIS, but why static?
topgun_ivard
Since this would be a helper method that just takes some arguments and returns a value, without accessing any state variable, it makes sense to explicitly declare it as such -- makes code more readable in my opinion. For a more elaborate response, see this answer: http://stackoverflow.com/questions/3346764/should-java-methods-be-static-by-default/3349987#3349987
Grodriguez
+7  A: 

A few comments;

  • You only highlight the first occurance of the search string.
  • You assume that lower case matching is fine. Unless this was specified as a requirement it might be better to provide two methods, one that respects case and one that ignores case.
  • I would probably check the given parameters and throw a NPE if either of them were null. This would be the first thing my method did. I would clearly document this behaviour in the javadoc.
  • Your method mame is bad; findSubstringInQuery's main task isn't to find, it is to highlight and the inQuery part is superflous. Just call the method highlight or maybe highlightIgnoreCase if you are going to have a highlight that respects case.
  • Your method parameter names are bad. I've looked at your method signature 10 times and still have to look at the method body to remind myself which arg is the search term and which is the text to search. Call them searchTerm and text.
  • Production code doesn't use the default package.
  • Production code doesn't use System.out.println().
  • Your javadoc need improving, it needs to tell the user everything they need to know about the code.
  • I would consider using static methods for a class with no class variables.
  • I would also consider allowing the user to specify their own start and end highlighting markers (I wouldn't use static methods if I did this).
  • I wouldn't trim() unless this was specified as a requirement. If I did, then obviously this behaviour would be documented in the javadoc.

I wouldn't worry about the algorithm used for searching, Knuth-Morris-Pratt looks good but they shouldn't expect you to know about it and implement it unless the job spec specifically asked for experience/expertise in string searching.

Qwerky
Adding to that: • You don't need to mention that all your method parameters and return values are `data type String`. That's obvious from the code. • You should not catch the `NullPointerException` when you don't know what to do with that exception.
Roland Illig
• Your JUnit test uses the old JUnit3 framework. In addition, the names of your test methods should start with `test...`. That way you don't need to construct a test suite manually but can just run the test using any decent IDE.
Roland Illig
• Adding to that: Your code is cheating and wrong. Consider this: `assertEquals("B[[HIGHLIGHT]]oo[[ENDHIGHLIGHT]]m", simpleObj.findSubstringInQuery("Boom", "oo"));`
Roland Illig