views:

139

answers:

6

I'm building a general purpose data translation tool for internal enterprise use, using Java 5. The various departments use differing formats for coordinate information (latitudes/longitudes), and they want to see the data in their own format. For example, the coordinates of the White House in DMS format are

38° 53' 55.133" N, 77° 02' 15.691" W

But can also be expressed as:

385355.133 / -0770215.691

I want to represent the pattern required by each system as a string, and then use those patterns to parse instance data from the input system, and also use that pattern when formatting a string for consumption by the output system.

So it is not unlike a date/time formatting problem, for which the JDK provides java.text.SimpleDateFormat that lets you convert among various date/time patterns, which are defined by strings such as "YYYY-MM-DD" or "MM/DD/YY".

My question is, do I have to build this CoordinateFormat thing totally from scratch, or is there a good general tool or well-defined approach I can use to guide me in this endeavor?

A: 

take a look to JScience, particularly this class

dfa
Thanks, but I am looking for a more general solution to the more general problem, since there are other beasts besides dates and coordinates that will also have differing formats among which this tool must translate.
Kevin Pauli
A: 

#1. I would think defining a common internal format would be helpful. You would convert from input format to internal and to any number of formats as required by output. #2. RegEx would be my choice to implement the converter.

spitfire
I'm not sure regex is up to the task... I think of this problem as another case of the date formatting problem. I want a pattern string much like "mm/dd/yyyy" except it will be something like "ddmmss.sss". And I'd rather not code something that is so specific to coordinates, I'm looking for a general tool or approach that solves this problem for all kinds of arbitrary objects that have a string representation. I want a solution to the more general problem, of which Date formatting and Coordinate formatting are but specific examples.
Kevin Pauli
A: 

One solution would be to define a specification system from which both the input regex (or whatever) and the output format string can be derived. If you have a regex system that allows named capture groups and a formatting system that allows non sequential arguments, this might be as simple as recoding the escaping and indexing of one into the other. I don't known mush Java so I'll leave the details to the reader.

BCS
A: 

To me, it looks like you are looking at a larger framework for your solution.

The main problem I see is that you're looking for a silver bullet to knock out any type of data. But as java goes the most consistent way is to wrap regex. Each object type is going to need a list of strings defining the accepted formats. So date could have many, coordinates have 2, etc.

These strings can either be regex (painful but consistent and accepted) or you can write your own conversion library to go something like this:

Converter c = new Converter();
FormatString format = new FormatString("ddmmss.sss");
format.AddRegexEquivalent("d","\\d");
format.AddRegexEquivalent("m","\\d");
format.AddRegexEquivalent("s","\\d");
c.AddFormatString(format);

if( c.ConvertString("385355.133") )
{
  System.out.println( c.GetData("d") );
  System.out.println( c.GetData("m") );
  System.out.println( c.GetData("s") );
}


output:
38
53
55.133

It'll be tough, but I think that's more what you're looking for. The converter has to translate the given letters into regex equivalents. (as a start you can just mass replace) and then concatenate all the values for each letter. I would return a String from GetData and then use a Parse*** from there, easier to handle.

CodePartizan
A: 

The TextTemplate class in wicket generates a string by interpolating a "template" string with a map of key-value pairs. You could use the output pattern string as a basis, with a variable to interpolate from the map for each value (longitude degrees, minutes, whatever). This won't do two-way conversion, but you might take a look at it and see if it helps you.

http://wicketstuff.org/wicket13doc/org/apache/wicket/util/template/TextTemplate.html

Here's the source, from their svn:

http://svn.apache.org/repos/asf/wicket/trunk/wicket/src/main/java/org/apache/wicket/util/template/TextTemplate.java

RMorrisey
+1  A: 

If I read it right, you're talking about the problem addressed by the Interpreter pattern, but sort of going in both directions.

There are some easy ways to get nice generic interfaces, so you can get the rest of the thing running. My recommendation on that is something like:

public interface Interpreter<OutputType> {
public void setCode(String coding);
public OutputType decode(String formattedData);
public String encode(OutputType rawData); }

However, there are a couple of hurdles with concrete implementations. For your date example, you might need to deal with "9/9/09", "9 SEP 09", "September 9th, 2009". The first "kind" of date is straightforward - numbers and set divider symbols, but either of the other two is pretty nasty. Honestly, doing something totally generic (which could already be canned) probably isn't reasonable, so I recommend the following.

I'd attack it on two levels, the first of which is pretty straightforward with regex and format string: chomping up the data string into the things that are going to become raw data. You'd supply something like "D*/M*/YY" (or "M*/D*") for the first one, "D* MMM YY" for the second, and "Mm+ D*e*, YYYY" for the last, where you've defined in your data some reserved symbols (D, M, Y, obvious interpretations) and for all data types (* multiple characters possible, + "full" output, e defined extraneous characters) - these symbols obviously being specific to your application. Then your regex stuff would chomp the string up, feeding everything associated with each reserved character to the individual data fields, and saving the decoration part (commas, etc) in some formatting string.

This first level can all be fairly generic - each data type (e.g., date, coordinate, address) has reserved symbols (which don't overlap with any formatting characters), and all data types have some shared symbols. Perhaps the Interpreter interface would also have public List<Character> reservedSymbols() and public void splitCode(List<String> splitcodes) methods, or perhaps guaranteed fields, so that you can make the divider an external class and pass in the results.

The second level is less easy, because it gets at the part that can't be generic. Based on the format of the reserved symbols, the individual fields need to know how to present themselves. To the date example, MM would tell the month to print as (01, 02, ... 12), M* as (1, 2, ... 12), MMM as (JAN, FEB, ... DEC), Mmm as (Jan, Feb, ...Dec), etc. If your company has been somewhat consistent or doesn't venture too far from standard representations of stuff, then hand coding each of these shouldn't be too bad (and in fact, there are probably smart ways within each data type to reduce replicated code). But I don't think it's practical to generify all this stuff - I mean, practically representing that something that can be presented as a number or characters (like months) or whole data that can be inferred from partial data (e.g., century from year) or how to get truncated representations from the data (e.g., the truncation for year is to the last two digits vice most normal numbers truncating to two leading digits) is probably going to take as long as handwriting those cases, though I guess I can imagine cases of your application the trade-off might be worth it. Date is really tricky example, but I can certainly see equally tricky things coming up for other sorts of data.

Summary:

-there's an easy generic face you can put on your problem, so the rest of your app can be coded around it.

-there's a fairly easy and generic first pass parsing, by having universal reserved symbols, and then reserved symbols for each data type; make sure these don't collide with symbols that will appear in formatting

-there's a somewhat tedious final coding stage for individual data bits

Carl
Very thorough answer. Not far from what I ended up with. See comment above.
Kevin Pauli