tags:

views:

73

answers:

6

I am saving all my notes in a log file. Each line is a note, suffixed by tags, and prefixed by a date and time marker, which currently looks like this: [12.20.09:22.22] ([date:time].

I am planning to have this a long-living format. Notes will be logged willy-nilly with this format 20-30 times a day for years to come. I foresee numerous kinds of parsing for analytics, filtering, searching ...

I am worried about the [ ]s though. Could they possibly trip some parsing code (someone else's if not mine)? What would be the most non-confrontational marker?

A: 

Using '[]' as the markers would be ok provided that you allow the DSL the ability to escape the characters. This is typical of operations on text which need parsing.

As an example check out the typical regular expression syntax which enables '/' as the seperator, whilst letting the user specify an escape character such as '\'. You may get some more ideas from the likes of such Unix tools as; awk, sed and grep

Grant Sayer
+1  A: 

This depends on your data. However, if you escape them with a special character of some sort, (i.e. \]) and code accordingly to look at the previous character when finding a "[" or "]", you should have no problem.

Also, if you're open to a new format, I'm a fan of JSON as it's light weight and very useful.

Dan Beam
+3  A: 

I'd consider using either XML or JSON as the format for the file.

In particular your date/time marker is ambiguous. Is it mm/dd/yy or dd/mm/yy? Or even yy/mm/dd? And in what timezone is the date and time?

Both XML and JSON define a way to have dates that are culture and timezone independent, and (best of all) there's masses of tooling available for both formats.

XML datetime format is defined here: for example, 2000-01-12T12:13:14Z.

JSON datetime format is defined as the number of seconds since Jan 1, 1970, so it's a bit uglier: { currentDate: "@1163531522089@" }

Jeremy McGee
A: 

I would tend to think a standardized format is the way to go, with JSON being my personal choice because of it's simplicity. Not only does that help to avoid parsing issues since others have already though about it, you are also given a lot more tools to work with over the life of the project.

chills42
+1  A: 

If you want everything to last in a long-lived format, then the metadata needs to be as explicit as possible. If it's intended to be long-lived, then many others will need to read it, as easily as possible.

I agree with Jeremy McGee: XML is an excellent choice. Even if no other data lives, then having it be in the format:

<note>
   <datetime>
      <year>
         2009
      </year>
      <month>
         12
      </month>
  . . .
   </datetime>
   <message>
      Foo bar baz quox
   </message>
<note>

cannot be misunderstood.

Chip Uni
Yeah that's explicit, but it's also painfully verbose. I would think on average that you would have more bytes devoted to XML tags than usable data.
Whisty
+3  A: 

If you end up going with your own format, can I recommend ISO 8601 for your date and time format.

In summary, the basic format is:

yyyy-mm-dd hh:mm:ss

You can extend this with timezone and microsecond info if you wish. Timezone is recommended or assume UTC.

With the date/time in this format there's no confusion over which is the month and the day. And it has the bonus of sorting using a basic string sort.

dave