views:

116

answers:

4

I've written a working grammar to replace dbunit in scala called ScalaDBTest. The whole program works - only took 2 days to write. I got a lot of polishing to do.

Anyway, the grammar I'm using for the DSL to input data into the database is malleable and I'd like some feedback on it.

The basic syntax looks like this. It's pretty simple:

country:
- country_id: 1, name: "Canada"
- country_id: 2, name: "United States"

This is certainly better than XML or SQL insert statements.

I debated using ":" or "=". The former looks better, but the latter seems automatic for me to type.

There's also a concept where you can "label" a record. In a sense, the above syntax was anonymous records. Labels will be an interesting feature because you can use them in a variety of ways.

country:
    record: Canada -> country_id: 1, name: $label # produces "Canada"
    record: UnitedStates -> country_id: 2, name: $label.uncamel # produces "United States"

I dislike this syntax. It's a little two wordy. Using the "-" doesn't look right, but if I use an actual command word like "record", I need to put the "->" to separate them otherwise it looks really bad (it's not necessary for technical reasons).

$label will simply repeat the label, so you can use the label at the bare minimum as way to reuse the string. $label.uncamel will add spaces where it looks like camel case.

The idea behind labels is to give the APIs a way to access records without having to remember the ids. If you know the country object you want to get is "Canada", then you can just pass the label "Canada" and it will convert it to a unique id and pull it out of the database.

Here's an example where you can specify default parameters:

province:
? country_id: 1, nice_weather: true
- province_id: 1, name: "British Columbia"
- province_id: 2, name: "Manitoba", nice_weather: false
- province_id: 3, name: "New York", country_id: 2

Here's were you see some real power. All of these 3 'province' records will have 4 columns. Since 2 of the provinces are from Canada, they are automatically inherited from the default values. In the 3rd case, we override it with United States for New York. We can mix/match as required.

In practice, this is going to save a lot of typing and cognitive load as we often only care about a few values in practice, and the rest can be mere placeholders to get the database to shut up about missing required fields and so on. This really helps with testing polymorphic objects too.

Here's another:

article:
? date_create: $now
- article_id: 1, title: "The Fed Sucks"
- article_id: 2, title: null

This snippet shows that you can actually place null values without doing any tricks like in dbUnit. In DbUnit, you have to first create a transformer that translates a custom string (like "[NULL]") to actual null value.

In fact, we can be way more expressive and offer a variety of expressions and functions to help generate data. For example, $now returns a properly formatted sql date of today's date/time. I will expand on these kinds of features to help make writing test data easier.

Anyway, I'm looking for help to really clean up the syntax. I can make any change, and since this is fresh, I'd like to make it really snazzy from the start rather than change it later.

Thanks

+1  A: 

I would consider embedding this in Scala rather than creating an external DSL. Features that would help are case classes, default parameters, compiler generated copy methods.

retronym
I am using scala's combinational parsing library. It was a piece of a cake to do.
egervari
+1  A: 

I will suggest to extent the initial syntax for the label in the following ways:

  1. Segment/Group label by column name.
    country: [label:name]
    - country_id: 1, name: "Canada"
    - country_id: 2, name: "United States"
  2. Segment/Group label by column index, to minimize typing.
    country: [label:2]
    - country_id: 1, name: "Canada"
    - country_id: 2, name: "United States"
  3. Single Recod
    country:
    -[label:2] country_id: 1, name: "Canada"
    -[label:name] country_id: 2, name: "United States"

You can eliminate the index based labels if you dont want to track the number of columns in the record.

If you wish to extend the tool you can add another attributes like concurrency for the record groups, for instance create 5K rows with 5 Threads will be something like this:


country: [label:name] [concurrency:5]
-[label:2] country_id: 1, name: "Canada"
-[label:name] country_id: 2, name: "United States"

XecP277
square brackets look really nice and consistent. Easier to type than angle's for sure. Thanks!
egervari
+1  A: 

While it is certainly important to have an easy to understand and terse format, that IMHO still doesn't justify to write your own proprietary without looking for standardized alternatives. Did you consider using lightweight and more readable alternatives to XML, like JSON or HAML? Note that you have still the advantages of tool support and standardization if you only support a well-defined sub-set.

Landei
JSON is harder to write and reason about than my format. Databases don't need to be nested like that. It's not at all appropriate to the problem. I don't know much about haml, but I think that format is still too complicated and it's goals are much larger and different than what I'm setting out to do here. This isn't about presentation - it's about creating test data in a database. For example, how would you do default column definitions? It would be a little weird.
egervari
+1  A: 

This looks a lot like YAML, so I suggest you take a look at that. Here are a couple of valid YAML syntax for what you have:

country:
  - country_id: 1
    name: Canada
  - country_id: 2
    name: United States

# country
---
country_id: 1
name: Canada
---
country_id: 2
name: United States
Daniel