kettle

Where is Pentaho Kettle's architecture?

Where can I find Pentaho Kettle architecture? I'm looking for a short wiki, design document, blog post, anything to give a good overview on how things work. This question is not meant for specific "how to" starting guides but rather a good view at the technology and architecture. Specific questions I have are: How does data flow betwe...

Src jar for Pentaho Kettle

Where can I find a src-jar for Kettle? I'm looking for a jar that contains the Java files, and I can point my IDE to (like for example junit-4.6-src.jar). ...

Kettle plugin remains on Idle

I'm writing my first Pentaho Kettle plugin, and when I run it through the Spoon UI, it remains Idle, while the other plugins are active. I connected it to an input, and it just processed. What am I doing wrong? ...

Reading both attributes and nodes at the same time in Kettle / Spoon

I'm using kettle and trying to load both the attribute and node values from an xml document. <Colors> <Color code="123">blue</Color> <Color code="234">black</Color> <Color code="456">green</Color> </Colors> If I set the loop XPath to Colors I will only get one row but it will read both the code and the value. example: Code | C...

Rhino ETL opinions vs Kettle and SSIS

I am considering a tool for an ETL solution that has high daily demand and requires heavy business logic processing. I've tried kettle and SSIS so far, and also want to test for Rhino ETL. I don't care for the visual flow structure of both Kettle and SSIS and creating complex businesse rules seems really hard using them... Rhino ETL seem...

Recursive calls in Pentaho Data Integration

Is it possible for a step or transformation in Pentaho Data Integration to call itself, passing the results of the previous call as parameters/variables? My first thought was to create a loop in a transformation, but they don't seem to be allowed... ...

Waiting for Transformations in a Job

I am working with Pentaho Data Integration (aka Kettle) and I have several Transformations, let's call them A, B, C, D, E. B depends on A, D depends on C and E depends on B and D. In a job I'd like to run A, B and C, D in parallel: -> A -> B _ Start< \ -> C -> D----> E where A and C run in parallel...

Is it better to do a select count then if before doing a delete or just blindly call delete?

I'm looking for a best practice / thoughts on if it is better to do a select count and checking if the result is > 0 before calling a delete or if it would be better to just blindly fire a delete statement at the database even if the data doesn't exist. In our case most of the time data will NOT exist. So what is better: Option 1: c...

Pentaho: Best way to convert a date into a drill-down-able cube dimension?

My datawarehouse table just contains a single date SQL column, but I want to be able to drill down using the usual year/quarter/month/day levels. I could manually create new column using Pentaho Kettle, and then create the levels one-by-one in Pentaho Schema Workbench. But this is such a common task (I guess everybody creating sales-re...

RegEx to Remove Unwanted text

I'm still kind of new to RegEx in general. I'm trying to retrieve the names from a field so I can split them for further use (using Pentaho Data Integration/Kettle for the data extraction). Here's an example of the string I'm given: CN=Name One/OU=Site/O=Domain;CN=Name Two/OU=Site/O=Domain;CN=Name Three/OU=Site/O=Domain I would like...

Does anybody know the list of Pentaho Data Integration (Kettle) connectors list ?

Hi all I am doing comparison between three open source ETL tools Talend, Kettle and CloverETL. I could find with no problem Talend and CloverETL's connector list. But, I cannot find the one for Kettle. Does someone knows them or where can I find them ? Thanks a lot, ...

Putting multiple DB-resultrows into one stream row

I have a database table, let's call it headers with an id and a String-field called "header". Another table in the database, called subheaders has two fields, headerId and the String field "subheader". There are 0, 1 or 2 subheaders per header. I now want to use Kettle/Pentaho Data Integration to generate an Excel output with the followi...