etl

ETL framework for loading data into Rails app

I need to load data for my Rails application from multiple providers (REST/SOAP based XML feeds) into the database on a recurring basis. I have written a set of Rake tasks which are kicked off by whenever-generated cron jobs. Each task hits the partner feed endpoint, parses the feed and loads it into the database. Instead of writing Rak...

When is data erased from the OLAP DB?

I am new to OLAP. I understand the table structure and ETL process. I don't understand when data is supposed to be deleted from the fact table. Say I'm creating a reporting application for events. each event has the duration it took to complete, the exit code and total bytes read. There are several dimensions, e.g. time and location. ...

Why isn't my Data Mining Model Training destination accepting numeric input inside SSIS?

I'm trying to create a mining model for forecasting against some DW data. I'm using SSIS for my ETL, and trying to use the Data Mining Model Training destination. Unfortunately I'm receiving an error whenever the column I'm trying to predict is numeric or decimal format. I don't get the error when I create the model by hand in SSMS, a...

What's the best way to read a tab-delimited text file in C#

We have a text file with about 100,000 rows, about 50 columns per row, most of the data is pretty small (5 to 10 characters or numbers). This is a pretty simple task, but just wondering what the best way would be to import this data into a C# data structure (for example a DataTable)? ...

In Powershell, what's the most efficient way to split a large text file by record type?

I am using Powershell for some ETL work, reading compressed text files in and splitting them out depending on the first three characters of each line. If I were just filtering the input file, I could pipe the filtered stream to Out-File and be done with it. But I need to redirect the output to more than one destination, and as far as I...

Module or tool for web-based data import and ETL?

I'm going to be adding a feature to a web application that allows users to import data. I don't want to reinvent wheel, so I am looking for any module I could integrate that would handle this. The interface should be similar to that of importing a file into Excel or Access plus some more complex mapping and type conversion functions ...

Does SQL Server Integration Services (SSIS) re-compile C# code every time it's run?

We have a process that is getting data in real time and adding records to a database. We're using SQL Server 2008 Integration Services to run our Extract Transform Load (ETL) process. We download about 50 files from an FTP site, process them and then archive the files. The problem is that the processing is taking about 17s per file even...

Video Tutorials for Ab Initio ETL Data Ware housing Tool !

Hi All, Please tell me where I can find video tutorials of Ab-Initio ETL Data Ware Housing tool. I surfed in google but i did not find any materials. Thanks in Advance. ...

SQL Server ETL process transaction logs

hi, is it ok to set recovery mode simple in a staging db for an ETL process... The customer is not even doing a regular backup! So what's the point in keeping the transaction logs... I propose to organize a daily backup after the bulk import and that's it... Anything against this plan? Also the transaction logs were at 80gb after 3 we...

Sybase: how can I remove non-printable characters from CHAR or VARCHAR fields with SQL?

I'm working with a Sybase database that seems to have non-printable characters in some of the string fields and this is throwing off some of our processing code. At first glance, it seemed to only be newlines and carriage returns, but we also have an ASCII code 27 in there - an ESC character, some accented characters, and some other odd...

Re-usable SQL Server stored procedures; nesting; global variables

I want to make some re-useable, somewhat-dynamic TSQL code that can be called within many other stored procs, but I'm struggling with how to implement this with SQL Server. The environment is that many distributed source systems databases which will have their own wrapper stored procedure which will call a few of these modular stored pr...

Field specific errors for ETL

I am creating a ETL process in MS SQL Server and I would like to have errors specific to a particular column of a particular row. For example, the data is initially loaded from excel files into a table(we'll call the Initial table) where all columns are varchar(2000) and then I stage the data to another table(the DataTypedTable) that co...

Grouping ETL Staging Tables With User Schemas?

I was thinking of putting staging tables and stored procedures that update those tables into their own schema. Such that when importing data from SomeTable to the datawarehouse, I would run a Initial.StageSomeTable procedure which would insert the data into the Initial.SomeTable table. This way all the procs and tables dealing with the...

SQLAlchemy: checking unicode string's validity for a given type of db column

Hi folks! Am developing an extract-transform-load script with sqlalchemy. Scenario is as follows: take 30+ mln text file (csv, tab-delimited or any other...). parse it and generate file, suitable for 'Load data infile' mySQL import command (as described http://dev.mysql.com/doc/refman/5.0/en/load-data.html ) From within script, disabl...

CDC and ETL help/recommendations

Here's the background. We have a few different customers, each with a different backend source database. We want to be back to pick up real time changes to the backend database, then transform the data to a target schema in our target database. After that broadcast a message to other apps alerting the change. To do this we need CDC s...

Please explain about AbInitio recovery file(.rec)?When should we roll back the file?

Hi All, Please tell the concept of AbInitio recovery file. When the Abinitio graph fails in execution which cases should we rollback the recovery file and in which cases we shouldnt rollback the recovery file. Please provide links for any AbInitio materials. Thanks. ...

What is the best 3rd party component for importing flat files using C#?

Just looking for a component that can be programmatically called in a fairly simple way to import a flat file of data. The data is typically 100,000-500,000 rows, each row contains about 200 fields of text anywhere from about 5 to 250 characters long. Data could be CSV, tab-delimited, etc. There is some budget for this, but would like...

How to prevent CAST errors on SSIS ?

Hello, The question Is it possible to ask SSIS to cast a value and return NULL in case the cast is not allowed instead of throwing an error ? My environment I'm using Visual Studio 2005 and Sql Server 2005 on Windows Server 2003. The general context Just in case you're curious, here is my use case. I have to store data coming from ...

Straight Java/Groovy versus ETL tool (Talend/etc) - what libraries would you use?

Assume you have a small project which on the surface looks like a good match for an ETL tool like Talend. But assume further, that you have never used Talend and furthermore, you do not trust "visual programming" tools in general and would rather code everything the old fashioned way (text on a nice IDE!) with the help of an appropriate...

Empty data problem - data layer or DAL?

I designing the new App now and giving the following question a lot of thought. I consume a lot of data from the warehouse, and the entities have a lot of dictionary based values (currency, country, tax-whatever data) - dimensions. I cannot be assured though that there won't be nulls. So I am thinking: create an empty value in each of ...