etl

Ideas on Populating the Fact Table in a Data Mart

Hi I am looking for ideas to populate a fact table in a data mart. Lets say i have the following dimensions Physician Patient date geo_location patient_demography test I have used two ETL tools to populate the dimension tables- Pentaho and Oracle Warehouse Builder. The date, patient demography and geo locations do not pul...

Distributed ETL question

Looking for any recommendations for an ETL system for 200+ distributed systems (Windows, AS400, Linux etc). We collect data each month from all of our customers (regardless of system type), bring it back, process it all together and send the aggregate solutions back to them. I'm tasked with automating this system - any suggestions on h...

Best way to produce automated exports in tab-delimited form from Teradata?

I would like to be able to produce a file by running a command or batch which basically exports a table or view (SELECT * FROM tbl), in text form (default conversions to text for dates, numbers, etc are fine), tab-delimited, with NULLs being converted to empty field (i.e. a NULL colum would have no space between tab characters, with appr...

Best Practise to populate Fact and Dimension Tables from Transactional Flat DB

Hi! I want to populate a star schema / cube in SSIS / SSAS. I prepared all my dimension tables and my fact table, primary keys etc. The source is a 'flat' (item level) table and my problem is now how to split it up and get it from one into the respective tables. I did a fair bit of googling but couldn't find a satisfying solution to...

PostgreSQL to Data-Warehouse: Best approach for near-real-time ETL / extraction of data

Background: I have a PostgreSQL (v8.3) database that is heavily optimized for OLTP. I need to extract data from it on a semi real-time basis (some-one is bound to ask what semi real-time means and the answer is as frequently as I reasonably can but I will be pragmatic, as a benchmark lets say we are hoping for every 15min) and feed it...

ETL Operation - Return Primary Key

I am using Talend to populate a data warehouse. My job is writing customer data to a dimension table and transaction data to the fact table. The surrogate key (p_key) on the fact table is auto-incrementing. When I insert a new customer, I need my fact table to reflect the id of the related customer. As I mentioned my p_key is auto auto...

ETL mechanisms for MySQL to SQL Server over WAN

I’m looking for some feedback on mechanisms to batch data from MySQL Community Server 5.1.32 with an external host down to an internal SQL Server 05 Enterprise machine over VPN. The external box accumulates data throughout business hours (about 100Mb per day), which then needs to be transferred internationally across a WAN connection (qu...

Python - CSV: Large file with rows of different lengths

In short, I have a 20,000,000 line csv file that has different row lengths. This is due to archaic data loggers and proprietary formats. We get the end result as a csv file in the following format. MY goal is to insert this file into a postgres database. How Can I do the following: Keep the first 8 columns and my last 2 columns, to hav...

Version-control in a large SSIS ETL project

We're about to make data transformation from one system to another using SSIS. We are four people people who will continuously be working on this for two years and therefore we need some sort of versioning system. We can not use team foundation. We're currently configuring a SVN server, but digging into it I've seen some big risks. It s...

Loop Control within a DataflowTask in ETL

Hi, Being fairly new to SSIS and the ETL process, I was wondering if there is anyway to loop though a record set within a DataFlowTask and pass each row (deriving parameters from the row) into a Stored Procedure (the next step in the ETL phase). Once i have passed the row into the stored procedure, I want the results from each iteratio...

C# Importing Large Volume of Data from CSV to Database

What's the most efficient method to load large volumes of data from CSV (3 million + rows) to a database. The data needs to be formatted(e.g. name column needs to be split into first name and last name, etc.) I need to do this in a efficiently as possible i.e. time constraints I am siding with the option of reading, transforming an...

transforming binary data using ssis and sql server 2008

Hello All - I have a task to import/transform and extract zipped binary files that contain both text data as well as embedded binary data. Within the data is data that is relational in nature and needs to be processed into a defined database structure. Currently I have a C# single threaded app that essentially grabs all the files from th...

Difference Pentaho to JasperSoft - 2010

What are the main differences between the Pentaho BI Suite and JasperSoft BI Suite (as they are currently packaged in 2010)? ...

Transforming large XML for SQL Server insertion

Hi, Our state government has opened its transport timetable data. The data is in xml based TransXchange standard format. The problem is the data files are huge. The sample data file itself is 300 MB. The good thing is most of the data is redundant and I don't need it for my application. I am wondering what options do I have of insert...

What is the best FREE solution to implement one ETL project in MySql

Hi, What is the best FREE solution to implement one ETL project in MySql? I need to extract for analisys big amount of data, and put the results in other tables. Regards, Pedro ...

Handling primary key duplicates in a data warehouse load

I'm currently building an ETL system to load a data warehouse from a transactional system. The grain of my fact table is the transaction level. In order to ensure I don't load duplicate rows I've put a primary key on the fact table, which is the transaction ID. I've encountered a problem with transactions being reversed - In the transac...

SQL Server 2008: Disable index on one particular table partition

I am working with a big table (~100.000.000 rows) in SQL Server 2008. Frequently, I need to add and remove batches of ~30.000.000 rows to and from this table. Currently, before loading a large batch into the table, I disable indexes, I insert the data, then I rebuild the index. I have measured this to be the fastest approach. Since rece...

a bit confuse, does SSAS stored ETL result?

a bit confuse round here, does SSAS stored ETL result? or is it SSAS connect directly to the production databases? ...

Can you safely rely upon Yahoo Pipes to offload ETL for your application?

Yahoo Pipes are a very intriguing choice for sort of a poor-man's server-free ETL solution, but would it be a good idea to build an application around one or many Pipes? I've really only used them for toy things here and there, with the only thing I've used longer than a week or two being one amalgamated and filtered RSS feed that I've ...

SSIS 2005 - How to Import a Fixed Width Flat File?

I have a flat file that looks something like this: junk I don't care about \n \n columns names\n val1 val2 val3\n val1 val2 val3\n columns names \n val1 val2 val3\n I only care the lines with values. These value lines are all fixed width format and have the same line length. The other junk lines and column names c...