Hi
I am looking for ideas to populate a fact table in a data mart. Lets say i have the following dimensions
Physician
Patient
date
geo_location
patient_demography
test
I have used two ETL tools to populate the dimension tables- Pentaho and Oracle Warehouse Builder. The date, patient demography and geo locations do not pul...
Looking for any recommendations for an ETL system for 200+ distributed systems (Windows, AS400, Linux etc).
We collect data each month from all of our customers (regardless of system type), bring it back, process it all together and send the aggregate solutions back to them. I'm tasked with automating this system - any suggestions on h...
I would like to be able to produce a file by running a command or batch which basically exports a table or view (SELECT * FROM tbl), in text form (default conversions to text for dates, numbers, etc are fine), tab-delimited, with NULLs being converted to empty field (i.e. a NULL colum would have no space between tab characters, with appr...
Hi!
I want to populate a star schema / cube in SSIS / SSAS.
I prepared all my dimension tables and my fact table, primary keys etc.
The source is a 'flat' (item level) table and my problem is now how to split it
up and get it from one into the respective tables.
I did a fair bit of googling but couldn't find a satisfying solution to...
Background:
I have a PostgreSQL (v8.3) database that is heavily optimized for OLTP.
I need to extract data from it on a semi real-time basis (some-one is bound to ask what semi real-time means and the answer is as frequently as I reasonably can but I will be pragmatic, as a benchmark lets say we are hoping for every 15min) and feed it...
I am using Talend to populate a data warehouse. My job is writing customer data to a dimension table and transaction data to the fact table. The surrogate key (p_key) on the fact table is auto-incrementing. When I insert a new customer, I need my fact table to reflect the id of the related customer.
As I mentioned my p_key is auto auto...
I’m looking for some feedback on mechanisms to batch data from MySQL Community Server 5.1.32 with an external host down to an internal SQL Server 05 Enterprise machine over VPN. The external box accumulates data throughout business hours (about 100Mb per day), which then needs to be transferred internationally across a WAN connection (qu...
In short, I have a 20,000,000 line csv file that has different row lengths. This is due to archaic data loggers and proprietary formats. We get the end result as a csv file in the following format. MY goal is to insert this file into a postgres database. How Can I do the following:
Keep the first 8 columns and my last 2 columns, to hav...
We're about to make data transformation from one system to another using SSIS. We are four people people who will continuously be working on this for two years and therefore we need some sort of versioning system. We can not use team foundation. We're currently configuring a SVN server, but digging into it I've seen some big risks.
It s...
Hi,
Being fairly new to SSIS and the ETL process, I was wondering if there is anyway to loop though a record set within a DataFlowTask and pass each row (deriving parameters from the row) into a Stored Procedure (the next step in the ETL phase). Once i have passed the row into the stored procedure, I want the results from each iteratio...
What's the most efficient method to load large volumes of data from CSV (3 million + rows) to a database.
The data needs to be formatted(e.g. name column needs to be split into first name and last name, etc.)
I need to do this in a efficiently as possible i.e. time constraints
I am siding with the option of reading, transforming an...
Hello All - I have a task to import/transform and extract zipped binary files that contain both text data as well as embedded binary data. Within the data is data that is relational in nature and needs to be processed into a defined database structure. Currently I have a C# single threaded app that essentially grabs all the files from th...
What are the main differences between the Pentaho BI Suite and JasperSoft BI Suite (as they are currently packaged in 2010)?
...
Hi,
Our state government has opened its transport timetable data. The data is in xml based TransXchange standard format.
The problem is the data files are huge. The sample data file itself is 300 MB.
The good thing is most of the data is redundant and I don't need it for my application. I am wondering what options do I have of insert...
Hi,
What is the best FREE solution to implement one ETL project in MySql?
I need to extract for analisys big amount of data, and put the results in other tables.
Regards,
Pedro
...
I'm currently building an ETL system to load a data warehouse from a transactional system. The grain of my fact table is the transaction level. In order to ensure I don't load duplicate rows I've put a primary key on the fact table, which is the transaction ID.
I've encountered a problem with transactions being reversed - In the transac...
I am working with a big table (~100.000.000 rows) in SQL Server 2008. Frequently, I need to add and remove batches of ~30.000.000 rows to and from this table. Currently, before loading a large batch into the table, I disable indexes, I insert the data, then I rebuild the index. I have measured this to be the fastest approach.
Since rece...
a bit confuse round here, does SSAS stored ETL result? or is it SSAS connect directly to the production databases?
...
Yahoo Pipes are a very intriguing choice for sort of a poor-man's server-free ETL solution, but would it be a good idea to build an application around one or many Pipes? I've really only used them for toy things here and there, with the only thing I've used longer than a week or two being one amalgamated and filtered RSS feed that I've ...
I have a flat file that looks something like this:
junk I don't care about \n
\n
columns names\n
val1 val2 val3\n
val1 val2 val3\n
columns names \n
val1 val2 val3\n
I only care the lines with values. These value lines are all fixed width format and have the same line length. The other junk lines and column names c...