Hi suppose I have the following Source Table (called S)
Table S:
name Gender Code
Bob 0
Nancy 1
Ruth 1
David 0
And let assume I also have the a lookup values table (called S_gender_values)
Gender_Code Gender_value
0 Male
1 Female
My goal is to create a target tabl...
Using SQL Server Integration Services (SSIS) to perform incremental data load, comparing a hash of to-be-imported and existing row data. I am using this:
http://ssismhash.codeplex.com/
to create the SHA512 hash for comparison. When trying to compare data import hash and existing hash from database using a Conditional Split task (expres...
I've tried reading the Wikipedia article for "extract, transform, load", but that just leaves me more confused...
Can someone explain what ETL is, and how it is actually done?
...
In the next few weeks, my company will be engaging multiple vendors to establish a choice for a common global ETL tool - not necessarily one that can't be broken from, but just where our license investment will go to consolidate those costs. Two of the major players are Talend and Informatica, with others that are unimportant for the sak...
I have an important problem running ETL Process in production environment. While my ETL is running, the OLAP Server turns extremely slowly, I think this is because the ETL is updating several existing rows in the fact table and adding new ones. I tried to avoid this problem having a whole data base replication and ETL writes in DB1 and O...
I need to be able to extract and transform data from a data source on a client machine and ship it off via a web service call to be loaded into our data store. I would love to be able leverage SSIS but the Sql Server licensing agreement is preventing me from installing Integration Services on a client machine. Can I just provide the clie...
I have an OLTP database, and am currently creating a data warehouse. There is a dimension table in the DW (DimStudents) that contains student data such as address details, email, notification settings.
In the OLTP database, this data is spread across several tables (as it is a standard OLTP database in 3rd normal form).
There are curr...
We have a non normalized table that contains foreign key infomration as free text inside a column.
I would like to create a view that will transform and normalize that table.
E.g. a column that contains the following text:
"REFID:12345, REFID2:67890"
I want to create a view that will have REFID1 and REFID2 as 2 separate integer colu...
I need to do a lot of processing on a table that has 26+ million rows:
Determine correct size of each column based on said column's data
Identify and remove duplicate rows.
Create a primary key (auto incrementing id)
Create a natural key (unique constraint)
Add and remove columns
Please list your tips on how to speed this process up ...
We have an SSIS package that runs nightly which takes the backup of a couple of production databases, restores to a staged database, sensitive information is removed and then the backup of this staged database gets restored on another server so that the hyperion guys can run their jobs. The whole process used to take around 4 and half ho...
I'm looking into replacing a bunch of Python ETL scripts that perform a nightly / hourly data summary and statistics gathering on a massive amount of data.
What I'd like to achieve is
Robustness - a failing job / step should be automatically restarted. In some cases I'd like to execute a recovery step instead.
The framework must be ab...
For some reason my MDF file is 154gigs, however, I only loaded 7 gigs worth of data from flat files. Why is the MDF file so much larger than the actual source data?
More info:
Only a few tables with ~25 million rows. No large varchar fields (biggest is 300, most are less than varchar(50). Not very wide tables < 20 columns. Also, no...
After searching stackoverflow.com I found several questions asking how to remove duplicates, but none of them addressed speed.
In my case I have a table with 10 columns that contains 5 million exact row duplicates. In addition, I have at least a million other rows with duplicates in 9 of the 10 columns. My current technique is taking ...
Is the following workflow possible with Informatica Powercenter?
AS400 -> Xml(in memory) -> Oracle 10g stored procedure (pass xml as param)
Specifically, I need to take a result set eg. 100 rows. Convert those rows into a single xml document as a string in memory and then pass that as a parameter to an Oracle stored procedure that is ...
I was looking a report on internet that data warehousing is much lucrative and highly paid IT career. I am talking about technologies like abinitio etl datastage teradata. I work in ASP.net and sql server 05. Is it a good thought to move from web programming to data warehousing technologies. Since I would have no experience with data war...
Can I unit test Informatica Powercentre workflows?
EDIT:
More specifically, can I mock sources and target and test the steps in between? Eg. If I have a workflow with a Oracle source and a text file target can I test it without Oracle and a text file.?
...
I am using the EzAPI to create a dataflow with FlatFile Source
public class EzOleDbToFilePackage : EzSrcDestPackage<EzFlatFileSource, EzFlatFileCM, EzOleDbDestination, EzSqlOleDbCM>
Using the example from http://blogs.msdn.com/b/mattm/archive/2008/12/30/ezapi-alternative-package-creation-api.aspx I am trying to use a flat file source....
The Problem
i'm trying to import data into a table using SQL Server Management Studio's Import Data task. It only brings in 26 rows, out of the original 49,325. (Edit: That's where 99.9% comes from: (1-26/49325)*100 = 99.9%
Using DTS in Enterprise Manager correctly brings all 49,325 rows.
Why is SSMS not importing all rows, reporting ...
Hi all,
I am new to Business Intelligence.
I just got hired by a company in order to complete their websolution, implementing a BI Module. After lot of reading, I think I could get an idea of what a BI Process looks like, you'll find enclose my idea of a BI process.
Can you please tell me if this is a correct vision of the all workflo...
Hi all
I am doing comparison between three open source ETL tools Talend, Kettle and CloverETL.
I could find with no problem Talend and CloverETL's connector list.
But, I cannot find the one for Kettle.
Does someone knows them or where can I find them ?
Thanks a lot,
...