data-warehouse

How to index a table with a Type 2 slowly changing dimension for optimal performance

Suppose you have a table with a Type 2 slowly-changing dimension. Let's express this table as follows, with the following columns: * [Key] * [Value1] * ... * [ValueN] * [StartDate] * [ExpiryDate] In this example, let's suppose that [StartDate] is effectively the date in which the values for a given [Key] become known to the system. ...

What to name column in database table that holds versioning number

I'm trying to figure out what to call the column in my database table that holds an INT to specific "record version". I'm currently using "RecordOrder", but I don't like that, because people think higher=newer, but the way I'm using it, lower=newer (with "1" being the current record, "2" being the second most current, "3" older still, an...

Netezza, Teradata, DB2 Parallel/Enterprise, ... versus Hadoop or others?

I'm looking at building some data warehousing/querying infrastructure, right now on top of Map/Reduce solutions like Hadoop. However, it strikes me that all the M/R work is just repeating what the RDBMS guys have solved for the last 20 years with parallel SQL databases. Parallel SQL implementations scale reads and writes across nodes, j...

Continuous Integration with Oracle Products

Hi, I'm currently working on a Datawarehouse project using an Oracle Database, Oracle Data Integrator, Oracle Warehouse Builder and some Jython thrown in for good measure. All of which is held within TFS. My background is .net and prior to this project was seeing a lot of promise in CI. I'm not suggesting that the testing element of CI...

When is data erased from the OLAP DB?

I am new to OLAP. I understand the table structure and ETL process. I don't understand when data is supposed to be deleted from the fact table. Say I'm creating a reporting application for events. each event has the duration it took to complete, the exit code and total bytes read. There are several dimensions, e.g. time and location. ...

Stored procedures vs JDO for data warehousing project

Hi there, In the old days we used to access the database through stored procedures. They were seen as `the better' way of managing the data. We keep the data in the database, and any language/platform can access it through JDBC/ODBC/etc. However, in recent years run-time reflection/meta-data based storage retrieval mechanisms such as...

Size of SQL Server tempDB for Data Warehouse

Is there an expected size of tempDB for a Data Warehouse application. Is 10 gigs excessive? It's hit by frequent large queries. The Data Warehouse itself is 50 gigs. I'm using SQL Server 2000 ...

How do you implement Data Quality & Validation rules in a data warehouse?

I'm developing a datawarehouse to be part of my company's enterprise application suite. So I've been learning a lot about DW concepts but the rules engine seems difficult and I can't find much information about various ways to implement. The focus of the rules is to validate data quality, and also alert when certain business metrics ar...

Video Tutorials for Ab Initio ETL Data Ware housing Tool !

Hi All, Please tell me where I can find video tutorials of Ab-Initio ETL Data Ware Housing tool. I surfed in google but i did not find any materials. Thanks in Advance. ...

Strategy for Reporting on JMS Activity

The answer to this question may be obvious to someone with more experience in data-warehousing and BI, but I am looking for some guidance. I'm building a system that uses multiple JMS queues to process millions of messages per day. I need visibility into the activity of these queues, so that I can create reports like..."Yesterday at 11...

What is a manhattan database?

A friend of mine was interviewing for a data warehouse and Business Object role But he was asked about the Manhattan database? I have Googled "Manhattan database" and even searched for it on Bing and Yahoo but have found no relevant information. Any help would be greatly appreciated! ...

Please explain about AbInitio recovery file(.rec)?When should we roll back the file?

Hi All, Please tell the concept of AbInitio recovery file. When the Abinitio graph fails in execution which cases should we rollback the recovery file and in which cases we shouldnt rollback the recovery file. Please provide links for any AbInitio materials. Thanks. ...

Parition a table across multiple physical nodes

Hello, So I'm currently working on a project that involves the collection and storing of some huge datasets (as far as what I'm used to working with). The data essentially consists of meta information, and then actual values (where the values are trended over time). The meta information itself is relatively large, but nothing huge, I w...

True or False: Good design calls for every table to have a primary key, if nothing else, a running integer

Consider a grocery store scenario (I'm making this up) where you have FACT records that represent a sale transaction, where the columns of the Fact table include SaleItemFact Table ------------------ CustomerID ProductID Price DistributorID DateOfSale Etc Etc Etc Even if there are duplicates in the table when you consi...

Fact table with multiple facts

I have a dimension (SiteItem) has two important facts: perUserClicks perBrowserClicks however, within this dimension, I have groups of values based on an attribute column (let's call the groups AboveFoldItems, LeftNavItems, OnTheFlyItems, etc.) each have more facts that are specific to that group: AboveFoldItems: eyeTime, loadTime...

How do you verify the correct data is in a data mart?

I'm working on a data warehouse and I'm trying to figure out how to best verify that data from our data cleansing (normalized) database makes it into our data marts correctly. I've done some searches, but the results so far talk more about ensuring things like constraints are in place and that you need to do data validation during the E...

Free data warehouse - Infobright, Hadoop/Hive or what ?

I need to store large amount of small data objects (millions of rows per month). Once they're saved they wont change. I need to : store them securely use them to analysis (mostly time-oriented) retrieve some raw data occasionally It would be nice if it could be used with JasperReports or BIRT My first shot was Infobright Community - ...

Empty data problem - data layer or DAL?

I designing the new App now and giving the following question a lot of thought. I consume a lot of data from the warehouse, and the entities have a lot of dictionary based values (currency, country, tax-whatever data) - dimensions. I cannot be assured though that there won't be nulls. So I am thinking: create an empty value in each of ...

Ideas on Populating the Fact Table in a Data Mart

Hi I am looking for ideas to populate a fact table in a data mart. Lets say i have the following dimensions Physician Patient date geo_location patient_demography test I have used two ETL tools to populate the dimension tables- Pentaho and Oracle Warehouse Builder. The date, patient demography and geo locations do not pul...

Insert into a star-schema

I've read a lot about star-schema's, about fact/deminsion tables, select statements to quickly report data, however the matter of data entry into a star-schema seems aloof to me. How does one "theoretically" enter data into a star-schema db? while maintaining the fact table. Is a series of INSERT INTO statement within giant stored proc ...