data-warehouse

Star-Schema Design

Is a Star-Schema design essential to a data warehouse? Or can you do data warehousing with another design pattern? ...

Looking for tools to analyze email data

I made a write-up about how to analyze your gmail account data with ruby. Then on SlashDot, someone told me about mail-trends. Does anyone have other tools to suggest ? ...

Typical Kimball Star-schema Data Warehouse - Model Views Feasible? and How to Code Gen

I have a data warehouse containing typical star schemas, and a whole bunch of code which does stuff like this (obviously a lot bigger, but this is illustrative): SELECT cdim.x ,SUM(fact.y) AS y ,dim.z FROM fact INNER JOIN conformed_dim AS cdim ON cdim.cdim_dim_id = fact.cdim_dim_id INNER JOIN nonconformed_dim AS dim ON d...

Can I have non-measure codes mixed with measures in my fact table?

We're doing a complex bit of data accumulation. Our customer sends us some stuff that includes two dimensions (time and a business unit). Time is mostly year-month. The business unit dimension has just a few attributes: a name, and a few categories to which BU's can belong for reporting and analysis purposes. The stuff they send us i...

How to design a fact table for delivery data

I'm building a data warehouse that includes delivery information for restaurants. The data is stored in SQL Server 2005 and is then put into a SQL Server Analysis Services 2005 cube. The Deliveries information consists of the following tables: FactDeliveres BranchKey DeliveryDateKey ProductKey InvoiceNumber (DD: degenerate dimension)...

Are there any data warehouse frameworks?

I've got a lot of mysql data that I need to generate reports from. It's mostly historic data so it won't be changing much, but it weighs in at 20-30 gigabytes easily and is expected to grow. I currently have a collection of php scripts that will do some complex queries and output csv and excel files. I also use phpMyAdmin with bookmarked...

Map data dimension generated in OWB to database column

Hello All; Iam using Oracle warehouse builder. Created maps,time dimension. Now i need to map my time dimension to the database time coulmn. Any help ASAP is appreciated. Thanks in Advance. Reg, S.Gyazuddin. ...

Merge Facts from Different Sources? Or Load Separately?

We've got data with two different origins: some comes from a customer, some comes from different vendors. Currently, we physically "merge" this data into a massive table with almost a hundred columns, tens of thousands of rows and no formal separation of the two dimensions. Consequently, we can't actually use this table for much. I...

Tuning/Best Practices Inetsoft Style Report BI Tool ?

Is anyone using the business intelligence tool Inetsoft Style Report ? I'm stuck with it and was wondering if anyone has advice on tuning and/or best practices for server admin? We are running on a fast Solaris box using Tomcat with a db2 database. ...

Recommendation for a large-scale data warehousing system

I have a large amount of data I need to store, and be able to generate reports on - each one representing an event on a website (we're talking over 50 per second, so clearly older data will need to be aggregated). I'm evaluating approaches to implementing this, obviously it needs to be reliable, and should be as easy to scale as possibl...

Can you recommend a good source for Teradata Best Practices?

Looks like my data warehouse project is moving to Teradata next year (from SQL Server 2005). I'm looking for resources about best practices on Teradata - from limitations of its SQL dialect to idioms and conventions for getting queries to perform well - particularly if they highlight things which are significantly different from SQL Ser...

20 Billion Rows/Month - Hbase / Hive / Greenplum / What ?

Hi, I'd like to use your wisdom for picking up the right solution for a data-warehouse system. Here are some details to better understand the problem: Data is organized in a star schema structure with one BIG fact and ~15 dimensions. 20B fact rows per month 10 dimensions with hundred rows (somewhat hierarchy) 5 dimensions with ...

Datamart vs. reporting Cube, what are the differences?

The terms are used all over the place, and I don't know of crisp definitions. I'm pretty sure I know what a data mart is. And I've created reporting cubes with tools like Business Objects and Cognos. I've also had folks tell me that a datamart is more than just a collection of cubes. I've also had people tell me that a datamart is a ...

What are some sample questions the professor could ask on my "ETL to EDW" datawarehousing project?

I am doing a project in MySql. At the presentation, I am curious what kind of questions the professor could ask me? If you know the answer, plz post that too. Thanx. ...

Setting up Dim and Fact tables for a Data Warehouse

I'm tasked with creating a datawarehouse for a client. The tables involved don't really follow the traditional examples out there (product/orders), so I need some help getting started. The client is essentially a processing center for cases (similar to a legal case). Each day, new cases are entered into the DB under the "cases" table....

Fact/Dim Table Time Value

I'm setting up Fact and Dim tables and trying to figure out the best way to setup my time values. AdventureworksDW uses a timekey (UID) for each time entry in the DimTime table. I'm wondering there's any reason I shouldn't just use a time value instead i.e. 0106090800 (My granularity is hourly)? ...

Reversing (or undoing) a large load to a warehouse fact table

Currently, we plan to record a "batch id" for each batch of facts we load. That way, we can back out the load in case we find problems. Should we consider tracking the batch id on the dimension rows, also? It seems like dimension rows have different rules. If we treat them as slowly-changing, and use one of the SCD algorithms that ...

SQL Cube Processing Window

I've got Dim Tables, Fact Tables, ETL and a cube. I'm now looking to make sure my cube only holds the previous 2 months worth of data. Should this be done by forcing my fact table to hold only 2 months of data and doing a "full process", or is there a way to trim outdated data from my cube? ...

MDX Calculating Time Between Events

I have a Cube which draws its data from 4 fact/dim tables. FactCaseEvents (EventID,CaseID,TimeID) DimEvents (EventID, EventName) DimCases (CaseID,StateID,ClientID) DimTime (TimeID,FullDate) Events would be: CaseReceived,CaseOpened,CaseClientContacted,CaseClosed DimTime holds an entry for every hour. I would like to write an ...

Moving a SQL 2000 cube to SQL 2005

I have been tasked with moving a cube which is in SQL 2000 to SQL 2005. It would appear that the Dimensions and Measures are all coming from one table. Is this possible in SQL 2005 or do I need to restructure the data into multiple Dim/Fact tables, and is there any way to easily move a SQL 2000 cube into SQL 2005? ...