views:

863

answers:

5

I was talking with a co-worker yesterday regarding a situation where he used SSIS (or something like that) to do some really cool thing with an SSIS Package where he passed in a name like "Dr. Reginald Williams, PhD." and based on some weighting scheme the system was smart enough to figure out how to tokenize it and store it in the database as "Salutation- First Name - Last Name - Suffix". He threw out some buzzwords like BI, and SSIS, ETL, and Data mining. I really wanted more information, but didn't even know where to begin to ask.

I'm a .Net developer and thoroughly versed in C#, Vb.Net, WPF, etc..., but I have no idea what these technologies are, how to add them to my skill set, and whether or not it's something that I really should be focusing on. Any and all direction would be helpful.

A: 

Ironic question. You're datamining on datamining.

(Yes, I know 'ironic' is probably not the word I should have used, let's please keep the Morissette to a minimum).

More to the point: from wikipedia, SSIS seems to be a tool for Business Intelligence tasks (and there you have BI) in Microsoft's SQL Server, deeply connected with other BI tools to, e.g. create SSIS modules easily (whatever they're used for).

Adriano Varoli Piazza
Why the downmod? I believe it's good form to at least leave a comment on why did you think it was worth downmodding...
Adriano Varoli Piazza
+2  A: 

SSIS is SQL Server Integration Services and is useful for doing the ETL (Extract, Transform, and Load) that are the front end of many data warehousing/business intelligence solutions that integrate data into easy to use dimensional models. SSIS is also useful for smaller projects as a convenient way to load legacy data or data from other repositories or files.

Data mining usually implies using the data from the integrated sources to infer information that would not be obvious from transactional data (via the integration of multiple sources giving more "dimensions" to the data.

BI is a huge topic so it may not be something to focus on unless you want to get into that field, but SSIS can be useful on smaller projects and is worth learning about in any event.

Turnkey
+9  A: 

SSIS == SQL Server Integration Services and it is an Extract Transform and Load (ETL) tool, it is a far superior implementation of what was Data Transformation Services or DTS in SQL7, SQL2K era. It is a great tool for expressing workflow processes wherein data is moved from point A to point B (and c and d etc) and undergoes changes through that process such as consolidation to a denormalized design or data cleansing.

BI or Business Intelligence is a moniker for a entire category in the tech world and it is a great place to be right now. BI skills are very valued and hard to come by, one of the reasons this is the case is that it is hard to recreate a true BI case in a lab so teaching is almost always done in a real world situation.

From a high level, BI projects usually involve an end point of reporting. Often times as developers we are used to transactional report writing such as the details of a PO but BI can get into very broad reports that cover product sales trends over decades and deal with hundreds of millions of records. The way we design databases for applications is not ideal for this kind of reporting so other tools and technologies were invented and are used in the BI space. These are things like Cubes which you often hear called OLAP cubes. OLAP cubes usually originate from a data warehouse which is nothing more than another database - but typical warehouses contain data that came from more than one, and often dozens of other application databases. Your inventory app, purchasing app, HR app and a whole bunch of others all contain bits and pieces of data that create a complete picture of the business, a BI architect will use something like SSIS to pull the data from all these systems, massage it and store it in the data warehouse which is designed with a different kind of design better for reporting. Once it is in the warehouse he will use Analysis services to create cubes on that data and something like Reporting Services to show you reports over that data.

Edit: sorry, forgot Data Mining, it is another non-specific term that describes and concept or a process and not so much a tool. In a simple example, it is a methodical approach to identifying patterns in data. In the past a good business analysy would look through data for trends but with modern databases you are talking about datasets way too large to manually comb through - Data mining allows you to instruct the computer to comb through that data and identify patterns that are of interest.

Hope that helps

keithwarren7
+2  A: 

What your coworker did might be better described as "intelligent parsing" of a string. That could be done at many levels of sophistication -- for example, using statistical models to give you the likelihood that "Dr." is a salutation and not a first name. Or it could just use a simple lookup list of common salutations, in which case it's just regular procedural code, nothing more.

SSIS is short for SQL Server Integration Services. It's basically DTS on steroids; some people love it, and some people hate it. It'd be tricky to use that by itself to do the kind of thing you're talking about; it's mainly just for taking data from various sources and combining it, transforming it, and loading it somewhere else. It can do some nifty things, many of which tend to be data-mining like, but ultimately it's a production tool for cramming data one direction or another. It isn't particularly well respected in the data mining community.

Data Mining is an entire academic discipline, focused on using some (typically large) quantity of data to either predict future answers or better understand patterns in existing data. It's definitely a great area to get into, but not something you can just pick up and do without some intensive study of math and algorithms. A good book on the subject is this one.

"Business Intelligence" is really more of a buzzword than a specific technology, and can mean different things to different people. At base, the idea suggests doing less dumb stuff with business data, and generally it refers to analysis of trends over time, often using OLAP. It may also include data mining or AI algorithms, but since there's no rigorous definition, just about anybody who wants to sell you something will tell you it offers "Business Intelligence", and hope you don't dig any further.

Ian Varley
A: 

Since this is a Data Mining context, don't you think your friend meant SSAS? (SQL Server Analysis Services, which include Data Mining tools).

ybakos