Data Warehouse Issues
There are certain issues surrounding data warehouses that companies need to be prepared for. A failure to prepare for these issues is one of the key reasons why many data warehouse projects are unsuccessful. One of the first issues companies need to confront is that they are going to spend a great deal of time loading and cleaning data.
Some experts have said that the typical data warehouse project will require companies to spend 80% of their time doing this. While the percentage may or may not be as high as 80%, one thing that you must realize is most vendors will understate the amount of time you will have to spend doing it. While cleaning the data can be complicated, extracting it can be even more challenging.
Not matter how well a company prepares for the project management, they must face the fact that the scope of the project will probably be longer then they estimate. While most projects will begin with specific requirements, they will conclude with data. Once the end users see what they can do with the data warehouse once its completed, it is very likely that they will place higher demands on it. While there is nothing wrong with this, it is best to find out what the users of the data warehouse need next rather than what they want right now. Another issue that companies will have to face is having problems with their systems placing information in the data warehouse.
When a company enters this stage for the first time, they will find that problems that have been hidden for years will suddenly appear. Once this happens, the business managers will have to make the decision of whether or not the problem can be fixed via the transaction processing system or a data warehouse that is read only. It should also be noted that a company will often be responsible for storing data that has not be collected by the existing systems they have. This can be a headache for developers who run into the problem, and the only way to solve it is by storing data into the system. Many companies will also find that some of their data is not being validated via the transaction processing programs.
In a situation like this, the data will need to be validated. When data is placed in a warehouse, there will be a number of inconsistencies that will occur within fields. Many of these fields will have information that is descriptive. When of the most common issues is when controls are not placed under the names of customers. This will cause headaches for the warehouse user that will want the data warehouse to carry out an ad hoc query for selecting the name of a specific customer. The developer of the data warehouse may find themselves having to alter the transaction processing systems. In addition to this, they may also be required to purchase certain forms of technology.
One of the most critical problems a company may face is a transaction processing system that feeds info into the data warehouse with little detail. This may occur frequently in a data warehouse that is tailored towards products or customers. Some developers may refer to this as being a granular issue. Regardless, it is a problem you will want to avoid at all costs. It is important to make sure that the information that is placed in the data warehouse is rich in detail.
Many companies also make the mistake of not budgeting high enough for the resources that are connected to the feeder system structure. To deal with this, companies will want to construct a portion of the cleaning logic for the feeder system platform.
This is especially important if the platform happens to be a mainframe. During the cleaning process, you will be expected to do a great deal of sorting. The good news about this is that the mainframe utilities are often proficient in this area. Some users chosoe to construct aggregates within the mainframe since aggregation will also require a lot of sorting. It should also be noted that many end user will not use the training that they receive for using the data warehouse. However, it is important that the be taught the fundamentals of using it, especially if the company wants them to use the data warehouse frequently.