Data warehousing checklist

By S Lakshmi Narasimhan | 19 Jul 1999

1

This checklist strives to be comprehensive, but it may not be complete, since technology in this area has been advancing rapidly. You will need to update yourself regularly. We shall try to keep you updated about new developments in this area. For the present, here’s what we can suggest:

  • Be clear why you want a data warehouse. That will clarify how much of the available data needs to be integrated into the warehouse. It will help if you ask the question: what information needs do we not need the warehouse to cater to. That will help you limit the scope of the project. Remember, there is no merit in size for the sake of size. A large size that is not required can become cumbersome and inefficient. Also, the more data you seek to integrate, the more it will cost you, and the more complex it will be for people to use.
  • Managers in your organisation can only give a rough idea as to the sort of queries that they would normally make. Only after the system is up and running, can they get an idea as to the real capabilities of the data warehouse. Therefore, carefully analyze the types of queries that they would normally ask from the data warehouse. The existing method of MIS can be used as a starting point for configuring these queries.
  • How will users query the data? Will they use a temporary tool or structured query language or through a screen built with the help of a program? Will it be combinations of these?
  • Having decided what you want, go about the project in phases, unless it is very small, and can be implemented in one shot. That means you must be clear about the priorities. It will also focus the organisation’s effort to gather and collate all the data that is required to run the system when it is up.
  • The data warehousing consultants will advise you on how to link the various blocks of data resident in various departments, divisions and branches, and you expedite the process by having a complete inventory of what kind of data is available and necessary for the desired decisions to be taken.
  • In short, you need data about your data. That’s called metadata. Metadata gives you information about what will go into the data warehouse, where each item of data comes from, the conversions required for it to be stored in the warehouse, the definition and meaning of each data item, and so on. Without metadata, there will be confusion.
  • Don’t worry about the complexity and tediousness of this task. are available, which will relieve you of this burden.
  • One of the trickiest and most common problem companies face is that of diverse definitions. To give you a simple example, the marketing department may call your buyer a client, and the finance department may call the same party a customer. This may be fine when you sit in a management committee meeting and discuss the subject. Computers are not so smart. For the computer you need to ensure that across the organisation all database files define the customer either as a customer or as a client – not both.
  • These problems are worse where companies have computerised different departments or branches at different times using different computer systems and consultants. It’s the worst when two organisations merge, and standardisation of file structures and definitions become a nightmare.
  • If you think that can be solved simply by doing a kind of "find-and-replace" operation across the board, forget it. You might find that for one purpose sales have been defined as a gross value including excise duty, and for another as net of excise, and for a third, as a value minus the dealer commission. You will have to sort this out -- what the consultants will probably call .
  • Simultaneously, you will have to plan the computer system that you will need to run the data warehouse. Click on the links if you want to know more about the jargon the consultants will throw at you when you are doing this evaluation - , ,
  • While taking the purchase decision, it might be worth your while to check with others who have implemented a data warehouse using the tools that have been recommended to you. Don't depend entirely on vendor/dealer suggestions. Also, try and avoid a new product that is not time-tested.
  • Ensure that the products that you buy will be with one another. And that they will be compatible with future products you are likely to buy. There should be seamless integration of all the platforms you use.
  • Very important: build the system testing time and the learning curve of the staff into your project plan.
  • From here, you can go about integrating, migrating, and cleaning the data. ( about it)
  • Optimise your database. This process is called . Use CASE tools wherever necessary for designing the data store and the main data warehouse databases.
  • Ensure that you manage disk space and memory well.
  • Work out your ‘loading strategy’. This will determine how often you are going to transfer data from the operational database to the data warehouse database. (to find out more about this.
  • Based on the proportion of critical data that you have, develop a and recovery strategy.
  • At this stage, run thorough tests on the environment. All the stages that have been mentioned above should be tested and should work efficiently when real-time querying is started. Utilities such as uninterrupted power supply, modems, etc., should be working efficiently.
  • Be aware that data warehouse design is always iterative. There is no single solution that will be effective always. Try different techniques while designing and developing the tables. Never rely on gut feel. Create benchmarks. If there are some queries that are posing problems, or take a long time to run, separate them for further optimisation.
  • Install a data mining package if you consider necessary. Data mining is the process of converting data into information. Information is defined as processed data. There are several data mining tools such as SAS, Knowledge Seeker from Angoss, Intelligent Miner from IBM and Darwin from Thinking Machines. These programs ascertain hitherto unknown facts, correlations and relationships that lie hidden in the data. In India, the use of data mining tools is just about beginning.



 

Business History Videos

History of hovercraft Part 3...

Today I shall talk a bit more about the military plans for ...

By Kiron Kasbekar | Presenter: Kiron Kasbekar

History of hovercraft Part 2...

In this episode of our history of hovercraft, we shall exam...

By Kiron Kasbekar | Presenter: Kiron Kasbekar

History of Hovercraft Part 1...

If you’ve been a James Bond movie fan, you may recall seein...

By Kiron Kasbekar | Presenter: Kiron Kasbekar

History of Trams in India | ...

The video I am presenting to you is based on a script writt...

By Aniket Gupta | Presenter: Sheetal Gaikwad

view more