Weblog

Process Control & Delta Detection in Oracle Warehouse Builder

Some weeks ago we have implemented delta detection from the source systems by using Oracle functions and procedures in Oracle Warehouse Builder (OWB). The situation is as follows: every run you only want the new or changed records to be processed in your mapping (most likely a staging area mapping), and you want OWB to figure out from which point on your next delta selection should be. For this example you should already have the following operators (objects) in your mapping: a source and target table and a filter operator in between. This mapping would simply load records from one table to another depending on the WHERE clause you enter in the filter. And more importantly: this solution requires the existence of an attribute in the source system which can tell you if and when records have been changed or newly entered (mutation datetime attribute).

To add the process control as described above we have to do the following:

  • Create a process control table with the process name, the start date, the end date and optionally a sequence ID.
  • Create a function which looks up the next start date (eg the datetime from which the delta should be selecterd in the source system).
  • Create a procedure which updates the process control table if the mapping has been succesfull. The record in this table for the next run (the one which the above mentioned function uses) will have to be inserted here.
  • Add two constant operators, an expression operator and a post-session operator to your mapping.

The first constant and the expression should be placed before your filter operator. In this constant you can define the name of your mapping. Using this name, you can connect to the expression as an input there for the lookup function.

This function (of course developed in OWB) has an input paramter MAPPING_NAME defined as a varchar:

--initialize variables here
CURSOR LoadDate IS (
        SELECT          MAX(START_DATE) as Start_Date
        FROM             PROCESS_CONTROL_TABLE
        WHERE           Mapping_Name=MAPPING_NAME
        GROUP BY     Mapping_Name);

v_start_date date;

-- main body
BEGIN
   v_start_date := TO_DATE('18500101', 'YYYYMMDD');
   < <recloop>>
FOR RecSet IN LoadDate

LOOP
        v_start_date := RecSet.start_date;
        EXIT recloop;
END LOOP;
        RETURN v_start_date;
END;

As you can see, it selects the highest date for the next run. In your expression operator call this function using the mapping name supplied by your constant. The result will be a date/time attribute which you can drag to your filter expression where you can set that source mutation datetime attribute should be greater than your lookup date time (start_date).

The second part of this process is the post-session operator. Enter your mappingname in your second constant operator, obviously equal to the name in the first constant. Beside that enter a datetime attribute end_date with the expression SYSDATE-1. These two values can be dragged to the post-session operator which should do two things:

first, close off the current process control record, by setting the end date to the sysdate-1
second, insert a new record in which the start_date is sysdate-1

The -1 is a safe margin because during the run of the mapping new record in the source system could be inserted or altered, and you don’t want to miss out on those. This works well, as long as your mapping keeps its run time within a day. But then again; if not you have other problems than process control. Alternatively you can query the OWB repository and select the mapping starttime here.

The procedure that updates the process control table (you will need to create the two input parameters in OWB):

v_end_date                    DATE;
v_mapping_name	  VARCHAR2(30);

BEGIN

v_end_date                    := END_DATE;
v_mapping_name	  := MAPPING_NAME;

UPDATE PROCESS_CONTROL_TABLE SET END_DATE= v_end_date WHERE END_DATE IS NULL AND MAPPING_NAME = v_mapping_name;
INSERT INTO PROCESS CONTROL_TABLE VALUES (v_mapping_name, v_end_date, NULL);

END;

The NULL value in the second statement means: the end_date for the next run is empty. Of course; this will be updated the next time you run this mapping with the datetime from the delta selection.

Lastly two things to keep in mind: you may want to set the post-session operator to run only upon succes (configuration of this operator), because if the mapping fails you probably want to rerun it with the same values. And if you want to have a technical ID in your process control table I would suggest to set it up as described above, create a sequence and a trigger (unfortunately not yet doable in OWB) which insert the NEXTVAL in the ID attribute every time the new record is inserted.

Firefox 1.5

As you probably know by now, Firefox 1.5 has been released. Even the mainstream media is reporting on this new version. The only thing i want to add here is that it includes a fix for the bug that was causing a problem with the menu’s on the IT-eye website. Good to see that fixed. Other improvements include: improved popup blocker, automatic download of updates and support for svg.

Shale the next Struts version?

Today, Shale caught my attentention on the Web. Shale is a new framework from Apache to build web applications in a MVC fashion. Shale is a Struts sub-project and it is a framework that is build on top of the new standard in the web-tier: JSF.

Craig McClanahan initiated Shale with the main idea to let it become Struts 2.0. However, when I read several blogs, eg. David Geary’s blog and read more about the Shale framework, I am convinced that it is not really Struts 2.0., but a total new framework that is build on top of JSF. In my opinion, JSF just puts Struts in a new better ‘jacket’ (For this I agree with David Geary).

I wonder how Shale’s first release will be introduced. When you look now at the Shale homepage, you can extract from the URL that it is still a Struts sub-project.. However, it is good to see that Apache is now, including with MyFaces, heading into the direction of JSF. Because, I consider JSF as the new standard WAF for the next couple of years.

Creating your standard Oracle Warehouse Builder Time Dimension for your Datawarehouse

The following is a nifty trick which is used in many datawarehouses around. In any case: a time dimension is one of the most common objects of a datawarehouse and one with which you can quickly give some new data insights to your customer. If you have some easy to collect facts (e.g. financial figures or amounts) and an easy company hierarchy you can quickly show some trends in time.

To build a time dimension, use the table function supplied by the Oracle DBMS. You can’t develop these from Oracle Warehouse Builder so you need to do that in your SQL client. You can however call these functions from your mapping (surprisingly called ‘table function operator’). What you need to do is to create a function like this one with two input fields (start date and end date) and as output however you want your time dimension to be. For a table function to run, you also need to add the uses database types. In this example its the TIME_DIMENSION_DATA_TABLE type and the TIME_DIMENSION_DATA record type. The code for this is:

CREATE OR REPLACE
TYPE TIME_DIMENSION_DATA AS OBJECT
( Periode_Key  NUMBER      ,
  DAG_NUM      VARCHAR2(15),
  DAG_OMS	   VARCHAR2(15),
  WEEK_NR      NUMBER,
  MAAND_NR     NUMBER(2),
  MAAND_DAGNR  NUMBER,
  MAAND_NAAM   VARCHAR2(15),
  JAAR_DAGNR   NUMBER(3),
  JAAR_JAAR    NUMBER(4)
);

For the record type

And

CREATE OR REPLACE
TYPE TIME_DIMENSION_DATA_TABLE AS TABLE OF TIME_DIMENSION_DATA

For the table type.

The datatypes in the record type and the table function should obviously match and need to be present in order for the table function to run. But then again, you only need to do this once.

The code for the table function itself is as follows:

CREATE OR REPLACE FUNCTION Time_Dimension_Filler
(   start_date  IN NUMBER     -- start date (as number format YYYYYMMDD )
,   eind_date   IN NUMBER     -- end   date (as number format YYYYYMMDD )
)  RETURN TIME_DIMENSION_DATA_TABLE PIPELINED IS
   out_rec TIME_DIMENSION_DATA := TIME_DIMENSION_DATA(NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL);

l_start_date            date;
l_eind_date            date;
l_dummy_date     date;
l_dag_in_week     number(1);
l_weekend              varchar2(1);
l_verwerken          boolean;

BEGIN
   l_start_date := to_date(p_start_date, 'YYYYMMDD');
   l_eind_date  := to_date(p_eind_date , 'YYYYMMDD');

   FOR l_datum_nr  in p_start_date..p_eind_date LOOP

     l_verwerken := TRUE;
     BEGIN
	  l_dummy_date := to_date(to_char(l_datum_nr,'99999999'),'YYYYMMDD');

	 exception
	 when others then
	       l_verwerken := FALSE;
	       l_dummy_dat := null;
	 end;

   if l_verwerken then
	    l_dag_in_week := to_number(to_char(l_dummy_date,'D'),'9');
        if l_dag_in_week in (6,7) then
	      l_weekend := 'J';
		else
		  l_weekend := 'J';
		end if;

        out_rec.Periode_Key            := l_datum_nr;
        out_rec.DAG_NUM                := to_char(l_dummy_date,'DD/MM/YYYY');
        out_rec.DAG_OMS                 := to_char(l_dummy_date,'DAY');
        out_rec.WEEK_NR                 := to_number(to_char(l_dummy_date,'WW'),'99');
        out_rec.MAAND_NR            := to_number(to_char(l_dummy_date,'MM'),'99');
        out_rec.MAAND_DAGNR 	:= to_number(to_char(l_dummy_date,'DD'),'99');
        out_rec.MAAND_NAAM     := to_char(l_dummy_date,'MONTH');
        out_rec.JAAR_DAGNR 	:= to_number(to_char(l_dummy_date,'DDD'),'999');
        out_rec.JAAR_JAAR            := to_number(to_char(l_dummy_date,'YYYY'),'9999');

        PIPE ROW(out_rec);
      END IF;
    END LOOP;  

  RETURN;
END;
/

All that remains is to set the input/ouput field and datatypes correctly in your Table Function Operator, this listens very closely. And you’re done!

The to-be-determined-how-many commandmends for developing in Oracle Warehouse Builder

This post is meant to be upgrade every now and then with new insights. First of all when setting up your own environment for Oracle Business Intelligence, be sure to install Oracle Warehouse Builder (OWB) patch 10.1.0.4 to be able to use OWB with the Oracle 10.2 database. If you don’t you will get an error message saying that your sys as sysdba user doesn’t have enough privileges. Installing OWB on Oracle 10.1 should not cause any problems.

  1. Use the OWB runtime assistant for creating the to-be-used database schema’s for the the targets. The use of the repository assistant for your metadata repository is obligatory.
  2. Think long and hard over your OWB project name and location names, editing these later on causes untold numbers of errors.
  3. Use OWB for development of every PL/SQL function or procedure instead of creating it in SQLPlus or Toad and importing it into OWB. You can not edit the procedures after that so you’re obliged to keep editing them in the other client
  4. Try to keep your mappings (data transformation processes) limited to a single target. You can (of course) have multiple sources but try to design your proces so that only one logical target is used. You can have two instances for the same target if you need a seperate insert and update flow for instance.
  5. A target in your mappings means the end of your process flow. Try to resist the urge to continue from that point on but make a new mapping instead which takes your target as a source for the next mapping. Continuing data tranfsormation processes though a number of targets makes debugging unnecessarily difficult.
  6. Try to be as explicit as possible, it’s very tempting to make solutions in PL/SQL but try to make the most of the functions available in OWB. It may seem like it’s not very efficient but in a good mapping design one can graps the meaning more quickly if your solution is spread over a number of operators (transformations), rather than a dense PL/SQL code.
  7. Make datatype transformations explicit! It may work well for instance to connect a numeric attribute to a varchar attribute technically but it pays to use a (OWB standard) TO_CHAR in every case.
  8. If you made it to the final datamart of star-schema fact table mapping. Be sure to spend some time on your pre and post session procedures. You definitely want to drop all the constraints every update run on your datawarehouse.
  9. The same goes for indexing, use pre and post session commands to drop or create indexes and recalculate statistics.
  10. When using Oracle Workflow in combination with Oracle Warehouse Builder; its wiser to implement point 8 and 9 in the workflow so everyone can watch and know the different sequential steps in the whole workflow. You need to create a Oracle Workflow repository for this from the database installer.

More free express tools

Visual Studio Express Logo

Microsoft just anounced that if you download their express tools before november 7th 2006, you can use them for free. Included are visual web developer, visual basic, visual c#, visual c++, visual j++, and sql server. More info can be found on msdn: Announcing the release of Visual Studio 2005 Express Editions. But wait, there’s more! Sun has also made their Java Studio Creator IDE free.

It’s good to see so many free tools, i’ll be easier for developers to pick the right tool for the job. I’ve tried Java Studio creator in the past, but i think JDeveloper is a better IDE for building JSF applications. I’m curious to see how Visual studio compares to JDeveloper.

Technology
Ben jij slim genoeg voor IT-eye