Category Archives: ORACLE

A SQLLOADER example – preserving parent-child relationships

 

A colleague sent me a problem he was having with loading data into ORACLE while preserving parent-child relationships..  Here was my response:

From what I understand,  the incoming data has no sequencing or relational values embedded into it?  E.g. a field which could designate the parent-child relationship?

Embedded relational fields

If there are relational fields, I would just load the data into temp tables, and then post process and assign the sequence numbers (which _you_ want) based on the previous data set’s relational values.

No embedded relational fields

If there are _not_ relational fields, then I am assuming that  the ‘proximity’ or ‘sequencing’ in the file is actually designating the parent-child relationship.  In that case, your data might  look something like:

01MFGDEPT

22JOE   SMITH

22SAM   SPADE

01SHPDEPT

22JILL  BROWN

22MARTY SPRAT

22CRAIG JONES

01FNCDEPT

22RICK  PRICE

Where 01 designates a dept record and 22 designates an employee who belongs in that department.

Because Joe Smith is immediately following the MFG DEPT record, the parent child relationship is assumed.  Is that correct?

(you specified type 2 and type 3, which I am assuming that you have one further level of embedded hierarchy)

My assumption is that you never know how many ‘child’ records you will see?  In this case, it could be 0-N DEPT_EMP records for one DEPT record.

If you always knew that there were a fixed number of child records, you COULD do something miserable such as concatenating multiple physical records into one logical record. This would keep ‘continuity’ from the parent to the child by slapping all those into one long record.  You would then proceed to post-process that combined table, splitting each record into their destination tables using PL/SQL (and using PL/SQL’s mechanism to retain the last sequence number, to ensure the parent’s unique sequence number is applied to the child records).

ASSEMBLING LOGICAL RECORDS FROM PHYSICAL RECORDS

http://download.oracle.com/docs/cd/E11882_01/server.112/e16536/ldr_control_file.htm#i1005509

If you can’t guarantee that, you could use sqlloader’s sequence facility to give each record a unique number which provides the order in which the records were loaded.  You could then post process this by running through the temp load tables and using the sequence number to find which child records were just after the parent record.

http://download.oracle.com/docs/cd/E11882_01/server.112/e16536/ldr_field_list.htm#i1008320

For example, the data above would be loaded as:

DEPT

100 MFGDEPT

103 SHPDEPT

107 FNCDEPT

DEPT_EMP

101 JOE SMITH

102 SAM SPADE

104 JILL BROWN

105 MARTY SPRAT

106 CRAIG JONES

108 RICK PRICE

The one point to note here is that if a row in the INFILE is rejected, that sequence number is skipped.  E.g. if SHPDEPT’s DEPT record was rejected, you would not see 103 anywhere in either table.  This may throw off your logic, if you are not careful.  E.g. you could think ‘I will just process the DEPT table, find each record, and find all child records between it and the next record in DEPT’.  If a dept record got rejected, then you would erroneously allocate the DEPT_EMP records (either to the wrong dept, or not allocate them at all). 

Basically, that code would look like a giant loop which would run through all the sequence numbers, and thereby create the parent-child relationships.  You would retain the ‘last parent I encountered’ so you could assign child records to belong to the parent record.  Again, you would need to look through the SQLLOADER logs to make sure no records were rejected, otherwise that loop would incorrectly assign parent-child relationships without more intricate ‘am I missing a sequence number’ logic.

Advertisements

Leave a comment

Filed under Data Management, ORACLE