A colleague sent me a problem he was having with loading data into ORACLE while preserving parent-child relationships.. Here was my response:
From what I understand, the incoming data has no sequencing or relational values embedded into it? E.g. a field which could designate the parent-child relationship?
Embedded relational fields
If there are relational fields, I would just load the data into temp tables, and then post process and assign the sequence numbers (which _you_ want) based on the previous data set’s relational values.
No embedded relational fields
If there are _not_ relational fields, then I am assuming that the ‘proximity’ or ‘sequencing’ in the file is actually designating the parent-child relationship. In that case, your data might look something like:
Where 01 designates a dept record and 22 designates an employee who belongs in that department.
Because Joe Smith is immediately following the MFG DEPT record, the parent child relationship is assumed. Is that correct?
(you specified type 2 and type 3, which I am assuming that you have one further level of embedded hierarchy)
My assumption is that you never know how many ‘child’ records you will see? In this case, it could be 0-N DEPT_EMP records for one DEPT record.
If you always knew that there were a fixed number of child records, you COULD do something miserable such as concatenating multiple physical records into one logical record. This would keep ‘continuity’ from the parent to the child by slapping all those into one long record. You would then proceed to post-process that combined table, splitting each record into their destination tables using PL/SQL (and using PL/SQL’s mechanism to retain the last sequence number, to ensure the parent’s unique sequence number is applied to the child records).
ASSEMBLING LOGICAL RECORDS FROM PHYSICAL RECORDS
If you can’t guarantee that, you could use sqlloader’s sequence facility to give each record a unique number which provides the order in which the records were loaded. You could then post process this by running through the temp load tables and using the sequence number to find which child records were just after the parent record.
For example, the data above would be loaded as:
101 JOE SMITH
102 SAM SPADE
104 JILL BROWN
105 MARTY SPRAT
106 CRAIG JONES
108 RICK PRICE
The one point to note here is that if a row in the INFILE is rejected, that sequence number is skipped. E.g. if SHPDEPT’s DEPT record was rejected, you would not see 103 anywhere in either table. This may throw off your logic, if you are not careful. E.g. you could think ‘I will just process the DEPT table, find each record, and find all child records between it and the next record in DEPT’. If a dept record got rejected, then you would erroneously allocate the DEPT_EMP records (either to the wrong dept, or not allocate them at all).
Basically, that code would look like a giant loop which would run through all the sequence numbers, and thereby create the parent-child relationships. You would retain the ‘last parent I encountered’ so you could assign child records to belong to the parent record. Again, you would need to look through the SQLLOADER logs to make sure no records were rejected, otherwise that loop would incorrectly assign parent-child relationships without more intricate ‘am I missing a sequence number’ logic.