Thursday, 3 March 2016

Apply SCD without using SCD component and by just utilizing tmap on any database in talend

In talend we generally face problem while implementing scd on the database for which we don't have specific SCD component . but this we can can achieve by only using tmap and other component and using variables i will discuss how we can achieve that. As i am using Amezon Redshift as database which is hosted on cloud.

Below image shows overall design of the job:
basic design
as you can see on the picture i have taken input from the stage using SQL and handled all columns for nulls so that we dont get any null values causing null pointer exception.
added all lookup from all dimensions to get the dimension keys and have also added the dimension which we are going to populate in the lookup to compare the existing keys with incoming keys.

then you can see i have using two redshiftoutputbulkexec to  the output from tmap and bith will create two temp tables one will will containing all new incoming rows  with new business keys and other will contain updated rows with old business keys.

lets go to tmap now

tmap
as you can see above image in tmap where i have created a left outer join on stage dimension table with the stage input on business keys in which while doing left outer join if new business key is in input then it will create null values in right table for that row and that value will will utilize to find new and old records.

after join will create a variable which i have created with name ETL_UPD_IND  which i am showing in below image as hoe i generate value for that variable



as you can see in expression builder i check for as after doing left join with dimension table on the business key is the value in null then I else U as variable value and as also you can see i have two inputs one is insertNew and other is updated(which is not visible in image). for all new records  i have mapped stage input wuth insertNew and have filtered for veriable ETL_UPD_IND as I so all new  recors will go in that output and will create  a temp table with those rows. similarly i have mapped output "Updated" with stage input  and have filtered with condition as veriable ETL_UPD_IND as U so all matching keys with rows will go in that that output and will create another temp table with updated rows

then as you seen in the first picture i have added row component after each redshiftoutputbulkexec

1.first row component RR_Insert_new_rows:- there you can directly write SQL as insert into dimension select start from temp that is for new rows insertions

2. second row component you can utilise for implementing scd 1 or scd2 . if implementing scd 1 then directly write a update statement to update all fields except business keys and surrogate keys and current indicator flags by inner joining temp table and dimension on business keys or if you want to implement SCD2 then first write a update statement by inner joing dimension and temp on business keys which updated indicator flags and closes the record and and the write an iunsert statement which inserts all records in temp into the dimension table with indicater flag as active and you have achieved the scd 1 and 2 using tmap.

post your comment and feedback about this post and also comment your questions if any.
thanks.

6 comments:

  1. Hi Bipin,

    Do you happen to have this job, I am trying to explore talend and came across your blog post. I am not able to replicate what you have done.

    Can you help me please.

    ReplyDelete
  2. can anyone please send me the all scd's job zip files in real time

    ReplyDelete
  3. could you send the above job archive to nhasibuan@gmail.com

    ReplyDelete
  4. Could you please share the above the job to my mail id abhilash.tiwari@the3i.com

    I am trying this but not able to fully implement the design.

    Thanks

    ReplyDelete
  5. Please share two scenarios on tflowtoiterate and titeratetoflow.

    ReplyDelete
  6. Can it be possible make it generic, we are having existing Abintio code on which multiple tables are loaded with SCD2 logic from single graph (graph is a generic one)

    ReplyDelete