top of page

Assume you need to extract data from a source flat file, transform the data by filtering and transforming certain columns, and then load the transformed data into a target database table.

 

1. Extract Data

​

    Source Data: You have a flat file (CSV) with customer data, customers.csv, which has columns like Customer_ID, Customer_Name,        Country, and Amount.

 

  • Step-by-Step Example: You have a flat file (CSV) with customer data, customers.csv, which has columns like Customer_ID, Customer_Name, Country, and Amount

​

    Example customers.csv:

 

Customer_ID, Customer_Name, Country, Amount
1, John Doe, USA, 500
2, Jane Smith, UK, 700
3, Emma Brown, USA, 600
4, Liam Johnson, Canada, 800

2. Create a DataStage Job:

     Open DataStage Designer:

  • Create a new job: In DataStage Designer, create a new job of type “Parallel” or “Server” (depending on your version and needs).

  • Drag and Drop Stages: DataStage works by dragging and dropping stages into the job design canvas.

    • Source Stage: Drag a "Sequential File" stage and configure it to point to the customers.csv file.

    • Transformer Stage: Drag a "Transformer" stage to transform the data.

    • Target Stage: Drag a "Database" stage (for example, an Oracle or SQL Server database stage) to load data into the target table.

     Configure the Stages:
  • Source Stage:

    • Set the file path of the customers.csv.

    • Define the schema (columns: Customer_ID, Customer_Name, Country, Amount).

  • Transformer Stage:

    • Add a few simple transformations like:

      • Filter customers based on the country (e.g., only USA customers).

      • Modify the Amount column by adding a 10% increase for all rows.​

        Example Expression in Transformer Stage:

If Country = 'USA' Then Amount * 1.10 Else Amount

         
  • Configure the target database connection (e.g., an SQL Server table transformed_customers).

  • Define the target schema to match the source columns after transformation.

                       

  • Target Stage:                                       â€‹

3.   Data Flow in the Job:

  1. The Source Stage reads the data from the flat file.

  2. The Transformer Stage processes the data:

    • Filters customers from the USA.

    • Increases their Amount by 10%.

  3. The Target Stage writes the transformed data to the database.

4.   Run the Job:

 Once the job design is completed, compile and run it. DataStage will:

  • Extract data from customers.csv.

  • Apply the transformations (filtering USA customers and adjusting the Amount).

  • Load the transformed data into the transformed_customers table in the target database.

5. Example Output (Database Table):

After running the job, the transformed_customers table will look like:

   Customer_ID                         Customer_Name                CountryAmount

1  John Doe                                    USA                                    550

3  Emma Brown                             USA                                    660

Lessons Learned:

This simple DataStage job demonstrates the core ETL process: extracting data from a source, transforming it with a filter and calculation, and loading it into a target system. DataStage jobs can get more complex with additional stages like Lookups, Aggregators, Joins, etc., for more advanced processing.

bottom of page