MisterTootor M.S., B.S., A.S., A.S.B
Assume you need to extract data from a source flat file, transform the data by filtering and transforming certain columns, and then load the transformed data into a target database table.
1. Extract Data
​
Source Data: You have a flat file (CSV) with customer data, customers.csv, which has columns like Customer_ID, Customer_Name, Country, and Amount.
-
Step-by-Step Example: You have a flat file (CSV) with customer data, customers.csv, which has columns like Customer_ID, Customer_Name, Country, and Amount
​
Example customers.csv:
Customer_ID, Customer_Name, Country, Amount
1, John Doe, USA, 500
2, Jane Smith, UK, 700
3, Emma Brown, USA, 600
4, Liam Johnson, Canada, 800
2. Create a DataStage Job:
Open DataStage Designer:
-
Create a new job: In DataStage Designer, create a new job of type “Parallel” or “Server” (depending on your version and needs).
-
Drag and Drop Stages: DataStage works by dragging and dropping stages into the job design canvas.
-
Source Stage: Drag a "Sequential File" stage and configure it to point to the customers.csv file.
-
Transformer Stage: Drag a "Transformer" stage to transform the data.
-
Target Stage: Drag a "Database" stage (for example, an Oracle or SQL Server database stage) to load data into the target table.
-
Configure the Stages:
-
Source Stage:
-
Set the file path of the customers.csv.
-
Define the schema (columns: Customer_ID, Customer_Name, Country, Amount).
-
-
Transformer Stage:
-
Add a few simple transformations like:
-
Filter customers based on the country (e.g., only USA customers).
-
Modify the Amount column by adding a 10% increase for all rows.​
-
-
Example Expression in Transformer Stage:
If Country = 'USA' Then Amount * 1.10 Else Amount
-
Configure the target database connection (e.g., an SQL Server table transformed_customers).
-
Define the target schema to match the source columns after transformation.
-
Target Stage: ​
3. Data Flow in the Job:
-
The Source Stage reads the data from the flat file.
-
The Transformer Stage processes the data:
-
Filters customers from the USA.
-
Increases their Amount by 10%.
-
-
The Target Stage writes the transformed data to the database.
4. Run the Job:
Once the job design is completed, compile and run it. DataStage will:
-
Extract data from customers.csv.
-
Apply the transformations (filtering USA customers and adjusting the Amount).
-
Load the transformed data into the transformed_customers table in the target database.
5. Example Output (Database Table):
After running the job, the transformed_customers table will look like:
Customer_ID Customer_Name CountryAmount
1 John Doe USA 550
3 Emma Brown USA 660
Lessons Learned:
This simple DataStage job demonstrates the core ETL process: extracting data from a source, transforming it with a filter and calculation, and loading it into a target system. DataStage jobs can get more complex with additional stages like Lookups, Aggregators, Joins, etc., for more advanced processing.