SAP BODS: Data Transfer Transform

v This transform writes the data from a source or the output from another transform into a transfer object and subsequently reads data from the transfer object.

v The transfer type can be a relational database table or file.

v We can use the Data_Transfer transform to push down operations to the database server when the transfer type is a database table.

v We can also can push down resource-consuming operations such as joins, GROUP BY, and sorts using this transform.

v Data transfer transformation is used extensively to optimize the performance of the job by pushing down operations to the Database level or Operating system level or Cache level.

v Data transfer transformation insures the operations for ‘push down’. This is performed by the underlying database instead of Data services engine if the transfer type is a database table.

v Data transfer transformation creates a temporary Database table that is used as staging table for operation, thus insuring the push down.

v By pushing down operations to the source database, Data services reduce the no. of rows and operations that the engine must retrieve and process, which improves the performance.

v A Data transfer transform automatically divides the data flow into sub data flows and executes serially.

The transform object Data_Transfer is a pure optimization tool helping you to push down resource-consuming operations and transformations like JOIN and GROUP BY to the database level.

Thus the two sources will be from same data store and operations are pushed down to database server automatically.

Example:

Let us take simple table to perform SUM( ) operation and Group By.

Source Table:

Then the below table resides in a SQL Server.

Database Name: SQL_KABIL_DB

Table Name: CUSTOMER_NAME

And Below table also resides in SQL Server. But in another database

Database Name: DS_DB

Table Name: EXPENSES

By using the above two tables I’m going to perform DATA TRANSFER Transform. Let us see in detail:

Step 1: Create a New Batch Job from your project Area in SAP BODS and Here I named as Data_Transfer_JOB_1.

Step 2: Drag and Drop Work Flow. Here I named it as WF_Data_transfer

Step 3: Drag and Drop Data Flow. Here I named it as DF_Data_Transfer_Transform

Step 4: Now, Drag and Drop the Source Tables ( CUSTOMER_NAME, EXPENSES) and make it as a Source.

Step 5: Drag and Drop Data Transfer Transform from Transform tab by Click Data Integrator and Choose Data Transfer.

Step 6: Double-Click a Data Transfer, a pop up window will appear and Provide details as shown as below:

GENERAL:

Enable Transfer: Enables or disables the execution of the Data_Transfer transform. It is selected by default. You might want to disable this transform if you are tuning performance and you want to see the effect of the Data_Transfer transform.

Note

When you run the job in debug mode, Data Services automatically disables all Data_Transfer transforms

Transfer type:

Choose one of the following transfer types to temporarily store the data of each sub data flow:

1. Table: Database table from an existing datastore. If you choose this type, specify the Table options below (Table name, Database type, and Array fetch size).

2. File: A flat file. If you choose this type, specify the File options below (Root directory and File name).

3. Automatic: The Data Services optimizer chooses the transfer type from:

· Your datastores that selected the Enable automatic data transfer check box, or

· The pageable cache directory that you specify in the Server Manager.

The Data Services optimizer chooses the transfer type and location that could provide the optimal performance, based on subsequent operations that the data flow contains.

For example, if an ORDER BY follows the Data_Transfer transform, the optimizer might pick the database datastore that contains the data so that the ORDER BY can be pushed down to the database.

If the data flow does not contain an ORDER BY, GROUP BY, DISTINCT, join, or any expression that can be pushed down, the Optimizer chooses the pageable cache directory. If multiple files are available (one on each job server in a server group), the optimizer chooses the directory that is local to the data flow process.

Join rank: Indicates the rank of the output data set relative to other tables and files joined in a data flow. The software joins sources with higher join ranks before joining sources with lower join ranks.

Join rank specified in the Query transform editor FROM tab overrides any join rank specified in a source. For new jobs, specify the join rank only in the Query transform editor.

Must be a non-negative integer. Default value is 0.

FILE OPTIONS:

Root Directory: The name of the root directory that will contain the file to use for transfer. If your default Job Server and Designer reside on the same computer, you can use the browse button (ellipses) to find the Root directory. If your default Job Server does not reside on your local computer, you must manually enter the path to your Root directory.

You can use a global variable or parameter for the pathname.

File Name: Name of the flat file that you want to use as transfer for sub data flows. The file does not need to exist.

Table Options:

Table Name: Name of the database table that you want to use as transfer for sub data flows. Specify the table name with the following format:

<datastorename.ownername.tablename>

You can click the browse button (ellipses) to display your datastores. Select a table name from the list or type in the name of a new table.

Database Type: Name of the Database

Fetch Size: Indicates the number of rows retrieved in a single request to a source database. The default value is 1000. Higher numbers reduce requests, lowering network traffic, and possibly improve performance. The maximum value is 5000.

This option is available for source tables from DB2, Informix, ODBC, Oracle, and SQL Server datastores.

When retrieving a column with an Oracle long data type, Data Services automatically sets Array Fetch Size to 1. If a column has an Oracle long data type, Data Services can only retrieve one row at a time.

Step 7: Now Drag and Drop the Query Transform and make a join between the CUSTOMER_Name Source table and Data Transfer.