Article | Vectors, Strings and Transform Functions

Hardly, one would find blogs/articles/tutorials on Ab Initio transformation functions. I introduce you to another set of functions which will make you understands vectors in a different way.

Scenario

Suppose, there is data file whose record format has 5 fields, namely A, B, C and D. Let us give them useful meaning, A – student_id, B – student_nm, C – sem_num, D – subject_name, E – marks.

There will be a possibility that combination of A, B remains the same across several records in the data, whereas remaining columns may change. For instance, student Mr. X has 3 subjects in semester 1, whereas Mr. Y has 4 subjects in semester 2 and Mr. Z also having 4 subjects, but in semester 2.

What if there’s a need to consolidate so much information in a single record and present it as a report. This report must have Student Name, Student Id, Semesters covered so far (comma-delimited), Courses taken (pipe-delimited) and Average Marks.

Transformation Function

The following transformation rule uses string_join() and vector_sort_dedup_first() to solve the purpose, i.e. to get an output string of pipe-delimited vector elements.

out.target_field :: string_join(vector_sort_dedup_first(in), "|");

One can create user-defined function which wraps the above functionality, which would be generic enough to be used in any transformation where vectors are involved.

out :: get_delim_vec_ele(input_vec, delim)
begin
   out :: string_join(vector_sort_dedup_first(input_vec), delim);
end;

This function takes 2 arguments, input vector and delimiter to form delimited string of vector elements as output. So, your actual transformation can now look like;

out.target_field :: get_delim_vec_ele(in, "|");

Hope this helps 🙂

Advertisements

taT4AI | Using Implicit Vectors Through String Functions

Heard about Ab Initio? It is an ETL tool for Data Warehousing Projects, or probably much more than that. To know more about the software, please refer the official website – http://www.abinitio.com/

Recently, while coding an XFR, i.e. business transformation logic for some project, I was asked that there’s a field in the input record format, which contains words separated by single space. However, only the second word holds significance. So, how can one extract the exact information?

As I have been dealing with vectors in writing XFR’s for sometime now, I proposed following solution;

out.fld_name :: string_split(in.fld_name_having_space_separated_words, " ")[1];

As the string_split() returns the result in form of a vector, which one can relate to arrays in many programming languages. Thus, this trick makes use of implicit vectors, and assigns second element denoted by [1].

Hope this trick works for you as well. If not, explore more to know more.