Latest Entries »

Heard about Ab Initio? It is an ETL tool for Data Warehousing Projects, or probably much more than that. To know more about the software, please refer the official website – http://www.abinitio.com/

Recently, while coding an XFR, i.e. business transformation logic for some project, I was asked that there’s a field in the input record format, which contains words separated by single space. However, only the second word holds significance. So, how can one extract the exact information?

As I have been dealing with vectors in writing XFR’s for sometime now, I proposed following solution;

out.fld_name :: string_split(in.fld_name_having_space_separated_words, " ")[1];

As the string_split() returns the result in form of a vector, which one can relate to arrays in many programming languages. Thus, this trick makes use of implicit vectors, and assigns second element denoted by [1].

Hope this trick works for you as well. If not, explore more to know more.

Recently, somebody asked whether it’s possible to replace Nth delimited item, delimiter set as pipe symbol. Suppose, the text contains;

TXT="abc|def|ghi|jlk|mnop|qrst|uvw|xyz|123|456|7890"

Now, if you look at the text, 4th delimited item is out-of-order, i.e jlk. So, if I want to correct it to jkl, (i.e. N = 4) I tried solving this using sed and found complex solution;

echo $TXT | sed -r 's/((([^|]*)\|){3})([^|]*)(.*)$/\1jkl|\5/'
## General Solution
typeset -i N=$1; (( N-- ))
cat $2 | sed -r 's/((([^<delimiter>]*)<delimiter-with-or-without-esc-seq>){'$N'})([^<delimiter>]*)(.*)$/\1<replace-pattern><delimiter>\5/'

Few days later, my colleague suggested the following solution, rather easy one;

echo $TXT | sed -r 's/[^|]*/jkl/4'
## General Solution
typeset -i N=$1
cat $2 | sed -r 's/[^<delimiter>]*/<replace-pattern>/'$N''

Both of them worked, the later one was easy to read and explain.

Recently, I have joined the forums – http://unix.stackexchange.com/ where I came across the following question;

http://unix.stackexchange.com/questions/11448/how-to-convert-a-href-http-xy-comxy-a-to-http-xy-com-xy

In simple words, this is what was required;

<a href="something">sometext</a>
# transforms into the following wiki markup
[something sometext]

Here’s what I feel should be done using GNU sed;

sed -r 's,<a\s+href=\"([^>]*)\">([^<]*)</a>,[\1 \2],g'

The curve-brackets helps in grouping pattern-matching data, which is then used-by back-references (eg. \1, \2, etc.)
So, in our scenario, we could identify two logical groups;

  1. Data between ‘<a href=”‘ and ‘”>’
  2. Data between ‘”>’ and ‘</a>’

Once groups are identified, using sed back-references, one can format it the way particular wiki markup requires.

Suppose, there’s a file having following content;

$ cat > A
## This is how you echo newline
echo "\n"
## This is how to keep line break in-between words..
echo "Varun\nNischal"

Now, you want to filter out the lines having “\n” within text by writing them into some other file.

So, I would do the following;

$ cat A | grep "\\\\n" | while read line; do echo $line; done

However, you don’t see the desired result;

echo "n"
echo "VarunnNischal"

Here’s what you MUST do ..

Alternative #1 (Tested in Ubuntu 9.10 using Virtual Box)

$ sed -i 's/\\/\\\\/g; s/\"/\\"/g' A
$ cat A | grep "\\\\n" | while read line; do echo $line; done
echo "\n"
echo "Varun\nNischal"

Alternative #2 (Tested in Unix system at workplace)

$ cat A | grep "\\\\n" | while read line; do echo -E $line; done
echo "\n"
echo "Varun\nNischal"

Explore more to know more ;)

While working on DW-BI projects, there are chances you might come across using Ab Initio GDE. If you do so, then you would find this newly started blog series useful.

To run the graphs made using Ab Initio GDE in the server where the Ab Initio’s Co>Operating System is installed, one must deploy the graph as “script”, if it is not done automatically by the GDE.

Blog Series

I’m starting this series based on my experience of the past few months, where I’ve created several utilities which helps in reducing development effort to certain extent. Brief overview of the utilities;

  1. Finding out various components used in a graph.
  2. Finding out various components used within 2 given phases.
  3. Finding out components that transform a particular field in a graph.
  4. Finding out components that use artifacts (like dml, xfr, etc..) used in a graph.

This is just the beginning, many more utilities could be created by analyzing the deployed script for various purposes.

Follow

Get every new post delivered to your Inbox.