taT4Nix | convert EBCDIC to ASCII and vice versa

So far, I have encountered this on data warehousing projects, probably this might happen in some other domains too. Anyways, if there’s an ebcdic file with you, mostly retrieved from Mainframe systems. Then, one would like to convert them to ASCII for making modifications using text editors on UNIX servers, like AIX.

I have used the following command several times for changing the file from ASCII to EBCDIC or vice-versa. So, this is how its done;

dd if=<ebcdic-file> of=<ascii-file> conv=ascii

Now, you can start modifying the ASCII version and once done, you may convert it back to EBCDIC to be used by your application.

dd if=<ascii-file> of=<ebcdic-file> conv=ebcdic

If you’re just replacing particular number of bytes with equivalent number of bytes having different characters, the conversion would be smooth and application reading the file should not have any issues.

However, I had some issues while adding/deleting records into/from ASCII version, as found when converted back to EBCDIC mode. The file was unreadable by the application and I had difficulty reverting without having backup of EBCDIC version.

Hope this helps :)

taT4Nix | Redirect STDERR to File or File Descriptor

$> cd /
$> find $PWD -type -f

If you change directory to root directory on any UNIX server and try the above find command, chances are you would see long list of error messages along with few search results.

Mostly, the error would be Permission denied, one would like to suppress such messages, as they are not helping in achieving the desired results. In that scenario, do the following;

$> cd /
$> find $PWD -type -f 2> /dev/null

There you go, looks simple isn’t it?

2> means redirect STDERR to the file or file descriptor on the right-hand side of output-redirection operator.

In our case, it would be a file known as /dev/null, however there could be scenarios when your command’s output is redirected to some file, yet the error messages are getting printed on the STDOUT (as it is by default).

To handle that, do the following;

$> cd /
$> find $PWD -type -f > $HOME/sample_logfile.log 2>&1

Using 2>&1 would redirect the STDERR to file descriptor for STDOUT, i.e. &1 However, STDOUT is already redirected to $HOME/sample_logfile.log So, effectively STDERR and STDOUT are both being redirected to same logfile.

That’s all for today, keep scripting :)

taT4AI | Using Implicit Vectors Through String Functions

Heard about Ab Initio? It is an ETL tool for Data Warehousing Projects, or probably much more than that. To know more about the software, please refer the official website – http://www.abinitio.com/

Recently, while coding an XFR, i.e. business transformation logic for some project, I was asked that there’s a field in the input record format, which contains words separated by single space. However, only the second word holds significance. So, how can one extract the exact information?

As I have been dealing with vectors in writing XFR’s for sometime now, I proposed following solution;

out.fld_name :: string_split(in.fld_name_having_space_separated_words, " ")[1];

As the string_split() returns the result in form of a vector, which one can relate to arrays in many programming languages. Thus, this trick makes use of implicit vectors, and assigns second element denoted by [1].

Hope this trick works for you as well. If not, explore more to know more.

taT4Nix | Search and Replace Nth Delimited Item

Recently, somebody asked whether it’s possible to replace Nth delimited item, delimiter set as pipe symbol. Suppose, the text contains;

TXT="abc|def|ghi|jlk|mnop|qrst|uvw|xyz|123|456|7890"

Now, if you look at the text, 4th delimited item is out-of-order, i.e jlk. So, if I want to correct it to jkl, (i.e. N = 4) I tried solving this using sed and found complex solution;

echo $TXT | sed -r 's/((([^|]*)\|){3})([^|]*)(.*)$/\1jkl|\5/'
## General Solution
typeset -i N=$1; (( N-- ))
cat $2 | sed -r 's/((([^<delimiter>]*)<delimiter-with-or-without-esc-seq>){'$N'})([^<delimiter>]*)(.*)$/\1<replace-pattern><delimiter>\5/'

Few days later, my colleague suggested the following solution, rather easy one;

echo $TXT | sed -r 's/[^|]*/jkl/4'
## General Solution
typeset -i N=$1
cat $2 | sed -r 's/[^<delimiter>]*/<replace-pattern>/'$N''

Both of them worked, the later one was easy to read and explain.

taT4Nix | Transform HTML to WIKI using GNU sed

Recently, I have joined the forums – http://unix.stackexchange.com/ where I came across the following question;

http://unix.stackexchange.com/questions/11448/how-to-convert-a-href-http-xy-comxy-a-to-http-xy-com-xy

In simple words, this is what was required;

<a href="something">sometext</a>
# transforms into the following wiki markup
[something sometext]

Here’s what I feel should be done using GNU sed;

sed -r 's,<a\s+href=\"([^>]*)\">([^<]*)</a>,[\1 \2],g'

The curve-brackets helps in grouping pattern-matching data, which is then used-by back-references (eg. \1, \2, etc.)
So, in our scenario, we could identify two logical groups;

  1. Data between ‘<a href=”‘ and ‘”>’
  2. Data between ‘”>’ and ‘</a>’

Once groups are identified, using sed back-references, one can format it the way particular wiki markup requires.