taT4Nix | Transform HTML to WIKI using GNU sed

Recently, I have joined the forums – http://unix.stackexchange.com/ where I came across the following question;

http://unix.stackexchange.com/questions/11448/how-to-convert-a-href-http-xy-comxy-a-to-http-xy-com-xy

In simple words, this is what was required;

<a href="something">sometext</a>
# transforms into the following wiki markup
[something sometext]

Here’s what I feel should be done using GNU sed;

sed -r 's,<a\s+href=\"([^>]*)\">([^<]*)</a>,[\1 \2],g'

The curve-brackets helps in grouping pattern-matching data, which is then used-by back-references (eg. \1, \2, etc.)
So, in our scenario, we could identify two logical groups;

  1. Data between ‘<a href=”‘ and ‘”>’
  2. Data between ‘”>’ and ‘</a>’

Once groups are identified, using sed back-references, one can format it the way particular wiki markup requires.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s