taT4Nix | Transform HTML to WIKI using GNU sed

Recently, I have joined the forums – http://unix.stackexchange.com/ where I came across the following question;

http://unix.stackexchange.com/questions/11448/how-to-convert-a-href-http-xy-comxy-a-to-http-xy-com-xy

In simple words, this is what was required;

<a href="something">sometext</a>
# transforms into the following wiki markup
[something sometext]

Here’s what I feel should be done using GNU sed;

sed -r 's,<a\s+href=\"([^>]*)\">([^<]*)</a>,[\1 \2],g'

The curve-brackets helps in grouping pattern-matching data, which is then used-by back-references (eg. \1, \2, etc.)
So, in our scenario, we could identify two logical groups;

  1. Data between ‘<a href=”‘ and ‘”>’
  2. Data between ‘”>’ and ‘</a>’

Once groups are identified, using sed back-references, one can format it the way particular wiki markup requires.