taT4Nix | Transform HTML to WIKI using GNU sed

Recently, I have joined the forums – http://unix.stackexchange.com/ where I came across the following question;


In simple words, this is what was required;

<a href="something">sometext</a>
# transforms into the following wiki markup
[something sometext]

Here’s what I feel should be done using GNU sed;

sed -r 's,<a\s+href=\"([^>]*)\">([^<]*)</a>,[\1 \2],g'

The curve-brackets helps in grouping pattern-matching data, which is then used-by back-references (eg. \1, \2, etc.)
So, in our scenario, we could identify two logical groups;

  1. Data between ‘<a href=”‘ and ‘”>’
  2. Data between ‘”>’ and ‘</a>’

Once groups are identified, using sed back-references, one can format it the way particular wiki markup requires.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s