[UPDATE: 09/28/2014]
I have mainly used python for text parsing, validation and transforming as needed. If it was done using shell script, I would end up writing variety of regular expression to play around.
Getting Started
Well, python is no different and in order to cook up regular expressions, one must import re (module) and get started.
import re
So far, I have been able to use the patterns exactly the same way as I would with grep or sed. Usually, I end up writing multiple search patterns, as the script evolves. While using python, I find it intuitive to create dictionary of compiled search patterns (RegexObject), I wrote to style unified differences, as follows.
regexDict = { 'HEADER': re.compile("^@@ ([+-][0-9]+(,[0-9]+)? ?){1,2} @@$"), 'ADD': re.compile("^\+"), 'DEL': re.compile("^\-") }
Search Recursively
Looking at the above dictionary, there are only 3 key-value pairs, so writing if-else construct would be easy. Let say, such a dictionary is dynamically created and can have any number of key-value pairs.
All you need to figure out, whether data matches particular search pattern or not. If yes, print the data, or transform the data, etc. In this post, we will go one step further and redesign the if-else construct used to style unified differences, as follows.
with open(inputFile, 'r') as fileObj: ### Using Slice To Ignore First 2 Lines for line in fileObj.read().splitlines()[2:]: fn__recurSearch(regexDict.keys(), line)
The for-statement invokes function with two arguments, first one of the type – iterator and the other one – string. Let us look at the function definition below;
def fn__recurSearch(iterator, data): if not iterator: print data return key = iterator.pop(0) matchObj = regexDict[key].search(data) if not matchObj: fn__recurSearch(iterator, data) return print data
The function looks straight forward, however this would simply dump the file as is on the STDOUT. If you have paid attention, you will notice, we have not added logic to style unified differences !! Well, that’s an exercise left it for you, otherwise I will try to cover next time.
[UPDATE: 09/28/2014]
Let us add the logic to style unified differences, as follows;
def fn__recurSearch(iterator, data): global codeChunkList if not iterator: ##print data codeChunkList.append(fn__applyStyle(data)) return key = iterator.pop(0) matchObj = regexDict[key].search(data) if not matchObj: fn__recurSearch(iterator, data) return if key == 'HEADER': if codeChunkList: print trHTMLCode % '<br />\n'.join(codeChunkList) codeChunkList = [] return ##print data codeChunkList.append(fn__applyStyle(data))
Indeed, the recursive function does make it look so easy, yet simple. The output produced by new design, yields the same results as did the old one.
WAIT, there’s more..
How about styling context differences ?? This approach can have multiple applications, depending on your problem scenario. Give it a try, feel free to share your thoughts..