A Handy Little Tool You Might Not Know About

I never touched Unix in my life before going to college, but it was the development environment of choice for most of the upper-level coursework at school (lower level was Windows + Emacs… not a combination I would recommend given the excellent Java IDEs which exist right now, incidentally), and so I’ve gotten very used to having my good friends ls, grep, wc, and gawk available on my machine.  I use Cygwin to have them available on my Windows box, because while Linux is a lovely operating system in many ways I rather enjoy having iTunes and games available.

Anyhow, I had a task yesterday: I have a local copy of my website.  All of the pages have a very similar structure because the navigational elements are embedded in static HTML.  However, this didn’t use an official templating feature because NVU doesn’t support any, which means updating the navigational elements means updating approximately twenty pages by hand.  There was one particular snippet of HTML that I wanted to change to another, globally.

Normally, I would do this in a gawk script (you can do just about anything in a gawk script), but it gets a little messy:

BEGIN {

startTag = “<a href=\”guarantee.htm\”>Money-back Guarantee”

replacementTag = “<a href=\”guarantee.htm\”><img src=\”images/money-back-guarantee.htm\” alt=\”Money-back Guarantee\” \\>”

}

($0 !~ startTag { print $0}

($0 ~ startTag) {

gsub(startTag, replacementTag, $0);

print $0;

Anyhow, that is the way to do it in gawk.  Define your regular expression to search for, print out every line that doesn’t match it and do some surgery on the ones that do.  Note that you’ll have to run this on every file in your web directory, with the easiest way probably being a bash script, and output the results.  However, if you output to the file directly it will generally end up empty for reasons that I have never bothered really understanding, so you probably want to have another quick shell script to output it to a temp file and then copy over the original.  Something like, oh,

for file in *.htm do

  gawk -f myscript $file > temp.htm

  mv temp.htm $file

done

But that is two scripts, and two scripts is too complicated!  Enter sed, a handy little stream processing utility.  Sed is perfect when you just need to do a quick substitution for a regular express:

sed -e “/myregularexpression/myreplacement/”

There, you’re done.  Or in my case, you have to finangle the regular expression into a string, which involves quite a bit of escape characters, but it still only took about 30 seconds.  This saves about half of your typing time versus doing it in gawk, and you can’t accidentally bork the matching logic (something which is easy to do in gawk — forget the ! before the ~, or execute the (string ~ regex) block before the (string !~ regex) block, and you’re screwed.

Of course, you could always write it in Perl.  That would probably take a single line and look like your cat had just tap danced across your keyboard.  But if you want a simple, intuitive way to do fairly powerful data processing, you’ll see sed is your man…  err, I mean, you should see man sed.

Advertisements
Explore posts in the same categories: Uncategorized

9 Comments on “A Handy Little Tool You Might Not Know About”

  1. Ellen Says:

    I may be missing something here, but are you doing anything that a good text editor, like Boxer or UltraEdit, wouldn’t do for you? You can just drag all your pages into the editor window, pick “replace in all open files”, and “save all”. Both Boxer and UE handle as complicated regular expressions as I’ve ever needed to try, as well as case differences and anything else I’ve wanted.

    You convinced me to get serious about launchers; let me try to return the favor. In my opinion, a good text editor will help you more than any other single program.


  2. “…Note that you’ll have to run this on every file in your web directory, with the easiest way probably being a bash script…”

    Patrick,

    Ahh, unix is so much fun. A handy way to run a command on all files in a directory structure is via the ‘find’ command:

    ex: find -name=”*.html” -exec gawk script.awk {} \;

    where {} is replaced by the html file name. I use this command quite often – hope it helps.

  3. bitsdujour Says:

    Am I missing something? What does this do that a good text editor – Boxer or UltraEdit, just to name my favorite two – won’t do for you?

    (Sorry if this is a duplicate; I tried posting it earlier this morning, but may have missed the button).

  4. Frederik Slijkerman Says:

    Of course, you could also just do a global Find/Replace in Dreamweaver 🙂


  5. You shouldn’t have twenty copies of the same navigational code in twenty static files in the first place. Use SSI (server side includes) in the static files and include ONE file with the navigational code. Then you’ll have to change the code only once in one place.
    The include statement in SSI notation:
    The including files are regular HTML-files but must have a .shtml extension.


  6. Sorry, the web form swallowed the SSI notation (leave away backslashes):

  7. rhubarb Says:

    I haven’t used sed since my unix days, but I was looking for a solution to this problem myself a few years ago and came across a perl script called prep (perl replace).
    Its a very simple script, I’ve modifed it myself, short enough to post here if you cant find it on google (let me know)

    Anyway prep is nice because it runs in an interactive mode asking for the arguments at the command line (ah, remember the 80s!)

    This is kind of lame if you were to use it every day, but in reality you hardly ever need these tools, and when you do you dont want to have to learn it again.

    Here’s a typical session with prep:
    [c:\]prep
    Multi Line Perl Execution Program V2.0
    Use -h for help.
    Start directory? [.]
    File pattern:*.htm
    Recurse subdirectories? [n]y
    Save undo information to file [none]?undo.txt
    Verbose mode [y]:
    Keep original file times? [n]y
    Command to execute for each line:
    s/myregularexpression/myreplacement/

    The defaults are in [], you just hit enter for these.
    So instead of learning all the command line args (and you can run it that way) it asks you for the directory, if you want to recurse, if you want to change the files but keep the file time, etc

    Note the undo option: I specified undo.txt, so it will create a file of that name and fill it with diff information on all the changes it makes in all the files. That means I can run prep again with -u undo.txt and it will, obviously, undo all my changes. Cool eh?

    Also note that its not limited to replacing. I entered the s/…/…/ command to do a replace. But I could have entered any perl command to be run on every line of files.

    Okay, found a link:
    http://peter.verhas.com/progs/perl/prep/index.html

  8. Mike Blonder Says:

    Good to hear of your experience with SED. We publish thousands of static html pages for ecommerce customers with SED, AWK, VIM and a Python APP (check out the Walk plugin for VIM, it is great!).

    Cheers for SED

    Mike Blonder

  9. Peter Says:

    Nice to hear that some use my little tool 🙂


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: