Sunday, October 21, 2007

Have you ever wanted to do more than use the basic find-and-replace functions in Word? Wildcard characters and regular expressions can make those oper

By Colin Wilcox,
Graham Mayor, and Klaus Linke

Applies to
Microsoft Word 97, 2000, and 2002

Have you ever had to make a large number of repetitive changes to a document by hand? For example, have you ever had to find and remove duplicate rows from a large table, or transpose a list of names (change them from "Colin Wilcox" to "Wilcox, Colin")? That type of repetitive find-and-replace work gets old in a big hurry, doesn't it?
You can automate many of those find-and-replace tasks. Microsoft Word provides a set of wildcard characters that you can use to build regular expressions, combinations of literal text and wildcard characters. You can use regular expressions to find text that matches a given pattern and then replace those matches with new text.
If this all sounds complex, don't worry. We'll introduce it in easy steps, explain things as we go, and provide
several working examples. You can use the information in this column with Word 97, 2000, and 2002. The user interfaces vary slightly between the versions, but you can accomplish the tasks described here with each version.
A quick spin through the jargon
To start, let's define a couple of terms:
A wildcard character is a keyboard character that you can use to represent one or many characters. For example, the asterisk (*) typically represents one or more characters, and the question mark (?) typically represents a single character.
In our case, a regular expression is a combination of literal and wildcard characters that you use to find and replace patterns of text. The literal text characters indicate text that must exist in the target string of text. The wildcard characters indicate the text that can vary in the target string.
That may seem a bit abstract, but you've seen (and most likely used) wildcard characters and regular expressions since you first began computing. For example, the Open dialog box (on the File menu, click the Open command) uses the asterisk wildcard character extensively: And, if you ever used the MS-DOS operating system, you probably used a command and a simple regular expression to copy files:
copy *.doc a:
That command uses the asterisk wildcard character and the .doc literal text string to copy a set of Word documents to hard disk drive A. If you look around a bit, you'll see that Microsoft Windows® and the Microsoft Office applications use wildcard characters everywhere.
Try it!
The steps in this section explain how to use a regular expression that transposes names. Keep in mind that you always use the Find and Replace dialog box to run your regular expressions. Also, remember that if an expression doesn't work as expected, you can always press CTRL+Z to undo your changes, and then try another expression.
To transpose names
Start Word and open a new, blank document.
Copy this table and paste it into the document.
Josh Barnhill
Doris Hartwig
Tamara Johnston
Daniel Shimshoni
Press CTRL+F to open the Find and Replace dialog box.
If you don't see the Use wildcards check box, click More, and then select the check box. If you don't select the check box, Word treats the wildcard characters as text.
Click the Replace tab, and then enter the following characters in the Find what box. Make sure you include the space between the two sets of parentheses: (<*>) (<*>)
In the Replace with box, enter the following characters. Make sure you include the space between the comma and the second slash: \2, \1
Select the table, and then click Replace All. Word transposes the names and separates them with a comma, like so:
Barnhill, Josh
Hartwig, Doris
Johnston, Tamara
Shimshoni, Daniel
At this point, you may wonder what to do if some or all of your names contain middle initials. See the first example in
Putting regular expressions to work in Word for more information.
The next section explains how those regular expressions work.
What makes the expression tick
From here on, keep this principle in mind: The content of a document controls most (but not all) of the design of your regular expressions. For example, in the sample table you used earlier, each cell contained two words. If the cell contained two words and a middle initial, you'd use a different expression.
Let's examine each expression from the inside out:
In the first expression, (<*>) (<*>):
The asterisk (*) returns all the text in the word.
The less than and greater than symbols (< >) mark the start and end of each word, respectively. They ensure that the search returns a single word.
The parentheses and the space between them divide the words into distinct groups: (first word) (second word). The parentheses also indicate the order in which you want search to evaluate each expression.
In other words, the expression says: "Find both words."
Note Searching on this expression, (*) (*>), produces the same results. However, the expression in the example is easier to describe, and you should use restricting characters whenever you can, because doing so ensures greater accuracy in your results.
In the second expression, \2, \1:
The slash (\) works with the numbers to serve as a placeholder. (You can also use the slash to find other wildcard characters. See the next section for more information.)
The comma after the first placeholder inserts the correct punctuation between the transposed names.
In other words, the expression says: "Write the second word, add a comma, write the first word."
For more on this issue please follow this link :

No comments:

Post a Comment

Your Comment Please!