(defun translate-region-into-German (start end)
(interactive "r")
...
The r option to interactive fills in the two arguments start and end when the function is called interactively, but if it is called from other Lisp code, both arguments must be supplied. The usual way to do this is like this:
(translate-region-into-German (point) (mark))
But you need not call it in this way. If you wanted to use this function to write another function called translate-buffer-into-German, you would only need to write the following as a "wrapper":
(defun translate-buffer-into-German ( )
(translate-region-into-German (point-min) (point-max)))
In fact, it is best to avoid using point and mark within Lisp code unless doing so is really necessary; use local variables instead. Try not to write Lisp functions as lists of commands a user would invoke; that sort of behavior is better suited to macros (see Chapter 6).
11.3.2 Regular Expressions
Regular expressions (regexps) provide much more powerful ways of dealing with text. Although most beginning Emacs users tend to avoid commands that use regexps, like replace-regexp and re-search-forward, regular expressions are widely used within Lisp code. Such modes as Dired and the programming language modes would be unthinkable without them. Regular expressions require time and patience to become comfortable with, but doing so is well worth the effort for Lisp programmers, because they are one of the most powerful features of Emacs, and many things are not practical to implement in any other way.
One trick that can be useful when you are experimenting with regular expressions and trying to get the hang of them is to type some text into a scratch buffer that corresponds to what you're trying to match, and then use isearch-forward-regexp (C-M-s) to build up the regular expression. The interactive, immediate feedback of an incremental search can show you the pieces of the regular expression in action in a way that is completely unique to Emacs.
We introduce the various features of regular expressions by way of a few examples of search-and-replace situations; such examples are easy to explain without introducing lots of extraneous details. Afterward, we describe Lisp functions that go beyond simple search-and-replace capabilities with regular expressions. The following are examples of searching and replacing tasks that the normal search/replace commands can't handle or handle poorly:
• You are developing code in C, and you want to combine the functionality of the functions read and readfile into a new function called get. You want to replace all references to these functions with references to the new one.
• You are writing a troff document using outline mode, as described in Chapter 7. In outline mode, headers of document sections have lines that start with one or more asterisks. You want to write a function called remove-outline-marks to get rid of these asterisks so that you can run troff on your file.
• You want to change all occurrences of program in a document, including programs and program's, to module/modules/module's, without changing programming to moduleming or programmer to modulemer.
• You are working on documentation for some C software that is being rewritten in Java. You want to change all the filenames in the documentation from <filename>.c to <filename>.java, since .java is the extension the javac compiler uses.
• You just installed a new C++ compiler that prints error messages in German. You want to modify the Emacs compile package so that it can parse the error messages correctly (see the end of Chapter 9).
We will soon show how to use regular expressions to deal with these examples, which we refer to by number. Note that this discussion of regular expressions, although more comprehensive than that in Chapter 3, does not cover every feature; those that it doesn't cover are redundant with other features or relate to concepts that are beyond the scope of this book. It is also important to note that the regular expression syntax described here is for use with Lisp strings only; there is an important difference between the regexp syntax for Lisp strings and the regexp syntax for user commands (like replace-regexp), as we will see.
11.3.2.1 Basic operators
Regular expressions began as an idea in theoretical computer science, but they have found their way into many nooks and crannies of everyday, practical computing. The syntax used to represent them may vary, but the concepts are much the same everywhere. You probably already know a subset of regular expression notation: the wildcard characters used by the Unix shell or Windows command prompt to match filenames. The Emacs notation is a bit different; it is similar to those used by the language Perl, editors like ed and vi and Unix software tools like lex and grep. So let's start with the Emacs regular expression operators that resemble Unix shell wildcard character, which are listed in Table 11-5.
Table 11-5. Basic regular expression operators
| Emacs operator | Equivalent | Function |
|---|---|---|
. |
? |
Matches any character. |
.* |
* |
Matches any string. |
[abc] |
[abc] |
Matches a, b, or c. |
[a-z] |
[a-z] |
Matches any lowercase letter. |
For example, to match all filenames beginning with program in the Unix shell, you would specify program*. In Emacs, you would say program.*. To match all filenames beginning with a through e in the shell, you would use [a-e]* or [abcde]*; in Emacs, it's [a-e].* or [abcde].*. In other words, the dash within the brackets specifies a range of characters.[78] We will provide more on ranges and bracketed character sets shortly.
To specify a character that is used as a regular expression operator, you need to precede it with a double-backslash, as in \\* to match an asterisk. Why a double backslash? The reason has to do with the way Emacs Lisp reads and decodes strings. When Emacs reads a string in a Lisp program, it decodes the backslash-escaped characters and thus turns double backslashes into single backslashes. If the string is being used as a regular expression—that is, if it is being passed to a function that expects a regular expression argument—that function uses the single backslash as part of the regular expression syntax. For example, given the following line of Lisp:
78
Emacs uses ASCII codes (on most machines) to build ranges, but you shouldn't depend on this fact; it is better to stick to dependable things, like all-lowercase or all-uppercase alphabet subsets or [0-9] for digits, and avoid potentially nonportable items, like [A-z] and ranges involving punctuation characters.