;; GNU-style error messages.
;; This used to reject spaces and dashes in file names,
;; but they are valid now; so I made it more strict about the error
;; message that follows.
("\\(\\([a-zA-Z]:\\)?[^:(\t\n]+\\)(\\([0-9]+\\)) \
: \\(error\\|warning\\) C[0-9]+:" 1 3)
;; Caml compiler:
;; File "foobar.ml", lines 5-8, characters 20-155: blah blah
("^File \"\\([^,\" \n\t]+\\)\", lines? \\([0-9]+\\)[-0-9]*, characters? \
\\([0-9]+\\)" 1 2 3)
;; Cray C compiler error messages
("\\(cc\\| cft\\)-[0-9]+ c\\(c\\|f77\\): ERROR \\([^,\n]+, \\)* File = \
\\([^,\n]+\\), Line = \\([0-9]+\\)" 4 5)
;; Perl -w:
;; syntax error at automake line 922, near "':'"
;; Perl debugging traces
;; store::odrecall('File_A', 'x2') called at store.pm line 90
(".* at \\([^ \n]+\\) line \\([0-9]+\\)[,.\n]" 1 2)
;; See http://ant.apache.org/faq.html
;; Ant Java: works for jikes
("^\\s-*\\[[^]]*\\]\\s-*\\(.+\\):\\([0-9]+\\):\\([0-9]+\\):[0-9]+:[0-9]\
+:" 1 2 3)
;; Ant Java: works for javac
("^\\s-*\\[[^]]*\\]\\s-*\\(.+\\):\\([0-9]+\\):" 1 2)
)
This is a list of elements that have at least three parts each: a regular expression and two numbers. The regular expression matches error messages in the format used by a particular compiler or tool. The first number tells Emacs which of the matched subexpressions contains the filename in the error message; the second number designates which of the subexpressions contains the line number. (There can also be additional parts at the end: a third number giving the position of the column number of the error, if any, and any number of format strings used to generate the true filename from the piece found in the error message, if needed. For more details about these, look at the actual file, as described below.)
For example, the element in the list dealing with Perl contains the regular expression:
".* at \\([^ \n]+\\) line \\([0-9]+\\)[,.\n]"
followed by 1 and 2, meaning that the first parenthesized subexpression contains the filename and the second contains the line number. So if you have Perl's warnings turned on—you always do, of course—you might get an error message such as this:
syntax error at monthly_orders.pl line 1822, near "$"
The regular expression ignores everything up to at. Then it finds monthly_orders.pl, the filename, as the match to the first subexpression "[^ \n]+" (one or more nonblank, nonnewline characters), and it finds 1822, the line number, as the match to the second subexpression "[0-9]+" (one or more digits).
For the most part, these regular expressions are documented pretty well in their definitions. Understanding them in depth can still be a challenge, and writing them even more so! Suppose we want to tackle Example 5 by adding an element to this list for our new C++ compiler that prints error messages in German. In particular, it prints error messages like this:
Fehler auf Zeile linenum in filename: text of error message
Here is the element we would add to compilation-error-regexp-alist:
("Fehler auf Zeile \\([0-9]+\\) in \\([^: \t]+\\):" 2 1)
In this case, the second parenthesized subexpression matches the filename, and the first matches the line number.
To add this to compilation-error-regexp-alist, we need to put this line in .emacs:
(setq compilation-error-regexp-alist
(cons '("Fehler auf Zeile \\([0-9]+\\) in \\([^: \t]+\\):" 2 1)
compilation-error-regexp-alist))
Notice how this example resembles our example (from Chapter 9) of adding support for a new language mode to auto-mode-alist.
11.3.2.5 Regular expression operator summary
Table 11-6 concludes our discussion of regular expression operators with a reference list of all the operators covered.
Table 11-6. Regular expression operators
| Operator | Function |
|---|---|
. |
Match any character. |
* |
Match 0 or more occurrences of preceding char or group. |
+ |
Match 1 or more occurrences of preceding char or group. |
? |
Match 0 or 1 occurrences of preceding char or group. |
[...] |
Set of characters; see below. |
\\( |
Begin a group. |
\\) |
End a group. |
\\| |
Match the subexpression before or after \\|. |
^ |
At beginning of regexp, match beginning of line or string. |
$ |
At end of regexp, match end of line or string. |
\n |
Match Newline within a regexp. |
\t |
Match Tab within a regexp. |
\\< |
Match beginning of word. |
\\> |
Match end of word. |
| The following operators are meaningful within character sets: | |
^ |
At beginning of set, treat set as chars not to match. |
- (dash) |
Specify range of characters. |
| The following is also meaningful in regexp replace strings: | |
\\n |
Substitute portion of match within the nth \\( and \\), counting from left \\( to right, starting with 1. |