(define-binary-type iso-8859-1-char ()
(:reader (in)
(let ((code (read-byte in)))
(or (code-char code)
(error "Character code ~d not supported" code))))
(:writer (out char)
(let ((code (char-code char)))
(if (<= 0 code #xff)
(write-byte code out)
(error "Illegal character for iso-8859-1 encoding: character: ~c with code: ~d" char code)))))
Now defining the ISO 8859-1 string types is trivial using the short form of define-binary-type as follows:
(define-binary-type iso-8859-1-string (length)
(generic-string :length length :character-type 'iso-8859-1-char))
(define-binary-type iso-8859-1-terminated-string (terminator)
(generic-terminated-string :terminator terminator :character-type 'iso-8859-1-char))
Reading UCS-2 strings is only slightly more complex. The complexity arises because you can encode a UCS-2 code point in two ways: most significant byte first (big-endian) or least significant byte first (little-endian). UCS-2 strings therefore start with two extra bytes, called the byte order mark, made up of the numeric value #xfeff encoded in either big-endian form or little-endian form. When reading a UCS-2 string, you read the byte order mark and then, depending on its value, read either big-endian or little-endian characters. Thus, you'll need two different UCS-2 character types. But you need only one version of the sanity-checking code, so you can define a parameterized binary type like this:
(define-binary-type ucs-2-char (swap)
(:reader (in)
(let ((code (read-value 'u2 in)))
(when swap (setf code (swap-bytes code)))
(or (code-char code) (error "Character code ~d not supported" code))))
(:writer (out char)
(let ((code (char-code char)))
(unless (<= 0 code #xffff)
(error "Illegal character for ucs-2 encoding: ~c with char-code: ~d" char code))
(when swap (setf code (swap-bytes code)))
(write-value 'u2 out code))))
where the swap-bytes function can be defined as follows, taking advantage of LDB being SETFable and thus ROTATEFable:
(defun swap-bytes (code)
(assert (<= code #xffff))
(rotatef (ldb (byte 8 0) code) (ldb (byte 8 8) code))
code)
Using ucs-2-char, you can define two character types that will be used as the character-type arguments to the generic string functions.
(define-binary-type ucs-2-char-big-endian () (ucs-2-char :swap nil))
(define-binary-type ucs-2-char-little-endian () (ucs-2-char :swap t))
Then you need a function that returns the name of the character type to use based on the value of the byte order mark.
(defun ucs-2-char-type (byte-order-mark)
(ecase byte-order-mark
(#xfeff 'ucs-2-char-big-endian)
(#xfffe 'ucs-2-char-little-endian)))
Now you can define length- and terminator-delimited string types for UCS-2-encoded strings that read the byte order mark and use it to determine which variant of UCS-2 character to pass as the character-type argument to read-value and write-value. The only other wrinkle is that you need to translate the length argument, which is a number of bytes, to the number of characters to read, accounting for the byte order mark.
(define-binary-type ucs-2-string (length)
(:reader (in)
(let ((byte-order-mark (read-value 'u2 in))
(characters (1- (/ length 2))))
(read-value
'generic-string in
:length characters
:character-type (ucs-2-char-type byte-order-mark))))
(:writer (out string)
(write-value 'u2 out #xfeff)
(write-value
'generic-string out string
:length (length string)
:character-type (ucs-2-char-type #xfeff))))
(define-binary-type ucs-2-terminated-string (terminator)
(:reader (in)
(let ((byte-order-mark (read-value 'u2 in)))
(read-value
'generic-terminated-string in
:terminator terminator
:character-type (ucs-2-char-type byte-order-mark))))
(:writer (out string)
(write-value 'u2 out #xfeff)
(write-value
'generic-terminated-string out string
:terminator terminator
:character-type (ucs-2-char-type #xfeff))))
ID3 Tag Header
With the basic primitive types done, you're ready to switch to a high-level view and start defining binary classes to represent first the ID3 tag as a whole and then the individual frames.
If you turn first to the ID3v2.2 specification, you'll see that the basic structure of the tag is this header:
ID3/file identifier "ID3"
ID3 version $02 00
ID3 flags %xx000000
ID3 size 4 * %0xxxxxxx
followed by frame data and padding. Since you've already defined binary types to read and write all the fields in the header, defining a class that can read the header of an ID3 tag is just a matter of putting them together.
(define-binary-class id3-tag ()
((identifier (iso-8859-1-string :length 3))
(major-version u1)
(revision u1)
(flags u1)
(size id3-tag-size)))
If you have some MP3 files lying around, you can test this much of the code and also see what version of ID3 tags your MP3s contain. First you can write a function that reads an id3-tag, as just defined, from the beginning of a file. Be aware, however, that ID3 tags aren't required to appear at the beginning of a file, though these days they almost always do. To find an ID3 tag elsewhere in a file, you can scan the file looking for the sequence of bytes 73, 68, 51 (in other words, the string "ID3").[275] For now you can probably get away with assuming the tags are the first thing in the file.
(defun read-id3 (file)
(with-open-file (in file :element-type '(unsigned-byte 8))
(read-value 'id3-tag in)))
On top of this function you can build a function that takes a filename and prints the information in the tag header along with the name of the file.
(defun show-tag-header (file)
(with-slots (identifier major-version revision flags size) (read-id3 file)
(format t "~a ~d.~d ~8,'0b ~d bytes — ~a~%"
identifier major-version revision flags size (enough-namestring file))))
It prints output that looks like this:
ID3V2> (show-tag-header "/usr2/mp3/Kitka/Wintersongs/02 Byla Cesta.mp3")
ID3 2.0 00000000 2165 bytes — Kitka/Wintersongs/02 Byla Cesta.mp3
275
The 2.4 version of the ID3 format also supports placing a footer at the end of a tag, which makes it easier to find a tag appended to the end of a file.