Читать онлайн "Practical Common Lisp" - Siebel Peter - RuLit

(define-binary-type iso-8859-1-char ()

(:reader (in)

(let ((code (read-byte in)))

(or (code-char code)

(error "Character code ~d not supported" code))))

(:writer (out char)

(let ((code (char-code char)))

(if (<= 0 code #xff)

(write-byte code out)

(error "Illegal character for iso-8859-1 encoding: character: ~c with code: ~d" char code)))))

Now defining the ISO 8859-1 string types is trivial using the short form of define-binary-type as follows:

(define-binary-type iso-8859-1-string (length)

(generic-string :length length :character-type 'iso-8859-1-char))

(define-binary-type iso-8859-1-terminated-string (terminator)

(generic-terminated-string :terminator terminator :character-type 'iso-8859-1-char))

Reading UCS-2 strings is only slightly more complex. The complexity arises because you can encode a UCS-2 code point in two ways: most significant byte first (big-endian) or least significant byte first (little-endian). UCS-2 strings therefore start with two extra bytes, called the byte order mark, made up of the numeric value #xfeff encoded in either big-endian form or little-endian form. When reading a UCS-2 string, you read the byte order mark and then, depending on its value, read either big-endian or little-endian characters. Thus, you'll need two different UCS-2 character types. But you need only one version of the sanity-checking code, so you can define a parameterized binary type like this:

(define-binary-type ucs-2-char (swap)

(:reader (in)

(let ((code (read-value 'u2 in)))

(when swap (setf code (swap-bytes code)))

(or (code-char code) (error "Character code ~d not supported" code))))

(:writer (out char)

(let ((code (char-code char)))

(unless (<= 0 code #xffff)

(error "Illegal character for ucs-2 encoding: ~c with char-code: ~d" char code))

(when swap (setf code (swap-bytes code)))

(write-value 'u2 out code))))

where the swap-bytes function can be defined as follows, taking advantage of LDB being SETFable and thus ROTATEFable:

(defun swap-bytes (code)

(assert (<= code #xffff))

(rotatef (ldb (byte 8 0) code) (ldb (byte 8 8) code))

code)

Using ucs-2-char, you can define two character types that will be used as the character-type arguments to the generic string functions.

(define-binary-type ucs-2-char-big-endian () (ucs-2-char :swap nil))

(define-binary-type ucs-2-char-little-endian () (ucs-2-char :swap t))

Then you need a function that returns the name of the character type to use based on the value of the byte order mark.

(defun ucs-2-char-type (byte-order-mark)

(ecase byte-order-mark

(#xfeff 'ucs-2-char-big-endian)

(#xfffe 'ucs-2-char-little-endian)))

Now you can define length- and terminator-delimited string types for UCS-2-encoded strings that read the byte order mark and use it to determine which variant of UCS-2 character to pass as the character-type argument to read-value and write-value. The only other wrinkle is that you need to translate the length argument, which is a number of bytes, to the number of characters to read, accounting for the byte order mark.

(define-binary-type ucs-2-string (length)

(:reader (in)

(let ((byte-order-mark (read-value 'u2 in))

(characters (1- (/ length 2))))

(read-value

'generic-string in

:length characters

:character-type (ucs-2-char-type byte-order-mark))))

(:writer (out string)

(write-value 'u2 out #xfeff)

(write-value

'generic-string out string

:length (length string)

:character-type (ucs-2-char-type #xfeff))))

(define-binary-type ucs-2-terminated-string (terminator)

(:reader (in)

(let ((byte-order-mark (read-value 'u2 in)))

(read-value

'generic-terminated-string in

:terminator terminator

:character-type (ucs-2-char-type byte-order-mark))))

(:writer (out string)

(write-value 'u2 out #xfeff)

(write-value

'generic-terminated-string out string

:terminator terminator

:character-type (ucs-2-char-type #xfeff))))

ID3 Tag Header

With the basic primitive types done, you're ready to switch to a high-level view and start defining binary classes to represent first the ID3 tag as a whole and then the individual frames.

If you turn first to the ID3v2.2 specification, you'll see that the basic structure of the tag is this header:

ID3/file identifier "ID3"

ID3 version $02 00

ID3 flags %xx000000

ID3 size 4 * %0xxxxxxx

followed by frame data and padding. Since you've already defined binary types to read and write all the fields in the header, defining a class that can read the header of an ID3 tag is just a matter of putting them together.

(define-binary-class id3-tag ()

((identifier (iso-8859-1-string :length 3))

(major-version u1)

(revision u1)

(flags u1)

(size id3-tag-size)))

If you have some MP3 files lying around, you can test this much of the code and also see what version of ID3 tags your MP3s contain. First you can write a function that reads an id3-tag, as just defined, from the beginning of a file. Be aware, however, that ID3 tags aren't required to appear at the beginning of a file, though these days they almost always do. To find an ID3 tag elsewhere in a file, you can scan the file looking for the sequence of bytes 73, 68, 51 (in other words, the string "ID3").[275] For now you can probably get away with assuming the tags are the first thing in the file.

(defun read-id3 (file)

(with-open-file (in file :element-type '(unsigned-byte 8))

(read-value 'id3-tag in)))

On top of this function you can build a function that takes a filename and prints the information in the tag header along with the name of the file.

(defun show-tag-header (file)

(with-slots (identifier major-version revision flags size) (read-id3 file)

(format t "~a ~d.~d ~8,'0b ~d bytes — ~a~%"

identifier major-version revision flags size (enough-namestring file))))

It prints output that looks like this:

ID3V2> (show-tag-header "/usr2/mp3/Kitka/Wintersongs/02 Byla Cesta.mp3")

ID3 2.0 00000000 2165 bytes — Kitka/Wintersongs/02 Byla Cesta.mp3

вернуться

275

The 2.4 version of the ID3 format also supports placing a footer at the end of a tag, which makes it easier to find a tag appended to the end of a file.