Читать онлайн "Practical Common Lisp" - Siebel Peter - RuLit

(define-binary-type u1 () (unsigned-integer :bytes 1))

(define-binary-type u2 () (unsigned-integer :bytes 2))

(define-binary-type u3 () (unsigned-integer :bytes 3))

(define-binary-type u4 () (unsigned-integer :bytes 4))

Another type you'll need to be able to read and write is the 28-bit value used in the header. This size is encoded using 28 bits rather than a multiple of 8, such as 32 bits, because an ID3 tag can't contain the byte #xff followed by a byte with the top 3 bits on because that pattern has a special meaning to MP3 decoders. None of the other fields in the ID3 header could possibly contain such a byte sequence, but if you encoded the tag size as a regular unsigned-integer, it might. To avoid that possibility, the size is encoded using only the bottom seven bits of each byte, with the top bit always zero.[273]

Thus, it can be read and written a lot like an unsigned-integer except the size of the byte specifier you pass to LDB should be seven rather than eight. This similarity suggests that if you add a parameter, bits-per-byte, to the existing unsigned-integer binary type, you could then define a new type, id3-tag-size, using a short-form define-binary-type. The new version of unsigned-integer is just like the old version except with bits-per-byte used everywhere the old version hardwired the number eight. It looks like this:

(define-binary-type unsigned-integer (bytes bits-per-byte)

(:reader (in)

(loop with value = 0

for low-bit downfrom (* bits-per-byte (1- bytes)) to 0 by bits-per-byte do

(setf (ldb (byte bits-per-byte low-bit) value) (read-byte in))

finally (return value)))

(:writer (out value)

(loop for low-bit downfrom (* bits-per-byte (1- bytes)) to 0 by bits-per-byte

do (write-byte (ldb (byte bits-per-byte low-bit) value) out))))

The definition of id3-tag-size is then trivial.

(define-binary-type id3-tag-size () (unsigned-integer :bytes 4 :bits-per-byte 7))

You'll also have to change the definitions of u1 through u4 to specify eight bits per byte like this:

(define-binary-type u1 () (unsigned-integer :bytes 1 :bits-per-byte 8))

(define-binary-type u2 () (unsigned-integer :bytes 2 :bits-per-byte 8))

(define-binary-type u3 () (unsigned-integer :bytes 3 :bits-per-byte 8))

(define-binary-type u4 () (unsigned-integer :bytes 4 :bits-per-byte 8))

String Types

The other kinds of primitive types that are ubiquitous in the ID3 format are strings. In the previous chapter I discussed some of the issues you have to consider when dealing with strings in binary files, such as the difference between character codes and character encodings.

ID3 uses two different character codes, ISO 8859-1 and Unicode. ISO 8859-1, also known as Latin-1, is an eight-bit character code that extends ASCII with characters used by the languages of Western Europe. In other words, the code points from 0-127 map to the same characters in ASCII and ISO 8859-1, but ISO 8859-1 also provides mappings for code points up to 255. Unicode is a character code designed to provide a code point for virtually every character of all the world's languages. Unicode is a superset of ISO 8859-1 in the same way that ISO 8859-1 is a superset of ASCII—the code points from 0-255 map to the same characters in both ISO 8859-1 and Unicode. (Thus, Unicode is also a superset of ASCII.)

Since ISO 8859-1 is an eight-bit character code, it's encoded using one byte per character. For Unicode strings, ID3 uses the UCS-2 encoding with a leading byte order mark.[274] I'll discuss what a byte order mark is in a moment.

Reading and writing these two encodings isn't a problem—it's just a question of reading and writing unsigned integers in various formats, and you just finished writing the code to do that. The trick is how you translate those numeric values to Lisp character objects.

The Lisp implementation you're using probably uses either Unicode or ISO 8859-1 as its internal character code. And since all the values from 0-255 map to the same characters in both ISO 8859-1 and Unicode, you can use Lisp's CODE-CHAR and CHAR-CODE functions to translate those values in both character codes. However, if your Lisp supports only ISO 8859-1, then you'll be able to represent only the first 255 Unicode characters as Lisp characters. In other words, in such a Lisp implementation, if you try to process an ID3 tag that uses Unicode strings and if any of those strings contain characters with code points higher than 255, you'll get an error when you try to translate the code point to a Lisp character. For now I'll assume either you're using a Unicode-based Lisp or you won't process any files containing characters outside the ISO 8859-1 range.

The other issue with encoding strings is how to know how many bytes to interpret as character data. ID3 uses two strategies I mentioned in the previous chapter—some strings are terminated with a null character, while other strings occur in positions where you can determine the number of bytes to read, either because the string at that position is always the same length or because the string is at the end of a composite structure whose overall size you know. Note, however, that the number of bytes isn't necessarily the same as the number of characters in the string.

Putting all these variations together, the ID3 format uses four ways to read and write strings—two characters crossed with two ways of delimiting the string data.

Obviously, much of the logic of reading and writing strings will be quite similar. So, you can start by defining two binary types, one for reading strings of a specific length (in characters) and another for reading terminated strings. Both types take advantage of that the type argument to read-value and write-value is just another piece of data; you can make the type of character to read a parameter of these types. This is a technique you'll use quite a few times in this chapter.

(define-binary-type generic-string (length character-type)

(:reader (in)

(let ((string (make-string length)))

(dotimes (i length)

(setf (char string i) (read-value character-type in)))

string))

(:writer (out string)

(dotimes (i length)

(write-value character-type out (char string i)))))

(define-binary-type generic-terminated-string (terminator character-type)

(:reader (in)

(with-output-to-string (s)

(loop for char = (read-value character-type in)

until (char= char terminator) do (write-char char s))))

(:writer (out string)

(loop for char across string

do (write-value character-type out char)

finally (write-value character-type out terminator))))

With these types available, there's not much to reading ISO 8859-1 strings. Because the character-type argument you pass to read-value and write-value of a generic-string must be the name of a binary type, you need to define an iso-8859-1-char binary type. This also gives you a good place to put a bit of sanity checking on the code points of characters you read and write.

вернуться

273

The frame data following the ID3 header could also potentially contain the illegal sequence. That's prevented using a different scheme that's turned on via one of the flags in the tag header. The code in this chapter doesn't account for the possibility that this flag might be set; in practice it's rarely used.

вернуться

274

In ID3v2.4, UCS-2 is replaced by the virtually identical UTF-16, and UTF-16BE and UTF-8 are added as additional encodings.