It has come to light that Apple has embedded personal details into music files bought from its iTunes music store.
Ars Technica, one of the first websites to unveil the hidden information, said: “Everyone should be aware that while DRM-free files may lift a lot of restrictions on our personal usage habits, it doesn’t mean we can just start sharing the love, so to speak. Sharer beware.”
Personal data, including the names and e-mail addresses of purchasers, are inserted into the AAC files that Apple uses to distribute music tracks.The information is also included in tracks sold under Apple’s iTunes Plus system, launched this week, where users pay a premium for music that is free from the controversial digital rights management (DRM) intended to protect against piracy.
The Electronic Freedom Foundation, said it was possible that the data could be used to “watermark” tracks so that the original purchaser could be tracked down if a track appeared on a file-sharing network, although experts said that it would be relatively easy to “spoof” such data.
A couple of recent posts on Ars Technica and TUAW pointed out that Apple is embedding personal information, such as the name and email address of the purchaser, in all of their AAC files (including the DRM-free ones). We got curious, and wondered whether Apple might also be watermarking the underlying audio data in these tracks.
We've found that there isn't a watermark in the compressed audio signal itself, but there are surprisingly huge differences in the encoded files. Much bigger differences than just different tags, or even different signed/encrypted tags.
We compared two DRM-free copies of the track Daftendirekt by Daft Punk. When decoded to PCM/WAV data, both copies produced an identical audio signal (the MD5sum is e40b006497f9b417760ca5015c3fa937). So there is no audio watermark. But one of the .m4a files is almost 360K larger than the other!
We haven't finished examining these differences yet, and we don't have in-house expertise on MPEG codecs, but some of them have an intriguing amount of structure. There's a region (see around offset 0x11470 in the Daft Punk track for example) where the files contain what look like tables with sequential indices but different data in the table.
We'll post again if we learn more about what's going on here. In the mean time, some pure speculation: it may be that large amounts of iTunes library data are present in each file. It's also possible that Apple has found a way to watermark the AAC encoding itself, such that users would need to either crack the watermark or transcode the audio signal in order to produce a file that does not identify them as the source."