Thursday, August 14, 2008

Processing SWF Files With Perl - Part I

Flash is a very popular technology, at the heart of what we call Rich Internet Applications. Most commonly we interact with it through the SWF file format (ShockWave Flash files like games, web pages intros, etc) or FlashVideo (flv like what else, YouTube) The technology is not from Mars however and started in 1997. It gained wide popularity due to its integration with IE 5 back in 1999. Read here for more.

Flash is really everywhere and there is a tremendous amount of information in these files: music, text, video, etc. However, for the most part all this is invisible to search engines. A slight progress was made when Adobe released an SDK for processing SWF files, in specific converting swf files to html. I got this early upon release (you had to make a registration or something) but now I canot find it anymore..The closest I could get was this page. In which Adobe people explain how wonderfully they work with Google people to enhance searchability of flash content.

Anyway, there is not really an SDK, just an exe file to make the conversion.I do not know if search people have access to much greater functionality, but for now this is Maybe you have noticed some weird html code when you google search for swf files. Well, this is how bad this tool is. But don't worry! You can always use the official SWF spec.

Using Perl for this task requires patience and attention. I ran into several problems when writing the code. In this part I am going to refer only to the things to pay attention to (could save you hours of work) and also a short description of the SWF header format. In the next post, I will post the code.

Things to take care of:
  • Perl loves text and using it in binary mode might get dirty. For example, when read is called by default you will get the ASCII character from the file you are reading. In this case you have to binmode the file handler, but even if I did that I didn't get what I expected on a Windows machine. I had to ord the byte read to get the actual byte value.
  • Shift operators like >> or << style="font-weight: bold;">CPAN module. For example, if you want to read 10 bytes as one value then you need to avoid shifting because an overflow is sure to occur
  • SWF files come frequently compressed. This mean you will have to use the Compress::Zlib module (which is actually an interface the respective IO module)
In the next post we will get down to the code. If you want to give it a try on your own, the SWF header consists of:
  • 3 bytes for signature
  • 1 byte for Flash Version
  • 4 bytes for file length
  • a RECT structure for frame size
  • 2 bytes for frame rate
  • 2 bytes for frames count
Discovery of Day: Flash version 255 will be the last one!! :)