From: Sam Watkins To: debian-devel@lists.debian.org Cc: Bcc: Subject: tpkg, a package re-compressor Reply-To: X-OfflineIMAP-x1014788033-52656d6f7465-494e424f582e447261667473: 1106103102-0731086269184-v4.0.8 I am writing a program "tpkg" to recompress packages (e.g. debs) for more efficient storage and transport. It extracts archives and unzips files recursively, and makes one big .tar of them all. You can compress this tar with, for example, gzip -9 --rsyncable, bzip2 or 7zip (in experimental). You can turn a "tpkg" back into the original package (more or less) - currently you do this by untarring it and running the "tunpkg" script that was included in it. The recreated package may end up being compressed a bit differently. Currently the order of files in archives is not preserved, but I have an idea of how to do this. It's very "alpha" at the moment. If you'd like to take a look, it's at: http://nipl.net/hacks/tpkg Here's an example: freenet_latest.tgz original 4821736 compressed with 7zip instead 4389460 extracted with tpkg, then tarred 21606400 tpkg, gzip -9 --rsyncable 4714416 tpkg, bzip2 3265403 tpkg, 7zip 2424262 converted back to original form 4788467 In this example, just recompressing the original tar with 7z instead of gz does not give much improvement (~10%) but using "tpkg" to extract the files in it recursively (specifically, freenet.jar, freenet-ext.jar and seednodes.ref.bz2) then recompressing with 7zip cuts it down to about half the size. Another example: openoffice.org-help-en_1.1+20040420-2_all.deb original 12027932 extracted with tpkg (tar) 37437440 tpkg, gzip -9 --rsyncable 11001492 tpkg, 7zip 8610297 converted back to original form 12043740 And another: kernel-image-2.6.9-1-386_2.6.9-3_i386.deb original 14274238 extracted with tpkg (tar) 41482240 tpkg, gzip -9 --rsyncable 14398435 tpkg, bzip2 12861900 tpkg, 7zip 10346173 It currently extracts gzip, bzip2, tar, zip/jar, deb, vmlinuz ([b]zImage) and dictzip formats. If you're aware of any other compression or archive formats that are used in Debian, please let me know. I am working on adding support for other types of compressed file / archive that is used in Debian packages. I wrote a hacky script to extract and recompress zimages which just looks for the gzip header - does anyone know if there is a more reliable way to work out where the gzipped data starts in a zimage? I'm working on "tpkg" because I have a crazy idea to start another Debian-based distro, and I can't afford to waste space iand bandwidth on my server!! I want to be able to do binary package updates using rsync. I got some inspiration for tpkg from reading about Ubuntu people doing something like this to open-office. I don't know if there is another program that does this already (maybe they wrote one?).