Compression of HTML files

Ian Bicking ianb at colorstudy.com
Wed Jul 25 15:48:39 EDT 2007


Samuel Klein wrote:
>> Arael8 (Petr?) posted a zip file of HTML files,
>> http://dictionary.110mb.com/files/short-wiki.zip
>>
>> The files are pretty small, almost all from 500 bytes to 1.5K.
>>
>> Here's the compressed sizes:
>>
>> 4.6M    raw
>> 1.3M    short-wiki.zip
>> 704K    short-wiki.tar.gz
>> 496K    short-wiki.tar.bz2
>> 3.8M    gzipped
>> 3.8M    gzipped-1
>> 3.8M    bzipped
> 
> are the gzipped and bzipped files really the same size?

Actually, don't trust these sizes at all.  I realize du is being smart, 
and showing the actual disk space used, on my ext3 filesystem.  Which is 
not what I want.  It takes into account, I believe, that I have an inode 
size of 4K (or maybe 2K), which means any file less than 4K takes up 4K 
anyway.  As a result the compression can't do much good.

I think du --apparent-size is going to be more accurate:

2729	raw
1265	short-wiki.zip
490	short-wiki.tar.bz2
697	short-wiki.tar.gz
1214	gzipped
1273	gzipped-1
1277	bzipped

It still doesn't take into account the overheads of JFFS2 (which I'm 
guessing are less than ext3, but still exist), but I'm emailing the dev 
list to ask about that.


-- 
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org



More information about the Library mailing list