an interesting filesystem challenge: static pull of wiki.laptop.org

Martin Langhoff martin.langhoff at gmail.com
Wed Nov 12 15:46:49 EST 2008


On Wed, Nov 12, 2008 at 3:23 PM, C. Scott Ananian <cscott at cscott.net> wrote:
> apache seems to perform reasonably well serving files from such huge
> directories.  Should I be concerned?  Can anyone suggest:
...
>  b) whether reformatting with reiserfs or some other filesystem is
> worth the trouble?  ext3 already has btree-structured directories, so
> reiserfs isn't quite the obvious win it used to be.

According to Linus, Ted T'so and friends when we hit a similar problem
with git (before it did hashed dirs), The Answer Is XFS. This problem
(lookups in large directories) also affects Maildir-based IMAP
servers, and XFS is a favourite in that segment too.

XFS is trouble in many other aspects, but directories sure can do. So
I wouldn't do it for the whole system disk - just a dedicated
partition.

>  c) patched wget or other tool that will actually honor robot
> exclusion directives in <meta> tags in page headers?  wget seems to
> honor 'nofollow', but mediawiki uses <meta name="robots"
> content="noindex,nofollow" /> in the <head> of edit and printable
> pages, which isn't sufficient to convince wget to delete the file it
> just downloaded.

ISTR lwp-rget being reasonably good at this.

cheers,



m
-- 
 martin.langhoff at gmail.com
 martin at laptop.org -- School Server Architect
 - ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 - http://wiki.laptop.org/go/User:Martinlanghoff



More information about the Devel mailing list