#7278 HIGH Update.: Browse autocompletion should match on "beginnings" of URIs

Sat Jun 14 19:38:34 EDT 2008

#7278: Browse autocompletion should match on "beginnings" of URIs
-----------------------------+----------------------------------------------
 Reporter:  Eben             |       Owner:  marco                            
     Type:  defect           |      Status:  new                              
 Priority:  high             |   Milestone:  Update.2 (8.2.0)                 
Component:  browse-activity  |     Version:  Development build as of this date
 Keywords:                   |    Verified:  0                                
 Blocking:                   |   Blockedby:                                   
-----------------------------+----------------------------------------------
 I tested out the autocompletion in Browse.  I really like that matches are
 found anywhere within the page titles.  I found it somewhat confusing that
 the same was true for URIs, as I would frequently have many false
 positives getting in the way, and moreover ones which had nothing to do at
 all with the domain I was referencing. More importantly, one usually types
 a URI from left to right, general to specific, and so it makes sense to
 match only on the beginnings of strings, such that the suggested options
 are all a "subset" of the portion of the URI which has been typed so far.

 That said, there's a trick to getting this right.  We need to isolate the
 URI scheme and the subdomain from the rest, so that we can do smart
 matching:

 {{{
  [scheme:(//)][subdomain.][[domain][/path/to/resource][?query][#fragment]]
 }}}

 In this manner, we check for matches at the "beginning" only, where the
 beginning could be the beginning of: the scheme, the subdomain, or the
 "rest" (beginning with the domain).

  * If a match is made on the scheme, only entries having that scheme
 should be matched on.
  * If a match is made on the subdomain, only entries having that subdomain
 should be matched on, but all schemes for that subdomain are considered
 matches.
  * If a match is made on the domain, only entries having that domain
 ''and'' no subdomain should be matched, but all schemes for that domain
 are considered matches.
  * As an exception, we should always match 'www.domain' and 'domain'
 equally. (that is, 'www.domain' should match on entries for 'domain'
 (without 'www.') and vice versa)
  * The '//' (which is technically part of the hierarchical part; in other
 words, the first two chars of either the domain or the subdomain) is
 considered as part of the scheme for our matching purposes, so that we
 match on "http://domain..." and "domain..." equally.  No one ever types it
 without specifying the scheme as well; in fact, you can't...it will look
 on localhost instead.

 In all cases, we only match on entries which have a perfect string match
 for the typed portion of the URI ''from the index at which a match
 (scheme, subdomain, or domain) was found''.

 I'm pretty sure this is less complicated than I just made it sound.  Feel
 free to suggest better options, or poke holes in my design.  For further
 info on URIs, see wikipedia [1]

 [1] http://en.wikipedia.org/wiki/URI_scheme

-- 
Ticket URL: <http://dev.laptop.org/ticket/7278>
One Laptop Per Child <http://laptop.org/>
OLPC bug tracking system