#2423 HIGH Trial-3: Journal does not find substrings
Zarro Boogs per Child
bugtracker at laptop.org
Sat Aug 4 20:01:37 EDT 2007
#2423: Journal does not find substrings
-------------------------------+--------------------------------------------
Reporter: bert | Owner: bcsaller
Type: defect | Status: new
Priority: high | Milestone: Trial-3
Component: interface-design | Version:
Resolution: | Keywords:
Verified: 0 |
-------------------------------+--------------------------------------------
Changes (by Eben):
* cc: krstic, marco, tomeu (added)
* owner: Eben => bcsaller
Comment:
Well, there's a lot to consider. We have 3 main things to search: titles,
tags (+ metadata), and text.
'''Case-sensetivity:''' I think that its a safe bet to ignore case
entirely in all aspects of the search. Its use as a means of
distinguishing proper nouns from arbitrary tags is small compared to the
vast number of misses that it would likely cause. Google has been doing
pretty well ignoring case for years.
'''Boolean logic:''' We'd like to default to OR logic because this will
always return a superset of the AND results, and we feel it's better to
provide excess matches than few or none. Refining search terms and
filters is much more pleasant than removing them to broaden a search. We
would, however, also like to support boolean search terms and
parenthetical grouping when they are explicitly entered. It might be nice
to allow use of '&', '|', and '!' as well as the localized strings for
'and', 'or' and 'not'.
'''Fuzzy search:''' I opened ticket #2645 regarding fuzzy matches for
search strings. Since the kids using these machines are going to be both
a) learning how to spell and b) learning how to type, we should expect
inaccuracies in their search queries, which some amount of fuzziness could
overcome. As mentioned there, I think titles and tags could use fuzzy
search, but it's not needed for full text.
'''Partial matching:''' I think that partial matching is absolutely
essential for titles, and I think it's probably also good to use for tags
as well. Since activity name, participants, etc should all be stored
within the metadata, this would allow me to search for "Writ" and find all
Write documents, or "Walt" and find all instances of activities I did with
Walter. Again, like fuzzy search, I think partial matching of full text
is unnecessary and would provide far too many results. If we really
wanted to, we could allow explicit use of wildcards to force partial
matches on text, but I think that's an edge case optimization. Note that
this partial match is not bidirectional: a search for "ca" should match
"cat", but another for "cathedral" should not. Also, phrases within single
or double quotes should require a full match (though again, perhaps
fuzzy).
'''Tags:''' Tags are explicitly entered by the user. We'd like to make
their input as format agnostic as possible, so that we aren't forcing any
given system upon the kids. To do so we'd like to lay a base rule,
stating that all tags are space delimited. This, of course, means that
you can't tag something as "white house" but only with the weaker pair of
tags "white" and "house." Again, we'd like to offer a few accommodations
so that, while not required, advanced users can enter more accurate tags.
For this reason, we can support 3 ways of entering a tag with spaces:
1. Washington, white house, capitol
2. washington "white house" capitol
3. washingtin white_house capitol
The first method assumes that, when commas are present, they are meant for
delimiting a list. Commas within quotes, if they exist, would be ignored.
The second method uses quotes - single or double - to group two or more
words into a single tag. Single quotes inside double quotes should be
ignored. The last method replaces the space with an underscore in order to
tie the words together. Of course, since we allow partial matches,
searching for "white" would return the entry regardless of how the tag was
entered. On the other hand, searching for "white house" or white_house
would return only those with both terms in the tag, title, or text.
'''Metadata:''' We associate any number of metadata with the objects in
the Journal. Most of the metadata is provided by the activity or by
sugar, but we also want to allow more advanced children to create their
own. Presently, the format for this is simply typing "key:value" pairs
within the tag field. Likewise, a search for a key:value pair should
return results only for entries which have the same key:value pair (with
fuzziness, perhaps). To be clear, searching for a key:value pair should
search all metadata, including that which wasn't created by the user
within the tag field.
There is some more background and additional thoughts on this in the
Journal section of the HIG:
http://wiki.laptop.org/go/OLPC_Human_Interface_Guidelines/The_Laptop_Experience/The_Journal#The_Power_of_Metadata
As an aside, it occurred to me that present Journal designs don't offer
any means of showing results based upon relevance. On the one hand, I
think that the time based view is quite important, and can even be so
within the results list. On the other hand, I wonder if we need to allow
the option, even though temporal sort will remain the default. Can we
improve the speed of the search if we don't attempt to provide relevance
rankings at all?
Oh, and I assume this is already the case, but we should be sure to apply
any selected filters prior to considering the query string to limit the
number of entries we have to check for matches against.
--
Ticket URL: <http://dev.laptop.org/ticket/2423#comment:4>
One Laptop Per Child <http://laptop.org/>
More information about the Bugs
mailing list