[Trac #1466] datastore chokes on non-ascii file

Zarro Boogs per Child bugtracker at laptop.org
Sun May 13 11:09:26 EDT 2007


#1466: datastore chokes on non-ascii file
-----------------------+----------------------------------------------------
 Reporter:  tomeu      |       Owner:  bcsaller 
     Type:  defect     |      Status:  new      
 Priority:  normal     |   Milestone:  Untriaged
Component:  datastore  |     Version:           
 Keywords:             |  
-----------------------+----------------------------------------------------
 When adding a pdf file to the datastore, I get this exception:
 {{{
     Traceback (most recent call last):
       File "/home/tomeu/sugar-jhbuild/build/lib/python2.4/doctest.py",
 line 1248, in __run
         compileflags, 1) in test.globs
       File "<doctest sugar_demo_may17.txt[28]>", line 1, in ?
         ds.update(uid, dict(title="Same entry with some content in pdf"),
 'test.pdf')
       File "/home/tomeu/sugar-jhbuild/build/lib/python2.4/site-
 packages/olpc/datastore/datastore.py", line 195, in update
         self.querymanager.update(uid, props, filelike)
       File "/home/tomeu/sugar-jhbuild/build/lib/python2.4/site-
 packages/olpc/datastore/query.py", line 107, in update
         if file: self.fulltext_index(content, file)
       File "/home/tomeu/sugar-jhbuild/build/lib/python2.4/site-
 packages/olpc/datastore/query.py", line 435, in fulltext_index
         self._ft_index(content.id, fp, piece)
       File "/home/tomeu/sugar-jhbuild/build/lib/python2.4/site-
 packages/olpc/datastore/query.py", line 438, in _ft_index
         doc = [piece(p) for p in fp]
       File "build/bdist.linux-i686/egg/lemur/xapian/sei.py", line 677, in
 __init__
     UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
 1: ordinal not in range(128)
 }}}

 This in tests/sugar_demo_may17.txt will trigger the exception:
 {{{
 @@ -49,4 +55,7 @@ Check content:
  'some other content\n'
  >>> fp.close()

 +Set content as pdf:
 +>>> ds.update(uid, dict(title="Same entry with some content in pdf"),
 'test.pdf')
 +
  >>> del ds
 }}}

 Note that even if we don't have a decoder for a file type, the DS should
 *not* raise an exception thus rejecting the object. We should be able to
 store files that we cannot index.

-- 
Ticket URL: <http://dev.laptop.org/ticket/1466>
One Laptop Per Child <http://laptop.org/>



More information about the Bugs mailing list