[Server-devel] [XSCE] Re: Search engines and local git repos

Tim Moody tim at timmoody.com
Tue Feb 17 14:12:01 EST 2015


OK. That looks promising.  May want to leave out the zims.  iiab can take 
several days to index a single zim.



-----Original Message----- 
From: Anish Mangal
Sent: Tuesday, February 17, 2015 2:04 PM
To: xsce-devel at googlegroups.com ; server-devel
Subject: Re: [XSCE] Re: Search engines and local git repos

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

So in the settings sections I see a number of options

Web (which I guess would be html)
Database (SQL, Mongo)
Files
REST - not quite sure what this would do

Also there is this
https://github.com/opensearchserver/oss-text-extractor
(didn't check if it came bundled with the rpm - there's a separate rpm
from a different source as well). From the documentation

- ----

An open source RESTFul Web Service for text extraction and analysis.
oss-text-extractor supports various binary formats.

    Word processor (doc, docx, odt, rtf)
    Spreadsheet (xls, xlsx, ods)
    Presentation (ppt, pptx, odp)
    Publishing (pdf, pub)
    Web (rss, html/xhtml)
    Medias (audio, images)
    Others (vsd, text, markdown)

- ----

Seems quite useful.

Also, I think the usefulness of a search engine would go up when there
is more student/teacher generated content.

Cheers,
Anish



On Wednesday 18 February 2015 12:27 AM, Tim Moody wrote:
> The problem with indexing is that it's a lot easier with text
> files (like html) than binary files like pdf, doc, zim, etc.  iiab
> and kiwix can both index zims, which is how we search wikis, but a
> lot of our content is in binary files.  A quick look at
> opensearchserver makes me think they mainly do html.
>
>
> -----Original Message----- From: Anish Mangal Sent: Tuesday,
> February 17, 2015 1:37 PM To: server-devel ; xsce-devel Subject:
> [XSCE] Re: Search engines and local git repos
>
> FWIW, I haven't checked for the existence of ARM packages. For
> OpenSearchServer, the major dependency seemed to be java, so it
> might not be quite so difficult there. (I hope)
>
> Not sure of gitlab.
>
> On Tuesday 17 February 2015 11:41 PM, Anish Mangal wrote:
>> Hi,
>
>> So I've been playing around with various things which might be
>> added to the XSCE and the two that I came across which seem
>> quite straightforward to setup are (basically installing an rpm
>> package and 1-2 small config steps)
>
>> 1. OpenSearchServer - An offline search engine for content on
>> the XSCE
>
>> http://www.opensearchserver.com/
>
>> Use case: There may be tons of content stored under many
>> different web services on the xsce. Instead of going through each
>> service and manually searching/browsing, one may simply want to
>> .. 'google' :)
>
>> Caveat(s): * Crawling seems a cpu intensive process, but with
>> proper scheduling, it could be handled * Havent tested with IIAB
>> files yet. If anyone has an IIAB dataset online, please let me
>> know
>
>> My experiments:
>
>> * I basically installed the rpm a f21 VM and it works out of the
>> box! Has a detailed admin interface which basically controls
>> behavior to search and index webpages, databases, mailboxes etc.
>> and present to the user as a simple search box
>
>> Next step:
>
>> * Playbook
>
>
>
>> 2. gitlab - github for the xsce
>
>> https://about.gitlab.com/
>
>> Does what it says on the cover .. limited use only if some kids
>> want to develop code
>
>> Possible integration with other projects like gitenberg (Seth
>> Woodworth) in the future... needs lots of exploration
>
>> My experiments:
>
>> * Install and test the provided rpm packages. Instructions
>> worked out of the box
>
>> Next step:
>
>> * Playbook
>
>
>> Thoughts, Anish
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJU45DRAAoJEBoxUdDHDZVpD7oIAIBXi+oe9IGUOnKvoIE4hITe
k2nlgnVWoR3KlprH2KtFNhV6O/7+k8lvNkZJ4a/FwrGQcXmY060vqj2JFldpUHVw
wqsJS63PqL1rLxz+uQXT5juXyS6IZ+gwBXLPwV0+65M7cIucQBHRu2u+sLU2R+Pt
KM3CyUnaArUDxMUkJao9PchC7LtSmhcjaO0cljUIq/x3wKeMenmLOtZj/eYsn/7a
TX/PuQb2M9J8swydYu6ex3U9Nb6koJNSXInIxIOOmzCQvfaBSRxIGoU0ZhlaUIu0
pBSVNoQMuj15u4DAS1s/+HtVnNdoA0+nnNVrmqNOdFSECklZlgppaIK1G4DcD38=
=YOFB
-----END PGP SIGNATURE----- 



More information about the Server-devel mailing list