[Sugar-devel] The quest for data

Sameer Verma sverma at sfsu.edu
Mon Jan 6 15:04:01 EST 2014

On Mon, Jan 6, 2014 at 12:28 AM, Martin Dluhos <martin at gnu.org> wrote:
> On 3.1.2014 04:09, Sameer Verma wrote:
>> Happy new year! May 2014 bring good deeds and cheer :-)
>> Here's a blog post on the different approaches (that I know of) to data
>> gathering across different projects. Do let me know if I missed anything.
>> cheers,
>> Sameer
>> http://www.olpcsf.org/node/204
> Thanks for putting together the summary, Sameer. Here is more information about
> my xo-stats project:
> The project's objective is to determine how XOs are used in Nepalese
> classrooms, but I am intending for the implementation to be general enough, so
> that it can be reused by other deployments as well. Similarly to other projects
> you've mentioned, I separated the project into four stages:
> 1) collecting data from the XO Journal backups on the schoolserver
> 2) extracting the data from the backups and storing it in an appropriate format
> for analysis and visualization
> 3) statistically analyzing and visualizing the captured data
> 4) formulating recommendations for improving the program based on the analysis.
> Stage 1 is already implemented on both the server side as well as the client
> side, so I first focused on the next step of extracting the data. Initially, I
> wanted to reuse an existing script, but I eventually found that none of them
> were general enough to meet my criteria. One of my goals is to make the script
> work on any version of Sugar.
> Thus, I have been working on process_journal_stats.py, which takes a '/users'
> directory with XO Journal backups as input, pulls out the Journal metadata and
> outputs them in a CSV or JSON file as output.
> Journal backups can be in a variety of formats depending on the version
> of Sugar. The script currently supports backup format present in Sugar versions
> 0.82 - 0.88 since the laptops distributed in Nepal are XO-1s running Sugar
> 0.82. I am planning to add support for later versions of Sugar in the next
> version of the script.
> The script currently supports two ways to output statistical data. To produce
> all statistical data from the Journal, one row per Journal record:
>     process_journal_stats.py all
> To extract statistical data about the use of activities on the system, use:
>     process_journal_stats.py activity
> The full documentation with all the options are described in README at:
> https://github.com/martasd/xo-stats
> One challenge of the project has been determining how much data processing to do
> in the python script and what to leave for the data analysis and visualization
> tools later in the workflow. For now, I stopped adding features to the script
> and I am  evaluating the most appropriate tools to use for visualizing the data.
> Here are some of the questions I am intending to answer with the visualizations
> and analysis:
> * How many times do installed activities get used? How does the activity use
> differ over time?
> * Which activities are children using to create files? What kind of files are
> being created?
> * Which activities are being launched in share-mode and how often?
> * Which part of the day do children play with the activities?
> * How does the set of activities used evolve as children age?
> I am also going to be looking how answers to these questions vary from class to
> class, school to school, and region to region.
> As Martin Abente and Sameer mentioned above, our work needs to be informed by
> discussions with the stakeholders- children, educators, parents, school
> administrators etc. We do have educational experts among the staff at OLE, who
> have worked with more than 50 schools altogether, and I will be talking to them
> as I look beyond answering the obvious questions.

We should start a list on the wiki to collate this information. I'll
get someone from Jamaica to provide some feedback as well.

> For visualization, I have explored using LibreOffice and SOFA, but neither of
> those were flexible to allow for customization of the output beyond some a few
> rudimentary options, so I started looking at various Javascript libraries, which
> are much more powerful. Currently, I am experimenting with Google Charts, which
> I found the easiest to get started with. If I run into limitations with Google
> Charts in the future, others on my list are InfoVIS Toolkit
> (http://philogb.github.io/jit) and HighCharts (http://highcharts.com). Then,
> there is also D3.js, but that's a bigger animal.

Keep in mind that if you want to visualize at the school's local
XS[CE] you may have to rely on a local js method instead of an online

> Alternatively or perhaps in parallel, I am also willing to join efforts to
> improve the OLPC Dashboard, which is trying to answer very similar questions to
> mine.

I'll ping Leotis (cc'd) to push his dashboard code to github, so we
don't reinvent.


> I am looking forward to collaborating with everyone who is interested in
> exploring ways to analyze and visualize OLPC/Sugar data in a interesting and
> meaningful way.
> Cheers,
> Martin

More information about the Devel mailing list