[Sugar-devel] The quest for data

Mon Jan 6 15:08:33 EST 2014

On Mon, Jan 6, 2014 at 12:04 PM, Sameer Verma <sverma at sfsu.edu> wrote:
> On Mon, Jan 6, 2014 at 12:28 AM, Martin Dluhos <martin at gnu.org> wrote:
>> On 3.1.2014 04:09, Sameer Verma wrote:
>>> Happy new year! May 2014 bring good deeds and cheer :-)
>>>
>>> Here's a blog post on the different approaches (that I know of) to data
>>> gathering across different projects. Do let me know if I missed anything.
>>>
>>> cheers,
>>> Sameer
>>>
>>> http://www.olpcsf.org/node/204
>>
>> Thanks for putting together the summary, Sameer. Here is more information about
>> my xo-stats project:
>>
>> The project's objective is to determine how XOs are used in Nepalese
>> classrooms, but I am intending for the implementation to be general enough, so
>> that it can be reused by other deployments as well. Similarly to other projects
>> you've mentioned, I separated the project into four stages:
>>
>> 1) collecting data from the XO Journal backups on the schoolserver
>> 2) extracting the data from the backups and storing it in an appropriate format
>> for analysis and visualization
>> 3) statistically analyzing and visualizing the captured data
>> 4) formulating recommendations for improving the program based on the analysis.
>>
>> Stage 1 is already implemented on both the server side as well as the client
>> side, so I first focused on the next step of extracting the data. Initially, I
>> wanted to reuse an existing script, but I eventually found that none of them
>> were general enough to meet my criteria. One of my goals is to make the script
>> work on any version of Sugar.
>>
>> Thus, I have been working on process_journal_stats.py, which takes a '/users'
>> directory with XO Journal backups as input, pulls out the Journal metadata and
>> outputs them in a CSV or JSON file as output.
>>
>> Journal backups can be in a variety of formats depending on the version
>> of Sugar. The script currently supports backup format present in Sugar versions
>> 0.82 - 0.88 since the laptops distributed in Nepal are XO-1s running Sugar
>> 0.82. I am planning to add support for later versions of Sugar in the next
>> version of the script.
>>
>> The script currently supports two ways to output statistical data. To produce
>> all statistical data from the Journal, one row per Journal record:
>>
>>     process_journal_stats.py all
>>
>> To extract statistical data about the use of activities on the system, use:
>>
>>     process_journal_stats.py activity
>>
>> The full documentation with all the options are described in README at:
>>
>> https://github.com/martasd/xo-stats
>>
>> One challenge of the project has been determining how much data processing to do
>> in the python script and what to leave for the data analysis and visualization
>> tools later in the workflow. For now, I stopped adding features to the script
>> and I am  evaluating the most appropriate tools to use for visualizing the data.
>>
>> Here are some of the questions I am intending to answer with the visualizations
>> and analysis:
>>
>> * How many times do installed activities get used? How does the activity use
>> differ over time?
>> * Which activities are children using to create files? What kind of files are
>> being created?
>> * Which activities are being launched in share-mode and how often?
>> * Which part of the day do children play with the activities?
>> * How does the set of activities used evolve as children age?
>>
>> I am also going to be looking how answers to these questions vary from class to
>> class, school to school, and region to region.
>>
>> As Martin Abente and Sameer mentioned above, our work needs to be informed by
>> discussions with the stakeholders- children, educators, parents, school
>> administrators etc. We do have educational experts among the staff at OLE, who
>> have worked with more than 50 schools altogether, and I will be talking to them
>> as I look beyond answering the obvious questions.
>>
>
> We should start a list on the wiki to collate this information. I'll
> get someone from Jamaica to provide some feedback as well.
>
>> For visualization, I have explored using LibreOffice and SOFA, but neither of
>> those were flexible to allow for customization of the output beyond some a few
>> rudimentary options, so I started looking at various Javascript libraries, which
>> are much more powerful. Currently, I am experimenting with Google Charts, which
>> I found the easiest to get started with. If I run into limitations with Google
>> Charts in the future, others on my list are InfoVIS Toolkit
>> (http://philogb.github.io/jit) and HighCharts (http://highcharts.com). Then,
>> there is also D3.js, but that's a bigger animal.
>
> Keep in mind that if you want to visualize at the school's local
> XS[CE] you may have to rely on a local js method instead of an online
> library.
>
>>
>> Alternatively or perhaps in parallel, I am also willing to join efforts to
>> improve the OLPC Dashboard, which is trying to answer very similar questions to
>> mine.
>
> I'll ping Leotis (cc'd) to push his dashboard code to github, so we
> don't reinvent.
>

For those who haven't seen the protoype that Leotis has (demo'd at
OLPCSF summit), here it is using mock data. It's a prototype, so be
gentle :-)

http://108.171.173.65:8000/

cheers,
Sameer

> cheers,
> Sameer
>
>>
>> I am looking forward to collaborating with everyone who is interested in
>> exploring ways to analyze and visualize OLPC/Sugar data in a interesting and
>> meaningful way.
>>
>> Cheers,
>> Martin