[Sugar-devel] The quest for data

Sameer Verma sverma at sfsu.edu
Fri Jan 10 15:35:20 EST 2014


On Thu, Jan 9, 2014 at 10:10 PM, Anish Mangal <anish at activitycentral.com>wrote:

> Sorry for being late to the party. Clearly the "quest for data" is a
> commonly shared one, with many different approaches, questions, and
> reporting/results.
>
> One of the already mentioned solutions is the sugar-stats package,
> originally developed by Aleksey, which have now been part of dextrose-sugar
> builds for over a year, and the server side (xsce).
>
> http://wiki.sugarlabs.org/go/Platform_Team/Usage_Statistics
>
> The approach we followed was to collect as much data as possible without
> interfering with sugar-apis or code. The project has made slow progress on
> the visualization front, but the data collection front has already been
> field tested.
>
>
> I for one think there are a few technical trade-offs, which lead to larger
> strategy decisions:
> * Context v/s Universality ... Ideally we'd like to collect (activity)
> context specific data, but that requires tinkering with the sugar api
> itself and each activity. The other side is we might be ignoring the other
> types of data a server might be collecting ... internet usage and the
> various other logfiles in /var/log
>
> * Static v/s Dynamic ... Analyzing journal backups is great, but they are
> ultimately limited in time resolution due to the datastore's design itself.
> So the key question being "what's valuable?" ... a) Frequency counts of
> activities? b) Data such as upto the minute resolution of what activities
> are running, which activity is active (visible & when), collaborators over
> time ... etc ...
>
> In my humble opinion, the next steps could be:
> 1 Get better on the visualization front.
> 2 Search for more context. Maybe arm the sugar-datastore to collect higher
> resolution data.
>
>
>
1 and 2 can be done in parallel. As long as the architecture is
independent, the data sources can be sugar-datastore or sugar-stats.

BTW, Leotis has pushed his OLPC Dashboard code to github:
https://github.com/Leotis/olpc-datavisualization-

cheers,
Sameer


>
> On Tue, Jan 7, 2014 at 12:24 PM, Christophe Guéret <
> christophe.gueret at dans.knaw.nl> wrote:
>
>> Dear Sameer, all,
>>
>> That's a very interesting blog post and discussion. I agree that
>> collecting data is important but knowing that are the questions aimed to be
>> answered with that data is even more so. If you need help with that last
>> bit, I could propose to use the journal data as a use-case for the project
>> KnowEscape ( http://knowescape.org/ ). This project is about getting
>> insights out of large knowledge spaces via visualisation. There is wide
>> (European) community of experts behind it coming from different research
>> fields (humanities, physic, computer science, ...). Something useful could
>> maybe come out...
>>
>> I would also like to refer you to the project ERS we have now almost
>> finished. This project is an extension of the ideas behind SemanticXO some
>> of you may remember. We developed a decentralised entity registry system
>> with the XO as a primary platform for coding and testing. There is a
>> description of the implementation and links to code on
>> http://ers-devs.github.io/ers/ . We also had a poster at OLPC SF (thanks
>> for that !).
>>
>> In a nutshell, ERS creates global and shared knowledge spaces through
>> series of statements. For instance, "Amsterdam is in the Netherlands" is a
>> statement made about the entity "Amsterdam" relating it to the entity "the
>> Netherlands". Every user of ERS may want to either de-reference an entity
>> (*e.g.*, asking for all pieces of information about "Amsterdam") or
>> contribute to the content of the shared space by adding new statements.
>> This is made possible via "Contributors" nodes, one of the three types of
>> node defined in our system. Contributors can interact freely with the
>> knowledge base. They themselves take care of publishing their own
>> statements but cannot edit third-party statements. Every set of statements
>> about a given entity contributed by one single author is wrapped into a
>> document in couchDB to avoid conflicts and enable provenance tracking.
>> Every single XO is a Contributor. Two Contributors in a closed P2P network
>> can freely create and share Linked Open Data. In order for them to share
>> data with another closed group of Contributors, we haves "Bridges". A
>> Bridge is a relay between two closed networks using the internet or any
>> other form of direct connection to share data. Two closed communities, for
>> example two schools, willing to share data can each setup one Bridge and
>> connect these two nodes to each other. The Bridges will then collect and
>> exchange data coming from the Contributors. These bridges are not
>> Contributors themselves, they are just used to ship data (named graphs)
>> around and can be shut-down or replaced without any data-loss. Lastly, the
>> third component we define in our architecture is the "Aggregator". This is
>> a special node every Bridge may push content to and get updated content
>> from. As its name suggests, an Aggregator is used to aggregate entity
>> descriptions that are otherwise scattered among all the Contributors. When
>> deployed, an aggregator can be used to access and expose the global content
>> of the knowledge space or a subset thereof.
>>
>> One could use ERS to store (part of) the content of the Journal on an XO
>> (Contributor), cluster information as the school level (Bridge put on the
>> XS) and provide higher level analysis (Aggregator). The best things about
>> ERS, I think is that:
>> * It can store and share any data that consists of property/values about
>> a given thing identified with a unique identifier
>> * It is "off-line by default", all the upper level components are
>> optional. So is the connectivity to them
>> * It's conservative in terms of bandwidth used
>>
>> The creation of graphs could be done at every level to get some
>> statistics on the XO, on the XS and at a more global level. All these
>> potentially using the same code as the data is always stored using the same
>> model (a variant of JSON-LD).
>>
>> We are now finalising a small social-networking activity to demo&test
>> ERS. You can easily play with it using the virtual images we put on the
>> site. Here is a video showing it running: https://vimeo.com/81796228
>>
>> Please have a look and let us know how what you think of it :-) The
>> project is still funded for a bit less than three months and we would
>> really like it to be useful for the OLPC community (that's why we targeted
>> the XO) so don't hesitate to ask for missing features!
>>
>> Cheers,
>> Christophe
>>
>> On 6 January 2014 02:03, Andreas Gros <andigros72 at gmail.com> wrote:
>>
>>> Great utilization of CouchDB and its views feature! That's definitely
>>> something we can build on. But more importantly, to make this meaningful,
>>> we need more data.
>>> It's good to know what the activities are that are used most, so one can
>>> come up with a priority list for improvements, and/or focus developer
>>> attention.
>>> CouchDB allows to pull data together from different instances, which
>>> should make aggregation and comparisons between projects possible. And for
>>> projects that are not online, the data could be transferred to a USB stick
>>> quite easily and then uploaded to any other DB instance.
>>>
>>> Is there a task/todo list somewhere?
>>>
>>> Andi
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, Jan 3, 2014 at 11:16 AM, Sameer Verma <sverma at sfsu.edu> wrote:
>>>
>>>>  On Fri, Jan 3, 2014 at 4:15 AM, Martin Abente
>>>> <martin.abente.lahaye at gmail.com> wrote:
>>>> > Hello Sameer,
>>>> >
>>>> > I totally agree we should join efforts for a visualization solution,
>>>> but,
>>>> > personally, my main concern is still a  basic one: what are the
>>>> important
>>>> > questions we should be asking? And how can we answer these questions
>>>> > reliably? Even though most of us have experience in deployments and
>>>> their
>>>> > needs, we are engineers, not educators, nor decision makers.
>>>> >
>>>>
>>>> Agreed. It would be helpful to have a conversation on what the various
>>>> constituencies need (different from want) to see at their level. The
>>>> child, the parents/guardians, the teacher, the
>>>> principal/administrator, and educational bureaucracy. We should also
>>>> consider the needs of those of us who have to fundraise by showing
>>>> progress of ongoing effort.
>>>>
>>>> > I am sure that most of our collection approaches cover pretty much the
>>>> > trivial stuff like: what are they using, when are they using it, how
>>>> often
>>>> > they use it, and all kind of things that derive directly from journal
>>>> > metadata. Plus the extra insight that comes when considering different
>>>> > demographics
>>>>
>>>> True. Basic frequency counts such as frequency of use of activities,
>>>> usage by time of day, day of week, scope of collaboration are a few
>>>> simple one. Comparison of one metric vs the other will need more
>>>> thinking. That's where we should talk to the constituents.
>>>>
>>>> >
>>>> > But, If we could also work together on that (including the trivial
>>>> > questions), it will be a good step forward. Once we identify these
>>>> questions
>>>> > and figure out how to answer them, it would be a lot easier to think
>>>> about
>>>> > visualization techniques, etc.
>>>>
>>>> If the visualization subsystem (underlying tech pieces) are common and
>>>> flexible, then we can start with a few basic templates, and make it
>>>> extensible, so we can all aggregate, collate, and correlate as needed.
>>>> I'll use an example that I'm familiar with. We looked at CouchDB for
>>>> two reasons: 1) It allows for sync over intermittent/on-off
>>>> connections to the Internet and 2) CouchDB has a "views" feature which
>>>> provides selective subsets of the data, and the "reduce" feature does
>>>> aggregates. The actual visual is done in Javascript. Here's the
>>>> example Leotis had at the OLPC SF summit
>>>> (http://108.171.173.65:8000/).
>>>> >
>>>> > What you guys think?
>>>> >
>>>>
>>>> A great start for a great year ahead!
>>>>
>>>> > Saludos,
>>>>
>>>> cheers,
>>>> > tch.
>>>> Sameer
>>>> _______________________________________________
>>>> Sugar-devel mailing list
>>>> Sugar-devel at lists.sugarlabs.org
>>>> http://lists.sugarlabs.org/listinfo/sugar-devel
>>>>
>>>
>>>
>>
>>
>> --
>> Onderzoeker
>> +31(0)6 14576494
>> christophe.gueret at dans.knaw.nl
>>
>> *Data Archiving and Networked Services (DANS)*
>>
>> DANS bevordert duurzame toegang tot digitale onderzoeksgegevens. Kijk op
>> www.dans.knaw.nl voor meer informatie. DANS is een instituut van KNAW en
>> NWO.
>>
>>
>> Let op, per 1 januari hebben we een nieuw adres:
>>
>> DANS | Anna van Saksenlaan 51 | 2593 HW Den Haag | Postbus 93067 | 2509
>> AB Den Haag | +31 70 349 44 50 | info at dans.knaw.nl <info at dans.kn> |
>> www.dans.knaw.nl
>>
>>
>> *Let's build a World Wide Semantic Web!*
>> http://worldwidesemanticweb.org/
>>
>> *e-Humanities Group (KNAW)*
>> [image: eHumanities] <http://www.ehumanities.nl/>
>>
>> _______________________________________________
>> Sugar-devel mailing list
>> Sugar-devel at lists.sugarlabs.org
>> http://lists.sugarlabs.org/listinfo/sugar-devel
>>
>>
>
> _______________________________________________
> Devel mailing list
> Devel at lists.laptop.org
> http://lists.laptop.org/listinfo/devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.laptop.org/pipermail/devel/attachments/20140110/21ee8fa4/attachment.html>


More information about the Devel mailing list