Wednesday, December 21, 2016

A short time spent on LibreOffice accessibility

LibreOffice and Orca

In the last months I have a short time period fixing accessibility issues mainly on Linux. LibreOffice has a bunch of this kind of issues (fdo#36549). This metabug is about those bugs which makes it difficult for Orca screen reader to make LO usable for visually impaired users. As I see Orca has a few workarounds in it's LO related code to handle these issues (e.g. ignoring false or duplicated events), but some times there is no such solution and we need to add improvements on LO side.

Small fixes

So I did some bug fixing on this area. Most of the bugs were about missing accessibility events, which are needed for Orca to handle events which are visible on the screen and so users should notice these changes. For example when the selection is changed on a Calc sheet (fdo#93825) or when the cursor moves inside a text portion (fdo#99687, fdo#71435). These issues can be frustrating for users who used to get feedback about the effect of their keyboard key pressing.

Fixing these issues needed small code changes, which shows LO accessibility code has a good structure in general, but as the code changes in time, some parts of this code just becomes broken, without maintanance.

Spellcheck dialog

A bit bigger change, I added, was related to the spellcheck dialog (fdo#93430). Spellcheck dialog shows the errors spellcheck algorithms find and shows some options to handle these errors (e.g. suggestions for correction). The problem with this dialog was with the text entry which shows the misspelled word. This text entry contains a small part of the text and highlights the misspelled word with red text color. Orca tried to get this small text part and find out which word is the erroneous one, but LO did not return the right text attributes and so Orca did not have the necessary information to handle this situation.

Now, after fixing this issue text color and also the boundaries of misspelled word are accessible for Orca. Great to see that Orca's developer, Joanmarie Diggs already adapted the code to handle these new information and so reading of this spellcheck dialog will be better in the next versions of the two softwares.

BeLin

I added these accessibility improvements working for IT Foundation for the Visually Impaired. One project of the Foundation is an Ubuntu based operating system for visually impaired users called BeLin ("Beszélő Linux", which is "Speaking Linux" in English). Since it's an Ubuntu based distribution it has LibreOffice as default office suite and uses Orca as screen reader.

Hopefully these change will make more comfortable to use LibreOffice and Orca both on BeLin and on other Linux distributions.

Friday, November 25, 2016

New pivot table function in Calc: Median

After have some work with a pivot table related performance issue, I've got a request to implement a new function for pivot tables. It seemed a useful feature to have and also an easy thing to implement at the first sight.

Pivot table functions

Pivot tables are used to analyse a larger amount of data using different statistic functions for that. Both LibreOffice Calc and Microsoft Office Excel have the same function palette with 11 functions like average, sum, count and so on. These aggregate functions can be used for data fields and for row/column fields. Data fields determines which source field would be summarized and which function would be used for that. Row/column fields don't have a function by default, but with setting one user can calculate subtotals too.

Median

Why median? In psychology research pivot tables can be a useful thing to analyze data of the participants. It depends on the type of the data which functions can be used for aggregation. When we have interval variables we can use average, but for ordinal variables we would need median, which was missing from the function list.

This was the starting point, but I've also found some user posts about missing this feature from MS Excel (see links below). It seems Excel users had to face the very same problem again and again over the last ten years. Well, it has some advantages if a software is open source, I guess.

Good news

In LibreOffice 5.3, median is available for pivot tables:

Thanks

... to my professor, Attila Krajcsi (Department of Cognitive Psychology, Eötvös Loránd University) for supporting the LibreOffice project, with the idea to have this new function and also with some course credits!

Thursday, October 20, 2016

Improve pivot table import performance (tdf#102694)

After a short break I got back to Collabora and I continue working on LibreOffice. This time I handled a performance issue in pivot table's import code, as part of our work for SUSE. In some cases importing an XLSX document containing more pivot tables took such a long time, that user could interpret it as a freeze.

After some testing we found that pivot table grouping is one of the points which makes import slow. We means me and Kohei Yoshida, who is an expert in this area of the code and helped me with understanding it. So grouping was in our focus this time.

Pivot table groups

In case of pivot tables we can make groups for columns of the source data. This groups can be name groups (general groups), number groups (e.g. number ranges) and date groups (e.g. grouped by year, month, day, etc.). Since this groups are related to the source, they rather part of the source than the pivot table layout. This is true both for MS Excel and LO Calc implementation. Effectively this means that when we have more pivot tables using the same source they will have the same groups too. In case of LO Calc this groups are stored in the pivot cache linked with the corresponding source range.

Performance

The performance issue here was that however these kind of pivot tables (having the same source) were linked to each other by the pivot cache, XLSX import ignored that and worked expecting pivot tables are fully independent. Which means that same groups were imported so many times as many tables referenced them. What makes it even slower, this kind of tables are linked to each other on a way that when one table's grouping is changed other tables are also affected.

This difference between internal handling of pivot tables an the XLSX import code came from that XLSX import code was written before groups became part of the pivot cache. So I actually needed to update the import code to follow the changes of pivot table internal implementation. With that, pivot table groups' import time became quite good. For example the test document I uploaded to bugzilla (tdf#102694) took more than 20 minutes before and now it takes less than a half of a minute. This document contains a small data table and 20 simple pivot tables. So it's not something which should take so much time to load and now it doesn't.