Discussion:
[frameworks-baloo] [Bug 400704] New: Baloo indexing I/O introduces serious noticable delays
Add Reply
Axel Braun
2018-11-05 15:22:30 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

Bug ID: 400704
Summary: Baloo indexing I/O introduces serious noticable delays
Product: frameworks-baloo
Version: 5.45.0
Platform: Other
OS: Linux
Status: REPORTED
Severity: major
Priority: NOR
Component: Baloo File Daemon
Assignee: baloo-bugs-***@kde.org
Reporter: ***@gmx.de
Target Milestone: ---

As suggested in https://bugs.kde.org/show_bug.cgi?id=333655#c73 , lets open a
new bug for baloo 5:

I'm running baloo 5.45.0 on openSUSE Leap 15, and notice that my complete
desktop freezes regularly for 1-2 minutes(!). CPU monitor reports during that
time 100% Load on both cores, but top does not show any process of a
considerable CU load. The problem is more the couple of baloo and akonadi, as
iotop shows:

Total DISK READ : 10.37 M/s | Total DISK WRITE : 1060.53 K/s
Actual DISK READ: 10.37 M/s | Actual DISK WRITE: 197.36 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
4497 idle axel 9.73 M/s 0.00 B/s 0.00 % 99.52 %
baloo_file_extractor
2847 idle axel 651.54 K/s 1058.15 K/s 0.00 % 97.97 %
akonadi_indexing_agent --identifier akonadi_indexing_agent
23 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.10 % [kworker/1:1]
2479 be/4 axel 0.00 B/s 0.00 B/s 0.00 % 0.08 % plasmashell
849 be/4 root 4.76 K/s 0.00 B/s 0.00 % 0.00 % [xfsaild/sda2]

(interesting percentage calculation of iotop by the way)

System disk is a SSD, data disk is a hybrid 1TB disk with 8G cache.
I have configured the search to not index the file content. Thats why the heavy
IO surprises me even more
--
You are receiving this mail because:
You are watching all bug changes.
Zane Tu
2018-11-05 15:38:02 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

Zane Tu <***@gmail.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmail.com
--
You are receiving this mail because:
You are watching all bug changes.
Nate Graham
2018-11-05 17:05:02 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham <***@kde.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@kde.org,
| |***@rwth-aachen.d
| |e
--
You are receiving this mail because:
You are watching all bug changes.
Stefan Brüns
2018-11-05 17:52:06 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #1 from Stefan Brüns <***@rwth-aachen.de> ---
Unfortunately even the two most fundamental databases in baloo, the Terms and
the FileNameTerms DBs, show O(M^2) behaviour on updates. Everytime e.g. a "pdf"
is changed, the associated value (i.e. the IDs of all matching documents) for
the "pdf" term is updated.

An update may happen in two cases:
1. an existing file is appended, tagged, renamed ...
2. an existing file is replaced by an updated one (i.e. application creates a
temporary file on saving and atomically replaces the old one).

For (1.), the update can be minimized, i.e. only updating the terms which have
actually changed. I have some experimental patches for this.

For (2.), the database scheme has to be changed significantly.
--
You are receiving this mail because:
You are watching all bug changes.
Axel Braun
2018-11-06 08:06:56 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #2 from Axel Braun <***@gmx.de> ---
Thanks for your explanation, Stefan. Although I dont know how I can influence
the behaviour. If I start the computer the next day I would not expect heavy
re-indexing.
Are - by default - the database stores for akonadi (~/.local/share/akonadi
)excluded from baloo indexing?
--
You are receiving this mail because:
You are watching all bug changes.
Mayeul Cantan
2018-11-10 22:00:53 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

Mayeul Cantan <***@live.fr> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@live.fr

--- Comment #3 from Mayeul Cantan <***@live.fr> ---
I came in to report the same problem. The system frequently freezes, with the
mouse not moving for a couple seconds, or the screen not being refreshed.

Regardless of what is causing high IO usage within baloo and akonadi, I
consider them background tasks (most of the time), and I would like to see them
prioritized as such.

Could baloorunner be ran with the equivalent of ionice -c 3 by default? (and
maybe nice as well). My CPU is quite beefy, but I suffer of I/O contention:

Arch Linux
Ryzen 7 2700X
8 GiB DDR4 2666
4TiB HDD system drive (WDC WD40EZRZ)

I will probably upgrade to a SSD at some point, but this is no excuse for a
background task to consume all of the available disk IO bandwidth ;)
--
You are receiving this mail because:
You are watching all bug changes.
Stefan Brüns
2018-11-10 22:05:57 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #4 from Stefan Brüns <***@rwth-aachen.de> ---
(In reply to Mayeul Cantan from comment #3)
Post by Mayeul Cantan
Could baloorunner be ran with the equivalent of ionice -c 3 by default? (and
baloo_file/baloo_file_extractor, which are the indexing task (i.e. the one
causing write accesses) are already running with lowest priority. baloorunner
is not relevant here.

Even with low priority, the kernel eventually has to flush the write buffers,
causing the high I/O latency for other tasks.
--
You are receiving this mail because:
You are watching all bug changes.
Axel Braun
2018-11-11 16:21:24 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #5 from Axel Braun <***@gmx.de> ---
(In reply to Stefan Brüns from comment #4)
Post by Stefan Brüns
Even with low priority, the kernel eventually has to flush the write
buffers, causing the high I/O latency for other tasks.
Should the I/O traffic from higher prioritized tasks not processed before as
well? I mean, if baloo does not get any CPU time, how can it create such a high
traffic? Looking at iotop, it is mostly a factor 100 to 1000 higher than other
tasks....
--
You are receiving this mail because:
You are watching all bug changes.
Mayeul Cantan
2018-11-12 14:13:20 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #6 from Mayeul Cantan <***@live.fr> ---
(In reply to Axel Braun from comment #5)
Post by Axel Braun
(In reply to Stefan Brüns from comment #4)
Post by Stefan Brüns
Even with low priority, the kernel eventually has to flush the write
buffers, causing the high I/O latency for other tasks.
Should the I/O traffic from higher prioritized tasks not processed before as
well? I mean, if baloo does not get any CPU time, how can it create such a
high traffic? Looking at iotop, it is mostly a factor 100 to 1000 higher
than other tasks....
From this link, it seems to be the case (though a link to the kernel source
would have been nicer)
https://unix.stackexchange.com/questions/153505/how-disk-io-priority-is-related-with-process-priority
Post by Axel Braun
io_priority = (cpu_nice + 20) / 5
In my case, though, it was always baloorunner showing at 99.99 % I/O in iotop.
baloo_file_extractor would also run sometimes, but with a lesser subjective
impact on performance.
Setting baloorunner to a lower priority using ionice seemed to improve things
quite a bit, although I would have to confirm it.

I get the point about needing to flush the cache at some point. Unfortunately,
I am at a loss as to why my mouse freezes because of it. I am on a 8 (16
SMT)-core CPU, and only a couple are used by the kernel. CPU <-> RAM bandwidth
should not be the limiting factor, and other threads should be able to go
trough when CPU <-> Sata Controller is being waited on. Maybe it has to do with
interrupts comming in from the SATA controller?
--
You are receiving this mail because:
You are watching all bug changes.
Jack
2018-11-17 18:22:20 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

Jack <***@users.sourceforge.net> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@users.sourceforge
| |.net

--- Comment #7 from Jack <***@users.sourceforge.net> ---
Same problem with baloo 5.52.0 (on Artix Linux). GUI is almost completely
unresponsive. Switching to text console and back updates the screen, but it
mostly stays frozen. Sometimes clicking to switch between applications updates
things when I click, but otherwise frozen.

iotop shows baloo_file_extractor and one [kworker...] job at 99.99% (sometimes
alternating with a lower value still above 50%.) Systemsettings/search does
not have any setting to turn indexing off, although no plugin is checked.
balooctl does seem to show everything disabled and stopped, so I have no idea
why .

For me, this seems to have started relatively recently, but it's on a laptop I
don't use constantly, so I'm really not sure what updated triggered it. Is
there anything else I can check, or any other data I can provide. It makes the
laptop essentially unusable. (I'm posting this from a different PC (Gentoo)
although baloo here is 5.50.0 - I'll try updating.)
--
You are receiving this mail because:
You are watching all bug changes.
Jack
2018-11-17 22:42:52 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #8 from Jack <***@users.sourceforge.net> ---
After several reboots, I finally had systemsettings5 show me file search, and
turning that off, and another reboot, seems to have stopped the indexer from
running.

The odd thing was that despite earlier doing balooctl suspend, balooctl stop,
and balooctl disable, and balooctl showing disabled, it was still running. Not
really sure what finally stopped it. Hopefully it wont just start up again by
itself.
--
You are receiving this mail because:
You are watching all bug changes.
Nate Graham
2018-11-26 17:19:18 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham <***@kde.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmail.com

--- Comment #9 from Nate Graham <***@kde.org> ---
*** Bug 400932 has been marked as a duplicate of this bug. ***
--
You are receiving this mail because:
You are watching all bug changes.
Nate Graham
2018-11-26 17:19:30 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham <***@kde.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@hotmail.com

--- Comment #10 from Nate Graham <***@kde.org> ---
*** Bug 401279 has been marked as a duplicate of this bug. ***
--
You are receiving this mail because:
You are watching all bug changes.
Nate Graham
2018-11-26 20:24:59 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham <***@kde.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmail.com

--- Comment #11 from Nate Graham <***@kde.org> ---
*** Bug 384234 has been marked as a duplicate of this bug. ***
--
You are receiving this mail because:
You are watching all bug changes.
Nate Graham
2018-11-26 20:27:31 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham <***@kde.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmail.com

--- Comment #12 from Nate Graham <***@kde.org> ---
*** Bug 379011 has been marked as a duplicate of this bug. ***
--
You are receiving this mail because:
You are watching all bug changes.
Nate Graham
2018-11-26 20:27:42 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham <***@kde.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@ilr.tu-berl
| |in.de

--- Comment #13 from Nate Graham <***@kde.org> ---
*** Bug 376446 has been marked as a duplicate of this bug. ***
--
You are receiving this mail because:
You are watching all bug changes.
Nate Graham
2018-11-26 21:20:33 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham <***@kde.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
Status|REPORTED |CONFIRMED

--- Comment #14 from Nate Graham <***@kde.org> ---
There's a proposed patch in Bug 356357 that sparked a serious discussion about
the frequency with which the DB should be written to, but unfortunately it went
nowhere.
--
You are receiving this mail because:
You are watching all bug changes.
Nate Graham
2018-11-26 21:20:39 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham <***@kde.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
See Also| |https://bugs.kde.org/show_b
| |ug.cgi?id=356357
--
You are receiving this mail because:
You are watching all bug changes.
Nate Graham
2018-11-26 21:32:09 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham <***@kde.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@zhigalin.tk

--- Comment #15 from Nate Graham <***@kde.org> ---
*** Bug 359119 has been marked as a duplicate of this bug. ***
--
You are receiving this mail because:
You are watching all bug changes.
Nate Graham
2018-11-26 21:59:50 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

Nate Graham <***@kde.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@gmail.com

--- Comment #16 from Nate Graham <***@kde.org> ---
*** Bug 393465 has been marked as a duplicate of this bug. ***
--
You are receiving this mail because:
You are watching all bug changes.
Alberto Salvia Novella
2018-11-27 01:11:12 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

Alberto Salvia Novella <***@gmail.com> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC|***@gmail.com |
--
You are receiving this mail because:
You are watching all bug changes.
Alberto Salvia Novella
2018-11-27 01:12:25 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #17 from Alberto Salvia Novella <***@gmail.com> ---
Since I'm not using Plasma right now I'm unsubscribing from this bug, but feel
free to re-subscribe me if you needed any help from me.
--
You are receiving this mail because:
You are watching all bug changes.
Kevin Colyer
2018-12-06 12:36:59 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

Kevin Colyer <***@thecolyers.net> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@thecolyers.net

--- Comment #18 from Kevin Colyer <***@thecolyers.net> ---
(In reply to Nate Graham from comment #14)
Post by Nate Graham
There's a proposed patch in Bug 356357 that sparked a serious discussion
about the frequency with which the DB should be written to, but
unfortunately it went nowhere.
I am still suffering this problem. Yesterday nextcloud decided to refresh my
files and downloaded about 10G of files. Baloo started indexing and my desktop
stalls. Chrome can't start and and can do no work!!!!

I do hope we can get a solution soon - this is a long standing problem. Finding
things with an baloo saves me time... but not as much as I am loosing whilst
waiting for the indexer!!!!!

Please can we have a solution -

I like the idea of throttling database updates - perhaps some sort of
exponential stand-off approach but inverted so high number of files index per
minute changes updates to 80, 160, 320 ... limit ?
--
You are receiving this mail because:
You are watching all bug changes.
Stefan Brüns
2018-12-06 14:18:43 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #19 from Stefan Brüns <***@rwth-aachen.de> ---
An exponential backoff would only help if baloo would index the same files
recurrently.

If you add new documents to your indexed folders, baloo will process these. It
will not get better when you commit changesets double the size, the stalls will
be even longer.

This is *not* a trivial problem which can be solved by adjusting a single knob.

Baloos datastructures currently impose a changeset size which is approximately
proportional to the size of the database. Adding/changing a single small
document can cause a DB update of several 100 MBytes.
--
You are receiving this mail because:
You are watching all bug changes.
Kevin Colyer
2018-12-06 15:55:35 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #20 from Kevin Colyer <***@thecolyers.net> ---
(In reply to Stefan Brüns from comment #19)
Post by Stefan Brüns
An exponential backoff would only help if baloo would index the same files
recurrently.
If you add new documents to your indexed folders, baloo will process these.
It will not get better when you commit changesets double the size, the
stalls will be even longer.
This is *not* a trivial problem which can be solved by adjusting a single
knob.
Baloos datastructures currently impose a changeset size which is
approximately proportional to the size of the database. Adding/changing a
single small document can cause a DB update of several 100 MBytes.
Thanks for the prompt feedback. Currently I have to do a manual exponential
backoff of switching off baloo and turning it on overnight to do it's
indexing!!!

Given that a "single small document can cause a DB update of several 100
MBytes." might there need to a fresh look given to the underlying data
structure? That seems sub-optimal to me as a user who is struggling with the
indexing processes unintended side-effects.
--
You are receiving this mail because:
You are watching all bug changes.
Stefan Brüns
2018-12-06 16:38:01 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #21 from Stefan Brüns <***@rwth-aachen.de> ---
It would save a lot of developer time if not everyone would add their "me too"
comments.

Changes to the database are planned, but this is not trivial. One structure may
work well for a number of cases and cause huge problems for others. These
changes have to be evaluated, for performance and for correctness.

The baloo codebase has been enhanced with additional unit tests recently,
increasing code coverage and reducing the chance for regressions. This is an
ongoing effort likely taking several more months until completeted.

Baloo is currently developed mostly by volunteers doing it in their spare time.
Development will not go faster by adding some more exclamations marks ...
--
You are receiving this mail because:
You are watching all bug changes.
Kevin Colyer
2018-12-06 16:57:06 UTC
Reply
Permalink
https://bugs.kde.org/show_bug.cgi?id=400704

--- Comment #22 from Kevin Colyer <***@thecolyers.net> ---
(In reply to Stefan Brüns from comment #21)
Post by Stefan Brüns
It would save a lot of developer time if not everyone would add their "me
too" comments.
Changes to the database are planned, but this is not trivial. One structure
may work well for a number of cases and cause huge problems for others.
These changes have to be evaluated, for performance and for correctness.
The baloo codebase has been enhanced with additional unit tests recently,
increasing code coverage and reducing the chance for regressions. This is an
ongoing effort likely taking several more months until completeted.
Baloo is currently developed mostly by volunteers doing it in their spare
time. Development will not go faster by adding some more exclamations marks
...
Dear Developers,

I am supremely grateful for all the work and efforts that have gone into the
indexing services for KDE. If I had the skills I would join you. I just glanced
at the Git repo and realised how unskilled I am to contribute; I couldn't even
find the schema. Baloo has improved greatly.

However, I do wish to say please don't discourage well intentioned feedback.
Without feedback from users about their actual problems encountered future
priorities may not be as readily identified. As a long term KDE user,
enthusiast and advocate feedback is one of my most important contributions.
This thread follows from https://bugs.kde.org/show_bug.cgi?id=333655#c73 which
was started in 2014. I am only making my first comment now. The performance
issues have been a problem to me for all this time and I went for a long season
with baloo permanently off!

Do let me know if there is anything concrete I can contribute more than what I
offer in these comments.
--
You are receiving this mail because:
You are watching all bug changes.
Loading...