Discussion:
Xapian/Cyrus/Thunderbird
Albert Shih
2018-05-04 12:48:49 UTC
Permalink
Hi,

So I've installed cyrus-imapd 3.0.5 with xapian.

I configure and launch xapian according to

https://www.cyrusimap.org/imap/developer/install-xapian.html

and I currently launch in a console

/usr/local/cyrus/sbin/squatter

to indexe all the mail I got (imapsync).

The problem is on two larges user.mailbox the first one with a
/xapian-index/user/firstuser/*.glass and the second without (not indexes
already), on those two mailbox I make a search from thunderbird and don't
see any difference about the delay of the answer.

So I'm wondering if the xapian work....

Is they are any way to make a test to know if xapian work or not ?

Regards
--
Albert SHIH
DIO bâtiment 15
Observatoire de Paris
xmpp: ***@obspm.fr
Heure local/Local time:
Fri May 4 14:28:28 CEST 2018
Albert Shih
2018-05-04 12:53:44 UTC
Permalink
Le 04/05/2018 à 14:48:49+0200, Albert Shih a écrit
Post by Albert Shih
Hi,
So I've installed cyrus-imapd 3.0.5 with xapian.
I configure and launch xapian according to
https://www.cyrusimap.org/imap/developer/install-xapian.html
and I currently launch in a console
/usr/local/cyrus/sbin/squatter
to indexe all the mail I got (imapsync).
The problem is on two larges user.mailbox the first one with a
/xapian-index/user/firstuser/*.glass and the second without (not indexes
already), on those two mailbox I make a search from thunderbird and don't
see any difference about the delay of the answer.
just to be very clear, I select the option in thunderbird to make the
search on server.

I also try those search throught a webmail (SOGO4), to be absolutly sure
it's not a search on the client.

Regards

--
Albert SHIH
DIO bâtiment 15
Observatoire de Paris
xmpp: ***@obspm.fr
Heure local/Local time:
Fri May 4 14:52:13 CEST 2018
Robert Stepanek
2018-05-05 14:01:31 UTC
Permalink
Hi,
Post by Albert Shih
Is they are any way to make a test to know if xapian work or not ?
Cyrus comes with two binaries that will tell you its build and runtime configuration:

E.g. running

$ /usr/cyrus/sbin/cyr_buildinfo

on my development server yields a JSON-formatted output about Cyrus build config. Look for something like

"search": {
"squat": true,
"xapian": true,
"xapian_flavor": "cyruslibs"
},

to check if Xapian is enabled in the build (presumably, yes, since you have *.glass databases in your directories).

Next, cyr_info will tell you, if xapian got enabled indeed at runtime:

$ /usr/cyrus/sbin/cyr_info -C <your-config-dir>/imapd.conf conf-all | grep search

should yield something like

[...]
search_engine: xapian

Next, make sure that the IMAP commands submitted by your clients are using SEARCH FUZZY. You might want to inspect the IMAP telemetry to check this.

If all these checks passed, there shouldn't be any reason why Cyrus should not use Xapian during search. One might want to enable verbose logging for search then, but unfortunately, that's currently not a runtime-option.

Cheers,
Robert
Albert Shih
2018-05-09 07:45:20 UTC
Permalink
Le 05/05/2018 à 16:01:31+0200, Robert Stepanek a écrit
Hi,
Post by Robert Stepanek
Post by Albert Shih
Is they are any way to make a test to know if xapian work or not ?
Cyrus comes with two binaries that will tell you its build and
E.g. running
$ /usr/cyrus/sbin/cyr_buildinfo
on my development server yields a JSON-formatted output about Cyrus
build config. Look for something like
"search": {
"squat": true, "xapian": true, "xapian_flavor": "cyruslibs"
},
So I get

"search": {
"squat": true,
"sphinx": false,
"xapian": true,
"xapian_flavor": "vanilla"
},

I don't know if vanilla are a correct value of xapian_flavor, I would say
yes because vanilla are ... a flavor...and a flavor of xapian.
Post by Robert Stepanek
to check if Xapian is enabled in the build (presumably, yes, since
you have *.glass databases in your directories).
Next, cyr_info will tell you, if xapian got enabled indeed at
$ /usr/cyrus/sbin/cyr_info -C <your-config-dir>/imapd.conf conf-all | grep search
should yield something like
[...] search_engine: xapian
I got that too

search_batchsize: 8192
search_engine: xapian
search_index_headers: no
Post by Robert Stepanek
Next, make sure that the IMAP commands submitted by your clients
are using SEARCH FUZZY. You might want to inspect the IMAP telemetry
to check this.
I will check that.
Post by Robert Stepanek
If all these checks passed, there shouldn't be any reason why Cyrus
should not use Xapian during search. One might want to enable verbose
logging for search then, but unfortunately, that's currently not
a runtime-option.
Ok.

Just one question, are the xapian handle all search ? Including body search ?

Thanks a lot

Regards

JAS
--
Albert SHIH
DIO bâtiment 15
Observatoire de Paris
xmpp: ***@obspm.fr
Heure local/Local time:
Wed May 9 09:32:58 CEST 2018
Robert Stepanek
2018-05-09 07:53:23 UTC
Permalink
Hi,
Post by Albert Shih
So I get
"search": {
"squat": true,
"sphinx": false,
"xapian": true,
"xapian_flavor": "vanilla"
},
I don't know if vanilla are a correct value of xapian_flavor, I would say
yes because vanilla are ... a flavor...and a flavor of xapian.
That's OK. It means that you are using the upstream version of Xapian, rather than the Xapian fork of the Cyrus project. You are missing out on a few features that didn't get pulled into the upstream release, yet: namely improved Chinese/Japanese search, and improved snippet generation. Both are optional.

The cyruslibs fork is at https://github.com/cyrusimap/xapian

The complete set of forked libraries, including a build script for convenience, is located at https://github.com/cyrusimap/cyruslibs
Post by Albert Shih
search_index_headers: no
The rest of your config looks fine, but you might want to change search_index_headers to yes, in case you are doing a lot of searches on the From:, To:, etc headers. If disabled, searching these headers still will work, but will be slower.
Post by Albert Shih
Just one question, are the xapian handle all search ? Including body search ?
Actually, Xapian mainly is useful for the body search, e.g. searching text within a somewhat large corpus.

Cheers,
Robert
Sebastian Hagedorn
2018-05-11 11:12:56 UTC
Permalink
Post by Robert Stepanek
Post by Albert Shih
search_index_headers: no
The rest of your config looks fine, but you might want to change
search_index_headers to yes, in case you are doing a lot of searches on
the From:, To:, etc headers. If disabled, searching these headers still
will work, but will be slower.
If that is the case, the documentation should be changed:

<https://cyrusimap.org/imap/developer/install-xapian.html?highlight=search_index_header>
--
.:.Sebastian Hagedorn - Weyertal 121 (GebÀude 133), Zimmer 2.02.:.
.:.Regionales Rechenzentrum (RRZK).:.
.:.UniversitÀt zu Köln / Cologne University - ✆ +49-221-470-89578.:.
Sebastian Hagedorn
2018-05-11 11:16:33 UTC
Permalink
--On 11. Mai 2018 um 13:12:57 +0200 Sebastian Hagedorn
--On 9. Mai 2018 um 09:53:23 +0200 Robert Stepanek
Post by Robert Stepanek
Post by Albert Shih
search_index_headers: no
The rest of your config looks fine, but you might want to change
search_index_headers to yes, in case you are doing a lot of searches on
the From:, To:, etc headers. If disabled, searching these headers still
will work, but will be slower.
<https://cyrusimap.org/imap/developer/install-xapian.html?highlight=searc
h_index_header>
Sorry, I should've checked the manpage first. It states:

Whether to index headers other than From, To, Cc, Bcc, and
Subject. Experiment shows that some headers
such as Received and DKIM-Signature can contribute up to
2/3rds of the index size but almost nothing to
the utility of searching. Note that is header indexing is
disabled, headers can still be searched, the
searches will just be slower.

If the manpage is correct, standard headers are always indexed, so the
recommendation on the web page is actually good the way it is.
--
.:.Sebastian Hagedorn - Weyertal 121 (GebÀude 133), Zimmer 2.02.:.
.:.Regionales Rechenzentrum (RRZK).:.
.:.UniversitÀt zu Köln / Cologne University - ✆ +49-221-470-89578.:.
Sebastian Hagedorn
2018-05-11 12:25:19 UTC
Permalink
For my understanding: does that mean the Xapian index is only used for
clients that support RFC 6203? If that is the case, how are
"traditional" IMAP searches handled?
For non-FUZZY text SEARCH, Cyrus attempts to match the string on its own
[1].
That sounds strange to me, because Cyrus 2.4 and earlier don't support
FUZZY, and there the SQUAT index was used, if present. Only messages that
were added after the last squatter run were searched directly. Why would
that have changed?
Off the top of my head, I don't think Cyrus falls back to the
search-engine index for large corpus text matches. Using FUZZY always is
the better choice, but if there's a real-world issue, please let me know.
At this point I can't really say, because our production server still runs
2.4. I have no idea which clients use FUZZY if it's available, and which
don't.
It might be possible to switch to Xapian also for non-FUZZY search.
I think if Xapian only does fuzzy, some searches may be *slower* than using
a SQUAT index. That seems counterintuitive, at the least. Do you have
internal search benchmarks? Without metrics it's hard to say if any of this
actually matters ...
--
.:.Sebastian Hagedorn - Weyertal 121 (GebÀude 133), Zimmer 2.02.:.
.:.Regionales Rechenzentrum (RRZK).:.
.:.UniversitÀt zu Köln / Cologne University - ✆ +49-221-470-89578.:.
Robert Stepanek
2018-05-14 12:10:06 UTC
Permalink
Post by Sebastian Hagedorn
For non-FUZZY text SEARCH, Cyrus attempts to match the string on its own
[1].
That sounds strange to me, because Cyrus 2.4 and earlier don't support
FUZZY, and there the SQUAT index was used, if present. Only messages that
were added after the last squatter run were searched directly. Why would
that have changed?
Right, it hasn't. SQUAT is still the backend for non-FUZZY text search.
Post by Sebastian Hagedorn
I think if Xapian only does fuzzy, some searches may be *slower* than using
a SQUAT index. That seems counterintuitive, at the least. Do you have
internal search benchmarks? Without metrics it's hard to say if any of this
actually matters ...
I don't know of any benchmarks for this. It hasn't popped up as a performance issue.

Cheers,
Robert
Sebastian Hagedorn
2018-05-14 12:35:21 UTC
Permalink
Post by Robert Stepanek
--On 11. Mai 2018 um 13:32:29 +0200 Robert Stepanek
For non-FUZZY text SEARCH, Cyrus attempts to match the string on its
own [1].
That sounds strange to me, because Cyrus 2.4 and earlier don't support
FUZZY, and there the SQUAT index was used, if present. Only messages
that were added after the last squatter run were searched directly. Why
would that have changed?
Right, it hasn't. SQUAT is still the backend for non-FUZZY text search.
But search_engine is either squat or xapian. That would mean that one would
have to run squatter with two separate configurations in order to cover
both search types. That's at the very least counterintuitive ...

Should I open a Github issue for that topic? I see more than one approach
to solve that.
Post by Robert Stepanek
I think if Xapian only does fuzzy, some searches may be *slower* than
using a SQUAT index. That seems counterintuitive, at the least. Do you
have internal search benchmarks? Without metrics it's hard to say if
any of this actually matters ...
I don't know of any benchmarks for this. It hasn't popped up as a performance issue.
I thought Xapian was added to improve all IMAP BODY searches, but I guess
the only reason was to enable IMAP FUZZY searches. In my opinion this needs
to be stated explicitly in the documentation, if that's the way it will be
going forward.

Cheers
Sebastian
--
Sebastian Hagedorn - Weyertal 121, Zimmer 2.02
Regionales Rechenzentrum (RRZK)
UniversitÀt zu Köln / Cologne University - Tel. +49-221-470-89578
Albert Shih
2018-05-17 20:19:50 UTC
Permalink
Le 14/05/2018 à 14:35:21+0200, Sebastian Hagedorn a écrit
Post by Sebastian Hagedorn
Post by Robert Stepanek
--On 11. Mai 2018 um 13:32:29 +0200 Robert Stepanek
For non-FUZZY text SEARCH, Cyrus attempts to match the string on its
own [1].
That sounds strange to me, because Cyrus 2.4 and earlier don't support
FUZZY, and there the SQUAT index was used, if present. Only messages
that were added after the last squatter run were searched directly. Why
would that have changed?
Right, it hasn't. SQUAT is still the backend for non-FUZZY text search.
But search_engine is either squat or xapian. That would mean that one would
have to run squatter with two separate configurations in order to cover
both search types. That's at the very least counterintuitive ...
Sorry to ask some stupid question, but in the case I choose xapian with
only one configuration, does that mean the search is ... what neither SQUAT or
XAPIAN if the client don't use FUZZY SEARCH ?
Post by Sebastian Hagedorn
I thought Xapian was added to improve all IMAP BODY searches, but I guess
the only reason was to enable IMAP FUZZY searches. In my opinion this needs
to be stated explicitly in the documentation, if that's the way it will be
agree....;-)

Regards.

--
Albert SHIH
DIO bâtiment 15
Observatoire de Paris
xmpp: ***@obspm.fr
Heure local/Local time:
Thu May 17 22:11:13 CEST 2018
Robert Stepanek
2018-05-21 11:51:44 UTC
Permalink
Hi Albert,
After turn on telemetry it's seem Thunderbird just do a « classic » search.
So that's mean when I do search from Thunderbird it's very slow, and I
build the xapian index for almost nothing because almost 90% of my users
use Thunderbird as MUA.
Do you want I create a issue on github ?
Yes, please create an issue for this on Github so we can track this. I don't know why it's implemented like it currently is, and I haven't assessed what's involved to change this.

If you find time, please feel free to join our weekly Cyrus developer hangout, next meeting is on Monday, May 28 at 11am UTC where I will bring this up:

https://plus.google.com/hangouts/_/g4xnqjjb5zvomzeb4kqvja3fz4a

The hangout is open for anyone interested in Cyrus development, and we always look to forward to meet new people :)

Cheers,
Robert

Loading...