Discussion:
sieve and utf-8 MIME/base64 content
Eugene M. Zheganin
2018-02-16 09:17:44 UTC
Permalink
Hi,


I'm using sieve with cyrus to sort incoming mail, and it works perfectly
with latin symbols. But what if I need to sort out the mail that has all
sorts utf-8 sumbols in it ? Like MIME-encoded headers and base64
-encoded body ? I've read in the RFC that implementation should support
this, what about cyrus-imapd sieve implementation ? I'm using 2.5
version, so, if anyone has some information about this - please let me
know. Official documentation seems to be lacking this.


Thanks.

Eugene.
Robert Stepanek
2018-02-16 09:25:08 UTC
Permalink
Post by Eugene M. Zheganin
I'm using sieve with cyrus to sort incoming mail, and it works perfectly
with latin symbols. But what if I need to sort out the mail that has all
sorts utf-8 sumbols in it ? Like MIME-encoded headers and base64
-encoded body ?
The main developer for Sieve support (and 2.5) in Cyrus IMAP might not be able to respond the next days. That being said, I'm not sure I understand what use case you are trying to accomplish?

Cheers,
Robert
Vladislav Kurz
2018-02-16 10:07:19 UTC
Permalink
Post by Robert Stepanek
Post by Eugene M. Zheganin
I'm using sieve with cyrus to sort incoming mail, and it works perfectly
with latin symbols. But what if I need to sort out the mail that has all
sorts utf-8 sumbols in it ? Like MIME-encoded headers and base64
-encoded body ?
The main developer for Sieve support (and 2.5) in Cyrus IMAP might not be able to respond the next days. That being said, I'm not sure I understand what use case you are trying to accomplish?
I'm not sure if it works either, but just to clarify - I think what
Eugene wants, is something like this:

if header :contains "from" "Štěpánek" { ... do whatever ... }

The problem is that headers with non-ascii chars are encoded in form
like this:

Subject: =?UTF-8?B?UmU6I.....gVG9v?=

And also the body is sometimes completely in base64, even though it is
just plaintext or HTML in UTF-8. Depends on sender's mail client.

So the question is whether Cyrus decodes this stuff before checking and
applying the sieve rules.
--
Best Regards
Vladislav Kurz

----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https
Robert Stepanek
2018-02-16 10:37:14 UTC
Permalink
Thanks for making this more clear to me :)
Post by Vladislav Kurz
The problem is that headers with non-ascii chars are encoded in form
Subject: =?UTF-8?B?UmU6I.....gVG9v?=
And also the body is sometimes completely in base64, even though it is
just plaintext or HTML in UTF-8. Depends on sender's mail client.
AFAIK the Sieve implementation in Cyrus 2.5 fully implements RFC 5228, including the string comparison requirements of section 2.7 [1]. That is, it implements the ascii-casemap and octet collations. It decodes MIME headers to UTF-8 before matching (e.g. see [2] and [3]). The RFC 5173 Sieve body extension is also supported [4].

Eugene, does that work for you?

Cheers,
Robert

[1] https://tools.ietf.org/html/rfc5228#section-2.7
[2] https://github.com/cyrusimap/cyrus-imapd/blob/cyrus-imapd-2.5/sieve/script.c#L261
[3] https://github.com/cyrusimap/cyrus-imapd/blob/cyrus-imapd-2.5/sieve/bc_eval.c#L809
[4] https://cyrusimap.org/imap/rfc-support.html
Eugene M. Zheganin
2018-02-19 13:33:57 UTC
Permalink
Hi,
Post by Robert Stepanek
Thanks for making this more clear to me :)
Post by Vladislav Kurz
The problem is that headers with non-ascii chars are encoded in form
Subject: =?UTF-8?B?UmU6I.....gVG9v?=
And also the body is sometimes completely in base64, even though it is
just plaintext or HTML in UTF-8. Depends on sender's mail client.
AFAIK the Sieve implementation in Cyrus 2.5 fully implements RFC 5228, including the string comparison requirements of section 2.7 [1]. That is, it implements the ascii-casemap and octet collations. It decodes MIME headers to UTF-8 before matching (e.g. see [2] and [3]). The RFC 5173 Sieve body extension is also supported [4].
Eugene, does that work for you?
Yes, thanks, I experimented and found that it's fully working, thanks a
lot (I'm truly impressed).

But my users kind of demand more and more :), now they want not only the
header/body search, but also sorting out the letters to the folders that
contain the localized symbols. So, it's working fine when a folder is
named 'Junk e-mail', but when it's named 'Спам', this stops working. Of
course, the workaround is to determine on-disk name of the folder, this
can be done by ls'ing the folder name on the disk, but only few users
can ssh to server, so the rest have a choice of either using latin names
for folders or to ask their system engineer. So I wanted to ask - is
there any more elegant solution ?


Eugene.

----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
htt
Robert Stepanek
2018-02-19 13:42:44 UTC
Permalink
Post by Eugene M. Zheganin
But my users kind of demand more and more :), now they want not only the
header/body search, but also sorting out the letters to the folders that
contain the localized symbols. So, it's working fine when a folder is
named 'Junk e-mail', but when it's named 'Спам', this stops working. Of
course, the workaround is to determine on-disk name of the folder, this
can be done by ls'ing the folder name on the disk, but only few users
can ssh to server, so the rest have a choice of either using latin names
for folders or to ask their system engineer. So I wanted to ask - is
there any more elegant solution ?
The folder name either needs to be in UTF-8 or modified UTF-7 encoding (aka IMAP UTF-7). If you want to use UTF-8, you'll need to enable the sieve_utf8fileinto option in imapd.conf (see https://www.cyrusimap.org/imap/reference/manpages/configs/imapd.conf.html).

Cheers,
Robert
----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mail

Loading...