Discussion:
sos on cyrusimapd
Albert Shih
2018-06-27 14:09:25 UTC
Permalink
Hi everyone,

So today I switch all my user from my old server to the new one running
cyrus-imapd 3.0.7 with FreeBSD 11.1-p11.

The server get 192Go of Ram.

Currently I got lot of disconnection from MUA (mutt), the client going to
keep the connection during let's say few minutes and then I lost the
connection (nothing to do with the network).

If I relaunch just after de disconnection, I can get

back maibox pretty quickly,
back maibox very slowly
connection error (strange SSL I/O error on the client)

On the server I didn't see lot of message, only this strange

onewconn: pcb 0xfffff8276c40e570: Listen queue overflow: 49 already in queue awaiting acceptance (209 occurrences)

knowing I'm not able to find the pcb fffff8276c40e570 with netstat or lsof,
event it not change in time.

I guessing I mis configure something...but what ?

I know the server are under heavy load because all my user's client are
re-synchronize their mailbox. But well It's not very good...

Is anybody running cyrusimapd on FreeBSD have tuning some special variable
in the kernel through sysctl ?

Regards

--
Albert SHIH
Observatoire de Paris
xmpp: ***@obspm.fr
Heure local/Local time:
Wed Jun 27 16:03:10 CEST 2018
Eric W. Bates
2018-06-27 15:27:48 UTC
Permalink
Yah. You need to crank up some buffer sizes.

This is an excellent document (written for FreeBSD 10):
https://calomel.org/freebsd_network_tuning.html

I think the one you're bumping your head on is:
kern.ipc.maxsockbuf

but you shouldn't crank it up by itself.
Post by Albert Shih
Hi everyone,
So today I switch all my user from my old server to the new one running
cyrus-imapd 3.0.7 with FreeBSD 11.1-p11.
The server get 192Go of Ram.
Currently I got lot of disconnection from MUA (mutt), the client going to
keep the connection during let's say few minutes and then I lost the
connection (nothing to do with the network).
If I relaunch just after de disconnection, I can get
back maibox pretty quickly,
back maibox very slowly
connection error (strange SSL I/O error on the client)
On the server I didn't see lot of message, only this strange
onewconn: pcb 0xfffff8276c40e570: Listen queue overflow: 49 already in queue awaiting acceptance (209 occurrences)
knowing I'm not able to find the pcb fffff8276c40e570 with netstat or lsof,
event it not change in time.
I guessing I mis configure something...but what ?
I know the server are under heavy load because all my user's client are
re-synchronize their mailbox. But well It's not very good...
Is anybody running cyrusimapd on FreeBSD have tuning some special variable
in the kernel through sysctl ?
Regards
--
Albert SHIH
Observatoire de Paris
Wed Jun 27 16:03:10 CEST 2018
----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
Albert Shih
2018-06-27 15:44:53 UTC
Permalink
Le 27/06/2018 à 11:27:48-0400, Eric W. Bates a écrit
Post by Eric W. Bates
Yah. You need to crank up some buffer sizes.
https://calomel.org/freebsd_network_tuning.html
Yes I already find that page
Post by Eric W. Bates
kern.ipc.maxsockbuf
but the problem is It's already high according to what netstat -m say

I got

[***@zenobe /usr/home]# sysctl -a kern.ipc.maxsockbuf
kern.ipc.maxsockbuf: 2097152

[***@zenobe /usr/home]# netstat -m
35091/25314/60405 mbufs in use (current/cache/total)
33072/11170/44242/12180860 mbuf clusters in use (current/cache/total/max)
33072/7914 mbuf+clusters out of packet secondary zone in use
(current/cache)
1254/4242/5496/6090429 4k (page size) jumbo clusters in use
(current/cache/total/max)
0/0/0/1804571 9k jumbo clusters in use (current/cache/total/max)
0/0/0/1015071 16k jumbo clusters in use (current/cache/total/max)
79932K/45636K/125569K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls

and both mbufs denied/delayed are at 0.
Post by Eric W. Bates
but you shouldn't crank it up by itself.
Do you think I should still increase that maxsockbuf ?

Do you have anything special tweaking you kernel ?

Regards.
Post by Eric W. Bates
Post by Albert Shih
So today I switch all my user from my old server to the new one running
cyrus-imapd 3.0.7 with FreeBSD 11.1-p11.
The server get 192Go of Ram.
Currently I got lot of disconnection from MUA (mutt), the client going to
keep the connection during let's say few minutes and then I lost the
connection (nothing to do with the network).
If I relaunch just after de disconnection, I can get
back maibox pretty quickly,
back maibox very slowly
connection error (strange SSL I/O error on the client)
On the server I didn't see lot of message, only this strange
onewconn: pcb 0xfffff8276c40e570: Listen queue overflow: 49 already in queue awaiting acceptance (209 occurrences)
knowing I'm not able to find the pcb fffff8276c40e570 with netstat or lsof,
event it not change in time.
I guessing I mis configure something...but what ?
I know the server are under heavy load because all my user's client are
re-synchronize their mailbox. But well It's not very good...
Is anybody running cyrusimapd on FreeBSD have tuning some special variable
in the kernel through sysctl ?
Regards
--
Albert SHIH
Observatoire de Paris
Wed Jun 27 16:03:10 CEST 2018
----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
--
Albert SHIH
DIO bâtiment 15
Observatoire de Paris
5 Place Jules Janssen
92195 Meudon Cedex
France
☏ +33 1 45 07 76 26/+33 6 86 69 95 71
xmpp: ***@obspm.fr
Heure local/Local time:
Wed Jun 27 17:41:04 CEST 2018
----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https:
Eric W. Bates
2018-06-27 16:58:22 UTC
Permalink
Le 27/06/2018 à 11:27:48-0400, Eric W. Bates a écrit
Post by Eric W. Bates
Yah. You need to crank up some buffer sizes.
https://calomel.org/freebsd_network_tuning.html
Yes I already find that page
Post by Eric W. Bates
kern.ipc.maxsockbuf
but the problem is It's already high according to what netstat -m say
I got
kern.ipc.maxsockbuf: 2097152
Ours is:
kern.ipc.maxsockbuf: 4194304
35091/25314/60405 mbufs in use (current/cache/total)
33072/11170/44242/12180860 mbuf clusters in use (current/cache/total/max)
33072/7914 mbuf+clusters out of packet secondary zone in use
(current/cache)
1254/4242/5496/6090429 4k (page size) jumbo clusters in use
(current/cache/total/max)
0/0/0/1804571 9k jumbo clusters in use (current/cache/total/max)
0/0/0/1015071 16k jumbo clusters in use (current/cache/total/max)
79932K/45636K/125569K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 sendfile syscalls
and both mbufs denied/delayed are at 0.
Yup. That looks pretty healthy.
Post by Eric W. Bates
but you shouldn't crank it up by itself.
Do you think I should still increase that maxsockbuf ?
Do you have anything special tweaking you kernel ?
Well. Everything I did I got from the calomel documents; but this is
what I set:

kern.ipc.maxsockbuf=4194304 # (default 2097152)
net.inet.tcp.sendbuf_max=4194304 # (default 2097152)
net.inet.tcp.recvbuf_max=4194304 # (default 2097152)
net.inet.tcp.mssdflt=1460 # (default 536)
net.inet.tcp.minmss=1300 # (default 216)
net.inet.tcp.cc.algorithm=htcp # (default newreno)
net.inet.tcp.cc.htcp.adaptive_backoff=1 # (default 0 ; disabled)
net.inet.tcp.cc.htcp.rtt_scaling=1 # (default 0 ; disabled)
net.inet.tcp.syncache.rexmtlimit=0 # (default 3)
net.inet.ip.rtexpire=10 # (default 3600)
net.inet.tcp.syncookies=0 # (default 1)
net.inet.tcp.tso=0 # (default 1)

kern.ipc.soacceptqueue=32768 # (default 128 ; same as kern.ipc.somaxconn)
net.inet.tcp.delayed_ack=0

Been quite a while since we set all that up and it's been performing well.
Regards.
Post by Eric W. Bates
Post by Albert Shih
So today I switch all my user from my old server to the new one running
cyrus-imapd 3.0.7 with FreeBSD 11.1-p11.
The server get 192Go of Ram.
Currently I got lot of disconnection from MUA (mutt), the client going to
keep the connection during let's say few minutes and then I lost the
connection (nothing to do with the network).
If I relaunch just after de disconnection, I can get
back maibox pretty quickly,
back maibox very slowly
connection error (strange SSL I/O error on the client)
On the server I didn't see lot of message, only this strange
onewconn: pcb 0xfffff8276c40e570: Listen queue overflow: 49 already in queue awaiting acceptance (209 occurrences)
knowing I'm not able to find the pcb fffff8276c40e570 with netstat or lsof,
event it not change in time.
I guessing I mis configure something...but what ?
I know the server are under heavy load because all my user's client are
re-synchronize their mailbox. But well It's not very good...
Is anybody running cyrusimapd on FreeBSD have tuning some special variable
in the kernel through sysctl ?
Regards
--
Albert SHIH
Observatoire de Paris
Wed Jun 27 16:03:10 CEST 2018
----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
----
Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
--
Albert SHIH
DIO bâtiment 15
Observatoire de Paris
5 Place Jules Janssen
92195 Meudon Cedex
France
☏ +33 1 45 07 76 26/+33 6 86 69 95 71
Wed Jun 27 17:41:04 CEST 2018
Albert Shih
2018-06-28 20:45:24 UTC
Permalink
Le 27/06/2018 à 12:58:22-0400, Eric W. Bates a écrit

Hi,

Thanks you *very* much for your answer.

I will try that. But I'm know pretty sure that the problem doesn't come
from network but from I/O of the disk.

I manage to configure my mutt (my old MUA) to keep the connection, he lost
the connection because by default he keep the connection only for 15
secondes. Changing that to 600 secondes and I never get disconnected.

But I still get some hang (10-30 sec) and I think it's the I/O because the
server hang too or at least very slow down (when I'm connected by ssh) and
If I use atop I'm see the disk are busy.

Another point is during those time (~1500 connections for ~500 uniques
users), the ARC go down to 20G (by configuration it's get 96Go).

I will try the zfs tweaking.
Post by Eric W. Bates
kern.ipc.maxsockbuf=4194304 # (default 2097152)
net.inet.tcp.sendbuf_max=4194304 # (default 2097152)
net.inet.tcp.recvbuf_max=4194304 # (default 2097152)
net.inet.tcp.mssdflt=1460 # (default 536)
net.inet.tcp.minmss=1300 # (default 216)
net.inet.tcp.cc.algorithm=htcp # (default newreno)
net.inet.tcp.cc.htcp.adaptive_backoff=1 # (default 0 ; disabled)
net.inet.tcp.cc.htcp.rtt_scaling=1 # (default 0 ; disabled)
net.inet.tcp.syncache.rexmtlimit=0 # (default 3)
net.inet.ip.rtexpire=10 # (default 3600)
net.inet.tcp.syncookies=0 # (default 1)
net.inet.tcp.tso=0 # (default 1)
kern.ipc.soacceptqueue=32768 # (default 128 ; same as kern.ipc.somaxconn)
net.inet.tcp.delayed_ack=0
Been quite a while since we set all that up and it's been performing well.
well, from thunderbird point of vu everything seem fine, only some *normal*
slow down, users doesn't complain.

I complain because it's big server with lot of disk (24 disks, 2 SSD for
/var/imap, RAM 192Go), so....I like to make it very efficient

And again thanks you for your help.

Regards.




--
Albert SHIH
DIO bâtiment 15
Observatoire de Paris
xmpp: ***@obspm.fr
Heure local/Local time:
Thu Jun 28 22:35:43 CEST 2018
Eric W. Bates
2018-06-28 21:31:41 UTC
Permalink
We have an SSD dedicated for cache and another dedicated for the log.

They seem to help.
Le 27/06/2018 à 12:58:22-0400, Eric W. Bates a écrit
Hi,
Thanks you *very* much for your answer.
I will try that. But I'm know pretty sure that the problem doesn't come
from network but from I/O of the disk.
I manage to configure my mutt (my old MUA) to keep the connection, he lost
the connection because by default he keep the connection only for 15
secondes. Changing that to 600 secondes and I never get disconnected.
But I still get some hang (10-30 sec) and I think it's the I/O because the
server hang too or at least very slow down (when I'm connected by ssh) and
If I use atop I'm see the disk are busy.
Another point is during those time (~1500 connections for ~500 uniques
users), the ARC go down to 20G (by configuration it's get 96Go).
I will try the zfs tweaking.
Post by Eric W. Bates
kern.ipc.maxsockbuf=4194304 # (default 2097152)
net.inet.tcp.sendbuf_max=4194304 # (default 2097152)
net.inet.tcp.recvbuf_max=4194304 # (default 2097152)
net.inet.tcp.mssdflt=1460 # (default 536)
net.inet.tcp.minmss=1300 # (default 216)
net.inet.tcp.cc.algorithm=htcp # (default newreno)
net.inet.tcp.cc.htcp.adaptive_backoff=1 # (default 0 ; disabled)
net.inet.tcp.cc.htcp.rtt_scaling=1 # (default 0 ; disabled)
net.inet.tcp.syncache.rexmtlimit=0 # (default 3)
net.inet.ip.rtexpire=10 # (default 3600)
net.inet.tcp.syncookies=0 # (default 1)
net.inet.tcp.tso=0 # (default 1)
kern.ipc.soacceptqueue=32768 # (default 128 ; same as kern.ipc.somaxconn)
net.inet.tcp.delayed_ack=0
Been quite a while since we set all that up and it's been performing well.
well, from thunderbird point of vu everything seem fine, only some *normal*
slow down, users doesn't complain.
I complain because it's big server with lot of disk (24 disks, 2 SSD for
/var/imap, RAM 192Go), so....I like to make it very efficient
And again thanks you for your help.
Regards.
--
Albert SHIH
DIO bâtiment 15
Observatoire de Paris
Thu Jun 28 22:35:43 CEST 2018
--
Clark 159a, MS 46
508/289-3112
Loading...