Discussion:
Cyrus crashed on redundant platform - need better availability?
(too old to reply)
Luca Olivetti
2004-09-13 07:58:47 UTC
Permalink
I'm not sure why the box crashed; there was nothing in the logs, there
was nothing on the screen when we came there, it just booted up again.
Of course I'm interested if anyone has any thoughts on this.
Maybe it has nothing to do with your problem, but there is a timing
issue with some intel xeon and p4 processors. Look at this HP advisory:

http://tinyurl.com/63dxe

even if it says that no field issues have been identified, I've
experienced real random lock ups before updating the bios.
Look if is there a bios update available from dell.

Bye
--
Luca Olivetti
Wetron Automatización S.A. http://www.wetron.es/
Tel. +34 93 5883004 Fax +34 93 5883007
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Paul Dekkers
2004-09-11 12:36:55 UTC
Permalink
David Lang
2004-09-10 22:20:07 UTC
Permalink
Date: Fri, 10 Sep 2004 13:15:05 -0600
Subject: Re: Cyrus crashed on redundant platform - need better availability?
The theory only translates if you're using a JOURNALED file system. Linux
ext3, reiserfs.... AIX JFS, Sun/others veritas are all examples of this.
AFAIK FreeBSD hasn't any journalling file systems, i could be wrong though
since I haven't really looked for one (my freebsd boxes just run...and
run...and run...) That said, the machine shouldn't' have crashed in the
first place, but you are running 5.x which is clearly labeled as *NOT*
production (4.10 for that)... All of my produciton boxen are 4.x based (of
the FreeBSD herd)
However even a Journaled filesystem won't protect you completely from
corruption. even the filesystems you list can loose data when there is a
crash and if one system goes haywire and starts scribbling on the shared
disk it will trash any filesystem.

David Lang
--On Friday, September 10, 2004 13:24 +0200 Paul Dekkers
Hi,
We're implementing a new mailplatform running on two dell 2650-servers (2
xeon cpu's with each 3 Ghz, HTT and 3Gb of memory) and with a disk array
of 4 Tb connected with a adaptec 39160 scsi controller for storage. We
installed FreeBSD 5.2.1 on it, and - of course - cyrus 2.2.8 (from the
ports) as IMAP server. Our MTA is postfix.
There are two machines for redundancy. If one fails, the other one should
take over: mount the disks from the array, and move on.
Unfortunally, the primary server crashed twice already. The first time it
did while synchronising two IMAP-spools from the old server to the new
one. There was not much data on it back then. The second time was worse,
around 10Gb of mail was stored on the disks. We discovered that the fsck
took about 30 minutes, so although we have two machines for redundancy it
takes still quite some time before the mail is available again. (And we
still have about 90 Gb of mail to migrate, so when all users are migrated
it takes much longer.)
I mounted the filesystems synchronous now: although it slows down the
system I hope it speeds up the fsck a bit when there is another crash.
The second crash was while removing a lot of mailboxes (dm) while some of
them where removed the same time using a webmail app (squirrelmail).
I'm not sure why the box crashed; there was nothing in the logs, there
was nothing on the screen when we came there, it just booted up again. Of
course I'm interested if anyone has any thoughts on this.
Although many on the list claim that this (having 2 boxes with 1
disk-array) is a nice way for redundancy I'm in doubt now if this is
true. It still takes 30 mins before everything is back again! It seems to
me that if there was a "live" version of cyrus available with a
synchronised mail-spool, that there was no outage noticeable for users
(except in losing a connection maybe). Am I right?
Maybe it's time to continue on the "High availability ...
again"-discussion we had a while ago. If the cyrus developers are able to
implement this with some funding there are still some questions left for
me: how much time would it take before a "stable" solution is ready? How
many funding is expected? I still have to talk to management about this,
but I would really support this development and I'm certainly willing to
convince some managers.
Regards,
Paul
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
--
Undocumented Features quote of the moment...
"It's not the one bullet with your name on it that you
have to worry about; it's the twenty thousand-odd rounds
labeled `occupant.'"
--Murphy's Laws of Combat
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Michael Loftis
2004-09-10 19:15:05 UTC
Permalink
The theory only translates if you're using a JOURNALED file system. Linux
ext3, reiserfs.... AIX JFS, Sun/others veritas are all examples of this.
AFAIK FreeBSD hasn't any journalling file systems, i could be wrong though
since I haven't really looked for one (my freebsd boxes just run...and
run...and run...) That said, the machine shouldn't' have crashed in the
first place, but you are running 5.x which is clearly labeled as *NOT*
production (4.10 for that)... All of my produciton boxen are 4.x based (of
the FreeBSD herd)



--On Friday, September 10, 2004 13:24 +0200 Paul Dekkers
Hi,
We're implementing a new mailplatform running on two dell 2650-servers (2
xeon cpu's with each 3 Ghz, HTT and 3Gb of memory) and with a disk array
of 4 Tb connected with a adaptec 39160 scsi controller for storage. We
installed FreeBSD 5.2.1 on it, and - of course - cyrus 2.2.8 (from the
ports) as IMAP server. Our MTA is postfix.
There are two machines for redundancy. If one fails, the other one should
take over: mount the disks from the array, and move on.
Unfortunally, the primary server crashed twice already. The first time it
did while synchronising two IMAP-spools from the old server to the new
one. There was not much data on it back then. The second time was worse,
around 10Gb of mail was stored on the disks. We discovered that the fsck
took about 30 minutes, so although we have two machines for redundancy it
takes still quite some time before the mail is available again. (And we
still have about 90 Gb of mail to migrate, so when all users are migrated
it takes much longer.)
I mounted the filesystems synchronous now: although it slows down the
system I hope it speeds up the fsck a bit when there is another crash.
The second crash was while removing a lot of mailboxes (dm) while some of
them where removed the same time using a webmail app (squirrelmail).
I'm not sure why the box crashed; there was nothing in the logs, there
was nothing on the screen when we came there, it just booted up again. Of
course I'm interested if anyone has any thoughts on this.
Although many on the list claim that this (having 2 boxes with 1
disk-array) is a nice way for redundancy I'm in doubt now if this is
true. It still takes 30 mins before everything is back again! It seems to
me that if there was a "live" version of cyrus available with a
synchronised mail-spool, that there was no outage noticeable for users
(except in losing a connection maybe). Am I right?
Maybe it's time to continue on the "High availability ...
again"-discussion we had a while ago. If the cyrus developers are able to
implement this with some funding there are still some questions left for
me: how much time would it take before a "stable" solution is ready? How
many funding is expected? I still have to talk to management about this,
but I would really support this development and I'm certainly willing to
convince some managers.
Regards,
Paul
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
--
Undocumented Features quote of the moment...
"It's not the one bullet with your name on it that you
have to worry about; it's the twenty thousand-odd rounds
labeled `occupant.'"
--Murphy's Laws of Combat

---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Michael Loftis
2004-09-10 19:16:10 UTC
Permalink
BTW -- if you want Stable (in case you didn't understand that from ym
previous mail) go back to FreeBSD 4.x (say 4.10-STABLE or -SECURE) --
you've probably run into a platform bug, not a bug in Cyrus, since the
whole machine went.

---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Ken Murchison
2004-09-15 18:13:46 UTC
Permalink
Hi,
--On Mittwoch, 15. September 2004 13:38 Uhr +0200 Paul Dekkers
You are not using a clustered filesystem,
right?
No.
I can imagine that would be one of the advantages of RH's clustering,
since you don't have to mount a filesystem in that case for a machine
that just crashed - it would safe time...
I'm not sure if Red Hat even supports a clustered FS at this time. It
certainly didn't when we set up the system more than two years ago.
I thinks that's exactly why they bought Sistina with GFS - and GPL'd it.
Does anybody know how it works with cyrus-imapd?
If you are interested in using a shared filesystem on a SAN for server
redundancy, then you could try using a replicated Murder (Cyrus 2.3).
Such a config is running at a local University using 4 Sun servers and
QFS (Sun's SAN filesystem) on a Hitachi fibre array.

I haven't tested this with GFS, but if it has correct file locking and
memory mapping support, then it might work.

I'm fairly confident that SGI's XFS would work, although I haven't tried it.
--
Kenneth Murchison Oceana Matrix Ltd.
Software Engineer 21 Princeton Place
716-662-8973 x26 Orchard Park, NY 14127
--PGP Public Key-- http://www.oceana.com/~ken/ksm.pgp
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Simon Matter
2004-09-15 15:40:49 UTC
Permalink
Hi,
--On Mittwoch, 15. September 2004 13:38 Uhr +0200 Paul Dekkers
You are not using a clustered filesystem,
right?
No.
I can imagine that would be one of the advantages of RH's clustering,
since you don't have to mount a filesystem in that case for a machine
that just crashed - it would safe time...
I'm not sure if Red Hat even supports a clustered FS at this time. It
certainly didn't when we set up the system more than two years ago.
I thinks that's exactly why they bought Sistina with GFS - and GPL'd it.
Does anybody know how it works with cyrus-imapd?

Simon


---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Sebastian Hagedorn
2004-09-15 15:04:29 UTC
Permalink
Hi,

--On Mittwoch, 15. September 2004 13:38 Uhr +0200 Paul Dekkers
You are not using a clustered filesystem,
right?
No.
I can imagine that would be one of the advantages of RH's clustering,
since you don't have to mount a filesystem in that case for a machine
that just crashed - it would safe time...
I'm not sure if Red Hat even supports a clustered FS at this time. It
certainly didn't when we set up the system more than two years ago.
But I suppose RH's cluster manager takes care of mounting the partitions
and checking them if there are any errors.
Right. The unmounting/mounting of partitions usually works fine, but there
have been problems at times. The worst one was causing alternating crashes
of both nodes:

sd(8,73)): ext3_free_blocks: Freeing blocks not in datazone - block =
225139276, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in
datazone - block = 1919637002, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in
datazone - block = 894788200, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in
datazone - block = 1883792719, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in
datazone - block = 1347113037, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in
datazone - block = 829312330, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in
datazone - block = 893538370, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in
datazone - block = 1450341715, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in
datazone - block = 909390198, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in
datazone - block = 1366706293, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in
datazone - block = 846548333, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in
datazone - block = 1630746450, count = 1
EXT3-fs error (device sd(8,73)): ext3_free_blocks: Freeing blocks not in
datazone - block = 860649837, count = 1
EXT3-fs error (device sd(8,73)

leading to this:

Assertion failure in journal_forget_Rsmp_094dfde7() at transaction.c:1226:
"!jh->b_committed_data"
------------[ cut here ]------------
kernel BUG at transaction.c:1226!
invalid operand: 0000
Kernel 2.4.9-e.38enterprise
CPU: 3
EIP: 0010:[<f885b636>] Not tainted
EFLAGS: 00010282
EIP is at journal_forget_Rsmp_094dfde7 [jbd] 0xd6
eax: 00000025 ebx: ce6e8c10 ecx: c02f7f84 edx: 0008dad9
esi: cd95f3e0 edi: cd7a3094 ebp: cd7a3000 esp: cb947d70
ds: 0018 es: 0018 ss: 0018
Process ctl_cyrusdb (pid: 4500, stackpage=cb947000)
Stack: f8863f30 000004ca e7b08b20 cd95f3e0 cd7a3000 0000000b cd95f3e0
f885ee69
ce14ac40 cd95f3e0 cd95f3e0 cd95f3e0 cab35900 ce14ac40 f886bc8c
ce14ac40
00020000 cd95f3e0 cd95f3e0 cd6ad000 cd6ae000 cdd93000 cd95f3e0
00020000
Call Trace: [<f8863f30>] .LC7 [jbd] 0x0 (0xcb947d70)
[<f885ee69>] journal_revoke_Rsmp_56fa5ece [jbd] 0xf9 (0xcb947d8c)
[<f886bc8c>] ext3_forget [ext3] 0x7c (0xcb947da8)
[<f886df3a>] ext3_free_branches [ext3] 0xda (0xcb947dd8)
[<f886df2c>] ext3_free_branches [ext3] 0xcc (0xcb947e30)
[<f886e2ec>] ext3_truncate [ext3] 0x2bc (0xcb947e74)
[<f885a285>] start_this_handle [jbd] 0x125 (0xcb947eac)
[<f885a38f>] journal_start_Rsmp_ec53be73 [jbd] 0xbf (0xcb947ec4)
[<f886bd5e>] start_transaction [ext3] 0x4e (0xcb947ee4)
[<f886bee7>] ext3_delete_inode [ext3] 0xe7 (0xcb947f08)
[<f887a080>] ext3_sops [ext3] 0x0 (0xcb947f28)
[<c015dd1c>] iput_free [kernel] 0x14c (0xcb947f2c)
[<f886f9c3>] ext3_lookup [ext3] 0x73 (0xcb947f40)
[<c015addb>] dentry_iput [kernel] 0x4b (0xcb947f50)
[<c01541ab>] vfs_unlink [kernel] 0x1eb (0xcb947f60)
[<c0152c41>] lookup_hash [kernel] 0x91 (0xcb947f6c)
[<c015427a>] sys_unlink [kernel] 0x9a (0xcb947f88)
[<c01181c0>] do_page_fault [kernel] 0x0 (0xcb947fb0)
[<c01073e3>] system_call [kernel] 0x33 (0xcb947fc0)


Code: 0f 0b 59 58 53 e8 40 03 00 00 8b 43 24 c7 43 14 00 00 00 00
<0>Kernel panic: not continuing

I had to intercept the boot process manually before the cluster software
starts and fsck the partition. Not good. But this problem has been fixed in
a kernel update.
It's good but not perfect. We recently installed a huge SAN and are
now in the process of moving over the mail data to reside there.
Fibrechannel seems to be much more error tolerant than SCSI.
Where you working with a "multi-initiator enviroment" (as RH calls it) or
"single initiator" (e.g. with 2 machines on exactly the same SCSI bus, or
two seperate interfaces on your array's SCSI controller?)
I think with a multi-initiator enviroment (as we have it) there is a very
limited chance of failures.
I'm not sure about the terminology, but we have two separate SCSI busses on
the RAID, one for each host. I thought that was "single initiator"? The
problem that regularly occurred is the following: the cluster software
requires a raw partition that's mounted by both nodes, called the "quorum
partition". Each node regularly writes a timestamp on the quorum partition
to prove it's alive. This is in addition to heartbeat channels over serial
lines and ethernet. When one of the nodes doesn't write to the quorum
partition for more than an adjustable period of time, the other node
"shoots it in the head". That happened several times, even though the slow
node hadn't actually crashed.
Hmm, I don't expect the problems to be SCSI-related. Maybe it has to
do...
That's not what I was talking about. We have a similar setup, yet
still there were instances when Red Hat's cluster software failed to
write to the shared storage. I guess this was caused by the slow-downs
connected to the memory management, but Red Hat support indicated that
shared storage connected via FibreChannel would not have been as
susceptible to these problems.
Do you think using RH's cluster software is a valuable consideration for
this kind of clustering setup?
Yes, I do.
Using FreeBSD there are not that many
clustering solutions for now, and if it's advisable to at least consider
using RH here (although I have no experience with RH) we can certainly
look at it. (Any idea how fast RH would "recover services"?)
That depends on how you configure it, but usually within a minute.

Cheers, Sebastian Hagedorn
--
Sebastian Hagedorn M.A. - RZKR-R1 (Gebäude 52), Zimmer 18
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
Universität zu Köln / Cologne University - Tel. +49-221-478-5587
David Lang
2004-09-15 15:56:52 UTC
Permalink
also take a look at the heartbeat package at linux-ha.org This works on
linux, *BSD, and solaris (there were people working on a AIX port, but
they apparently dropped it shortly before finishing)

David Lang

On Wed, 15 Sep 2004,
Date: Wed, 15 Sep 2004 17:07:20 +0200
Subject: Re: Cyrus crashed on redundant platform - need better availability?
On Wed, 15 Sep 2004 13:38:43 +0200
But I suppose RH's cluster manager takes care of mounting the partitions
and checking them if there are any errors.
Not really, at least not by itself. See
http://people.redhat.com/jrfuller/cms/ for detailed documentation of what is
included with RH AS 2.1 (it's some $500 extra for AS 3).
I had to write some pretty paranoid scripts that take care of assembling
software raids, checking the fs and mountig it while taking care about the
other machine to prevent problems.
Of course all this would be much easier with some kind of clustered fs, but
clustered fs brings a new problem: locking. Almost all i've seen so far have
an external 'locking manager' on a separate box, which brings ethernet
latency into every lock operation, which i'm sure is very noticable in the
lock-heavy usage patterns as mail is. But this is just my feeling, i haven't
yet benchmarked any :)
Do you think using RH's cluster software is a valuable consideration for
this kind of clustering setup? Using FreeBSD there are not that many
clustering solutions for now, and if it's advisable to at least consider
using RH here (although I have no experience with RH) we can certainly
look at it. (Any idea how fast RH would "recover services"?)
This RH cluster software is nothing fancy; i'm sure equivalents exists for
BSDs. See documentation link above. Actually it is just Kimberlite
(http://oss.missioncriticallinux.com/projects/kimberlite/), sold with RedHat
support.
"Speed" of recovery is almost completely out of the cluster control. The
only thing that matters for the cluster is what your cyrus init script
returns when called with 'status' parameter. Everything else is up to your
init scripts.
Of course, if one box dies completely, the other takes over in the
configurable time.
--
Jure Peÿÿar
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare

---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Jure Pečar
2004-09-15 15:07:20 UTC
Permalink
On Wed, 15 Sep 2004 13:38:43 +0200
But I suppose RH's cluster manager takes care of mounting the partitions
and checking them if there are any errors.
Not really, at least not by itself. See
http://people.redhat.com/jrfuller/cms/ for detailed documentation of what is
included with RH AS 2.1 (it's some $500 extra for AS 3).
I had to write some pretty paranoid scripts that take care of assembling
software raids, checking the fs and mountig it while taking care about the
other machine to prevent problems.

Of course all this would be much easier with some kind of clustered fs, but
clustered fs brings a new problem: locking. Almost all i've seen so far have
an external 'locking manager' on a separate box, which brings ethernet
latency into every lock operation, which i'm sure is very noticable in the
lock-heavy usage patterns as mail is. But this is just my feeling, i haven't
yet benchmarked any :)
Do you think using RH's cluster software is a valuable consideration for
this kind of clustering setup? Using FreeBSD there are not that many
clustering solutions for now, and if it's advisable to at least consider
using RH here (although I have no experience with RH) we can certainly
look at it. (Any idea how fast RH would "recover services"?)
This RH cluster software is nothing fancy; i'm sure equivalents exists for
BSDs. See documentation link above. Actually it is just Kimberlite
(http://oss.missioncriticallinux.com/projects/kimberlite/), sold with RedHat
support.
"Speed" of recovery is almost completely out of the cluster control. The
only thing that matters for the cluster is what your cyrus init script
returns when called with 'status' parameter. Everything else is up to your
init scripts.
Of course, if one box dies completely, the other takes over in the
configurable time.
--
Jure Pečar
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Michael Loftis
2004-09-20 00:00:42 UTC
Permalink
--On Monday, September 20, 2004 00:43 +0200 Jure Pe ar
On Sun, 19 Sep 2004 00:52:08 -0700 (PDT)
Nice review of replication ABC :)
1. Active->Slave replication with manual failover
This is really the simplest way to do it. Rsync (and friends) does 90% of
the required job here; the only thing it's lacking is the concept of the
"mailbox" as a unit. It would be nice if our daemon here would do its job
in an atomic way.
A few days ago someone was asking for an event notification system that
would be able to call some program when a certain action happened on a
mailbox. Something like this would come handy here i think :)
we were doing this but really, rsync does not scale well. when you get
lots of small files it takes it loner to figure out what to transfer, than
it'd take to just transfer almost everything over (assuming a small 768kbit
to about 1.5mbit link and a average sized messages mailstore). and unles
you break it up into smaller chunks, it'll gobble up wads of RAM during the
process. insane amounts like well over a gig or so for our mailstore with
about humm 51Gb of mail. Not exactly sure the number of files off the top
of my head though it could be figured if wanted.

---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Jure Pe_ar
2004-09-19 22:43:34 UTC
Permalink
On Sun, 19 Sep 2004 00:52:08 -0700 (PDT)
David Lang <***@digitalinsight.com> wrote:

Nice review of replication ABC :)
1. Active->Slave replication with manual failover
This is really the simplest way to do it. Rsync (and friends) does 90% of
the required job here; the only thing it's lacking is the concept of the
"mailbox" as a unit. It would be nice if our daemon here would do its job in
an atomic way.
A few days ago someone was asking for an event notification system that
would be able to call some program when a certain action happened on a
mailbox. Something like this would come handy here i think :)
2. Active->Slave replication with automatic failover
2 is really just 1 + your heartbeat package of choice and some scripts to
tie it all together.
3. Active->Slave replication with Slave able to accept client connections
I think here would be good to start thinking about the app itself and define
"connections" better. Cyrus has three kinds of "connections" that modify a
mailbox: lmtp that puts new mails into mailbox, pop that (generally)
retrieves (and delete) them and imap that does both plus some other (folder
ops and moving mails around).
Now if you deceide that it does not hurt you if slave is "a bit" out of date
when it accepts a connection (but i guess most of us would find this
unacceptable), you can ditch some of the complexity; but you'd want the
changes that were made on the slave in that connection to propagate up to
the master. I dont really like this, because the concepts of master and
slave gets blurred here and things can easily end in a mess.
Once you have mailstores that are synchronizing each other in a way that is
not very well defined, you'll end up with conflicts sooner or later. There
are some unpredictable factors like network latency that can lead you to
unexpected situations easily.
4. #3 with automatic failover
Another level of mess over 3 :)
5. Active/Active
designate one of the boxes as primary and identify all items in the
datastore that absolutly must not be subject to race conditions between
the two boxes (message UUID for example). In addition to implementing the
replication needed for #1 modify all functions that need to update these
critical pieces of data to update them on the master and let the master
update the other box.
Exactly. This is the atomicy i was mentioning above. I'd say this is going
to be the larger part of the job.
6. active/active/active/...
This is what most of us would want.
while #6 is the ideal option to have it can get very complex
Despite everything you've said, i still think this *can* be done in a
relatively simple way. See my previos mail where i was dreaming about the
whole ha concept in a raid way.
There i assumed murder as the only agent through which clinets would be able
to access their mailboxes. If you think of murder handling all of the jobs
of your daemon in 1-4, one thing that you gain immediately is much simpler
synchronization of actions between the mailstore machines. If you start
empty or with exactly the same data on two machines, all that murder needs
to do is take care that both receive the same commands and data in the same
order.
Also if you put all logic into one place, backend mailstores need not to be
taught any special tricks and can remain pretty much as they are today.

Or am i missing something?
personally I would like to see #1 (with a sample daemon or two to provide
basic functionality and leave the doors open for more creative uses)
followed by #3 while people try and figure out all the problems with #5
and #6
and i would like to see that we come here to a conclusion of what kind of ha
setup would be best for all and focus our energy on only one implementation.
I have enough old hardware here (and i'm getting some more in about a month)
that i can setup a nice little test environment. Right now it also looks
like i'll have plenty of time in the february - june 2005 so i can volunteer
to be a tester.
there are a lot of senerios that are possible with #1 or #3 that are not
possible with #5
One i think is slave of a slave of a slave (...) kind of setup. Does anybody
really need such setup for a mail? I understand it for a ldap for example,
there are even some things where it is usefull for a sql database, but i see
no reason to have it for a mail server.
--
Jure Pečar
http://jure.pecar.org/

---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
David Lang
2004-09-20 17:07:23 UTC
Permalink
5. Active/Active
designate one of the boxes as primary and identify all items in the
datastore that absolutly must not be subject to race conditions between
the two boxes (message UUID for example). In addition to implementing
the replication needed for #1 modify all functions that need to update
these critical pieces of data to update them on the master and let the
master update the other box.
We may be talking at cross purposes (and its entirely likely that I've
got the wrong end of the stick!), but I consider active-active to be
the case where there is no primary: users can make changes to either
system, and if the two systems lose touch with each other they have
to resolve their differences when contact is reestablished.
Since this is a setup where there is no primary at all, I suppose this is
quite some different design then the #1-4 solutions. And because of that, I
would think that it's rather useless to have these steps done in order to get
#5 right, but I might as well be wrong.
actually I think most of the work nessasary for #1 is also needed for
#5-6.

for #1 you need to have the ability for a system report all it's changes
to a daemon and the ability for a system to read in changes and implement
them. #5 needs the same abilities plus the ability to resolve conflicts.

the HA steps of #2 and #4 don't gain that much, but they can also be done
external to cyrus so it's not a problem to skip them.

#3 involves changes to the update code to have cyrus take special actions
with soem types of updates. there would need to be changes in the same
area for #5, but they would be different.

David Lang
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Paul Dekkers
2004-09-20 08:51:28 UTC
Permalink
5. Active/Active
designate one of the boxes as primary and identify all items in the
datastore that absolutly must not be subject to race conditions
between the two boxes (message UUID for example). In addition to
implementing the replication needed for #1 modify all functions that
need to update these critical pieces of data to update them on the
master and let the master update the other box.
We may be talking at cross purposes (and its entirely likely that I've
got the wrong end of the stick!), but I consider active-active to be
the case where there is no primary: users can make changes to either
system, and if the two systems lose touch with each other they have
to resolve their differences when contact is reestablished.
I'd go for #5 as well:
Since this is a setup where there is no primary at all, I suppose this
is quite some different design then the #1-4 solutions. And because of
that, I would think that it's rather useless to have these steps done in
order to get #5 right, but I might as well be wrong.

I would be most happy when the work would start on #5. Personally I
don't care that much at this moment for #6, but I can imagine that this
is different for others. But well; if the design is that every machine
tracks changes and they have them propagated (actively or passively) to
n hosts (it's not so hard to keep track of that, "all hosts had this
change; remove it") there is no risk of missing things or not recovering
I guess. (It's only possible that a slave is out of sync for a very
short time, and well - why would that be so wrong? And if that is so
wrong, then maybe fix that later since this would make the work easier?)

This could be the task of the cyrus daemon, but it can as well be the
work of murder as Jure suggests. (Or both?) I'm not entirely sure that
that is what we want, but it could be done if that fits nicely (and it
can be asured that there is always a murder to talk to).

If there is a problem with UID selection, I don't see a problem in that
one of the servers is responsible for that task. We don't even need an
election system for that, you could define a sequence for the servers;
if a server with the highest preference is down, then take over its job.
It's just that for the users the machines should appear all active. (And
that in case of failover the remaining machines remain active, and not
readonly or only active after manual intervention.)

Paul


---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
David Carter
2004-09-20 09:02:46 UTC
Permalink
here is the problem.
you have a new message created on both servers at the same time. how do you
allocate the UID without any possibility of stepping on each other?
With a new UIDvalidity you can choose any ordering you like. Of course one
of the two servers has to make that choice, and the potential for race
conditions here and elsewhere in an active-active solution is amusing.
--
David Carter Email: ***@ucs.cam.ac.uk
University Computing Service, Phone: (01223) 334502
New Museums Site, Pembroke Street, Fax: (01223) 334679
Cambridge UK. CB2 3QH.
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
David Carter
2004-09-20 17:12:28 UTC
Permalink
Thanks, this is exactly the type of feedback that I was hopeing to get.
so you are saying that #5 is more like $50k-100k and #6 goes up from
there
If anyone could implement Active-Active for Cyrus from scratch in 100 to
150 hours it would be Ken, but I think that its a tall order. Sorry.
--
David Carter Email: ***@ucs.cam.ac.uk
University Computing Service, Phone: (01223) 334502
New Museums Site, Pembroke Street, Fax: (01223) 334679
Cambridge UK. CB2 3QH.
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
David Lang
2004-09-20 17:00:53 UTC
Permalink
assiming that the simplest method would cost ~$3000 to code I would make a
wild guess that the ballpark figures would be
1. active/passive without automatic failover $3k
2. active/passive with automatic failover (limited to two nodes or withing
a murder cluster) $4k
3. active/passive with updates pushed to the master $5k
4. #3 with auto failover (failover not limited to two nodes or a single
murder cluster) $7k
5. active/active (limited to a single geographic location) $10k
6. active/active/active (no limits) $30k
in addition to automaticly re-merge things after a split-brin has happened
would probably be another $5k
I think that you are missing a zero (or at least a fairly substantial
multipler!) from 5. 1 -> 4 can be done without substantial changes to the
Cyrus core code, and Ken would be able to use my code as a reference
implementation, even if he wanted to recode everything from scratch. 5 and 6
would require a much more substantial redesign and I suspect quite a lot of
trial and error as this is unexplored territory for IMAP servers.
Thanks, this is exactly the type of feedback that I was hopeing to get. so
you are saying that #5 is more like $50k-100k and #6 goes up from there

Ok folks, how much are you really willing to pay for this and since the
amount of work involved translates fairly directly into both cost and time
how long are you willing to go with nothing?

David Lang
--
University Computing Service, Phone: (01223) 334502
New Museums Site, Pembroke Street, Fax: (01223) 334679
Cambridge UK. CB2 3QH.
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
David Carter
2004-09-20 10:21:58 UTC
Permalink
assiming that the simplest method would cost ~$3000 to code I would make a
wild guess that the ballpark figures would be
1. active/passive without automatic failover $3k
2. active/passive with automatic failover (limited to two nodes or withing a
murder cluster) $4k
3. active/passive with updates pushed to the master $5k
4. #3 with auto failover (failover not limited to two nodes or a single
murder cluster) $7k
5. active/active (limited to a single geographic location) $10k
6. active/active/active (no limits) $30k
in addition to automaticly re-merge things after a split-brin has happened
would probably be another $5k
I think that you are missing a zero (or at least a fairly substantial
multipler!) from 5. 1 -> 4 can be done without substantial changes to the
Cyrus core code, and Ken would be able to use my code as a reference
implementation, even if he wanted to recode everything from scratch. 5 and
6 would require a much more substantial redesign and I suspect quite a lot
of trial and error as this is unexplored territory for IMAP servers.
--
David Carter Email: ***@ucs.cam.ac.uk
University Computing Service, Phone: (01223) 334502
New Museums Site, Pembroke Street, Fax: (01223) 334679
Cambridge UK. CB2 3QH.
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
David Lang
2004-09-19 23:54:50 UTC
Permalink
please don't misunderstand my posts. it's not that I don't think that
active/active/active is possible, it's just that I think it's far more
complicated.

assiming that the simplest method would cost ~$3000 to code I would make a
wild guess that the ballpark figures would be

1. active/passive without automatic failover $3k

2. active/passive with automatic failover (limited to two nodes or withing
a murder cluster) $4k

3. active/passive with updates pushed to the master $5k

4. #3 with auto failover (failover not limited to two nodes or a single
murder cluster) $7k

5. active/active (limited to a single geographic location) $10k

6. active/active/active (no limits) $30k

in addition to automaticly re-merge things after a split-brin has happened
would probably be another $5k

now this doesn't mean that all ofs must be done in this funded project. I
believe that people would end up going from #1 or #3 to #2 or #4 by
individuals coding the required pieces and sharing them (#4 has as much of
a jump over #3 becouse of all the different senrios that are involved,
each one is individually simple) however #5 and #6 are significantly more
difficult and I would not expent them to just happen (they are also much
more intrusinve to the code so there is some possibility of them not
getting merged into the core code quickly)

David Lang

-- There are two ways of constructing a software design. One way is to
make it so simple that there are obviously no deficiencies. And the other
way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare

P.S. #1-4 all could qualify as the first way, #5 and #6 are both
complicated enough to start with that it is really hard to keep them out
of the second way
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
David Lang
2004-09-19 22:48:49 UTC
Permalink
5. Active/Active
designate one of the boxes as primary and identify all items in the
datastore that absolutly must not be subject to race conditions between
the two boxes (message UUID for example). In addition to implementing the
replication needed for #1 modify all functions that need to update these
critical pieces of data to update them on the master and let the master
update the other box.
We may be talking at cross purposes (and its entirely likely that I've
got the wrong end of the stick!), but I consider active-active to be
the case where there is no primary: users can make changes to either
system, and if the two systems lose touch with each other they have
to resolve their differences when contact is reestablished.
UUIDs aren't a problem (each machine in a cluster owns its own fraction of
the address space). Message UIDs are a big problem. I guess in the case of
conflict, you could bump the UIDvalidity value on a mailbox and reassign UIDs
for all the messages, using timestamps determine the eventual ordering of
messages. Now that I think about it, maybe that's not a totally absurd idea.
It would involve a lot of work though.
the problem is that when they are both up you have to have one of them
allocate the message UID's or you have to change the UIDVALIDITY for every
new message that arrives.

here is the problem.
you have a new message created on both servers at the same time. how do
you allocate the UID without any possibility of stepping on each other?

the only way to do this is to have some sort of locking so that only one
machine at a time can allocate UID's. you can shuffle this responsibility
back and forth between machines, but there's a significant amount of
overhead in doing this so the useual answer is just to have one machine
issue the numbers and the other ask the first for a number when it needs
it.

changing UIDVALIDITY while recovering from a split-brain is probably
going to be needed.

but as you say it's a lot of work (which is why I'm advocating the simpler
options get released first :-)
best use of available hardware as the load is split almost evenly
between the boxes.
best availability becouse if there is a failure half of the clients won't
see it at all
Actually this is what I do right now by having two live mailstores. Half the
mailboxes on each system are active, the remainder are passive.
right, but what this would allow is sharing the load on individual
mailboxes

useually this won't matter, but I could see it for shared mailboxes

David Lang
--
University Computing Service, Phone: (01223) 334502
New Museums Site, Pembroke Street, Fax: (01223) 334679
Cambridge UK. CB2 3QH.
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
David Carter
2004-09-19 10:37:58 UTC
Permalink
5. Active/Active
designate one of the boxes as primary and identify all items in the
datastore that absolutly must not be subject to race conditions between
the two boxes (message UUID for example). In addition to implementing
the replication needed for #1 modify all functions that need to update
these critical pieces of data to update them on the master and let the
master update the other box.
We may be talking at cross purposes (and its entirely likely that I've
got the wrong end of the stick!), but I consider active-active to be
the case where there is no primary: users can make changes to either
system, and if the two systems lose touch with each other they have
to resolve their differences when contact is reestablished.

UUIDs aren't a problem (each machine in a cluster owns its own fraction of
the address space). Message UIDs are a big problem. I guess in the case of
conflict, you could bump the UIDvalidity value on a mailbox and reassign
UIDs for all the messages, using timestamps determine the eventual
ordering of messages. Now that I think about it, maybe that's not a
totally absurd idea. It would involve a lot of work though.
best use of available hardware as the load is split almost evenly between
the boxes.
best availability becouse if there is a failure half of the clients won't
see it at all
Actually this is what I do right now by having two live mailstores. Half
the mailboxes on each system are active, the remainder are passive.
--
David Carter Email: ***@ucs.cam.ac.uk
University Computing Service, Phone: (01223) 334502
New Museums Site, Pembroke Street, Fax: (01223) 334679
Cambridge UK. CB2 3QH.
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
David Lang
2004-09-19 07:52:08 UTC
Permalink
There are many ways of doing High Availability. This is an attempt to
outline the various methods with the advantages and disadvantages. Ken and
David (and anyne else who has thoughts on this) please feel free to add to
this. I'm attempting to outline them roughly in order of complexity.

1. Active->Slave replication with manual failover

This is where you can configure one machine to output all changes to a
local daemon and another machine to implement the changes that are read
from a local daemon.

Pro:
simplist implementation, since it makes no assumptions about how you
are going to use it, it also sets no limits on how it is used.

This is the basic functionality that all other variations will need so
it's not wasted work no matter what is done later

allows for multiple slaves from a single master

allows for the propogation traffic pattern to be defined by the
sysadmin (either master directly to all slaves or a tree-like propogation
to save on WAN bandwidth when multiple slaves are co-located

by involving a local daemon at each server there is a lot of
flexibility in exactly how the replication takes place.
for example you could
use netcat as your daemon for instant transmission of the
messages
have a daemon that caches the messages so that if the link
drops the messages are saved
have a daemon that gets an acknowlegement from the far side that
the message got through
have a daemon that batches the messages up and compresses them for
more efficiant transport
have a daemon that delays all messages by a given time period to
give you a way to recover from logical corruption without having to go to
a backup
have a daemon that filters the messages (say one that updates
everything except it won't delete any messages so you have a known safe
archive of all messages)
etc

Con:
since it makes no assumptions about how you are going to use it, it
also gives you no help in useing it in any particular way


2. Active->Slave replication with automatic failover

This takes #1, limits it to a pair of boxes and through changes to
murder or other parts of cyrus will swap the active/slave status of the
two boxes

Pro:
makes setting up of a HA pair of boxes easier

increases availability by decreasing downtime

Con:
this functionality can be duplicated without changes to cyrus by the
use of an external HA/cluster software package.

Since this now assumes a particular mode of operation it starts to
limit other uses (for example, if this is implemented as part of murder
then it won't help much if you are trying to replicate to a DR datacenter
several thousand miles away).

Split-brain conditions are the responsibility of cyrus to prevent or
solve. These are fundamentaly hard problems to get right in all cases


3. Active->Slave replication with Slave able to accept client connections

This takes #1 and then further modifies the slave so that requests that
would change the contents of things get relayed to the active box and then
the results of the change get propogated back down before they are visable
to the client.

Pro:
simulates active/active operation although it does cause longer delays
when clients issue some commands.

use of slaves for local access can reduce the load on the master
resulting in higher performance.

can be cascaded to multiple slaves and multiple tiers of slaves as
needed

in case of problems on the master the slaves can continue to operate as
read-only servers providing degraded service while the master is fixed.
depending on the problem with the master this may be very preferable to
having to re-sync the master or recover from a split-brain situation

Con:
more extensive modifications needed to trap all changes and propogate
them up to the master

how does the slave know when the master has implemented the change (so
that it can give the result to the client)

raises questions about the requirement to get confirmation og all
updates before the slave can respond to the client (for example, if a
slave decides to read a message that is flagged as new should the slave
wait until the master confirms that it knows the message has been read
before it gives it to the client, or should it give the message to the
client and not worry if the update fails on the master)

since the slave needs to send updates to the master the latency of the
link between them can become a limiting factor in the performance that
clients see when connecting to the slave

4. #3 with automatic failover

Since #3 supports multiple slaves the number of failover senerios grow
significantly. you have multiple machines that could be the new master and
you have the split-brain senerio to watch out for.

Pro:
increased availability by decreasing failover time

potentially easier to setup then with external clustering software

Con:
increased complexity

runs the risk of breaking some deployment senerios in an attempt to
simplify others

5. Active/Active

designate one of the boxes as primary and identify all items in the
datastore that absolutly must not be subject to race conditions between
the two boxes (message UUID for example). In addition to implementing the
replication needed for #1 modify all functions that need to update these
critical pieces of data to update them on the master and let the master
update the other box.

Pro:
best use of available hardware as the load is split almost evenly
between the boxes.

best availability becouse if there is a failure half of the clients
won't see it at all

Con:
significantly more complex then the other options.

behavior during a failure is less obvious

split-brain recovery is not straightforward and if automatic failover
is active the sysadmin will have no option to have things degraded
slightly while a problem is fixed

depending on the implementation this may be very sensitive to network
latency between the machines and could be very suitable for working with
machines in the same datacenter, but worthless for machines thousands of
miles apart.

6. active/active/active/...

Take #5 and extend the idea to more then a pair of boxes. this makes the
updates more complex to propogate (they now need to be sent to every other
machine in the cluster)

Pro:
better load balancing then #5

allows for the ability to have a HA pair in a primary location and a
backup in a remote location (i.e. your main HQ has two boxes, but your
disaster recovery center has one as well)

Con:
the complexity goes up significantly when you shift from 2 to n boxes
in a cluster.

the bandwidth required for updates increases by a factor of roughly n!

significantly more split-brain senerios become possible and need to be
accounted for.



-------------------------------------------------------------------------

while #6 is the ideal option to have it can get very complex

personally I would like to see #1 (with a sample daemon or two to provide
basic functionality and leave the doors open for more creative uses)
followed by #3 while people try and figure out all the problems with #5
and #6

there are a lot of senerios that are possible with #1 or #3 that are not
possible with #5 and very little of the work needed to release #1 and #3
as supported options is not work that needs to be done towards #5/6 anyway
(the pieces need to be identified in the code and hooks put in place in
the code at those locations. the details of the hooks will differ slightly

David Lang

--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Paul Dekkers
2004-09-17 06:55:38 UTC
Permalink
Hi,
I think this would cause performance to suffer greatly. I think what
we want is "lazy" replication, where the client gets instant results
from the machine its connected to, and the replication is done in the
background. I believe this is what David's implementation does.
Yes, but if I understood it well it is per action, and not long after
the action was performed on one of the machines. (It should at least not
take long, but get in queue/backlog or something for the background
process? I'm not sure how it's done in David's patch, and neither if
that is really what we should go for, but that's up to you developers :-))
I would say not at an interval but as soon as there is an action
performed on one mailbox, the other one would be pushed to do
something. I believe that is called rolling replication.
I would not be really happy with a interval synchronisation. It would
make it harder to use both platforms at the same time, and that is
what I want as well. So there is a little-bit of load-balancing
involved, but more and more _availability_.
It plays a role that in our situation there is also spamassassin running
on the servers: if that could be distributed because one mail can be
delivered to one box and another one to the other that would already
mean quite some load-balancing: and then we have not taken the load of
cyrus into account :-)
Being able to use both platforms at the same time maybe implies that
there is either no master/slave role or that this is auto-elected
between the two and that this role is floating...
I'm not sure about that, btw: I'm no good programmer, but I can imagine
that this is a something you want.

If one server is down it should mean that all tasks can be performed at
the other one. I 'm curious how this would look if both servers are
still running but cannot reach eachother. If there is indeeed a UUID:
what if there are doubles... but I guess that has been taken into account.

Paul

---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Fabian Fagerholm
2004-09-17 05:18:10 UTC
Permalink
I think this would cause performance to suffer greatly. I think what we
want is "lazy" replication, where the client gets instant results from
the machine its connected to, and the replication is done in the
background. I believe this is what David's implementation does.
Question: Are people looking at this as both redundancy and
performance, or just redundance?
There has to be some balance between the two, of course. What exactly
would that balance be? A while back I had some ideas of lazy replication
between geographically separate nodes in a mail cluster, to solve a
problem that a customer was having. I think I posted something on this
very list back then. There was some research, but the costs involved in
actually implementing the thing were too big, and the time to do it was
too short.

The idea was to get rid of the single-master structure of Murder, and
have an assymetric structure where each node in the mail cluster can act
as "primary" for one or several domains, and as "secondary" for one or
several domains, at the same time. Synchronization could flow in either
direction. Each domain would have one primary server and some number of
secondary servers -- redundancy could be increased by adding slaves and
performance could be increased by placing them close to users in the
network topology. Placing slaves in a geographically remote location
would act as a sort of hot backup -- if one server breaks then you just
replace it and let it synchronize with an available replica. Basically,
think DNS here, and add the ability to inject messages at any node.

Let's say you have five servers and three offices (customers) -- you'd
set up one server in your own facilities, one server in a co-location
facility, and one server in each of your customers' facilities.

You configure the server in your network -- which acts as a kind of
administration point -- and in the co-location network to handle "all
domains" and each server in the customers' facilities to handle mail
only for their domain(s). You then create domains and mailboxes on the
server close to you in the network topology. The mailboxes will be lazy-
replicated to the correct locations. Using suitable DNS records, you can
have mail delivered directly to each customer's server, and it would
lazy-replicate to your servers. Your servers would act as MX backups
when the customer's network is down, and the mail would be lazy-
replicated to them when they reappear on the network. Also, you could
support dial-up users by having them connect to the co-located server
instead of having to open firewalls etc. to the customer's network,
which is potentially behind a much slower link.

So to answer your question, I believe that by selecting a suitable
structure, you could actually address both performance and redundancy at
the same time. (Although I realize I've broadened the terms beyond what
you probably meant originally.)

In any case, I'd be willing to join the fundraising, but before that I'd
like to see an exact specification of what is actually being
implemented. I imagine that the specification could be drafted here on
this list, put somewhere on the web along with the fundraising details,
and we'd go from there.

Cheers,
--
Fabian Fagerholm <***@paniq.net>
Earl R Shannon
2004-09-17 12:39:13 UTC
Permalink
Hello,

All that you say is true. But for performance one either
buys bigger and better or multiple machines to spread the
load. Murder allows one to buy multiple machines.

All I am saying is that improving perforance may already
be done. I believe redundancy in the application is more
important at this point.

Regards,
Earl Shannon
--On Thursday, September 16, 2004 22:14 -0400 Earl Shannon
Hello,
Question: Are people looking at this as both redundancy and
performance, or just redundance?
My $0.02 worth. Performance gains can be found the traditional way, ie,
faster hardware, etc.Our biggest need is for redundance. If something
goes wrong on one machine we still need to be able to let users use
email.
Cyrus already has this solved via MURDER, but FWIW, more smaller boxes
isolate failures more effectively than one big box, also
price/performance is still better at a certain size for any platform,
and going up higher on the performance curve has HUGE price jumps.
There's also the cost of administering multiple separate boxes to think
about but carefully planned, this can be managed rather easily.
The whole 'throw bigger and bigger boxen' at it method of 'scaling'
doesn't scale. You hit the wall. One box can only do so much, granted
you can spend LOTS of money and get pretty big boxes, but at some point
it becomes ludicrous -- who would use a Sun E10k/E15k and a whole
Symmetrix DMX for just mail? (and I'm excluding companies like AOL and
IBM who actually can afford it and would maybe have a reason to scale to
that size)...
Price/Performance has a curve associated with it, most of us can't
afford to always stay at the top end of the curve, and have to be at the
middle. Further, does it make sense to re-invest in equipment every year
to maintain growth? No, you should be able to expand, add another box,
or two, and that scales fairly well. Better than the single big box
approach.
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
--
Systems Programmer ,Information Technology Division
NC State University.
http://www.earl.ncsu.edu

Anonymous child "Some people can tell the time by looking at the sun,
but I have trouble seeing the numbers."
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Michael Loftis
2004-09-17 02:37:31 UTC
Permalink
--On Thursday, September 16, 2004 22:14 -0400 Earl Shannon
Post by Earl R Shannon
Hello,
Question: Are people looking at this as both redundancy and
performance, or just redundance?
My $0.02 worth. Performance gains can be found the traditional way, ie,
faster hardware, etc.Our biggest need is for redundance. If something
goes wrong on one machine we still need to be able to let users use email.
Cyrus already has this solved via MURDER, but FWIW, more smaller boxes
isolate failures more effectively than one big box, also price/performance
is still better at a certain size for any platform, and going up higher on
the performance curve has HUGE price jumps.

There's also the cost of administering multiple separate boxes to think
about but carefully planned, this can be managed rather easily.

The whole 'throw bigger and bigger boxen' at it method of 'scaling' doesn't
scale. You hit the wall. One box can only do so much, granted you can
spend LOTS of money and get pretty big boxes, but at some point it becomes
ludicrous -- who would use a Sun E10k/E15k and a whole Symmetrix DMX for
just mail? (and I'm excluding companies like AOL and IBM who actually can
afford it and would maybe have a reason to scale to that size)...

Price/Performance has a curve associated with it, most of us can't afford
to always stay at the top end of the curve, and have to be at the middle.
Further, does it make sense to re-invest in equipment every year to
maintain growth? No, you should be able to expand, add another box, or
two, and that scales fairly well. Better than the single big box approach.


---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Earl Shannon
2004-09-17 02:14:09 UTC
Permalink
Hello,
Question: Are people looking at this as both redundancy and
performance, or just redundance?
My $0.02 worth. Performance gains can be found the traditional way, ie,
faster hardware, etc.Our biggest need is for redundance. If something
goes wrong on one machine we still need to be able to let users use email.


---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
John Andrews
2004-09-17 13:48:45 UTC
Permalink
I'm not in a position to donate, but I would like to throw in a vote for
the raid style implementation. We have a murder with 4 backend servers
and that would definitely be a feature that I would take advantage of.
My only question is how well would that scale, you would have to
redistribute the backup mailboxes on the first, last, and new servers.
On Fri, 17 Sep 2004 08:25:26 +0200
I would say not at an interval but as soon as there is an action
performed on one mailbox, the other one would be pushed to do something.
I believe that is called rolling replication.
I would not be really happy with a interval synchronisation. It would
make it harder to use both platforms at the same time, and that is what
I want as well. So there is a little-bit of load-balancing involved, but
more and more _availability_.
Being able to use both platforms at the same time maybe implies that
there is either no master/slave role or that this is auto-elected
between the two and that this role is floating...
Paul
I'm jumping back into this thread a bit late ...
My feeling is that most of cyrus instalations run one or a few domains with
many users; at least that is my case. That's why i'd base any kind of
replication we come up with on the mailbox as the base unit. As raid uses
disk block for its unit, so would we use mailbox (with all its subfolders).
In a way that one would be able to take care of the whole domains on the
higher level, if needed.
Today we have the option of using murder (or perdition, with some added
logic) when more than one backend machine is needed. This brings us a kind
of "raid linear" (linux md speak) or concatenation of space into a single
mailstore. With all the 'features' of such setup: if you lose one
machine(disk), all users(data) on that machine(disk) are not available.
So what i'm thinking is we need is a kind of raid1 or mirroring of
mailboxes. Imagine user1 having its mailbox on server1 and server2, user2 on
server2 and server3, user3 on server3 and server1 ... for example. Murder is
already a central point with a knowledge of where a certain mailbox is and
how to proxy pop, imap and lmtp to it and in my way of seeing things, it
would be best to teach it how to handle this 'mirroring' too.
Let say one of the two mailboxes is primary, and the other is secondary;
murder connects to the primary, lets the client do whatever it wants and
then replays the exact same actions to the secondary mailbox. If this is
done after the primary disconnects or while the client is still talking to
the primary, is implementation detail.
Performance bonus: connect to both mailboxes at once and pronounce as
primary the one that responds faster :)
Murder would have to know how to record and playback the whole client-server
dialogues. Considering that there's already a system in cyrus that lets
admin see the 'telemetry' of the imap conversation, i guess this could be
extended and tied into murder.
So far this is just how clients would talk to our system.
What else would we need?
Certanly a mechanism to manually move mailboxes between servers in a way
that murder knows about the changes. Thinking of it, mupdate protocol
already knows how to push metadatas around; why not extend it so it can also
move mailboxes? Or should perl mupdate module be born and then some scripts
should be written with it and imap?
Then maybe some mechanism for murder to deceide on which servers to put
newly created mailboxes on. Ideally this would be plugin based with
different policies (load, disk space, responsiveness, combination of those,
something else), but a simple round robin would do for a start.
For those that do not want to have mailboxes in sync, a mechanism to delay
updates to the secondary mailbox. (In this case, which mailbox is primary
and which is secondary should not change) Also a way of handling huge piles
of backlogs in case one of the machines is down for a longer period of time.
Maybe a mechanism to sync the mailbox from the other server and discarding
the backlogs would be handy in such case. And a way to manually trigger such
resync on a specific mailbox.
Probalby something else i can't think of right now.
So how does this "cyrus in a raid view" sound? It should probalby be called
"raims" for redundand array of inexpensive mail servers anyway ;)
This way all the logic is done in one place and you only have to take good
care (in a HA sense) of the mupdate master machine. Others can remain cheap
and relatively dumb than can be pulled offline at will. Given fast enough
and reliable links, this could also work in a geographycally distributed
manner.
Ken, is something like this reasonable?
Oh, i'd like to know what fastmail.fm folks think about all this HA thing.
I'm sure they have some interesting insights :)
--
John Andrews
Systems Administrator
NPG Cable, Inc.
(816) 273-0337
***@npgco.com

---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
David Carter
2004-09-17 10:31:29 UTC
Permalink
So how does this "cyrus in a raid view" sound? It should probalby be
called "raims" for redundand array of inexpensive mail servers anyway ;)
We call it RAIN: Redundant Array of Inexpensive Nodes.

Really cheap Intel servers in our case :)
--
David Carter Email: ***@ucs.cam.ac.uk
University Computing Service, Phone: (01223) 334502
New Museums Site, Pembroke Street, Fax: (01223) 334679
Cambridge UK. CB2 3QH.
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Jure Pe_ar
2004-09-17 10:11:00 UTC
Permalink
On Fri, 17 Sep 2004 08:25:26 +0200
I would say not at an interval but as soon as there is an action
performed on one mailbox, the other one would be pushed to do something.
I believe that is called rolling replication.
I would not be really happy with a interval synchronisation. It would
make it harder to use both platforms at the same time, and that is what
I want as well. So there is a little-bit of load-balancing involved, but
more and more _availability_.
Being able to use both platforms at the same time maybe implies that
there is either no master/slave role or that this is auto-elected
between the two and that this role is floating...
Paul
I'm jumping back into this thread a bit late ...

My feeling is that most of cyrus instalations run one or a few domains with
many users; at least that is my case. That's why i'd base any kind of
replication we come up with on the mailbox as the base unit. As raid uses
disk block for its unit, so would we use mailbox (with all its subfolders).
In a way that one would be able to take care of the whole domains on the
higher level, if needed.

Today we have the option of using murder (or perdition, with some added
logic) when more than one backend machine is needed. This brings us a kind
of "raid linear" (linux md speak) or concatenation of space into a single
mailstore. With all the 'features' of such setup: if you lose one
machine(disk), all users(data) on that machine(disk) are not available.

So what i'm thinking is we need is a kind of raid1 or mirroring of
mailboxes. Imagine user1 having its mailbox on server1 and server2, user2 on
server2 and server3, user3 on server3 and server1 ... for example. Murder is
already a central point with a knowledge of where a certain mailbox is and
how to proxy pop, imap and lmtp to it and in my way of seeing things, it
would be best to teach it how to handle this 'mirroring' too.

Let say one of the two mailboxes is primary, and the other is secondary;
murder connects to the primary, lets the client do whatever it wants and
then replays the exact same actions to the secondary mailbox. If this is
done after the primary disconnects or while the client is still talking to
the primary, is implementation detail.

Performance bonus: connect to both mailboxes at once and pronounce as
primary the one that responds faster :)

Murder would have to know how to record and playback the whole client-server
dialogues. Considering that there's already a system in cyrus that lets
admin see the 'telemetry' of the imap conversation, i guess this could be
extended and tied into murder.

So far this is just how clients would talk to our system.

What else would we need?

Certanly a mechanism to manually move mailboxes between servers in a way
that murder knows about the changes. Thinking of it, mupdate protocol
already knows how to push metadatas around; why not extend it so it can also
move mailboxes? Or should perl mupdate module be born and then some scripts
should be written with it and imap?

Then maybe some mechanism for murder to deceide on which servers to put
newly created mailboxes on. Ideally this would be plugin based with
different policies (load, disk space, responsiveness, combination of those,
something else), but a simple round robin would do for a start.

For those that do not want to have mailboxes in sync, a mechanism to delay
updates to the secondary mailbox. (In this case, which mailbox is primary
and which is secondary should not change) Also a way of handling huge piles
of backlogs in case one of the machines is down for a longer period of time.
Maybe a mechanism to sync the mailbox from the other server and discarding
the backlogs would be handy in such case. And a way to manually trigger such
resync on a specific mailbox.

Probalby something else i can't think of right now.


So how does this "cyrus in a raid view" sound? It should probalby be called
"raims" for redundand array of inexpensive mail servers anyway ;)

This way all the logic is done in one place and you only have to take good
care (in a HA sense) of the mupdate master machine. Others can remain cheap
and relatively dumb than can be pulled offline at will. Given fast enough
and reliable links, this could also work in a geographycally distributed
manner.


Ken, is something like this reasonable?


Oh, i'd like to know what fastmail.fm folks think about all this HA thing.
I'm sure they have some interesting insights :)
--
Jure Pečar

---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Attila Nagy
2004-09-20 11:38:19 UTC
Permalink
currently we have murder which will spread the load across multiple
machines.
currently we have many tools available to detect a server failure and
run local scripts to reconfigure machines (HACMP on AIX, hearbeat for
Linux, *BSD, Solaris, etc)
what we currently do not have is any ability to have one mailstore
updated to match changes in another one.
I don't think that it would be a good idea to just solve the
replication, without the failover function.

What is in my mind is a setup, which has mailboxes defined to a given
machine (eg. with murder in front of them) and one or more additional
servers which also have those mailboxes' replicas.

The best would be that a mailbox could be set master or slave, so an
IMAP backend could function as the backup of the other one(s), without
adding hardware which are used only 2 times a year.
I also would not be really satisfied with interval synchronisation as
the only choice.
I guess in the enterprise era high availability doesn't mean that if one
of your mail backends go down you can serve the customers his/her
yesterday's messages, so this periodic sync won't really help.
One could do this already with good scripting capabilities...
--
Attila Nagy e-mail: ***@fsn.hu
Free Software Network (FSN.HU) phone @work: +361 371 3536
ISOs: http://www.fsn.hu/?f=download cell.: +3630 306 6758
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
David Carter
2004-09-17 09:20:54 UTC
Permalink
Isn't it possible to have equal roles? If all changes are put in some
backlog, and a synchroniser process runs on both machines and pushes the
backlog (as soon as there is any) to another machine... then you can
have the some process on both (equal) servers... Of course there needs
to be some more intelligence, but that's basicly what I would expect.
We have 16 servers: half the accounts on each system are master copies and
half are replicas. Each machine has a small database (a CDB lookup file)
to tell it whether a given account is master or slave. The replication
engine (which runs independently from the normal master spawned jobs)
bails out rapidly if the replica copy of an account is updated: it would
proceed to transform the master into a copy of the replica, but that's
probably not what you wanted :). I have a tool which allows me to switch
the master and replica copy for any (inactive) account without having to
shut anything down. This tool also lets me migrate data off onto a third
system and immediately create a replica of that. This makes upgrading
operating systems a much less fraught task.
In my sketch above (really not sure if it works of course) where both
have something like a backlog you can like "tail" that backlog and push
the update as soon as possible to the second machine. You solve the
thing you mention with delays while pushing updates to two servers at
the same time.
Yes, that's exactly how my code works. Asynchronous replication (which Ken
called lazy replication) is fairly easy to do in Cyrus. Synchronous
replication, where you only get a response to an IMAP/POP/LMTP command
when the data is safely committed to the replica, would involve a much
more substantial rewrite of the Cyrus code.

That's where block based replication schemes like DRDB have a big
advantage: the state that they have to track is much less involved.

I'm currently running with a replication cycle of one second on my live
servers for "rolling" replication (that's just a name I made up, its not
an official term), so on average we would lose of half a second of update
traffic for 1/16th of our user base if a single system failed. Further
safeguards are possible by keeping copies of incoming mail for a short
time on the MTA systems, but that's not really a Cyrus concern.

We also replicate to a tape backup spooling engine overnight. The
replication engine is rather useful for fast incremental updates.
Post by Paul Dekkers
If one server is down it should mean that all tasks can be performed
at the other one. I 'm curious how this would look if both servers are
what if there are doubles... but I guess that has been taken into
account.
UUIDs are just a convenient representation of message text, so that you
can pass messages by reference rather than value. Duplicates don't matter
(though I don't believe that they actual occur given my allocation scheme)
so long as the message text is the same. I maintain databases of MD5
checksums for messages and cache text just to be on the safe side.

UUIDs were originally just Mailbox UniqueID + Message UID. Unfortunately,
UniqueID isn't very Unique: its just a simple hash of the mailbox name. I
ended up allocating UUIDs in large chunks from the master process on each
machine. If a process runs out of UUIDS (which would take some going as
they are allocated in chunks of 2**24), it falls back to call by value.
--
David Carter Email: ***@ucs.cam.ac.uk
University Computing Service, Phone: (01223) 334502
New Museums Site, Pembroke Street, Fax: (01223) 334679
Cambridge UK. CB2 3QH.
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Paul Dekkers
2004-09-17 08:05:45 UTC
Permalink
Question: Are people looking at this as both redundancy and
performance, or just redundance?
Cyrus performs pretty well already. Background redundancy would be
awesome. Especially if we had control over when the syncing process
occurred either via time interval or date/time.
I would say not at an interval but as soon as there is an action
performed on one mailbox, the other one would be pushed to do
something. I believe that is called rolling replication.
I would not be really happy with a interval synchronisation. It would
make it harder to use both platforms at the same time, and that is
what I want as well. So there is a little-bit of load-balancing
involved, but more and more _availability_.
Being able to use both platforms at the same time maybe implies that
there is either no master/slave role or that this is auto-elected
between the two and that this role is floating...
right, but there are already tools freely available on most platforms
to do the election and changing of the role (by switching between
config files and restarting the master) what is currently lacking is
any ability to do the master/slave role. once we have that it's just a
little scripting to tie just about any failover software in to make it
automatic.
There are indeed tools available for that, but they're not always
working as they're supposed to do and are often very OS limited. With
FreeBSD I had no luch with heartbeat (wouldn't compile under FreeBSD-5),
(U)CARP was not available and FreeVRRP was buggy (at least in my case,
sometimes I had two masters).

Also I wouldn't like it when restarting the cyrus-process with a
different config-file is necessary (or there must be a seperate process
for synchronising that needs restarting, that would make it better).
That would still kill connections to that cyrus-process, I'd rather see
a software switch between that role.

Isn't it possible to have equal roles? If all changes are put in some
backlog, and a synchroniser process runs on both machines and pushes the
backlog (as soon as there is any) to another machine... then you can
have the some process on both (equal) servers... Of course there needs
to be some more intelligence, but that's basicly what I would expect.
one thing we need to watch out for here is that we don't set an
impossible/unreasonable goal.
I agree that we'll have to define properly what we expect and what is
reasonable, but I think that at this moment Ken (as developer) has the
best overview in this. We offer our wishlist, and I suppose he
translates that to code in his head ;-)
I suppose that's why he came up with the question about performance
versus redundancy/availability.
don't try to solve every problem and add every availablity feater you
can imagine all at once. instead let's look at the building blocks
that are needed and identify what's currently not available.
I don't agree there completely: I don't want to depend on yet another
tool that defines what the master or slave is. Sometimes they don't work
at all, work only at the same LAN, ... I'm not sure if you can count on
that.
(Hmm, you're the first that mentions the clustering software for
defining roles, and I didn't read about this on your website either.
This is new to me.)
currently we have murder which will spread the load across multiple
machines.
Yes, that's indeed something we don't need looking at :-)
(Although there is a posibility now to spread load as well of course,
with two machines available at the same time...)
currently we have many tools available to detect a server failure and
run local scripts to reconfigure machines (HACMP on AIX, hearbeat for
Linux, *BSD, Solaris, etc)
what we currently do not have is any ability to have one mailstore
updated to match changes in another one.
I would combine these two, and I think that can be done by just
well-designing the last thing you mention.
I also would not be really satisfied with interval synchronisation as
the only choice.
In my sketch above (really not sure if it works of course) where both
have something like a backlog you can like "tail" that backlog and push
the update as soon as possible to the second machine. You solve the
thing you mention with delays while pushing updates to two servers at
the same time.
I think we need something where the primary mailstore pushes a record
of it's changes to the secondary mailstore
Why not also vise versa?!
We want the two servers to be accessible at the same time, right?
If one server is down it should mean that all tasks can be performed at the
other one. I 'm curious how this would look if both servers are still running
but cannot reach eachother. If there is indeeed a UUID: what if there are
doubles... but I guess that has been taken into account.
In cluster terminology this situation is known as being 'split-brained'
and is generally viewed as a 'VERY BAD THING' that each cluster software
solves in a slightly different way, from having an odd number of machines
in the cluster (so that only one half of the cluster can actually have
enough machines to function) to physicly disconnecting power from a
machine deemed to have failed (if both boxes attempt to powe each other
down one will generally win and avoid being shut off itself, but even if
they do manage to power each other down at least you avaoided the
split-brain situation)
leave this up to the cluster software. don't try to put this in cyrus
initially.
I still don't see why we need clustering software here?! I only see
application replication, no clustering software at all - am I wrong?

If we indeed need a mechanism for UUID's for the messages, maybe one can
define that on one server the messages are odd and on the other even, or
that there is a different range on one server then for the other. (Not
sure if this is really necessary, but in fact I really don't want to
depend on clustering software.) I don't know, I supposed you already
handled that with your patches?

Paul

---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
David Lang
2004-09-17 07:12:07 UTC
Permalink
Date: Fri, 17 Sep 2004 08:25:26 +0200
Subject: Re: Funding Cyrus High Availability
Hi,
Question: Are people looking at this as both redundancy and
performance, or just redundance?
Cyrus performs pretty well already. Background redundancy would be
awesome. Especially if we had control over when the syncing process
occurred either via time interval or date/time.
I would say not at an interval but as soon as there is an action performed on
one mailbox, the other one would be pushed to do something. I believe that is
called rolling replication.
I would not be really happy with a interval synchronisation. It would make it
harder to use both platforms at the same time, and that is what I want as
well. So there is a little-bit of load-balancing involved, but more and more
_availability_.
Being able to use both platforms at the same time maybe implies that there is
either no master/slave role or that this is auto-elected between the two and
that this role is floating...
right, but there are already tools freely available on most platforms to
do the election and changing of the role (by switching between config
files and restarting the master) what is currently lacking is any ability
to do the master/slave role. once we have that it's just a little
scripting to tie just about any failover software in to make it automatic.

one thing we need to watch out for here is that we don't set an
impossible/unreasonable goal. don't try to solve every problem and add
every availablity feater you can imagine all at once. instead let's look
at the building blocks that are needed and identify what's currently not
available.

currently we have murder which will spread the load across multiple
machines.

currently we have many tools available to detect a server failure and run
local scripts to reconfigure machines (HACMP on AIX, hearbeat for Linux,
*BSD, Solaris, etc)

what we currently do not have is any ability to have one mailstore updated
to match changes in another one.

once we have that ability there are many things that can be built by
glueing togeather existing code. once we have a bit of experiance with
people actually useing these features it will then be obvious which
features need better integration with Cyrus and which make sense to remain
seperate.

I also would not be really satisfied with interval synchronisation as the
only choice.

I think we need something where the primary mailstore pushes a record of
it's changes to the secondary mailstore

This can then be tweaked in several directions.

1. locking can be added so that the primary doesn't complete it's command
until the secondary says it has a permanent record of the change
(two-phase commit or a reasonable facimily of such)

2. batch up the changes until they hit some threshold (size or time or
combination) and then send a batch of changes all at once

3. recongnise it's own changes to gain the ability to push updates in both
directions at the same time (true two-phase commit with bi-directional
replication, some horribile performance pathalogical cases, but attractive
in some cases)

or other varients

but these all share the same common need

the ability for the master to output all it's changes and the ability for
a slave to read such changes and update itself to match

the nice thing is that with IMAP much of the data needed is already
output, you could do a first approximation of this with a client that
opened a seperate connection to every folder on the primary server and
just sat watching for server messages and whenever it saw an update send
the matching command to the slave (fetching the full data as needed to get
all the info). this obviously won't scale to any reasonalbe size, but this
means that most of what's needed is already identified so the core could
be just a common output of the exisitng messages with a little more data
(mailbox and folder in most cases, message contents in a few)

let's get these small, but critical pieces done and then we can grow and
experiment from there.

David Lang
Paul
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Paul Dekkers
2004-09-17 06:25:26 UTC
Permalink
Hi,
Question: Are people looking at this as both redundancy and
performance, or just redundance?
Cyrus performs pretty well already. Background redundancy would be
awesome. Especially if we had control over when the syncing process
occurred either via time interval or date/time.
I would say not at an interval but as soon as there is an action
performed on one mailbox, the other one would be pushed to do something.
I believe that is called rolling replication.

I would not be really happy with a interval synchronisation. It would
make it harder to use both platforms at the same time, and that is what
I want as well. So there is a little-bit of load-balancing involved, but
more and more _availability_.

Being able to use both platforms at the same time maybe implies that
there is either no master/slave role or that this is auto-elected
between the two and that this role is floating...

Paul

---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Eric S. Pulley
2004-09-17 00:40:28 UTC
Permalink
--On Thursday, September 16, 2004 6:56 PM -0400 Ken Murchison
<***@oceana.com> wrote:
[SNIP]
Question: Are people looking at this as both redundancy and
performance, or just redundance?
Cyrus performs pretty well already. Background redundancy would be awesome.
Especially if we had control over when the syncing process occurred either
via time interval or date/time.
--
ESP
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Derrick J Brashear
2004-09-19 01:20:24 UTC
Permalink
I'm not sure that IMAP is ameniable to active-active: the prevalence of
UIDs in the protocol means that it would be very hard to resolve the
inconsistencies that would occur if a pair of machines ever lost touch.
Right, I was assuming that active-passive is what we would probably get, I
was just taking the pulse of the community.
In the past I have attempted to steer things internally at work toward
active-active solutions, but I think expecting that here will result in an
unrealistically complex solution to deploy, if implemented.
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Ken Murchison
2004-09-18 12:20:42 UTC
Permalink
Actually what I was really asking, is are people looking for an
active-active config and an active-passive config?
I'm not sure that IMAP is ameniable to active-active: the prevalence of
UIDs in the protocol means that it would be very hard to resolve the
inconsistencies that would occur if a pair of machines ever lost touch.
Right, I was assuming that active-passive is what we would probably get,
I was just taking the pulse of the community.
I would be happy to be proved wrong: active-active is clearly better
from a system administrator perspective :).
--
Kenneth Murchison Oceana Matrix Ltd.
Software Engineer 21 Princeton Place
716-662-8973 x26 Orchard Park, NY 14127
--PGP Public Key-- http://www.oceana.com/~ken/ksm.pgp
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
David Carter
2004-09-18 11:11:01 UTC
Permalink
Actually what I was really asking, is are people looking for an
active-active config and an active-passive config?
I'm not sure that IMAP is ameniable to active-active: the prevalence of
UIDs in the protocol means that it would be very hard to resolve the
inconsistencies that would occur if a pair of machines ever lost touch.

I would be happy to be proved wrong: active-active is clearly better from
a system administrator perspective :).
--
David Carter Email: ***@ucs.cam.ac.uk
University Computing Service, Phone: (01223) 334502
New Museums Site, Pembroke Street, Fax: (01223) 334679
Cambridge UK. CB2 3QH.
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Paul Dekkers
2004-09-18 07:16:22 UTC
Permalink
Hi,
Question: Are people looking at this as both redundancy and
performance, or just redundance?
for performance we already have murder, what we currently lack is
redundancy. once we have redundancy then the next enhancement is
going to be to teach murder about it so that it can failover to the
backup box(s) as needed, but for now simply having the full data at
the backup location would be so far ahead of where we are now that
the need to reconfigure murder for a failover is realitivly trivial
by comparison.
Actually what I was really asking, is are people looking for an
active-active config and an active-passive config?
My vote is certainly for active-active...

And if feasible, I would also choose to have an equal role for both
servers. I think in this stage (altough maybe not if David's patch is
copied entirely) that this would be not so much work extra, but when
adding it later it seems much more work to me. (It's just a matter of
design I suppose: having two backlogs and synchronising them to the
other host. This is also what you want with an active-active situation,
it shouldn't matter who you're talking to.)
In my sketch above (really not sure if it works of course) where both
have something like a backlog you can like "tail" that backlog and
push the update as soon as possible to the second machine. You solve
the thing you mention with delays while pushing updates to two
servers at the same time.
Yes, that's exactly how my code works. Asynchronous replication (which
Ken called lazy replication) is fairly easy to do in Cyrus.
Synchronous replication, where you only get a response to an
IMAP/POP/LMTP command when the data is safely committed to the
replica, would involve a much more substantial rewrite of the Cyrus code.
I don't know the exact benefits of that solution, but I can also imagine
that this raises problems if one server is down. (You have to use a
backlog then anyway.) I think I care more about having two servers
active (with the option of active-down) and a good recovery mechanism
then if the synchronisation is lazy or not ;-) (and I think that it
might be easier to recover (when e.g. both servers crash) with a
backlog, but that's really up to the programmers.)

Bye,
Paul

---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
David Lang
2004-09-19 07:57:18 UTC
Permalink
Mike, one of the problems with this is that different databases have
different interfaces and capabilities.
if you design it to work on Oracle then if you try to make it work on
MySQL there are going to be quite a few things you need to change.
--snip
another issue in all this is the maintainance of the resulting code. If
this code can be used in many different situations then more people will
use it (probably including CMU) and it will be maintained as a
side effect
of any other changes. however if it's tailored towards a very narrow
situation then only the people who have that particular problem will use
it and it's likly to have issues with new changes.
I'd actually figured something like ODBC would be used, with prepared
statements. /shrug. Abstract the whole interface issue.
unfortunantly there are a few problems with this

to start with ODBC is not readily available on all platforms.

secondly ODBC can't cover up the fact that different database engines have
vastly differeing capabilities. if you don't use any of these capabilities
then you don't run into this pitfall, but if you want to you will.

I really wish that ODBC did live up to it's hype, but in practice only the
most trivial database users can transparently switch from database to
database by changing the ODBC config

David Lang
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
s***@sziisoft.com
2004-09-17 21:43:22 UTC
Permalink
-----Original Message-----
Sent: Friday, September 17, 2004 2:25 PM
Subject: RE: Funding Cyrus High Availability
Mike, one of the problems with this is that different databases have
different interfaces and capabilities.
if you design it to work on Oracle then if you try to make it work on
MySQL there are going to be quite a few things you need to change.
--snip
another issue in all this is the maintainance of the resulting code. If
this code can be used in many different situations then more people will
use it (probably including CMU) and it will be maintained as a
side effect
of any other changes. however if it's tailored towards a very narrow
situation then only the people who have that particular problem will use
it and it's likly to have issues with new changes.
I'd actually figured something like ODBC would be used, with prepared
statements. /shrug. Abstract the whole interface issue.

Just some thoughts. =)

-Mike/Szii
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
David Lang
2004-09-17 21:24:48 UTC
Permalink
My biggest question here is, simply, why recreate what's already
out there?
There are a number of projects (LVM, PVFS) which do this kind of
replication/distribution/virtulization for filesystems.
There are a number of databases which have active/active clustering
(mysql, DB2, Oracle, et al) and master/slave.
Personally, I would LOVE to see a full RDBMS-backed system. You
define your database(s) in the config file ... and that is all.
All configuration options are stored on the central RDBMS. All
mailboxes are stored there. You can then rely 100% on the RDBMS
systems for clustering/failover/scalability/backing up ... all
datastorage domain problems which they have already addressed/solved.
<SNIP>
The other advantages would be very nice integration with other
applications which can query against databases. (ex: postfix directly
supports mysql lookups.)
But then, I can't afford to really help with this myself so take
my thoughts with a big "hope" pill. =D
Mike, one of the problems with this is that different databases have
different interfaces and capabilities.

if you design it to work on Oracle then if you try to make it work on
MySQL there are going to be quite a few things you need to change.

if you start on MySQL and then port to Oracle then you either ignore a
large chunk of Oracle functionality that you could use or you end up
having to re-write a bunch of stuff to take advantage of it.

I also would love this option (I would choose postgres as the back-end)
but this is significantly more complicated then a master->slave
replication modification to Cyrus.

As such it would cost more to get written and you would have fewer people
willing to pay for any particular version.

another issue in all this is the maintainance of the resulting code. If
this code can be used in many different situations then more people will
use it (probably including CMU) and it will be maintained as a side effect
of any other changes. however if it's tailored towards a very narrow
situation then only the people who have that particular problem will use
it and it's likly to have issues with new changes.

David Lang
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Jure Pe_ar
2004-09-17 21:12:43 UTC
Permalink
On Fri, 17 Sep 2004 13:28:08 -0700
My biggest question here is, simply, why recreate what's already
out there?
Because none of the existing solutions does not fit our needs well enough.
There are a number of projects (LVM, PVFS) which do this kind of
replication/distribution/virtulization for filesystems.
We're discussing replication on the application level. Block level
replication is nice for many things, but doesn't really take care of
consistency, which cyrus relies on pretty much.
There are a number of databases which have active/active clustering
(mysql, DB2, Oracle, et al) and master/slave.
Personally, I would LOVE to see a full RDBMS-backed system. You
define your database(s) in the config file ... and that is all.
You can go with dbmail and one of the existing well established databases
anytime. This can solve the issue we're having here, but brings lots of
other problems that cyrus is keeping away. Just ask any Exchange admin :)
The other advantages would be very nice integration with other
applications which can query against databases. (ex: postfix directly
supports mysql lookups.)
For mail routing & auth, yes ... many of us are already doing this. However,
storing mail in a db gives you about 20% of db overhead (straight from the
Oralce sales rep) and i/o is already a very valuable resource ...
But then, I can't afford to really help with this myself so take
my thoughts with a big "hope" pill. =D
Yup :)
--
Jure Pečar

---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
s***@sziisoft.com
2004-09-17 20:28:08 UTC
Permalink
My biggest question here is, simply, why recreate what's already
out there?

There are a number of projects (LVM, PVFS) which do this kind of
replication/distribution/virtulization for filesystems.

There are a number of databases which have active/active clustering
(mysql, DB2, Oracle, et al) and master/slave.

Personally, I would LOVE to see a full RDBMS-backed system. You
define your database(s) in the config file ... and that is all.

All configuration options are stored on the central RDBMS. All
mailboxes are stored there. You can then rely 100% on the RDBMS
systems for clustering/failover/scalability/backing up ... all
datastorage domain problems which they have already addressed/solved.

If you want to scale out it's a matter of
1) install the cyrus software
2) Point the config file at the database server
3) Entry in the database server/cluster to allow the new frontend/proxy.
4) Fire up the daemons
5) Enjoy.

The other advantages would be very nice integration with other
applications which can query against databases. (ex: postfix directly
supports mysql lookups.)

But then, I can't afford to really help with this myself so take
my thoughts with a big "hope" pill. =D

-Mike/Szii
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
David Lang
2004-09-17 18:49:59 UTC
Permalink
Question: Are people looking at this as both redundancy and
performance, or just redundance?
for performance we already have murder, what we currently lack is
redundancy. once we have redundancy then the next enhancement is going to
be to teach murder about it so that it can failover to the backup box(s)
as needed, but for now simply having the full data at the backup location
would be so far ahead of where we are now that the need to reconfigure
murder for a failover is realitivly trivial by comparison.
Actually what I was really asking, is are people looking for an active-active
config and an active-passive config?
I think that everyone would love to have the active-active option, the
problem I have with this is that the active-passive config will solve many
peoples problems and I believe that is will be far simpler to do so I
don't want the ideal goal of active-active to end up side tracking the
huge progress that would be achieved by active-passive.

active-active also requires significantly different choices if the nodes
are seperated by significant distances. I'd hate to end up with an
active-active solution that works only with the machines all local and
still have no solution to the disaster recovery senerio.

David Lang
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Lee
2004-09-17 18:15:25 UTC
Permalink
My vote would be for active/active, its usually more reliable and of
course it builds in better scaleability. I imagine the the main
question of everyone will be how the choice of active/active or
active/passive will effect cost/time of implementation.

L
Question: Are people looking at this as both redundancy and
performance, or just redundance?
for performance we already have murder, what we currently lack is
redundancy. once we have redundancy then the next enhancement is
going to be to teach murder about it so that it can failover to the
backup box(s) as needed, but for now simply having the full data at
the backup location would be so far ahead of where we are now that
the need to reconfigure murder for a failover is realitivly trivial
by comparison.
Actually what I was really asking, is are people looking for an
active-active config and an active-passive config?
--
Kenneth Murchison Oceana Matrix Ltd.
Software Engineer 21 Princeton Place
716-662-8973 x26 Orchard Park, NY 14127
--PGP Public Key-- http://www.oceana.com/~ken/ksm.pgp
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Ken Murchison
2004-09-17 17:16:11 UTC
Permalink
Question: Are people looking at this as both redundancy and
performance, or just redundance?
for performance we already have murder, what we currently lack is
redundancy. once we have redundancy then the next enhancement is going
to be to teach murder about it so that it can failover to the backup
box(s) as needed, but for now simply having the full data at the backup
location would be so far ahead of where we are now that the need to
reconfigure murder for a failover is realitivly trivial by comparison.
Actually what I was really asking, is are people looking for an
active-active config and an active-passive config?
--
Kenneth Murchison Oceana Matrix Ltd.
Software Engineer 21 Princeton Place
716-662-8973 x26 Orchard Park, NY 14127
--PGP Public Key-- http://www.oceana.com/~ken/ksm.pgp
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
David Lang
2004-09-16 23:34:39 UTC
Permalink
Question: Are people looking at this as both redundancy and performance, or
just redundance?
for performance we already have murder, what we currently lack is
redundancy. once we have redundancy then the next enhancement is going to
be to teach murder about it so that it can failover to the backup box(s)
as needed, but for now simply having the full data at the backup location
would be so far ahead of where we are now that the need to reconfigure
murder for a failover is realitivly trivial by comparison.

David Lang
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Ken Murchison
2004-09-16 22:56:58 UTC
Permalink
I imagine for a big project like this, refunds could be given. I think
its more a matter of finding someone to deal with this. Id be happy to
do it, but i think it would be best if Ken or another core developer
that everyone knows and already trusts is in charge of holding the cash.
Any Ideas Ken?
I wouldn't expect anyone to give money until someone (me?) decides to
move forward with the project. As of right now it looks like we have
about $1k pledged in small increments, so it definitelt looks like there
is interest. If some of the "heavy hitters" come in, this project might
take off sooner rather than later.
I would bet that if a "Fund Cyrus Replication" link were made
prominently on the cyrus homepage, 3-5k could be raised in less than a
month.
I can't control this. that would be up to Derrick or someone else at
CMU. I can do whatever I want with the HTTP server at my location, if
somebody can point me at some HTML/PHP/Java/Perl/etc code which I cxan
put in place quickly.
P.S. Ken, not sure if this would be easier or more complex, but another
alternative here might be to write a mysql backend to cyrus, which would
eliminate the need to worry about redundancy given mysql's multimaster
functionality (this might also provide better searching/sort/access and
enormous scaleability to the cyrus backends).
I think this would cause performance to suffer greatly. I think what we
want is "lazy" replication, where the client gets instant results from
the machine its connected to, and the replication is done in the
background. I believe this is what David's implementation does.

Question: Are people looking at this as both redundancy and
performance, or just redundance?
Hello All,
I would be willing to pay for this function. Though I am just a
startup, and
have very little capital. Most I could prolly do is $100 to $200. Not
much.
My fear, which maybe the fear of others is the risk of putting money
in, but
there not being enough support by others to reach the cash goal. Thus the
project never is done. What happens in that case ?
Thanks,
http://www.horde.org/bounties/
Basically people can make paypal donations to fund certain features.
For something like the high availability support, Im guessing that ALOT
of people would donate small to large amounts of cash to see this
functionality implemented ( i certainly would).
What do you all think?
L
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
--
Kenneth Murchison Oceana Matrix Ltd.
Software Engineer 21 Princeton Place
716-662-8973 x26 Orchard Park, NY 14127
--PGP Public Key-- http://www.oceana.com/~ken/ksm.pgp
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Lee
2004-09-17 06:25:28 UTC
Permalink
mysql does not have multi-master functionality, and it's replication,
is quite honestly, a joke. You may have mis-spoken and are talking
about the up-and-coming mysql cluster or the mysql max product (both
of which i'm much less familiar with).
Indeed i was talking about mysql cluster (which is now included with
teh distro). Im pretty convinced having talked with some mysql peeps,
that cluster will eventually (not too distant future) be VERY bullet
proof. I just figured that writing cyrus to use mysql (or SQL SPEC) as
a backend might kill two birds with one stone, and create a better
general platforms for growth. None the less, id would love to see just
replication is everyone if mysql back is out.

L
<cut>
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Michael Loftis
2004-09-16 22:39:01 UTC
Permalink
--On Thursday, September 16, 2004 18:13 -0400 Lee <***@brown.edu>
wrote:

<cut>
P.S. Ken, not sure if this would be easier or more complex, but another
alternative here might be to write a mysql backend to cyrus, which would
eliminate the need to worry about redundancy given mysql's multimaster
functionality (this might also provide better searching/sort/access and
enormous scaleability to the cyrus backends).
mysql does not have multi-master functionality, and it's replication, is
quite honestly, a joke. You may have mis-spoken and are talking about the
up-and-coming mysql cluster or the mysql max product (both of which i'm
much less familiar with).

<cut>
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Lee
2004-09-16 22:13:03 UTC
Permalink
I imagine for a big project like this, refunds could be given. I think
its more a matter of finding someone to deal with this. Id be happy to
do it, but i think it would be best if Ken or another core developer
that everyone knows and already trusts is in charge of holding the
cash. Any Ideas Ken?

I would bet that if a "Fund Cyrus Replication" link were made
prominently on the cyrus homepage, 3-5k could be raised in less than a
month.

L

P.S. Ken, not sure if this would be easier or more complex, but another
alternative here might be to write a mysql backend to cyrus, which
would eliminate the need to worry about redundancy given mysql's
multimaster functionality (this might also provide better
searching/sort/access and enormous scaleability to the cyrus backends).
Hello All,
I would be willing to pay for this function. Though I am just a
startup, and
have very little capital. Most I could prolly do is $100 to $200. Not
much.
My fear, which maybe the fear of others is the risk of putting money
in, but
there not being enough support by others to reach the cash goal. Thus
the
project never is done. What happens in that case ?
Thanks,
http://www.horde.org/bounties/
Basically people can make paypal donations to fund certain features.
For something like the high availability support, Im guessing that
ALOT
of people would donate small to large amounts of cash to see this
functionality implemented ( i certainly would).
What do you all think?
L
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
S***@Jentoo.com
2004-09-16 20:58:59 UTC
Permalink
Hello All,

I would be willing to pay for this function. Though I am just a startup, and
have very little capital. Most I could prolly do is $100 to $200. Not much.
My fear, which maybe the fear of others is the risk of putting money in, but
there not being enough support by others to reach the cash goal. Thus the
project never is done. What happens in that case ?

Thanks,
http://www.horde.org/bounties/
Basically people can make paypal donations to fund certain features.
For something like the high availability support, Im guessing that ALOT
of people would donate small to large amounts of cash to see this
functionality implemented ( i certainly would).
What do you all think?
L
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Ken Murchison
2004-09-16 16:13:21 UTC
Permalink
http://www.horde.org/bounties/
Basically people can make paypal donations to fund certain features. For
something like the high availability support, Im guessing that ALOT of
people would donate small to large amounts of cash to see this
functionality implemented ( i certainly would).
What do you all think?
Works for me.
Hi,
I wouldn't hold out hope of anything being available in "some months".
I wrote my replication code two years ago, and submitted it to Rob
and Ken about this time last year. Neither I or they have put any
significant work into the code since then. As I indicated in my
previous message, we all have other priorities right now.
I can imagine, but I hoped that priorities would change a bit with
the amount of users that repeatedly
This link appears dead. All I get is "To clipboard".
Oops. There was never supposted to be a link :-)
interest in this feature and with the money we are willing to put in
:-|
I'm willing to work on it if there is money available. You are the
only one that has says that you would commit money. Where are the
rest of the folks? Based on the number of people that stepped up to
pay for virtdomains support (zero), I'm guessing there are fewer out
there willing to spend money than you think. But I could be wrong.
I'm happy to see that there are indeed others interested in this ;-)
Other than the altnamespace project ($5000) that I did for a
(unnamed) company in Texas, Jeremy Howard at Fastmail is the only one
who has consistently paid for features. I'll let him disclose what
he has spent, if he chooses to, but its safe to say that its been
more than just pizza and beer.
I expected more then pizza and beer, so that's no surprise :-)
I'd have to look at David's patch again and discuss things with CMU
to get a good time estimate, but I'm guessing that a project like
this would cost a few thousand dollars.
Ok, I'll start a discussion with our management based on your latest
estimation ($3000-$5000) and I'll let you know about the results.
(Might take a while, I think at least not this week. If you have more
details (for instance time estimation) let me know.)
Bye,
Paul
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
--
Kenneth Murchison Oceana Matrix Ltd.
Software Engineer 21 Princeton Place
716-662-8973 x26 Orchard Park, NY 14127
--PGP Public Key-- http://www.oceana.com/~ken/ksm.pgp
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Lee
2004-09-16 15:00:23 UTC
Permalink
What do people think about a bounty program like horde's:

http://www.horde.org/bounties/

Basically people can make paypal donations to fund certain features.
For something like the high availability support, Im guessing that ALOT
of people would donate small to large amounts of cash to see this
functionality implemented ( i certainly would).

What do you all think?

L
Hi,
I wouldn't hold out hope of anything being available in "some
months".
I wrote my replication code two years ago, and submitted it to Rob
and Ken about this time last year. Neither I or they have put any
significant work into the code since then. As I indicated in my
previous message, we all have other priorities right now.
I can imagine, but I hoped that priorities would change a bit with
the amount of users that repeatedly
This link appears dead. All I get is "To clipboard".
Oops. There was never supposted to be a link :-)
interest in this feature and with the money we are willing to put in
:-|
I'm willing to work on it if there is money available. You are the
only one that has says that you would commit money. Where are the
rest of the folks? Based on the number of people that stepped up to
pay for virtdomains support (zero), I'm guessing there are fewer out
there willing to spend money than you think. But I could be wrong.
I'm happy to see that there are indeed others interested in this ;-)
Other than the altnamespace project ($5000) that I did for a
(unnamed) company in Texas, Jeremy Howard at Fastmail is the only one
who has consistently paid for features. I'll let him disclose what
he has spent, if he chooses to, but its safe to say that its been
more than just pizza and beer.
I expected more then pizza and beer, so that's no surprise :-)
I'd have to look at David's patch again and discuss things with CMU
to get a good time estimate, but I'm guessing that a project like
this would cost a few thousand dollars.
Ok, I'll start a discussion with our management based on your latest
estimation ($3000-$5000) and I'll let you know about the results.
(Might take a while, I think at least not this week. If you have more
details (for instance time estimation) let me know.)
Bye,
Paul
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Simon Matter
2004-09-15 16:51:32 UTC
Permalink
On the other hand, if there is a application level redundancy on its
way, it doesn't really matter on what platform the machine runs, so
it would still make me happier and even with FreeBSD. And I would
rather put my money there. Even if it means we'll have to wait for
some months,
I wouldn't hold out hope of anything being available in "some months".
I wrote my replication code two years ago, and submitted it to Rob and
Ken about this time last year. Neither I or they have put any
significant work into the code since then. As I indicated in my
previous message, we all have other priorities right now.
I can imagine, but I hoped that priorities would change a bit with the
amount of users that repeatedly
<http://www.interglot.com/toclipboard.php?b=1&d=2&t=herhaaldelijk&s=herhaaldelijk&w=repeatedly>showed
This link appears dead. All I get is "To clipboard".
interest in this feature and with the money we are willing to put in :-|
I'm willing to work on it if there is money available. You are the only
one that has says that you would commit money. Where are the rest of
the folks? Based on the number of people that stepped up to pay for
virtdomains support (zero), I'm guessing there are fewer out there
willing to spend money than you think. But I could be wrong.
Other than the altnamespace project ($5000) that I did for a (unnamed)
company in Texas, Jeremy Howard at Fastmail is the only one who has
consistently paid for features. I'll let him disclose what he has
spent, if he chooses to, but its safe to say that its been more than
just pizza and beer.
I'd have to look at David's patch again and discuss things with CMU to
get a good time estimate, but I'm guessing that a project like this
would cost a few thousand dollars.
We are very interested in replicated shared folders. We have different
cyrus-imapd servers in different countries and would like to have common
shared folders. If this could also be implemented I'm sure we were able to
help sponsoring it.
There are also a number of commercial vendors of cyrus-imapd based
solutions who should be very interested in application level replication.

Simon


---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Gary Mills
2004-09-15 19:59:11 UTC
Permalink
how much are you asking for?
Since this is probably as complex, if not more, as altnamespace, I'd say
somewhere between $3000-$5000 as an initial estimate. That's 30-50
hours at a fairly cheap rate.
If people want to start pledging their support, perhaps enough
"incentive" can be pooled. If people don't feel comfortable doing this
in public, then feel free to send me a private email.
I'm certainly interested in adding some redundancy to our Cyrus
installation. I'm about to upgrade the hardware to a single Sun
V480 with 4 1200 MHz CPUs and 16 gigs of memory. The two internal
disks will be mirrored, and contain only the OS files. Everything
else will be on external RAID arrays. The next expansion should
add more IMAP storage and provide redundancy in the case of software
or equipment failure. I'm aware of Murder, but I'm not sure that
it's the best solution for us.

I don't control the funding, but I can recommend something.
--
-Gary Mills- -Unix Support- -U of M Academic Computing and Networking-
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Ken Murchison
2004-09-15 18:07:08 UTC
Permalink
how much are you asking for?
Since this is probably as complex, if not more, as altnamespace, I'd say
somewhere between $3000-$5000 as an initial estimate. That's 30-50
hours at a fairly cheap rate.

If people want to start pledging their support, perhaps enough
"incentive" can be pooled. If people don't feel comfortable doing this
in public, then feel free to send me a private email.
Date: Wed, 15 Sep 2004 11:44:45 -0400
Subject: Re: Cyrus crashed on redundant platform - need better
availability?
On the other hand, if there is a application level redundancy on
its way, it doesn't really matter on what platform the machine
runs, so it would still make me happier and even with FreeBSD. And
I would rather put my money there. Even if it means we'll have to
wait for some months,
I wouldn't hold out hope of anything being available in "some months".
I wrote my replication code two years ago, and submitted it to Rob
and Ken about this time last year. Neither I or they have put any
significant work into the code since then. As I indicated in my
previous message, we all have other priorities right now.
I can imagine, but I hoped that priorities would change a bit with
the amount of users that repeatedly
<http://www.interglot.com/toclipboard.php?b=1&d=2&t=herhaaldelijk&s=herhaaldelijk&w=repeatedly>showed
This link appears dead. All I get is "To clipboard".
interest in this feature and with the money we are willing to put in :-|
I'm willing to work on it if there is money available. You are the
only one that has says that you would commit money. Where are the
rest of the folks? Based on the number of people that stepped up to
pay for virtdomains support (zero), I'm guessing there are fewer out
there willing to spend money than you think. But I could be wrong.
Other than the altnamespace project ($5000) that I did for a (unnamed)
company in Texas, Jeremy Howard at Fastmail is the only one who has
consistently paid for features. I'll let him disclose what he has
spent, if he chooses to, but its safe to say that its been more than
just pizza and beer.
I'd have to look at David's patch again and discuss things with CMU to
get a good time estimate, but I'm guessing that a project like this
would cost a few thousand dollars.
--
Kenneth Murchison Oceana Matrix Ltd.
Software Engineer 21 Princeton Place
716-662-8973 x26 Orchard Park, NY 14127
--PGP Public Key-- http://www.oceana.com/~ken/ksm.pgp
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
--
Kenneth Murchison Oceana Matrix Ltd.
Software Engineer 21 Princeton Place
716-662-8973 x26 Orchard Park, NY 14127
--PGP Public Key-- http://www.oceana.com/~ken/ksm.pgp
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
David Lang
2004-09-15 16:09:46 UTC
Permalink
how much are you asking for?

David Lang
Date: Wed, 15 Sep 2004 11:44:45 -0400
Subject: Re: Cyrus crashed on redundant platform - need better availability?
On the other hand, if there is a application level redundancy on its
way, it doesn't really matter on what platform the machine runs, so it
would still make me happier and even with FreeBSD. And I would rather
put my money there. Even if it means we'll have to wait for some
months,
I wouldn't hold out hope of anything being available in "some months".
I wrote my replication code two years ago, and submitted it to Rob and
Ken about this time last year. Neither I or they have put any
significant work into the code since then. As I indicated in my previous
message, we all have other priorities right now.
I can imagine, but I hoped that priorities would change a bit with the
amount of users that repeatedly
<http://www.interglot.com/toclipboard.php?b=1&d=2&t=herhaaldelijk&s=herhaaldelijk&w=repeatedly>showed
This link appears dead. All I get is "To clipboard".
interest in this feature and with the money we are willing to put in :-|
I'm willing to work on it if there is money available. You are the only one
that has says that you would commit money. Where are the rest of the folks?
Based on the number of people that stepped up to pay for virtdomains support
(zero), I'm guessing there are fewer out there willing to spend money than
you think. But I could be wrong.
Other than the altnamespace project ($5000) that I did for a (unnamed)
company in Texas, Jeremy Howard at Fastmail is the only one who has
consistently paid for features. I'll let him disclose what he has spent, if
he chooses to, but its safe to say that its been more than just pizza and
beer.
I'd have to look at David's patch again and discuss things with CMU to get a
good time estimate, but I'm guessing that a project like this would cost a
few thousand dollars.
--
Kenneth Murchison Oceana Matrix Ltd.
Software Engineer 21 Princeton Place
716-662-8973 x26 Orchard Park, NY 14127
--PGP Public Key-- http://www.oceana.com/~ken/ksm.pgp
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
--
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
-- C.A.R. Hoare
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Ken Murchison
2004-09-15 15:44:45 UTC
Permalink
On the other hand, if there is a application level redundancy on its
way, it doesn't really matter on what platform the machine runs, so
it would still make me happier and even with FreeBSD. And I would
rather put my money there. Even if it means we'll have to wait for
some months,
I wouldn't hold out hope of anything being available in "some months".
I wrote my replication code two years ago, and submitted it to Rob and
Ken about this time last year. Neither I or they have put any
significant work into the code since then. As I indicated in my
previous message, we all have other priorities right now.
I can imagine, but I hoped that priorities would change a bit with the
amount of users that repeatedly
<http://www.interglot.com/toclipboard.php?b=1&d=2&t=herhaaldelijk&s=herhaaldelijk&w=repeatedly>showed
This link appears dead. All I get is "To clipboard".
interest in this feature and with the money we are willing to put in :-|
I'm willing to work on it if there is money available. You are the only
one that has says that you would commit money. Where are the rest of
the folks? Based on the number of people that stepped up to pay for
virtdomains support (zero), I'm guessing there are fewer out there
willing to spend money than you think. But I could be wrong.

Other than the altnamespace project ($5000) that I did for a (unnamed)
company in Texas, Jeremy Howard at Fastmail is the only one who has
consistently paid for features. I'll let him disclose what he has
spent, if he chooses to, but its safe to say that its been more than
just pizza and beer.

I'd have to look at David's patch again and discuss things with CMU to
get a good time estimate, but I'm guessing that a project like this
would cost a few thousand dollars.
--
Kenneth Murchison Oceana Matrix Ltd.
Software Engineer 21 Princeton Place
716-662-8973 x26 Orchard Park, NY 14127
--PGP Public Key-- http://www.oceana.com/~ken/ksm.pgp
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Paul Dekkers
2004-09-15 15:00:25 UTC
Permalink
On the other hand, if there is a application level redundancy on its
way, it doesn't really matter on what platform the machine runs, so
it would still make me happier and even with FreeBSD. And I would
rather put my money there. Even if it means we'll have to wait for
some months,
I wouldn't hold out hope of anything being available in "some months".
I wrote my replication code two years ago, and submitted it to Rob and
Ken about this time last year. Neither I or they have put any
significant work into the code since then. As I indicated in my
previous message, we all have other priorities right now.
I can imagine, but I hoped that priorities would change a bit with the
amount of users that repeatedly
<http://www.interglot.com/toclipboard.php?b=1&d=2&t=herhaaldelijk&s=herhaaldelijk&w=repeatedly>showed
interest in this feature and with the money we are willing to put in :-|

Paul


---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
David Carter
2004-09-15 13:14:09 UTC
Permalink
On the other hand, if there is a application level redundancy on its
way, it doesn't really matter on what platform the machine runs, so it
would still make me happier and even with FreeBSD. And I would rather
put my money there. Even if it means we'll have to wait for some months,
I wouldn't hold out hope of anything being available in "some months".

I wrote my replication code two years ago, and submitted it to Rob and Ken
about this time last year. Neither I or they have put any significant work
into the code since then. As I indicated in my previous message, we all
have other priorities right now.
--
David Carter Email: ***@ucs.cam.ac.uk
University Computing Service, Phone: (01223) 334502
New Museums Site, Pembroke Street, Fax: (01223) 334679
Cambridge UK. CB2 3QH.
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Paul Dekkers
2004-09-15 11:38:43 UTC
Permalink
Hi,
You are not using a clustered filesystem,
right?
No.
I can imagine that would be one of the advantages of RH's clustering,
since you don't have to mount a filesystem in that case for a machine
that just crashed - it would safe time...
But I suppose RH's cluster manager takes care of mounting the partitions
and checking them if there are any errors.
It's good but not perfect. We recently installed a huge SAN and are
now in the process of moving over the mail data to reside there.
Fibrechannel seems to be much more error tolerant than SCSI.
Where you working with a "multi-initiator enviroment" (as RH calls it)
or "single initiator" (e.g. with 2 machines on exactly the same SCSI
bus, or two seperate interfaces on your array's SCSI controller?)
I think with a multi-initiator enviroment (as we have it) there is a
very limited chance of failures.
Hmm, I don't expect the problems to be SCSI-related. Maybe it has to
do...
That's not what I was talking about. We have a similar setup, yet
still there were instances when Red Hat's cluster software failed to
write to the shared storage. I guess this was caused by the slow-downs
connected to the memory management, but Red Hat support indicated that
shared storage connected via FibreChannel would not have been as
susceptible to these problems.
Do you think using RH's cluster software is a valuable consideration for
this kind of clustering setup? Using FreeBSD there are not that many
clustering solutions for now, and if it's advisable to at least consider
using RH here (although I have no experience with RH) we can certainly
look at it. (Any idea how fast RH would "recover services"?)

On the other hand, if there is a application level redundancy on its
way, it doesn't really matter on what platform the machine runs, so it
would still make me happier and even with FreeBSD. And I would rather
put my money there. Even if it means we'll have to wait for some months,
we would do that and take the risk of running on a "less
automatic-failover-situation" with a worst-case downtime of 30 mins (or
2 mins regulary with sync-mounted filesystems now).
The kernel that shipped with RedHat AS 2.1 was useless for most of the
tasks i tried it with. About three revisions later it became somewhat
more usefull for non-oracle types of use, but i've rolled my own and am
not following the state of it now.
That's fine if you don't have to rely on commercial support. Our
management decided to go the supported path all the way. That doesn't
leave you many options. I have to say that when it works, the cluster
software works extremely well. It's just that it hasn't always worked
in the past ... ;-)
That's a plus for RH (ES|AS) 3

Paul


---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Sebastian Hagedorn
2004-09-15 09:18:40 UTC
Permalink
Hi,

--On Freitag, 10. September 2004 16:27 Uhr +0200 Paul Dekkers
Right, works fine for us for the most part. Hasn't always been like
that, but the most recent kernel updates by Red Hat have improved
matters a lot.
What did the kernel improve?
memory management for the most part. With 8 GB of RAM and lots of it free
there were previously situations where either the cache grew too large,
causing the machine to become extremely slow, or where forks failed (even
though there were oodles of free RAM). Both seem to have been resolved in
2.4.9-e.49enterprise.
You are not using a clustered filesystem,
right?
No.
Although many on the list claim that this (having 2 boxes with 1
disk-array) is a nice way for redundancy I'm in doubt now if this is
true.
It's good but not perfect. We recently installed a huge SAN and are
now in the process of moving over the mail data to reside there.
Fibrechannel seems to be much more error tolerant than SCSI.
Hmm, I don't expect the problems to be SCSI-related. Maybe it has to do
with GEOM and SMP in FreeBSD 5.2.1, but not the SCSI-bus itself. (There
are two seperate controllers for both machines, they never see each other
on the same SCSI bus...)
That's not what I was talking about. We have a similar setup, yet still
there were instances when Red Hat's cluster software failed to write to the
shared storage. I guess this was caused by the slow-downs connected to the
memory management, but Red Hat support indicated that shared storage
connected via FibreChannel would not have been as susceptible to these
problems.

--On Freitag, 10. September 2004 21:36 Uhr +0200 "Jure PeÄ?ar"
The kernel that shipped with RedHat AS 2.1 was useless for most of the
tasks i tried it with. About three revisions later it became somewhat
more usefull for non-oracle types of use, but i've rolled my own and am
not following the state of it now.
That's fine if you don't have to rely on commercial support. Our management
decided to go the supported path all the way. That doesn't leave you many
options. I have to say that when it works, the cluster software works
extremely well. It's just that it hasn't always worked in the past ... ;-)
I haven't had problems with the fiber itself, i've only had lots of fun
with the firmware on the disks themselves and some with the qlogic
drivers.
We've had our share of problems with those as well, but I hear that Red Hat
AS 3.0 ships with working QLogic drivers that work out of the box.

Cheers, Sebastian Hagedorn
--
Sebastian Hagedorn M.A. - RZKR-R1 (Gebäude 52), Zimmer 18
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
Universität zu Köln / Cologne University - Tel. +49-221-478-5587
David Carter
2004-09-14 15:22:46 UTC
Permalink
http://www-uxsup.csx.cam.ac.uk/~dpc22/cyrus/replication.html
- the code isn't available on that webpage
No, but the code is available to people who want to play with it on the
understanding that they get no sympathy from me if they try and run it on
a production server right now.
- it changes the mailstore layout, so you cut off yourself if you use that
instead of the mainstream version
The incompatible change is actually just a single 96 bit value per message
in the cyrus index file (a message UUID value, used to replicate the
single instance store). If a future UUID format was agreed and space was
reserved in current index files, the incompatibility would disappear. That
might be a path to more widespread testing.
- I guess it is for an older Cyrus, so you cannot easily upgrade
I passed a patch relative to 2.3 CVS on to Rob a few months back. The
replication code is largely orthogonal to the existing code: it only took
me a couple of hours to generate the patch from my existing 2.1.16 code.
I cannot say anything about its architectural problems, if there are any
at all.
I consider the code to be a prototype of the "obvious" way to do
application level replication in Cyrus. It works fine for us, but would
clearly require a careful audit before going into more widespread use.

Support for a number of things is missing simply because we have no need
for them right now: seen state handling for shared mail folders, quota
roots other than user.<whatever>, and in 2.2+ mailbox annotation and
virtual domains spring to mind. I don't think that any of these things
would be particularly hard to do, its just a Small Matter of Programming.

I would estimate that I've put in about around 3 to 4 months work on the
current code and that we would be talking about (at least!) several more
man months work between myself and Cyrus developers to get something
properly merged. Thats a fairly substantial undertaking for all involved,
particularly given that we all have other priorities.
--
David Carter Email: ***@ucs.cam.ac.uk
University Computing Service, Phone: (01223) 334502
New Museums Site, Pembroke Street, Fax: (01223) 334679
Cambridge UK. CB2 3QH.
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Attila Nagy
2004-09-14 13:03:24 UTC
Permalink
I wasn't following this entire thread, but if I'm not mistaken David
Carter from the University of Cambridge already implemented what you're
http://www-uxsup.csx.cam.ac.uk/~dpc22/cyrus/replication.html
There are some problems with that:
- the code isn't available on that webpage
- it changes the mailstore layout, so you cut off yourself if you use
that instead of the mainstream version
- I guess it is for an older Cyrus, so you cannot easily upgrade

I cannot say anything about its architectural problems, if there are any
at all.
Maybe if you collect some money you could send it to him :)
I don't want to collect money from others which I could send, I just
popped up the idea to see whether the developers and the users have
affinity to cooperate in this way.

High availability is a recurring topic on this list, so I guess the
users would be happy to have it in the standard cyrus distribution.
--
Attila Nagy e-mail: ***@fsn.hu
Free Software Network (FSN.HU) phone @work: +361 371 3536
ISOs: http://www.fsn.hu/?f=download cell.: +3630 306 6758
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Paul Dekkers
2004-09-14 13:23:30 UTC
Permalink
Hi,
So many users cried for this feature (to provide not just horizontal
scalability with murder, but to have redundant backends which can hold
each others replicas too) that I wonder: if it's so important to us, the
cyrus users, why don't we collect some money and pass it to the developers?
I wasn't following this entire thread, but if I'm not mistaken David
Carter from the University of Cambridge already implemented what you're
http://www-uxsup.csx.cam.ac.uk/~dpc22/cyrus/replication.html
Maybe if you collect some money you could send it to him :)
I think there is still some work that needs to be done by the developers
to integrate the code in the cyrus distribution. I have more confidence
in something that is in the cyrus distribution than a seperate patch.
I didn't look close enough yet, but it would be good if the cyrus
developers also reviewed this solution and make sure that everything has
been taken into account, wouldn't it?

If it means that in order to get this done in the near future we need to
pay a bit, I think that's certainly a good option.

Paul


---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
David G Mcmurtrie
2004-09-14 11:14:44 UTC
Permalink
So many users cried for this feature (to provide not just horizontal
scalability with murder, but to have redundant backends which can hold
each others replicas too) that I wonder: if it's so important to us, the
cyrus users, why don't we collect some money and pass it to the developers?
I wasn't following this entire thread, but if I'm not mistaken David
Carter from the University of Cambridge already implemented what you're
looking for:

http://www-uxsup.csx.cam.ac.uk/~dpc22/cyrus/replication.html

Maybe if you collect some money you could send it to him :)

Thanks,

Dave

PGP/GPG Key: http://www.pitt.edu/~dgm/gpgkey.asc.txt
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Attila Nagy
2004-09-14 07:57:03 UTC
Permalink
I still think that it would be best to have two filesystems instead of
one, so with mirroring on application level (cyrus)... :-)
I'd rather see murder store a message on two sepparate machines ... Actually
to have duplicated mailboxes in sync over a pool of backend machines, with
murder taking care of backlogs when one of them would go down.
So many users cried for this feature (to provide not just horizontal
scalability with murder, but to have redundant backends which can hold
each others replicas too) that I wonder: if it's so important to us, the
cyrus users, why don't we collect some money and pass it to the developers?

Maybe it could help to make the implementation real, and the developers
have already demonstrated that they can design and code such things.
--
Attila Nagy e-mail: ***@fsn.hu
Free Software Network (FSN.HU) phone @work: +361 371 3536
ISOs: http://www.fsn.hu/?f=download cell.: +3630 306 6758
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Jure Pečar
2004-09-10 19:36:01 UTC
Permalink
On Fri, 10 Sep 2004 16:27:40 +0200
Right, works fine for us for the most part. Hasn't always been like
that, but the most recent kernel updates by Red Hat have improved
matters a lot.
What did the kernel improve? You are not using a clustered filesystem,
right?
The kernel that shipped with RedHat AS 2.1 was useless for most of the tasks
i tried it with. About three revisions later it became somewhat more usefull
for non-oracle types of use, but i've rolled my own and am not following the
state of it now.
It's good but not perfect. We recently installed a huge SAN and are
now in the process of moving over the mail data to reside there.
Fibrechannel seems to be much more error tolerant than SCSI.
I haven't had problems with the fiber itself, i've only had lots of fun with
the firmware on the disks themselves and some with the qlogic drivers.
I still think that it would be best to have two filesystems instead of
one, so with mirroring on application level (cyrus)... :-)
I'd rather see murder store a message on two sepparate machines ... Actually
to have duplicated mailboxes in sync over a pool of backend machines, with
murder taking care of backlogs when one of them would go down.
--
Jure Pečar
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Michael Loftis
2004-09-10 19:22:45 UTC
Permalink
--On Friday, September 10, 2004 16:27 +0200 Paul Dekkers
What did the kernel improve? You are not using a clustered filesystem,
right?
RH kernels tend to coem up with bugs that noone else sees FYI (this is why
my employer we're switching to Debian...)
Well, it's UFS2 with softupdates, so yes. I'm afraid the journal was
damaged in my case, there were serveral complaints while doing the fsck
about softupdate inconsistencies. (The server crashed once more but since
I mounted with -o sync now the fsck was much faster. I'll keep it that
way for now untill we know what's really wrong - it was again with a
large mail-folder synchronisation...)
FWIW I can't call soft updates a journal. 9/10 times when i have had a
crash, the soft updates journal either was corrupt, inconsistent, or made
things worse. When running with soft updates many times I'd lose many days
worth of mail on a restart.
Hmm, I don't expect the problems to be SCSI-related. Maybe it has to do
with GEOM and SMP in FreeBSD 5.2.1, but not the SCSI-bus itself. (There
are two seperate controllers for both machines, they never see each other
on the same SCSI bus...)
Probably not, more likely something funkish in FBSD 5.2.1
I still think that it would be best to have two filesystems instead of
one, so with mirroring on application level (cyrus)... :-)
I tend to agree....


---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Paul Dekkers
2004-09-10 14:27:40 UTC
Permalink
Hi,
There are two machines for redundancy. If one fails, the other one
should
take over: mount the disks from the array, and move on.
Right, works fine for us for the most part. Hasn't always been like
that, but the most recent kernel updates by Red Hat have improved
matters a lot.
What did the kernel improve? You are not using a clustered filesystem,
right?
Unfortunally, the primary server crashed twice already. The first
time it
did while synchronising two IMAP-spools from the old server to the new
one. There was not much data on it back then. The second time was worse,
around 10Gb of mail was stored on the disks. We discovered that the fsck
took about 30 minutes,
Isn't your filesystem journaled? We use ext3 for ours. There *have*
been a few occasions where the journal had been damaged as well
(forcing us to run fsck), but those have been few and far between. In
all other instances the failover is nearly instantaneous.
Well, it's UFS2 with softupdates, so yes. I'm afraid the journal was
damaged in my case, there were serveral complaints while doing the fsck
about softupdate inconsistencies. (The server crashed once more but
since I mounted with -o sync now the fsck was much faster. I'll keep it
that way for now untill we know what's really wrong - it was again with
a large mail-folder synchronisation...)
Although many on the list claim that this (having 2 boxes with 1
disk-array) is a nice way for redundancy I'm in doubt now if this is
true.
It's good but not perfect. We recently installed a huge SAN and are
now in the process of moving over the mail data to reside there.
Fibrechannel seems to be much more error tolerant than SCSI.
Hmm, I don't expect the problems to be SCSI-related. Maybe it has to do
with GEOM and SMP in FreeBSD 5.2.1, but not the SCSI-bus itself. (There
are two seperate controllers for both machines, they never see each
other on the same SCSI bus...)

I still think that it would be best to have two filesystems instead of
one, so with mirroring on application level (cyrus)... :-)

Paul


---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Sebastian Hagedorn
2004-09-10 13:33:42 UTC
Permalink
Hi,

--On Freitag, 10. September 2004 13:24 Uhr +0200 Paul Dekkers
We're implementing a new mailplatform running on two dell 2650-servers (2
xeon cpu's with each 3 Ghz, HTT and 3Gb of memory) and with a disk array
of 4 Tb connected with a adaptec 39160 scsi controller for storage. We
installed FreeBSD 5.2.1 on it, and - of course - cyrus 2.2.8 (from the
ports) as IMAP server. Our MTA is postfix.
that's similar to our setup, be we are currently running Red Hat Advanced
Server 2.1, Cyrus 2.1.16 and sendmail.
There are two machines for redundancy. If one fails, the other one should
take over: mount the disks from the array, and move on.
Right, works fine for us for the most part. Hasn't always been like that,
but the most recent kernel updates by Red Hat have improved matters a lot.
Unfortunally, the primary server crashed twice already. The first time it
did while synchronising two IMAP-spools from the old server to the new
one. There was not much data on it back then. The second time was worse,
around 10Gb of mail was stored on the disks. We discovered that the fsck
took about 30 minutes,
Isn't your filesystem journaled? We use ext3 for ours. There *have* been a
few occasions where the journal had been damaged as well (forcing us to run
fsck), but those have been few and far between. In all other instances the
failover is nearly instantaneous.
Although many on the list claim that this (having 2 boxes with 1
disk-array) is a nice way for redundancy I'm in doubt now if this is
true.
It's good but not perfect. We recently installed a huge SAN and are now in
the process of moving over the mail data to reside there. Fibrechannel
seems to be much more error tolerant than SCSI.

Cheers, Sebastian Hagedorn
--
Sebastian Hagedorn M.A. - RZKR-R1 (Gebäude 52), Zimmer 18
Zentrum für angewandte Informatik - Universitätsweiter Service RRZK
Universität zu Köln / Cologne University - Tel. +49-221-478-5587
Jure Pečar
2004-09-10 19:45:21 UTC
Permalink
On Fri, 10 Sep 2004 16:32:33 +0200
Hmm, then your fscks will run faster/with less problems, but there is
still outage that you can prevent if there is failover in another way
and availability/replication on the application level.
If there are replicated spools it doesn't matter if the fsck takes long
or not... although there will be a backlog of course.
Yes, but right now there are no replicated spools on the app level so i'm
doing the best i can as a sysadmin :)
Is it possible to have an fsck running on one partition and have cyrus
started already (so part of the mail-store, e.g. archives, is not
available yet?)
Not that i know ... i guess cyrus would be spewing lots of i/o errors back
at you for the mailboxes that are on that fscking partition ;)
The only high availability i see here is the google way. Cyrus is
offering you that with the 'murder' component.
That's not really availability, but distributed risk.
Exactly ... with murder taking care of keeping duplicated mailboxes in sync
over a pool of backend machines (as i mentioned in the other mail), this
would be perfect for all of us, i guess.
BTW, you're mentioning FreeBSD ... doesn't it have some sort of
background fsck while the filesystem is moutned rw?
It can, but I'm not sure if that's what I prefer. I'm not sure how
mature it is with FreeBSD, and I prefer to have mail-integrety over a
"quick restore".
I can't speak about maturity of a certain FreeBSD component as i'm a linux
guy, but what i hear it should just work.
--
Jure Pečar
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Paul Dekkers
2004-09-10 14:32:33 UTC
Permalink
Although many on the list claim that this (having 2 boxes with 1
disk-array) is a nice way for redundancy I'm in doubt now if this is
true. It still takes 30 mins before everything is back again! It seems
to me that if there was a "live" version of cyrus available with a
synchronised mail-spool, that there was no outage noticeable for users
(except in losing a connection maybe). Am I right?
Having 2 boxes with one disk array leaves you wit a single point of failure
that you wouldn't think of immediately: filesystem. I learned that the hard
way.
Yes, I agree.
I'm planning to 'redesign' our storage: instead of one big volume that fscks
for hours, i'm going to split in in many mirrors and use them as cyrus
partitions. This way they could all fsck in parrallel. I'm going to lose the
'single instance store' capability, but thats a tradeoff that i'm willing to
take.
Hmm, then your fscks will run faster/with less problems, but there is
still outage that you can prevent if there is failover in another way
and availability/replication on the application level.
If there are replicated spools it doesn't matter if the fsck takes long
or not... although there will be a backlog of course.

Is it possible to have an fsck running on one partition and have cyrus
started already (so part of the mail-store, e.g. archives, is not
available yet?)
It happened to me at least once that the machine that crashed corrupted the
filesystem in a way that the machine that took over also crashed within
hours...
Maybe it's time to continue on the "High availability ...
again"-discussion we had a while ago. If the cyrus developers are able
to implement this with some funding there are still some questions left
for me: how much time would it take before a "stable" solution is ready?
How many funding is expected? I still have to talk to management about
this, but I would really support this development and I'm certainly
willing to convince some managers.
The only high availability i see here is the google way. Cyrus is offering
you that with the 'murder' component.
That's not really availability, but distributed risk.
BTW, you're mentioning FreeBSD ... doesn't it have some sort of background
fsck while the filesystem is moutned rw?
It can, but I'm not sure if that's what I prefer. I'm not sure how
mature it is with FreeBSD, and I prefer to have mail-integrety over a
"quick restore".

Paul



---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Jure Pečar
2004-09-10 13:17:38 UTC
Permalink
On Fri, 10 Sep 2004 13:24:42 +0200
Although many on the list claim that this (having 2 boxes with 1
disk-array) is a nice way for redundancy I'm in doubt now if this is
true. It still takes 30 mins before everything is back again! It seems
to me that if there was a "live" version of cyrus available with a
synchronised mail-spool, that there was no outage noticeable for users
(except in losing a connection maybe). Am I right?
Having 2 boxes with one disk array leaves you wit a single point of failure
that you wouldn't think of immediately: filesystem. I learned that the hard
way.
I'm planning to 'redesign' our storage: instead of one big volume that fscks
for hours, i'm going to split in in many mirrors and use them as cyrus
partitions. This way they could all fsck in parrallel. I'm going to lose the
'single instance store' capability, but thats a tradeoff that i'm willing to
take.

It happened to me at least once that the machine that crashed corrupted the
filesystem in a way that the machine that took over also crashed within
hours...
Maybe it's time to continue on the "High availability ...
again"-discussion we had a while ago. If the cyrus developers are able
to implement this with some funding there are still some questions left
for me: how much time would it take before a "stable" solution is ready?
How many funding is expected? I still have to talk to management about
this, but I would really support this development and I'm certainly
willing to convince some managers.
The only high availability i see here is the google way. Cyrus is offering
you that with the 'murder' component.


BTW, you're mentioning FreeBSD ... doesn't it have some sort of background
fsck while the filesystem is moutned rw?
--
Jure Pečar
---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Paul Dekkers
2004-09-10 11:24:42 UTC
Permalink
Hi,

We're implementing a new mailplatform running on two dell 2650-servers
(2 xeon cpu's with each 3 Ghz, HTT and 3Gb of memory) and with a disk
array of 4 Tb connected with a adaptec 39160 scsi controller for
storage. We installed FreeBSD 5.2.1 on it, and - of course - cyrus 2.2.8
(from the ports) as IMAP server. Our MTA is postfix.
There are two machines for redundancy. If one fails, the other one
should take over: mount the disks from the array, and move on.

Unfortunally, the primary server crashed twice already. The first time
it did while synchronising two IMAP-spools from the old server to the
new one. There was not much data on it back then. The second time was
worse, around 10Gb of mail was stored on the disks. We discovered that
the fsck took about 30 minutes, so although we have two machines for
redundancy it takes still quite some time before the mail is available
again. (And we still have about 90 Gb of mail to migrate, so when all
users are migrated it takes much longer.)
I mounted the filesystems synchronous now: although it slows down the
system I hope it speeds up the fsck a bit when there is another crash.
The second crash was while removing a lot of mailboxes (dm) while some
of them where removed the same time using a webmail app (squirrelmail).

I'm not sure why the box crashed; there was nothing in the logs, there
was nothing on the screen when we came there, it just booted up again.
Of course I'm interested if anyone has any thoughts on this.

Although many on the list claim that this (having 2 boxes with 1
disk-array) is a nice way for redundancy I'm in doubt now if this is
true. It still takes 30 mins before everything is back again! It seems
to me that if there was a "live" version of cyrus available with a
synchronised mail-spool, that there was no outage noticeable for users
(except in losing a connection maybe). Am I right?

Maybe it's time to continue on the "High availability ...
again"-discussion we had a while ago. If the cyrus developers are able
to implement this with some funding there are still some questions left
for me: how much time would it take before a "stable" solution is ready?
How many funding is expected? I still have to talk to management about
this, but I would really support this development and I'm certainly
willing to convince some managers.

Regards,
Paul


---
Cyrus Home Page: http://asg.web.cmu.edu/cyrus
Cyrus Wiki/FAQ: http://cyruswiki.andrew.cmu.edu
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html
Continue reading on narkive:
Loading...