Backup methods

Discussion:

Backup methods

Anatoli

2018-05-11 06:05:05 UTC

Andrew,

For a small system with a few hundred mailboxes, a simple unix

filesystem backup is sufficient.Â You can dump the Cyrus mailboxes.db to
a flat file every hour with cron (keep a few days worth).Â Backup
everything with your regular backup system (tar, rsync, etc).

If you suffer a complete loss of the system and have to restore from

the backup, you won't care much about a few database file
inconsistencies, which can be repaired with Cyrus' reconstruct tool.Â
You would recover the whole backup, recover mailboxes.db from the most
recent flat file export, and then run reconstruct on every mailbox.

Yepp, this is how I was (and is) doing it (hourly), so if one backup has
something unrecoverable, I can check a previous backup (-1hr) and
luckily it'll be in a better shape. So on the one hand this is something
that "works", yes.

On the other, recently I've started using Cyrus xDAV functionality that
permits to store files, calendars and contacts (BTW, some minor issues
apart, it works great!). All this information, if inconsistent, is more
difficult to deal with. It's more fragile than mails. Also, changes to
this data are more important and happen with higher frequency (I have an
accounting client where 4 users make a couple of hundreds of changes to
a single xls file per day over Cyrus WebDAV).

It's in pre-production state in my deployments right now, but I suspect
that to bear some inconsistencies or restore a -1hr backup would not be
an acceptable policy for this type of data.

Regards,
Anatoli

*From:* Andrew Morgan
*Sent:* Friday, May 11, 2018 02:05
*To:* Anatoli
*Cc:* Info-cyrus
*Subject:* Re: Backup methods

There may be an argument that could be made for 2 backup stratagies

That's the point. In the context of SME environments (Small and
Medium-sized Enterprises, i.e. from 5 to 50 employees normally, up to
250 in some countries) that we were talking about, a replication is an
overkill, IMO. But for large enterprises like MNCs, large
universities, public mail providers (Fastmail) of course multiple
masters and backups via replication is the way to go. For large
deployments there are good backup solutions in Cyrus, but for the
small businesses admins I don't know any to recommend.

Anatoli,

I think you're making this harder than it needs to be...

For a small system with a few hundred mailboxes, a simple unix
filesystem backup is sufficient.Â You can dump the Cyrus mailboxes.db to
a flat file every hour with cron (keep a few days worth).Â Backup
everything with your regular backup system (tar, rsync, etc).

If you suffer a complete loss of the system and have to restore from the
backup, you won't care much about a few database file inconsistencies,
which can be repaired with Cyrus' reconstruct tool.Â You would recover
the whole backup, recover mailboxes.db from the most recent flat file
export, and then run reconstruct on every mailbox.

If you need to recover some messages or mailboxes that were deleted by a
user, then just recover those individual files or directories from you
backup.Â Run reconstruct -rf on the mailbox.

Naturally, delayed expunge and delayed delete are fantastic ways to
avoid all this work.Â Purge them only after a few weeks or a month has
passed. It is much easier to restore using those delayed delete/expunge
features.

Thanks,
Â Â Â Â Andy

Jason L Tibbitts III

2018-05-10 21:41:49 UTC

Permalink

A> What about mysqldump > dump.sql, then mysql < dump.sql? Also a wrong
A> way and didn't have to be implemented?

No, that's exactly my point. Thanks for making it for me! The analog
to the way you indicated that you would like it to work would be having
the mysql server stop IO so that you can take a filesystem snapshot
while the database is in a consistent state. But instead, the database
(like cyrus) implements a backup method which you can use to extract the
data. And it also requires disk space to hold the backup until you can
transfer it to your backup medium.

- J<

Albert Shih

2018-05-10 20:50:04 UTC

Permalink

Le 10/05/2018 à 16:08:32-0300, Anatoli a écrit

Hi.

In both cases, a copy of the master data is made, which requires twice the
space of real usage (Cyrus Backups tries to apply compression on stored data,
not sure how well it works).

In ZFS with lz4 (standard compression on ZFS) you got 1.18 ratio (3.57 To
on disk for 4.05To of data) so not very good.

I use lz4 because it's got same performance than no compression.

I didn't try gzip on mail but gzip can be very impressif on ratio but eat
lot of cpu.

What is really needed, IMO, for SME environments is the ability for Cyrus to
sync to disk all data, so one can take a hot copy of that data with standard
UNIX tools and then handle it accordingly. Once a recovery is needed, one just
copies a backup to the Cyrus dir and starts the service. The data would be in
the exact same state as when the backup took place. This is discussed in the
github issue mentioned in the previous mail.

I fully agree.

In fact 7 years ago when we renew our mail server I already try cyrus and
dovecot (we come from courier-imap), and we choose dovecot because it's
very easy to backup (and manage) for old_unix_admin. Just put in the
crontab some rsync that's all, one mail = one file, etc.

Now we choose cyrus-imap over dovecot (so for next 7 years) because all the
feature cyrus got. But yes if cyrus got something like mysql_dump or
pg_dump_all that would be super nice.

Regards.

--
Albert SHIH
DIO bâtiment 15
Observatoire de Paris
xmpp: ***@obspm.fr
Heure local/Local time:
Thu May 10 22:43:47 CEST 2018

Anatoli

2018-05-10 20:53:51 UTC

Permalink

For me, if I put a replica in place it's get the role of backup.

Meanning I will put two replica and do not make another backup.

A replica is not a replacement for a backup. You may have your specific
needs, but replica per-se mostly serves to cover for master's hardware
failures. You are not protected with a replica from accidental or
intentional deletions/changes of the data. If a user deletes some of
his/her mails and discovers it after the expunge period, you won't be
able to recover them as replica would also have them deleted.

Using ZFS, do no need to do that

Sure, if you're using ZFS :) The solution I've described serves for any
*nix OS and fs.

So if I stop the postfix on the cyrus_server

You just don't need to stop it. If you expect to stop Cyrus frequently,
just configure the cyrus_sever Postfix retry interval to something like
1 min.

*From:* Albert Shih
*Sent:* Thursday, May 10, 2018 17:32
*To:* Anatoli
*Cc:* Info-cyrus
*Subject:* Re: Backup methods

Le 10/05/2018 Ã 10:38:28-0300, Anatoli a Ã©crit

Not very sure to understand that. It's always true isn't ? If you have XTo of

data and you want n backups you will need X*(n+1) To ?
The replication as it is designed means that you create an additional (replica)
instance of Cyrus that will be in sync with the master instance, so when you
need to make a backup, you turn of the replica, take a backup from its data,
then turn it on again so it comes in sync with the master. In this case there's
no interruption to the service, you just stop a replica. But the replica will
use the same amount of space as your master, so without even making a backup,
you'll use 2x space. + you have to understand how the replication works, then
set it up, control that the sync process is always working and the replica has
the same information as the master... That's a great solution for ISP-level or
public mail service operations, but IMO an absolute overkill for small
deployments.

For me, if I put a replica in place it's get the role of backup. Meanning I
will put two replica and do not make another backup.

When it comes to making a backup, the best policy IMO is to make incremental
backups. In this case you only store the new mails + binary indexes. Once in a
while (e.g. every month) you make a full backup, then, say, once a week a level
1 backup (that stores changes from the previous week, reset at lower level
backup, i.e. every month), then daily level 2 backups and hourly level 3. This
way you can restore up to hourly changes without using excessive amount of
space. Of course you can compress them too (xz -9 gives a pretty good ratio).

Using ZFS, do no need to do that. Just use zfs snapshot and he going to
keep the differential at block level (much better than file level). Same as
compression. Just need to activate compression on the dataset.

Uhh don't do that. Your Postfix has no problem in retaining mails if Cyrus is
not reachable, then attempt their delivery again. I was referring to that,
depending on the configuration of your incoming MTA, the next delivery attempt
may be in, say, 15 minutes, so you postpone incoming mail for that time if you
turn off Cyrus to take a backup. If you turn off your incoming MTA, the source
MTA may have issues with delivery at all (you don't control it, you don't know
how it's configured, when the next delivery attempt will occur, etc.), never
turn off your incoming MTA.

Don't be a problem, I've got 2 public incoming MTA, 4 privates and the
postfix on the cyrus-server. So incoming mail, let's say gmail.com going
from gmail.com_MX to our MX, then send to cyrus-server. So if I stop the
postfix on the cyrus_server, the incoming mail going to stay on the our MX.

--
Albert SHIH
DIO bÃ¢timent 15
Observatoire de Paris
xmpp: ***@obspm.fr
Heure local/Local time:
Thu May 10 22:27:22 CEST 2018

Jason L Tibbitts III

2018-05-10 19:38:22 UTC

Permalink

A> What you mention is highly related to the replication backup
A> we were talking about in the previous mails.

Well, sort of. It is a method that is actually focused around doing
backups. It happens to make use of the replication protocol because
that is actually the smart way to do it. I did detail the differences
in my message.

A> In both cases, a copy of the master data is made, which requires
A> twice the space of real usage (Cyrus Backups tries to apply
A> compression on stored data, not sure how well it works).

As I mentioned, the documentation discusses this.

A> What is really needed, IMO, for SME environments is the ability for
A> Cyrus to sync to disk all data, so one can take a hot copy of that
A> data with standard UNIX tools and then handle it accordingly. Once a
A> recovery is needed, one just copies a backup to the Cyrus dir and
A> starts the service.

Honestly I believe that's the wrong way to go about it, but it's
certainly one way to do things if you have no backup solution integrated
into the software. But hey, it's your data. I only wanted to mention
that there really is an existing backup solution which wasn't being
discussed.

- J<

Albert Shih

2018-05-09 11:20:45 UTC

Permalink

Le 09/05/2018 à 11:42:35+0200, Niels Dettenbach a écrit

This is relatively inefficient, but a working option if anything from cyrus
data is on that VM - i.e. the complete mail spool and the database files
(possibly plus sieve files). We do similiar on relatively small systems or to
get "intraday backups" only.

Okay. I'm totally new on Cyrus...so..

On larger systems with VMs i take a ZFS or LVM snapshot and mount it
externally to "fetch" a full (incremental) filesystem backup of the mail spool
and imap spool and cyrus db on a daily base. After the backup run i destroy
the snapshot.

I'm not sure what you mean by « fetch » ? And how can you make sure the
databases are consistant ? Do cyrus have something like « database lock » ?
So I can sure the snapshot I take are good ? In fact that's why I thinking
about shutdown the VM. For example with pgsql I've got a pg_dump_all but I
don't see something similar with cyrus.

Beside this and depending from your needs you may take a look at cyrus

My needs are very simple, since Cyrus got the « delayed_expunge », my need
are basically to prevent a big crash of everything (filesystem corruptions,
loose everything...etc.)

Before Cyrus I'm (still currently) use Dovecot where it's very simple
because everything are plain file. So I just need to do a rsync and that's
all.

replication features to build a "backup" or just use standard filesystem
backup tools like tar, dumpfs etc.

What would be the difference ? I mean, which one are the easiest to use (as
backup and/or DRP).

On a file base you have to backup the mail spool and the cyrus database files.
If you use SIEVE, backup the SIEVE file pool too. You can restore by just
replacing the files and start cyrus. To get the common database files

Well that's the point, I'm not sure I know very well where are all the «
common database ». I see

"interoperable" it may makes sense to dump then into a machine independent
format for backup if they are in a machine dependent format.
If your restore such a filesystem based backup to a new system which has other
hardware / arch specs or newer / incompatible DB subsystem (instead of

No way....If a disaster come to happen I will still the simplest way to
make the service work again...

skiplist) you may have to "recreate" indizes and database data. reconstruct -

All my DB seem to be twoskip.

f may be your friend to "clean" up the transfer / restore.
There are several strategies for backup cyrus - this are just a few.

Yes...that's the problem ;-)

hth a bit.

Yes. A lot, thanks.

Regards
--
Albert SHIH
DIO bâtiment 15
Observatoire de Paris
xmpp: ***@obspm.fr
Heure local/Local time:
Wed May 9 12:58:16 CEST 2018