As some of you might now, we experienced an unscheduled downtime on the 6th of June, between 10:35 AM GMT and the 7th of June 0:45 AM GMT. In this post, I would like to give you more details about this downtime and explain what happened.
On the 6th of June, during the data migration for one of our clients, our database administrator accidentally deleted one of our main tables from the production database. The deletion immediately propagated to our master-slave setup and the only way we could recover the data was to import it back from our backup.
Unfortunately, the restoration was a lengthy process that also forced us to stop the access to our servers. After the restoration of the backup, we also needed to recover the maps which were created between the backup time and the data loss time.
We were able to do this because we had a secondary database with all the data. Although we only needed to recover maps which were created and modified in a 4-hour period, this took us several days because we had no quick procedures for extracting the data from the secondary database.
Even though the access to our databases is limited to qualified persons and the person who did the migration was fully qualified, it seems that human error is sometimes unavoidable.
Steps we took in order to avoid this in the future:
a) we are working on a new backup method where the restoration in case of such disaster will be much faster
b) we updated our procedures so we can extract the data from the secondary database much faster
c) during the database migration we will assign 2 persons which will check each other’s commands
I am deeply sorry for all the inconveniences we caused you and I can only hope you will choose to stay with us after this unfortunate event.
CEO – Mindomo