This chapter will obviously not cover the entire subject of data security.
Data security is first and foremost a process. A company with 1000 employees will not think in the same way as an individual who hosts his own data.
But there are basic rules. In a data stream, the compromising agent can be:
- someone you know,
- an individual outside your circle,
with or without malicious intent.
Keep in mind that the first factor of data compromise is human error. There are several definitions of data compromise, we will focus on two aspects:
- data destruction,
- data leakage.
An essential rule: Work with software that you are familiar with.
Information technology has become an empire of consumption. It is important to choose the right technology, no one is immune from a bad choice. That is why the first things to check, when choosing the software that will store your data, are its data import/export capabilities that it processes, then the pair “Backup/Restore”. It is inevitable that your storage service will fail, so it is mandatory to be prepared for it. The software must therefore be supported by a large community, i.e. volunteers or professionals who maintain the chosen solution. Finally, the designer of the solution should be a trusted organization.
In a computer system, we will distinguish between system data, closely linked to the operating systems and the services they distribute, and user data (office automation files, agendas, calculation results,…).
The configuration files of any service are system data stored in directories whose locations are called “standardized”.
Your CV, your photos are “user data” stored in a directory you have defined.
Whatever you do on a computer, you need to know where the data you generate is located that keeps your organization running smoothly.
This gives rise to another rule: Build a documentation, which will describe the organization of your data storage. No need, at our level to write a book, a simple, precise diagram will fully play this role.
Example of a simplistic schema (created with Nextcloud/drawio) :
You should also pay particular attention to how this data is stored: “Clear” storage" or “encrypted” storage"? In the case of self-hosting we will choose the second value. Let’s imagine the case of a breaking and entering, the thieves will have stolen your equipment but thanks to the encryption of the data, they will never be able to recover its contents. This also applies to the backup system.
A backup is complete when the restoration process has been validated. If I save the A file and the B file, I must be able to be sure that I can restore A and B.
The documentation should also define the type of data managed by the software. Most storage software stores raw data, but also generates metadata (remember this term - Wikipedia link). These are data relating to the raw information processed. Example: I add a file through the storage software, the latter will create an information, in a container different from the raw data, in this form: storage time, user name, IP address at the origin of the request, etc…)
This metadata container is usually created using software called “Database System” (Mysql, Postgresql, Oracle…).
If you rely on these metadata for computer processing, they become essential in your organization. It should therefore be imperative to safeguard them as well.
When we talk about data leakage, we always think of hackers who have entered a system to steal or damage data. Personally, what comes to mind: When I put a file on a server that intercepts it? who reads it? where is it stored? Submitting an unencrypted document on “Google Drive”, “Dropbox”, and more generally in GAFAM* represents an immediate and consented data leak. These third parties have taken possession of the content, and this, in a completely legal manner, since it is part of their general terms and conditions of use.
The private use of a storage system does not mean “disconnected from the rest of the world”. It is quite possible to use private information systems, shared with other private systems. This method often refers to the term “Federation”.
All the following chapters are related to my software choices. The operating system of my servers is “Debian “. For storage I chose “NextCloud “, instant messaging is called “Matrix”.
They are free, open-source, reliable and stable systems. “Debian” and “Nextcloud” are maintained by a large community, and are subject to numerous IT security audits. Matrix is quite young on the instant messaging market but has been chosen by the French government as an internal messaging system. It can be assumed that these organizations work together to ensure a good level of security. We have been using it for several months, hosted on a Raspberry Pi and a Rock64, we have not encountered any stability problems, even after an update.
(NB) Writing the content of this website takes a lot of time. The first article, in preparation, is about backups, you can find similar articles by browsing the blog
Data protection is organised in several chapters :
- Setting up backups in the Datacenter
- Restore a backup in the Datacenter
- Storage encryption
- Datacenter Monitoring
- Data leaks and countermeasures
- Ransomwares and countermeasures
(*) GAFAM: Google - Amazon - Facebook - Apple - Microsoft are all companies that collect a lot of data related to their users, with or without their consent. Once you are with them, you no longer have any control over the use of your data. Consult some articles related to the fraudulent use of your data by GAFAM.