OpenStack Cloud Security
上QQ阅读APP看书,第一时间看更新

The importance of logs

Often my clients ask me what they should log and what they should not log. My usual answer is. "What would you like to know if an accident or a data leak would have just happened?" I think this is the whole point, you have to think in the various scenarios which kind of data you would like to have and then start collecting them immediately. The same answer is valid for "For how long should I keep this log?"

Note

The importance of logs is that those are the only traces that can help you to understand what exactly happened and why.

Where to store the logs?

There are many places where you can store the logs, such as:

  • Files on filesystems
  • Files on SAN or other replicated infrastructure
  • Lines in a relational and or transactional database
  • Lines in a NoSQL database

The first option seems very good because hard drives are pretty cheap and you only need a server with a lot of hard drives to make it work. The downsides of this option are multiple:

  • Unreliability: How can you be sure whether the machine will be up today or tomorrow?
  • Scalability: How will you handle the case in which all your drives will be full?
  • Read performances: How much time will you need to scan all your logs? (consider that data center grade hard drive usually can read between 100MB/s and 200MB/s)
  • Usability: How will you find the exact data you need?

The second option does solve the first two disadvantages of the first option, but still has the usability issue and can be very costly.

The third option does solve the usability problem, but based on the fact that you have one or more nodes, can show the unreliability and the read performances problems. No matter how you design the node or cluster, you will have huge scalability problems and also some constraints created by the rigid structure of tables.

The last option does solve all problems in my opinion. Even if technically speaking it is a very good option, it will bring some aspects to be considered:

  • You will need someone with NoSQL/Big Data experience
  • You will have a high initial cost because NoSQL databases usually need more than three nodes to create a cluster.

While speaking of OpenStack, the best option to store log is OpenStack Data Processing Service (Sahara), since it's a part of OpenStack since October 2014.

The more information you log and with more details, the harder is it to store them and retrieve them. In fact, if you only store one type of data (for example, the time and person that is logging in a machine), you will probably have a few megabytes of data every month and; therefore, it will be very easy to put it in a relational database (such as MariaDB or PostgreSQL) that you already have in place. This is also possible because we have only one kind of data; you can know exactly how each log entry will be presented to your log system. When you start logging thousands of lines per hour, coming from tens or hundreds of sources, and with tens of different formats, the NoSQL storage seems to be the only one that works.

Evaluate what to log

Although there is no ultimate solution for deciding what logs work for every company, since every company is different, there are some things that are usually logged:

  • Door access (both entering and exiting)
  • Server access (SSH, Database, and so on)
  • All servers logs
  • Data center environmental metrics (temperature, humidity, and so on)

It's really important that a considered decision is made here to ensure that you have all the logs you need, but on the other hand you will not save a huge amount of logs that you will never use.

Evaluate the number of logs

Another important thing to decide is for how long to keep the logs. Some countries have specific laws for the minimum time to keep some kinds of logs, while other do not. In my opinion, it depends a lot from company to company, but I usually suggest keeping them for at least 1 year.

A whole year seems to be a lot of time, but it's not; it's the very minimum in my opinion. This is because if you suspect that a person lately is behaving strangely, you will want to look the logs for at least one year to confirm a pattern or a change of pattern.

The best option of all is to keep logs indefinitely, so that you can really go back in the past and have full information about the past.