Monitoring and managing incidents are serious business. It is impossible to predict what the next problem will be. Here is a series of aids
Monitor Linux Server Incidents
A warning or an alert from your Monitor system like Nagios, indicates that the server is down, what to do?
- Don’t panic.
- We can get information on what’s going on.
- Connect via SSH on the Server.
If the server is full of memory or the processors are overloaded this could take a long time. If you really can’t connect via SSH to the server, that’s bad. You will need to restart the server or log in using the serial console.
Commands to use after ssh login on the server.
Is anyone on the problem already?
#:~$ users
oppure#:~$ who
To see the running processes, use top, it will tell you about memory and CPU usage.
#:~$ top
oppure#:~$ htop
(se installato.)
It will show you something like:
top - 11:11:25 up 48 days, 14:40, 2 users, load average: 49.67, 48.85, 33.94
Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.3%us, 0.8%sy, 0.0%ni, 11.5%id, 86.7%wa, 0.0%hi, 0.6%si, 0.1%st
Mem: 8147096k total, 8108144k used, 38952k free, 28712k buffers
Swap: 4194296k total, 164740k used, 4029556k free, 2919400k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30261 root 20 0 5148m 4.6g 5884 S 18 58.6 6931:26 java
4470 n2 20 0 14004 964 396 S 1 0.0 391:12.84 n2txd
3645 root 20 0 0 0 0 D 0 0.0 11:18.61 flush-202:1
22092 root 20 0 9768 604 568 S 0 0.0 0:54.42 tail
22839 guest 20 0 19236 1420 1040 R 0 0.0 0:00.04 top
1 root 20 0 23832 1120 532 S 0 0.0 0:17.27 init
2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd
3 root RT 0 0 0 0 S 0 0.0 0:00.82 migration/0
4 root 20 0 0 0 0 S 0 0.0 0:13.66 ksoftirqd/0
5 root RT 0 0 0 0 S 0 0.0 0:00.94 watchdog/0
6 root RT 0 0 0 0 S 0 0.0 0:00.92 migration/1
At the top, it will usually tell you what you need to know. The first process in the top list on a WebServer should be java.
#:~$ ps aux | grep java
root 30261 79.9 58.5 5260904 4774052 ? Sl May30 6932:43 /usr/lib/jvm/java-6-sun/bin/java -Djava.util.logging.config.file=/usr/local/tomcat7/conf/logging.properties -server
The first number is the process id, with this we can kill/close the running process and start it again.
#:~$ sudo kill -9 30261
#:~$ sudo /etc/init.d/tomcat start
A server shows warnings or errors related to the CPU load, the sites are still open, probably the server is busy doing something heavy.
CPU load should not exceed 1.0 per core. In reality, loads up to 5.0 at the core are still acceptable.
top - 11:11:25 up 48 days, 14:40, 2 users, load average: 49.67, 48.85, 33.94
Tasks: 158 total, 1 running, 157 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.3%us, 0.8%sy, 0.0%ni, 11.5%id, 86.7%wa, 0.0%hi, 0.6%si, 0.1%st
Mem: 8147096k total, 8108144k used, 38952k free, 28712k buffers
Swap: 4194296k total, 164740k used, 4029556k free, 2919400k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30261 root 20 0 5148m 4.6g 5884 S 18 58.6 6931:26 java
4470 n2 20 0 14004 964 396 S 1 0.0 391:12.84 n2txd
3645 root 20 0 0 0 0 D 0 0.0 11:18.61 flush-202:1
22092 root 20 0 9768 604 568 S 0 0.0 0:54.42 tail
22839 guest 20 0 19236 1420 1040 R 0 0.0 0:00.04 top
1 root 20 0 23832 1120 532 S 0 0.0 0:17.27 init
2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd
3 root RT 0 0 0 0 S 0 0.0 0:00.82 migration/0
4 root 20 0 0 0 0 S 0 0.0 0:13.66 ksoftirqd/0
5 root RT 0 0 0 0 S 0 0.0 0:00.94 watchdog/0
6 root RT 0 0 0 0 S 0 0.0 0:00.92 migration/1
It has a load of 49.67. This is not good. With loads like this performance, the server will be useless. Let’s look for the cause:
Too many visitors on the website or app ???
#:~$ sudo tail -f /var/log/apache2/access.log
#:~$ sudo tail -f /var/log/apache2/other_vhosts_access.log
Error.log in the same folder, it will show you if there are serious errors.
#:~$ sudo tail error.log
DDoS, hacking attempt, security scan, brute force, how monitor them…
If someone is trying to access the system more than 1000 times per second it will cause high loads. The Apache log will show weird things if this is happening.
To see if anyone is using a brute force via SSH or via the Apache Web / App, check the following log’s files:
/var/log/apache2/error.log
/var/log/apache2/access.log
/var/log/apache2/other_vhosts_acces.log
/var/log/auth.log
Hackers usually try to find a popular tool for finding bugs like PHP. There is no real danger that the hacker will compromise our systems if he tries some tools.
However, it takes a long time for Apache to serve an error. If the tool makes requests like a million times per second, the server will become slow.
If you ever feel the need to run a script like WebGUI / CGI, please use extra login for Apache to protect it. (http://www.elated.com/articles/password-protecting-your-pages-with-htaccess/).
#:~$ sudo tail /var/log/apache2/access.log
*:80 110.173.1.118 - - [30/Mag/2015:03:37:15 +0200] "GET //scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:15 +0200] "GET //admin/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:16 +0200] "GET //admin/pma/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:17 +0200] "GET //admin/phpmyadmin/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:17 +0200] "GET //db/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:18 +0200] "GET //dbadmin/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:19 +0200] "GET //myadmin/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:20 +0200] "GET //mysql/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:20 +0200] "GET //mysqladmin/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:21 +0200] "GET //typo3/phpmyadmin/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:22 +0200] "GET //phpadmin/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:23 +0200] "GET //phpMyAdmin/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:23 +0200] "GET //phpmyadmin/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:24 +0200] "GET //phpmyadmin1/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:25 +0200] "GET //phpmyadmin2/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:25 +0200] "GET //pma/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:27 +0200] "GET //web/phpMyAdmin/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:28 +0200] "GET //xampp/phpmyadmin/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:29 +0200] "GET //web/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:29 +0200] "GET //php-my-admin/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:30 +0200] "GET //websql/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:31 +0200] "GET //phpmyadmin/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:32 +0200] "GET //phpMyAdmin/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:32 +0200] "GET //phpMyAdmin-2/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:33 +0200] "GET //php-my-admin/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:34 +0200] "GET //phpMyAdmin-2.2.3/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:36 +0200] "GET //phpMyAdmin-2.2.6/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:37 +0200] "GET //phpMyAdmin-2.5.1/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:37 +0200] "GET //phpMyAdmin-2.5.4/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:38 +0200] "GET //phpMyAdmin-2.5.5-rc1/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
*:80 110.173.1.118 - - [30/Mag/2015:03:37:39 +0200] "GET //phpMyAdmin-2.5.5-rc2/scripts/setup.php HTTP/1.1" 404 1808 "-" "-"
#:~$ sudo grep w00t /var/log/apache2/error.log
[May 30 03:14:53 2015] [error] [client 193.200.124.171] client sent HTTP/1.1 request without hostname (see RFC2616 section 14.23): /w00tw00t.at.ISC.SANS.DFind:)
[May 11 17:47:18 2015] [error] [client 50.57.84.107] client sent HTTP/1.1 request without hostname (see RFC2616 section 14.23): /w00tw00t.at.ISC.SANS.test0:)
[Jun 02 08:24:29 2015] [error] [client 95.211.37.204] client sent HTTP/1.1 request without hostname (see RFC2616 section 14.23): /w00tw00t.at.ISC.SANS.DFind:)
[Aug 04 06:51:10 2015] [error] [client 95.211.37.224] client sent HTTP/1.1 request without hostname (see RFC2616 section 14.23): /w00tw00t.at.ISC.SANS.DFind:)
SSH Attack Example Monitor
#:~$ sudo tail /var/log/auth.log
May 30 07:01:17 serverX sshd[15598]: Failed password for root from 31.6.80.232 port 51975 ssh2
May 30 07:01:17 serverX sshd[15598]: Received disconnect from 31.6.80.232: 11: Bye Bye [preauth]
May 30 07:01:17 serverX sshd[15600]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=31.6.80.232 user=root
May 30 07:01:19 serverX sshd[15600]: Failed password for root from 31.6.80.232 port 52235 ssh2
May 30 07:01:19 serverX sshd[15600]: Received disconnect from 31.6.80.232: 11: Bye Bye [preauth]
May 30 07:01:20 serverX sshd[15602]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=31.6.80.232 user=root
May 30 07:01:22 serverX sshd[15602]: Failed password for root from 31.6.80.232 port 52505 ssh2
May 30 07:01:22 serverX sshd[15602]: Received disconnect from 31.6.80.232: 11: Bye Bye [preauth]
May 30 07:01:22 serverX sshd[15604]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=31.6.80.232 user=root
May 30 07:01:24 serverX sshd[15604]: Failed password for root from 31.6.80.232 port 52767 ssh2
May 30 07:01:24 serverX sshd[15604]: Received disconnect from 31.6.80.232: 11: Bye Bye [preauth]
May 30 07:01:25 serverX sshd[15606]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=31.6.80.232 user=root
May 30 07:01:27 serverX sshd[15606]: Failed password for root from 31.6.80.232 port 53009 ssh2
May 30 07:01:27 serverX sshd[15606]: Received disconnect from 31.6.80.232: 11: Bye Bye [preauth]
May 30 07:01:27 serverX sshd[15609]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=31.6.80.232 user=root
They are trying to log in as root, using some common passwords.
Again, there is no risk that they can compromise the system since we don’t allow root login.
(In /etc/ssh/sshd_config we have PermitRootLogin no).
If we want to know where the IP comes from just type:
#:~$ whois 31.6.80.232
If the IP and / or the user does not have a familiar look like coming from China, Russia or some other dubious country, you can block the IP by adding it in:
#:~$ sudo vi /etc/hosts.deny
With:
#:~$ ALL: 31.6.80.232
The server is swapping
When a server gets very slow, it could be the swap partition. Here are some outputs from the top command. The memory row indicates that the server is using most of the memory and you have 164740k in the swap.
MEM: 8147096k totale, 8108144k utilizzato, 38952 KB liberi, 28712 buffer k
Swap: 4194296k totale, utilizzato 164740k, 4029556k gratis, 2919400k memorizzati nella cache
When a server runs out of memory and has a configured swap partition, it will use the swap partition as backup storage. If there is no swap space available when memory runs out, the Linux kernel will start killing processes that use a lot of memory, which usually starts with java. If there is swap space available, the kernel will not kill anything, but it is possible that the server will become slow because it is trying to handle the read/write to the disk.
Partition / swap control:
#:~$ cat /etc/fstab
It will show something like:
# /etc/fstab: static file system information.
#
# Use 'blkid -o value -s UUID' to print the universally unique identifier
# for a device; this may be used with UUID= as a more robust way to name
# devices that works even if disks are added and removed. See fstab(5).
#
# proc /proc proc nodev,noexec,nosuid 0 0
# / was on /dev/sda1 during installation
/dev/xvda1 / ext3 errors=remount-ro 0 1
/dev/xvda2 none swap sw 0 0
dev /dev tmpfs rw 0 0
Disable Swap
#:~$ /sbin/swapoff /dev/xvda2
Enable Swap
#:~$ /sbin/swapon /dev/xvda2
If there is not enough memory available, the server will refuse to deactivate the swap partition.