How to fix a network connection error in a Droplet￼
Last Friday, my Droplet lost the connection to the internet after a reboot. I tried to connect multiple times with SSH and it didn’t work. I checked the monitoring in DigitalOcean but I wasn’t able to see what was going on. Then, after a few tries and a reset of the root password, I was able to log into the droplet using the Recovery Console.
There was no internet connection, the eth0 network interface was removed from the network interfaces and I discovered that there were some packages missing related to the network.
I am writing this post from a Post-Morten perspective. The details here are not something that I reproduced again to check that all the steps are perfectly fine, this is more a “how I solved, and I hope it will help you” than a step-by-step recipe. But believe it or not, after hours of trying things, the solution was easier than expected.
What was happening? What was the error that you would see?
The droplet lost the connection to the network, it was not responding to any request, the public IP was configured in DigitalOcean, but no domain responded, nor the IP address. Identifying the scenario wasn’t a big deal, you don’t need to be an expert sysadmin to notice it.
The only way to connect with my droplet was by using the “Recovery Console” nothing else worked. So I reset my root password and I logged into the droplet. The console only works with a US keyboard, it was really really slow, and writing or copying & pasting something or editing a file was a pain. Besides that, when I tried to change the keyboard, I figured out that some programs were missing.
How did it happen?
My best guess is that I accidentally delete some packages or they got deleted. Reading about what caused the problem for others most of them described some apt-get package purge that accidentally deleted the network tools. But others described that the issue appeared after the reboot because some packages were deleted without knowing the root cause of the deletion.
I thought about the last things that I did in the droplet and I remembered an upgrade of Python from version 3.8 to 3.9 and some clean-up of other versions in the droplet. I remembered because I got some troubles with Apache, mod_wsgi, pip, and Django application not being able to read the correct version of Python. Although the story would be for another post, I can say that I fixed the installation at the end, but I remember to make an apt-get purge. I guess that was the root of my Friday nightmare.
Anyways, the issue only appeared after the reboot of the machine, I didn’t notice anything before that.
Why was it so painful? And why did I decide to write it down?
I thought I had lost all the data in the Droplet. No easy solution to make a copy of the data at least without fixing the network interface. And then I realized that most of the tools that could help me were missing, which made it more difficult. All the posts that I was able to find told me how to solve it only partially because the final step always required a tool that was somehow missing in my system.
Missing tools: netplan, networking, ifconfig, cloud-init, …
The only tool available for me was IP.
The first thing was to discover what was happening. I checked that the network interface eth0 was not showing up, and I found a post in the forums where they described an issue similar to mine.
sudo ip link set eth0 up
“Cannot find device “eth0”
dmesg | grep -i eth
It gave me something similar to these messages errors:
virtionet virtio0 ens3: renamed from eth0
virtionet virtio0 eth0: renamed from ens3
The same that another person described here.
My system is an Ubuntu Server 20.04 LTS, so the details of the question could help me to set up the eth0 interface, but how the interface was waking up didn’t fix the issue. Later, I compared with a setup really working and there were some details different.
In all the cases, the solutions described only worked temporarily because the solution disappeared after the next reboot.
All the solutions that I have applied need to be translated to use the “ip” tool, it looks like “ifconfig” was removed a long time ago as a default package.
One thing that I tried and failed:
ip a add <PUBLIC_IP_ADDRESS>/<NETMASK> dev eth0
ip link set dev eth0 up
I also modified the /etc/network/interfaces file, adding the below information:
iface eth0 inet static
but I wasn’t able to execute:
sudo systemctl restart networking.service
so I wasn’t able to apply the configuration. There it’s when I have discovered that “networking” was not installed.
Nothing worked for me.
Something strange was that the configuration files /etc/udev/rules.d/70-persistent-net.rules and /etc/netplan/50-cloud-init.yaml were there and well configured.
How did I finally solve it?
At some point in the long process, I found something interesting in the DigitalOcean documentation: You can restart your Droplet from a Recovery ISO and then have access to your Droplet hard disk. That was key to find the solution.
I followed the process to start up the Droplet from the Recovery ISO. Then I connected with the Droplet via SSH and then I started to work remotely.
I added a nameserver editing the below file:
sudo vim /etc/resolv.conf
I used one of Google well known DNS servers:
And then, I mounted my droplet hard drive, as described on StackOverflow:
sudo mount --bind /dev /<chrootlocation>/dev
sudo mount --bind /proc /<chrootlocation>/proc
sudo mount --bind /sys /<chrootlocation>/sys
sudo cp /etc/resolv.conf /<chrootlocation>/etc/resolv.conf
sudo chroot /<chrootlocation>
In my case <chrootlocation> was “mnt”.
After that, I updated and upgraded apt-get:
I am not sure if I would recommend the latest one.
After that, I decided to install all the tools that I found missing and could help me to fix the issue:
apt-get install netplan cloud-init ufw landscape-common
When I felt good enough, I decided to stop the Droplet, I removed the Recovery ISO and set it up to start from my hard drive again.
When the droplet started up, the network was restored and all was working normally. I was able to connect via SSH and my domains were working as before.
My thought here is that the configuration files were properly configured for all the tools, but the tools were missing. When the tools were restored, the system started to work again.
Back up your server often. This is a cheap “production” server, where I have my blog and some Python applications running, and my strong recommendation is not to have this kind of setup.
One server with all the stuff is not a good idea. I do it because it is an easy way to play with things, but from time to time these funny stories happen.