Setup caching for apt through a squid proxy.
Local APT usage with a Proxy
We now have a proxy that can cache, conditionally upon providing any proxy enabled command like wget with the http_proxy value on calling it or exporting it beforehand. As a Debian distribution, Ubuntu’s package manager apt is also able to use the proxy as well. apt likewise obeys the http_proxy variable, and will respond to its use the same as we did with wget.
Optionally we can first install a package, I’m testing with neovim as it doesn’t requires some dependencies and will not require a graphical shell in our container.
Beware: Know how your package management system behaves. In my case; apt install does not treat downloaded package files the the same as apt-get install. apt’s default is to remove *.deb files after successful install. apt-get keeps *.deb files within the archive. I discovered that the resident /var/cache/apt/archives/*.deb were due to previous calls to apt-get install not calls to apt install. In my case, apt’s behavior was ideal, as I didn’t need to worry about disabling the internal caching mechanism before starting. PS I like this article’s explanation here: removing-packages-and-configurations-with-apt-get.
First test we can install normally direct to the internet. I’d like to first clean the APT cache (APT includes apt, apt-get, apt-cache, apt-key etc.):
On your system you can find where APT is caching your deb’s by concatenating the output below: Dir + Dir::Cache + Dir::Cache::archives:
$ sudo apt-config dump | \
grep '^Dir \|Dir::Cache \|Dir::Cache::archives 'Dir "/";
Dir::Cache "var/cache/apt";
Dir::Cache::archives "archives/";
Hence for my system the deb’s are saved at /var/cache/apt/archives/*.deb
If you’ve previously run apt you’ll likely find some files sitting there:
$ sudo ls /var/cache/apt/archives/*.deb
...
javascript-common_11_all.deb
libluajit-5.1-common_2.1.0~beta3+dfsg-5.1_all.deb
libluajit-5.1-2_2.1.0~beta3+dfsg-5.1_amd64.deb
python-trollius_2.1~b1-5_all.deb
libtermkey1_0.20-3_amd64.deb
libvterm0_0~bzr718-1_amd64.deb
libunibilium4_2.0.0-4_amd64.deb
python-msgpack_0.5.6-1build2_amd64.deb
python3-msgpack_0.5.6-1build2_amd64.deb
libmsgpackc2_3.0.1-3_amd64.deb
...
APT provides a cache cleaner, to see what it will clean:
$ sudo apt clean --dry-runDel /var/cache/apt/archives/* /var/cache/apt/archives/partial/*
Del /var/lib/apt/lists/partial/*
Del /var/cache/apt/pkgcache.bin /var/cache/apt/srcpkgcache.bin
Running sudo apt clean will render will render those directories empty.
Let’s use apt-get to install neovim, and we will see the cached *.deb files…
$ sudo apt-get install --yes neovim; #lots of output...
#Notes during install:
#“Get” =http GET request
#“Selecting...unselected” = reinstalling packages of base distro
#“Unpacking” = extracting from .deb
#“Setting up” = auto generating .conf files moving binaries around.
#“Processing triggers” = loads files into OS. Prevents restart need.
$ sudo ls /var/cache/apt/archives/*.deb
...
/var/cache/apt/archives/javascript-common_11_all.deb
/var/cache/apt/archives/libjs-jquery_3.3.1~dfsg-3_all.deb
/var/cache/apt/archives/libjs-sphinxdoc_1.8.5-3_all.deb
/var/cache/apt/archives/libjs-underscore_1.9.1~dfsg-1_all.deb
...
Now we can uninstall the neovim program. In this case apt and apt-get will achieve the equivalent, you can use either:
$ sudo apt remove --purge --yes neovim; #purge removes neovim config
$ sudo apt autoremove — yes; #removes no longer reqd dependencies.
$ sudo ls /var/cache/apt/archives/*.deb; #should show no files.
Lets now do the same but with apt install. Fist I want to demonstrate that we get no apt cache with a standard apt install:
$ sudo apt install --yes neovim; # you'll see lots of "Get" actions.
#meaning that apt is reaching out to the internet
$ sudo ls /var/cache/apt/archives/*.deb
ls: cannot access '/var/cache/apt/archives/*.deb': No such file or directory
$ sudo apt remove --purge --yes neovim; #dependent binaries remain
$ sudo apt install --yes neovim; # this time only one "Get"
$ sudo apt remove --purge --yes neovim;
$ sudo apt -o APT::Keep-Downloaded-Packages="true" \
install --yes neovim; #you will see a single download
#and Download rate summary:
#Get:1 http://au.archive.ubuntu.com/ubuntu eoan/universe
# amd64 neovim amd64 0.3.8-1 [1,263 kB]
#Fetched 1,263 kB in 1s (1,798 kB/s)
$ ls -l /var/cache/apt/archives/neovim_*.deb; #file in APT cache.
-rw-r--r-- 1 root root 1263436 Jul 24 2019 /var/cache/apt/archives/neovim_0.3.8-1_amd64.deb;
$ sudo apt remove --purge --yes neovim; #now remove keeping cache.
$ sudo apt install --yes neovim; # this won't cache any files but
# apt *will* use the existing neovim.deb file
# you will noticed *no* "Get" statement in the install logs!
note: apt’s optional directives can look like either:
sudo apt -o APT::Keep-Downloaded-Packages="true" \
install --yes neovim;
sudo apt -o 'APT::Keep-Downloaded-Packages=true' \
install --yes neovim;
You can override default config behavior and keep the *.deb files with an option value at install time or add a new config directly into the persistent config too
#optional
echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' \
| sudo tee /etc/apt/apt.conf.d/01keep-debs
Great so now you know exactly how to ensure you haven’t inadvertently used APT cache, we can now focus on using the squid proxy. Important notes:
APT caching only works with
httpprotocol.httpsis encrypted and cannot be inspected with Squid or any proxy. This is by design for clients to guarantee secrecy. Hence theaptapplication requires thehttp_proxydirective NOT thehttps_proxydirective. Squid cache should be cleaned and inspected.
Squid’s cache cannot be reset on a running instance. Hence often Squid is load balanced to permit high availability. You can however permit hot config reload with squid -k reconfigure. I’ve not tried this, but it would be likely be very quick, requiring a new pre built cache_dir before the hot swap with reconfigure. In my case I will shutdown squid first. Further note — systemctl stop is a blocking command, systemctl start is non-blocking, so polling to wait for squid to start is required if automating this. I use:
#!/usr/bin/env bash
function process_wait(){
proc=$1
#status=$2 #active/inactive
status='active'
while [[ $(sudo systemctl is-active ${proc}) != ${status} ]]
do
sleep 1
echo 'sleeping'
done
}set -x
sudo systemctl stop squid
sudo systemctl status squid
sudo rm -rf /var/spool/squid/*
sudo squid -zSF #reset the index with "z"
sudo squid -k shutdown # I prefer starting with systemctl
sudo systemctl start squid
process_wait 'squid'
sudo systemctl status squid
sudo find /var/spool/squid/ -type f -ls #you should see no files
set +x
We are confident that by apt remove --purge and apt clean the APT cache is removed. Do this now before we start with apt via squid.
$ sudo apt remove --purge --yes neovim;
$ sudo apt clean; #empty the APT cache
Also by calling apt without the cache option directive, apt won’t store a *.deb file in the archive. Note again above that we use the http not https protocol directive. Also if we are to push the proxy config to a super user shell when we call apt, we must force sudo not to strip the http_config variable from the subprocess call hence we use sudo -E as it pull through the environment of the parent shell.
http_proxy=http://127.0.0.1:3128/ sudo -E apt install --yes neovim
$ sudo tail /var/log/squid/access.log
#filenames truncated for neatness1586606592.437 334 127.0.0.1 TCP_MISS/200 1263816
GET http://.../neovim_0.3.8-1_amd64.deb -
HIER_DIRECT/202.158.214.106 application/x-troff-man
Importantly above you can see the TCP_MISS statement, which indicates Squid has seen the requested file but failed to find the file in the cache, and has retrieved it externally with a GET http protocol request. I recommend converting the long integer group prefix to a real timestamp with perl.
$ sudo cat /var/log/squid/access.log | \
perl -p -e 's/^([0-9]*)/"[".localtime($1)."]"/e'
#filenames truncated for neatness[Sat Apr 11 20:03:12 2020].437 334 127.0.0.1 TCP_MISS/200 1263816
GET http://.../neovim_0.3.8-1_amd64.deb -
HIER_DIRECT/202.158.214.106 application/x-troff-man
apt cacheNow lets check for the neovim_0.3.8–1_amd64.deb file in the APT cache, it should NOT exist.
$ ls /var/cache/apt/archives/
lock partial/
But we know the file is 1263436 bytes and that Squid made a “Get” request. Hence is should appear in the Squid cache…
$ sudo find /var/spool/squid/ -type f -ls
541991 4 -rw-r----- 1 proxy proxy 144 Apr 12 12:58 /var/spool/squid/swap.state
541990 1236 -rw-r----- 1 proxy proxy 1263880 Apr 12 12:58 /var/spool/squid/00/00/00000000
Linux file command cannot determine that it is infact a debian package file, it simply identifies it a “data” but it is relatively the same size. Now lets remove the neovim application again (one way to do this could be to use md5sum signatures).
sudo apt remove --purge --yes neovim;
http_proxy=http://127.0.0.1:3128/ sudo -E apt install --yes neovim;
Calling an install again, APT reports during install:
...
Get:1 http://au.archive.ubuntu.com/ubuntu eoan/universe amd64 neovim amd64 0.3.8-1 [1,263 kB]
Fetched 1,263 kB in 0s (63.2 MB/s)
...
Although `apt reports that it reached out externally to retrieve the package, after inspecting the Squid access.log, we can see in fact that Squid had a cache “HIT” finding the file locally and forwarding it to the APT application.
$ sudo tail -n1 /var/log/squid/access.log
1586669427.684 2 127.0.0.1 TCP_HIT/200 1263825
GET http://au.archive.ubuntu.com/ubuntu/pool/universe/n/neovim/neovim_0.3.8-1_amd64.deb -
HIER_NONE/- application/x-troff-man
Nice! we hoped for either TCP_HIT (from squid cache files) or TCP_MEM_HIT (from squid owned system memory). So Squid has pulled a file directly from Squid cache, serving it to APT transparently. We can now make this setting permanent with:
cat <<EOF | sudo tee /etc/apt/apt.conf.d/50proxy
Acquire {
HTTP::proxy "http://127.0.0.1:3128";
}
EOF
We’ve now witnessed how to install, test and debug squid proxy settings with `apt. Next we will introduce the settings to a Docker environment. It is a bit simpler but requires a little Network knowledge.