Optimizing Python application's Docker image with strip

I was going through some Docker images of large applications and looking at the layer sizes with dive. A lot of space was used by installed dependencies under /usr/local/lib/python3.10/dist-packages/ and virtual environments. Yes, there were a lot of .py and .pyc files as you would expect but the largest files were the .so files. In one of the cases, there were 269M of compiled C and C++ code in shared library files (find /usr/local/lib/python3.10/dist-packages/ -name "*.so" | xargs du -hsc). I suspected that it was too much and most of them might contain the debug information. And I don’t think Python developers would use the gdb and look at backtraces of C++ code. So I tried to remove the debug information with strip. And behold - the size was reduced to 119M, and more than 100M were saved.

So now I added striping of libraries right after the pip install command in Dockerfile:

find /usr/local/lib/python3.10/dist-packages/ -name "*.so" | xargs strip

Even for simple projects with SQLAchemy, it reduces shared library size from 4.9M to 980K. I think it’s worth keeping this trick in my toolbelt.