Scientific Computing

Cleanup unused files in Linux

Keep at least 10% of drive space to avoid:

  • SSD wear
  • HDD fragmentation

Determine free space on Linux / macOS / Windows Subsystem for Linux with " ncdu. ncdu uses Ncurses terminal graphics to quickly show the biggest files in the Linux filesystem tree. ncdu is very handy to find large files or directories that may be unneeded.

df -h

gives a drive-level summary of disk usage.

Package managers cache installed files in case of need to reinstall, but the packages can be redownloaded if needed to save disk space by clearing the cache. Clear the package cache–for APT (common in Debian-based systems):

apt autoclean

or for DNF (Fedora, RHEL, CentOS):

dnf clean dbcache

Remove unwanted packages

TeX Live documentation can consume a lot of disk space. To cleanup the documentation, consider removing packages matching texlive-*doc. This also removes texlive-full but with no detriment to TeX Live working.

Synaptic list of files to remove for texlive-doc to save disk space

Packages removed for texlive-doc to save over 1 GB of disk space.


Related:

CMake FindOpenSSL hints

For all CMake find_*() commands including FindOpenSSL, the package path can be hinted by setting an appropriate environment variable or CMake variable. This examples supposes a Homebrew package manager has installed OpenSSL, which the user wishes to use in a CMake project. To hint the package path when configuring a CMake project, either specify OpenSSL_ROOT by environment variable:

export OpenSSL_ROOT=$(brew --prefix openssl)

or directly in the CMake configure command:

cmake -B build -DOpenSSL_ROOT=$(brew --prefix openssl)

The example CMakeLists.txt:

cmake_minimum_required(VERSION 3.16)

project(f LANGUAGES NONE)

find_package(OpenSSL REQUIRED)

Use the –debug-find CMake option to see the paths CMake is searching.

To disable various search paths, consider the following CMake variables. These are normally only used for debugging or special cases.

set(CMAKE_FIND_USE_CMAKE_PATH false)
set(CMAKE_FIND_USE_CMAKE_SYSTEM_PATH false)
set(CMAKE_FIND_USE_CMAKE_ENVIRONMENT_PATH false)
set(CMAKE_FIND_USE_SYSTEM_ENVIRONMENT_PATH false)

Homebrew uses subdirectories to separate different versions of packages. CMake does not recursively search as that would in general not have a stopping condition and at least significantly slow down the search performance.

GitHub Actions Apple Silicon CPU

GitHub Actions macOS runners use Apple Silicon CPU, which is what most Apple users have. Some build issues including the linker have historically had Apple Silicon-specific issues. Generally it’s good to test on the same CPU architecture as the target platform.

We sometimes find it necessary to select the Xcode version compatible with Homebrew GCC if build errors occur that are not present on a physical Apple Silicon laptop.

jobs:

  mac:
    runs-on: macos-1atest

    strategy:
      matrix:
        cxx: [g++-14, clang++]

    env:
      HOMEBREW_NO_AUTO_CLEANUP: 1
      CXX: ${{ matrix.cxx }}

    steps:
    - uses: actions/checkout

    - run: sudo xcode-select --switch /Applications/Xcode_15.1.app

    - run: cmake --workflow --preset debug

    - run: cmake --workflow --preset release

In this example Ninja enables quick testing of builds in Debug and Release mode, which is important to catch bugs.

macOS WiFi BSSID scan

The undocumented, discontinued macOS command-line utility airport– not to be confused with the Airport Utility app–gave detailed information about the current WiFi connection and nearby WiFi APs. This utility was located at /System/Library/PrivateFrameworks/Apple80211.framework/Versions/Current/Resources/airport.

Since discontining airport, current BSSID requires using CoreWLAN framework as demonstrated in Python scan-wifi-python.

Apple provides a list of device WiFi support.

PowerShell tilde expansion

PowerShell tilde expansion was dropped in 7.4.0. Automatic variable $home remains available across operating systems.

ls $home

PowerShell tilde expansion was fraught with difficulties that led PowerShell maintainers to at least temporarily drop tilde expansion in PowerShell 7.4.0.

Note that automatic variables are just inside PowerShell itself–they are not environment variables. Thus, automatic PowerShell variables are generally not visible to other programs or scripts unless additional steps are taken to expose them, perhaps as a command line argument or environment variable.

Aspell don't backup

Aspell creates backup files with a .bak extension by default. To turn off the backup files configure Aspell to not create them. Often there is not a not a user configuration file “aspell.conf” present. Even if there is a config file present, it can be overridden by environment variable ASPELL_CONF:

export ASPELL_CONF="dont-backup;"

or do similarly through Control Panel in Windows.

Confirm the setting has taken effect by:

aspell dump config | more

and look for the line backup false.


Related: Aspell dictionary location

Clear temporary scratch files on HPC

Unix-like HPC systems often have shared temporary scratch directories mapped by environment variable $TMPDIR to a directory like “/scratch” or “/tmp”. $TMPDIR may be used for temporary files during build or computation. $TMPDIR is often shared among all users with no expectation of preservation or backup. If user files are left in $TMPDIR, the HPC system may email a periodic alert to the user.

If the user determines that $TMPDIR files aren’t needed after the HPC batch job completes, one can clear $TMPDIR files with a command near the end of the batch job script. Carefully consider whether this is appropriate for the specific use case, as the scratch files will be permanently deleted.

rm -r -i $TMPDIR 2>/dev/null

Verify that deletes only the user’s files, as each user’s files have write permissions only for their own files. Once this is established, to use this command in batch scripts replace the “-i” with “-f” to make it non-interactive.

CMake TARGET_RUNTIME_DLL_DIRS for CTest

Building Windows shared libraries in general creates DLLs whose directory must be on environment variable PATH when the executable target is run. Windows error code -1073741515 corresponding to hex error code 0xc0000135 emits when the necessary DLLs are not in the program’s working directory or on Path environment variable. This will make CTest tests fail with error code 135.

The CMake generator expression TARGET_RUNTIME_DLL_DIRS along with test property ENVIRONMENT_MODIFICATION can be used to set the Path environment variable for the test, gathering all the directories of the DLLs CMake knows the target needs.

set_property(TEST adder PROPERTY ENVIRONMENT_MODIFICATION PATH=path_list_append:$<TARGET_RUNTIME_DLL_DIRS:main>)

in this minimal example CMakeLists.txt uses the properties above to work correctly.

Limit code language standard

C++17 and C++20 standard code is used throughout projects of all sizes, perhaps with limited-feature fallback to older language standards. Some standards certifications require a specific language standard. High reliability and safety-critical projects may require specific language standards. Examples include FACE and MISRA C / C++.

To enforce a specific language standard be limited, consider in a header used throughout the project as follows. This example limits the language standard to C++14 or earlier by halt the build if a higher standard is detected:

#if __cplusplus >= 201703L
#error "C++14 or earlier required"
#endif

For C code say no higher than C99, consider in a header used throughout the project, which will halt the build if a higher standard is detected:

#if __STDC_VERSION__ >= 201112L
#error "C99 or earlier required"
#endif

Related: MVSC __cplusplus macro flag

CMake ignore Anaconda libraries and compilers

Anaconda Python conda activate puts Conda directories first on environment variable PATH. This leads CMake to prefer finding Anaconda binaries (find_library, find_program, …) and Anaconda GCC compilers (if installed) over later directories on system PATH. Anaconda libraries and compilers are generally incompatible with the system or desired compiler. For certain libraries like HDF5, Anaconda is particularly problematic at interfering with CMake.

Detect Anaconda environment by existence of environment variable CONDA_PREFIX.

Fix by putting in CMakeLists.txt like the following.

ℹ️ Note

CMAKE_IGNORE_PREFIX_PATH does not take effect if set within Find*.cmake.

cmake_minimum_required(VERSION ...)

# ignore Anaconda compilers, which are typically not compatible with the system
if(DEFINED ENV{CONDA_PREFIX})
  set(CMAKE_IGNORE_PATH $ENV{CONDA_PREFIX}/bin)
endif()

project(example LANGUAGES C)

# Optional next two lines if needing Python in CMake project
unset(CMAKE_IGNORE_PATH)
find_package(Python ...)
# end optional lines

# exclude Anaconda directories from search
if(DEFINED ENV{CONDA_PREFIX})
  list(APPEND CMAKE_IGNORE_PREFIX_PATH $ENV{CONDA_PREFIX})
  list(APPEND CMAKE_IGNORE_PATH $ENV{CONDA_PREFIX}/bin)
  # need CMAKE_IGNORE_PATH to ensure system env var PATH
  # doesn't interfere despite CMAKE_IGNORE_PREFIX_PATH
endif()

To totally omit environment variable PATH from CMake find_* use CMAKE_FIND_USE_SYSTEM_ENVIRONMENT_PATH:

set(CMAKE_FIND_USE_SYSTEM_ENVIRONMENT_PATH false)

However, this can be too aggressive i.e. it might miss other programs on PATH actually wanted.