Scientific Computing

Dynamic libraries and CMake

On Unix-like platforms, CMake variable CMAKE_DL_LIBS is populated to link with target_link_libraries(), providing functions like “dlopen” and “dladdr”. For some libdl functions it’s necessary to also define “_GNU_SOURCE” like:

add_library(mylib SHARED mylib.c)
target_link_libraries(mylib PRIVATE ${CMAKE_DL_LIBS})
target_compile_definitions(mylib PRIVATE _GNU_SOURCE)

On Windows different mechanisms can be used to access dynamic libraries. With MSYS2 libdl is available via mingw-w64-ucrt-x86_64-dlfcn.

Run path (Rpath)

On Unix-like systems, the concept of run path is the search path for libraries used by a binary at runtime. For Windows there is no separate Rpath, just PATH is used–necessary .dll files must be on PATH environment variable at the time of running a binary. For Unix-like systems life can be easier since the Rpath is compiled into a binary. Optionally, using $ORIGIN in Rpath allows relocating binary packages.

For CMake, set all the time – no need for “if()” statements.

include(GNUInstallDirs)

set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS true)
# must be before all targets

Note that we use CMake defaults (we do NOT set values for these) for the following to avoid problems on HPC where library modules may be loaded dynamically. Instead we allow the end user to set these variables in the top level executable.

CMAKE_INSTALL_NAME_DIR
CMAKE_INSTALL_RPATH
CMAKE_INSTALL_RPATH_USE_LINK_PATH

Exporting symbols for MSVC-based compilers is necessary to generate a “example.lib” corresponding to the “example.dll”.

To have the installed CMake binaries work correctly, it’s necessary to set CMAKE_PREFIX_PATH at configure time. That is, the configure-build-install sequence for shared library project in CMake is like:

cmake -Bbuild -DBUILD_SHARED_LIBS=on --install-prefix=/opt/my_program

cmake --build build

cmake --install build

In general the rpath in a binary can be checked like:

  • Linux: readelf -d /path/to/binary | head -n 25
  • macOS: otool -l /path/to/binary | tail

References:

ssize_t for Visual Studio

The POSIX C type ssize_t is available on Unix-like systems in <sys/types.h>. Windows Visual Studio BaseTsd.h has SSIZE_T.

However, ssize_t is POSIX, but not C standard. It’s possible to define a signed size type “ssize_t” using “ptrdiff_t” for “ssize_t” in C and C++. Using ptrdiff_t instead of ssize_t is the practice of major projects like Emacs.

size_t bit width is guaranteed by C and C++ standards to have bit width not less than 16.

ptrdiff_t bit width is guaranteed by C standard to have bit width not less than 16, and C++ standard to have bit width not less than 17.

This example shows how to use ssize_t across computing platforms.


Related: C++ size_type property vs size_t

xarray to_netcdf() file compression

As with HDF5 and h5py, using xarray to_netcdf() to write netCDF files can losslessly compress Datasets and DataArrays, but file compression is off by default. Each data variable must have the compression option set to take effect. We typically only compress variables of 2-D or higher rank.

Notes:

  • Specify format="NETCDF4", engine="netcdf4" to allow a broader range of data types.
  • if “chunksizes” is not set, the data variable will not compress. We arbitrarily made the chunk sizes half of each dimension, but this can be optimized for particular data.
  • “fletcher32” is a checksum that can be used to detect data corruption.
  • Setting “.attr” of a data variable will be written to the netCDF file as well. This is useful to note physical units, for example.
from pathlib import Path
import xarray


def write_netcdf(ds: xarray.Dataset, out_file: Path) -> None:
    enc = {}

    for k in ds.data_vars:
        if ds[k].ndim < 2:
            continue

        enc[k] = {
            "zlib": True,
            "complevel": 3,
            "fletcher32": True,
            "chunksizes": tuple(map(lambda x: x//2, ds[k].shape))
        }

    ds.to_netcdf(out_file, format="NETCDF4", engine="netcdf4", encoding=enc)

Read image metadata with Python

The Python imageio package reads and writes numerous image formats and their metadata. The time and location of citizen science images are often critical to their interpretation. Not all cameras have GPS modules. Not all cameras have sufficiently accurately set clocks (including time zone).

A typical metadata item of interest is “DateTimeOriginal”. How this is defined and its accuracy is up to the camera implementation.

We show the reading of image metadata in a few distinct ways.

ImageIO read metadata

Get the image time using imageio.immeta:

import imageio.v3 as iio

from sys import argv
from pathlib import Path

fn = Path(argv[1]).expanduser()

meta = iio.immeta(fn)

for k in ("DateTimeOriginal", "DateTimeDigitized", "DateTime"):
    print(k, meta.get(k))

Consider that the timezone may need to be corrected.

ExifRead metadata

ExifRead Python module is powerful for reading EXIF image metadata.

If the camera had a GPS module, the location may be available. An ExifRead example of reading the EXIF GPS location:

import exifread

from sys import argv
from pathlib import Path

fn = Path(argv[1]).expanduser()

with open(fn, "rb") as f:
    tags = exifread.process_file(f)

latitude = tags["GPS GPSLatitude"]
longitude = tags["GPS GPSLongitude"]

print(f"{fn}  latitude, longitude: {latitude}, {longitude}")

Exif metadata

import exif

from sys import argv
from pathlib import Path

fn = Path(argv[1]).expanduser()

with open(fn, "rb") as f:
    tags = exif.Image(f)

latitude = tags.gps_latitude
longitude = tags.gps_longitude

print(f"{fn}  latitude, longitude: {latitude}, {longitude}")

GNU Make environment variables

These environment variables are common across build systems as a de facto standard, and assume a compiler like GCC or Clang environment variables.

Dynamic library path:

  • Linux: LD_LIBRARY_PATH
  • macOS: LIBRARY_PATH
  • Windows: must be on environment variable PATH

Include path (where .h C header files are located):

Linux / macOS: CPATH

An example GNU Make Unix-like shell command would be like:

LD_LIBRARY_PATH=/path/to/lib CPATH=/path/to/include make

List paths starting with dot

By default, the typical directory listing command “ls” does not show paths that start with a dot. That is, paths that start with a dot are hidden like “.ssh” or “.git” etc. Most shells will list all paths including those with a leading dot by:

ls -a

For PowerShell to list paths with a leading dot:

ls -Fo

“-Fo” is short for “-Force”.

set cURL user agent

In general for programs that access the web, whether cURL, Python, etc. web servers may block HTTP User Agent that doesn’t match typical graphical web browsers. The server filtering is often trivially overcome by setting a generic Mozilla user agent like “Mozilla/5.0”. For cURL, this is done with the -A option.

curl -A "Mozilla/5.0" https://www.whatsmyua.info/api/v1/ua

Matlab dark mode

Matlab Online and locally-installed Matlab Desktop can use “dark mode”. The traditional locally installed Matlab new desktop has a dark mode option.


Related: Alternative Matlab code editors

Reference: obsolete, unofficial dark theme