Scientific Computing

HDF5 command line tools

HDF5 command line tools h5dump and h5ls are handy to quickly explore HDF5 files from the command line. Backup link to old documentation. They are particularly useful when accessing a remote computer such as HPC where the HDF5 files may be very large and would take a while to transfer to a local computer.


h5ls provides a high-level look at objects in an HDF5 file. Typically we start examining HDF5 files by printing the dataset hierarchy:

h5ls --recursive my.h5

Determine the filters used (e.g. was the data compressed):

h5ls --verbose my.h5

h5dump can print the entire contents of an HDF5 file to the screen. This can be overwhelming, so we typically print only the headers to start:

h5dump --header my.h5

Individual variables can be printed like:

h5dump --dataset=myvar my.h5

Determine the filters used (e.g. was the data compressed):

h5dump --properties --header --dataset=myvar my.h5

Related: HDF5 data GUI

CB Radio 11m data telemetry

Mid-range radio control (1 km to 10+ km) and other data telemetry has long been legal in the 27 MHz 11m band across the world. In the USA, FCC Rules Part 95 subpart C addresses 27 MHz data transmissions. 27 MHz is still actively used for data telemetry, with manufacturers claiming up to 15 miles range with a 10 Watt 27.255 MHz data FSK transceiver. Another long-time 27 MHz data telemetry application is 27 MHz paging.

Wireless mice and keyboards in the early 2000’s decade widely used the 27 MHz band. Unfortunately those devices operating on 27.195 MHz “19A” would significantly interfere with the popular CB channel 19 27.185 MHz, and could be heard even just driving by a house with a CB radio in the vehicle. Likewise for the other 27 MHz channels that bleed across several CB radio channels if in a neighboring house or passing within say 50 meters of a CB radio. This is due to the liberal emissions mask of FCC Part 95.779(a) allowing significant bleedover of unwanted modulation products into adjacent channels. Thankfully, these 27 MHz mice and keyboards have limited users these days. 27 MHz mice and keyboards are still sold as low-end inexpensive devices, so they might still be heard in some locales.

The data telemetry or mouse/keyboard transmissions typically use FSK modulation. On an AM receiver, FSK might sound like a quiet transmission with little modulation. Using an FM receiver, FSK typically sounds like a loud buzz or tone.

Detect 10m, 11m, 12m band openings

Detecting band openings in the 10m, 11m, and 12m radio bands can be done by listening to popular frequencies in these bands. The 10m and 12m bands are licensed amateur radio bands capable of global communications when ionospheric conditions are favorable. The 11m band is license-free and typically has more users such that an amateur radio operator may listen to 11m to determine if 12m and/or 10m are also experiencing enhanced skywave propagation.

A good 10m frequency to listen to is in the vicinity of 28.074 MHz and 28.078 MHz, which are the FT8 and JS8call suppressed carrier frequency as tuned in upper-sideband (USB) mode. This can be tuned by a converted CB radio in AM mode on 28.075 MHz. An AM mode radio tuned to 28.075 MHz will hear a seemingly random series of tones with a 15 second interval. The tones heard in an AM receiver come from multiple FT8 or JS8Call signals heterodyning.

For 12m, listen to FT8 / JS8Call USB 24.915 MHz or USB 24.922 MHz. If only having a converted AM CB radio, tune 24.915 MHz or 24.925 MHz.

11m DX frequencies to monitor include:

  • AM 27.025 MHz (CB channel 6, high powered calling frequency)
  • AM 27.185 MHz (CB channel 19, road calling channel)
  • AM 27.065 MHz (CB channel 9, Spanish language calling frequency in Central and South America)
  • FM 26.805 MHz (FM 11m DX calling frequency)
  • USB 27.245 MHz (CB channel 25, JS8Call frequency)

C++ size_type property vs size_t

The C++ Standard Library uses size_type as a property of containers like std::vector, std::string, etc. This is generally recommended over using size_t directly.

Example C++ code snippets using size_type property:

std::vector<int> vec;

std::vector<int>::size_type L = vec.size();

//----------------------------------------------
std::string path = "/usr/bin:/usr/local/bin";
constexpr char pathsep = ':';

std::string::size_type start = 0;
std::string::size_type end = path.find_first_of(pathsep, start);

Related: ssize_t for Visual Studio

Install Intel oneAPI C++ and Fortran compiler

Intel oneAPI is a cross-platform toolset that covers several programming languages including C, C++, Fortran and Python. Intel oneAPI replaces Intel Parallel Studio. Intel oneAPI including the C++ “icpx” compiler, Fortran “ifx” compiler, and Intel MPI is free-to-use and no login is required to download oneAPI.

We suggest using the “online installer” download, which is a small download. The “online” installer can be copied over SSH to an HPC user directory for example and installed from the Terminal.

Windows requires Visual Studio Community to be installed first–IDE integration is optional and we don’t use it. Visual Studio integration is optional. If VS integration is installed, cmake -G "Visual Studio 17 2022" can be used to generate Visual Studio project files with CMake 3.29 or newer. Otherwise, at least CMake 3.25.0 is adequate for oneAPI.

Install the oneAPI Base Toolkit with options:

  • Math Kernel Library (oneMKL)
  • (optional) GDB debugger

Install oneAPI HPC toolkit with options:

  • Intel MPI library
  • Intel C++ compiler
  • Intel Fortran compiler

Usage

There are distinct usage patterns to access Intel oneAPI compilers on Windows vs. Linux. Set environment variables CC, CXX, FC via script. oneapi-vars sets environment variable CMAKE_PREFIX_PATH so don’t just blindly overwrite that environment variable.

Windows

On Windows a Start menu shortcut for a oneAPI command prompt is installed. Powershell can also use “oneapi-vars.bat” to set the environment variables as per the oneapi.ps1 in the Gist above.

If CMake Visual Studio generater is desired, ensure:

  • CMake ≥ 3.29 is used for the -T fortran=ifx option
  • Intel oneAPI Visual Studio integrations are installed
  • use CMake configure options
cmake -Bbuild -G "Visual Studio 17 2022" -T fortran=ifx

Troubleshooting:

If problems with finding packages with oneAPI on Windows and CMake occur, ensure that MSYS2 paths aren’t mixed in with the oneAPI environment. See the project CMakeConfigureLog.yaml and look for unwanted paths in the include commands.

Linux

On Linux, oneAPI requires GNU GCC toolchain. Some HPC systems have a too-old GCC version default for Intel oneAPI. This can cause problems with C++ STL linking. If needed, set environment variable CXXFLAGS for Intel GCC toolchain in custom “oneapi.sh” like:

export CXXFLAGS=--gcc-toolchain=/opt/rh/gcc-toolset-12/root/usr/

which can be determined like:

scl enable gcc-toolset-12 "which g++"

If using a CMake toolchain file, instead of CXXFLAGS environment variable, one can set

set(CMAKE_CXX_COMPILER_EXTERNAL_TOOLCHAIN "/opt/rh/gcc-toolset-12/root/usr/")

CI runners - stable vs. updated

CI runners across CI services often update software images regularly, perhaps weekly. This can break workflows, but reflects user devices.

GitHub Actions updates the runners weekly or so. A few times a year on average across projects and operating system this may require updating the CI YaML configuration. Apple updates of XCode a few times a year this can disrupt end users and CI runs.

To have a version stable CI image would generally require private on-premises CI like Jenkins or GitHub Actions for on-premises. Those on-premises CI runners then need maintenance.

The key issue with such frozen CI runners is they are out of date with what end users have. For example, macOS with Homebrew is probably the majority of scientific computing users besides HPC. Homebrew updates often and breaks occur across projects a few times a year. Better to catch that in CI rather than on end user devices.

CMake detect if project is top level

CMake can detect if a project is “top level” that is, NOT via FetchContent using PROJECT_IS_TOP_LEVEL and PROJECT_NAME_IS_TOP_LEVEL . For simplicity, we denote these variables in this article as “*_IS_TOP_LEVEL”.

Example use:

if(${PROJECT_NAME}_IS_TOP_LEVEL)
  message(STATUS "${PROJECT_NAME} directly building, not FetchContent")
endif()

For CMake < 3.21:

if(CMAKE_VERSION VERSION_LESS 3.21)
  get_property(not_top DIRECTORY PROPERTY PARENT_DIRECTORY)
  if(not_top)
    set(${PROJECT_NAME}_IS_TOP_LEVEL false)
  else()
    set(${PROJECT_NAME}_IS_TOP_LEVEL true)
  endif()
endif()

Caveats

Directory property PARENT_DIRECTORY and *_IS_TOP_LEVEL are NOT useful for detecting if the child project is being used as an ExternalProject.

These variables are based on the last “project()” command and so are not as universally useful as it first seems. For example, these variables do not work as expected when using ExternalProject. Even setting CMAKE_CACHE_ARGS of ExternalProject does not help, nor does cmake (1) command line options–the CMake-internal setting of *_IS_TOP_LEVEL overrides this attempt to set it. To workaround this, use an arbitrary auxiliary variable to detect if the project is top level.

Example:

Top-level CMakeLists.txt:

ExternalProject_Add(sub1
...
CMAKE_ARGS -DSUB1_IS_TOP:BOOL=false
)

ExternalProject_Add(sub2
...
CMAKE_ARGS -DSUB2_IS_TOP:BOOL=false
)

Subproject CMakeLists.txt

if(DEFINED SUB1_IS_TOP)
  set(SUB1_IS_TOP_LEVEL ${SUB1_IS_TOP})
endif()

Rather than try to directly workaround all the corner cases of *_IS_TOP_LEVEL, using this auxiliary variable allows the user to clearly force the intended behavior. This is useful when the subprojects and main project can build required ExternalProjects, and you want to only build the required ExternalProjects once.

GCC / Clang header clash on macOS

GCC on macOS including Homebrew-installed depends on the macOS SDK. When the macOS SDK is updated, the system headers may become incompatible with GCC versions < 13.3. Specifically, there can be syntax changes requiring C23 but that GCC < 13.3 could not handle.

Homebrew GCC 14.1 and newer work just fine, so the solution is to update GCC.

CMake 3.28, 3.29 Clang scandep workaround

CMake 3.28.0 .. 3.29.2 have a bug with Clang > 17 if CMAKE_CXX_STANDARD is set to 20 or higher before project() or enable_language(CXX). Specifically, if CMake policy CMP0155 is set to NEW by cmake_minimum_required(VERSION) or otherwise, then CMake 3.28.0 .. 3.29.2 will scan for C++ modules during initial C++ compiler checks, which is not expected or desired. To trivially workaround this issue without otherwise impacting the project or newer CMake versions, do like:

set(CMAKE_CXX_STANDARD 20)
# assuming default settings near top of CMakeLists.txt for readability

# <snip>

if(${PROJECT_NAME}_cxx)  # arbitrary user option

  set(CMAKE_CXX_SCAN_FOR_MODULES OFF)   # workaround CMake 3.28.0 .. 3.29.2 with Clang

  enable_language(CXX)

  set(CMAKE_CXX_SCAN_FOR_MODULES ON)  # optional, if project actually uses C++ modules

endif()

Related: CMake C++ standard with fallback

This issue was fixed in CMake 3.29.3.

C++ std::string with char*

C++ std::string is a dynamic, contiguous container for character strings. String data is easily and efficiently passed between std::string to / from a C or Fortran function that expects a char* pointer.

The basic algorithm is:

  1. allocate std::string with desired size and fill with \0.
  2. use std::string::data() to get a char* pointer to the string data that is read/write for the C or Fortran function (or C++).
  3. use std::string::c_str() to get a const char* pointer to the string data that is read-only for the C or Fortran function (or C++). This trims the string to the first \0 character. Otherwise, the std::string::length() will include all the unwanted trailing \0 characters.

example