Scientific Computing

Strip Jupyter notebook outputs from Git

Jupyter notebook outputs can be large (plots, images, etc.), making Git repo history excessively large and making Git operations slower as the Git history grows. Jupyter notebook outputs can reveal personal information with regard to usernames, Python executable, directory layout, and data outputs.

Strip all Jupyter outputs from Git tracking with a client-side Git pre-commit hook by configuring Git pre-commit hooks. We use Git pre-commit hook because Git filters can interfere with other programs such as CMake ExternalProject.

Configure Git user-wide where to use an IPython script to strip Jupyter notebook outputs by:

git config --global hook.lintipython.event pre-commit
git config --global hook.lintipython.command '$HOME/linters/strip-ipython.py'

Use an IPython linter script like ~/linters/strip-ipython.py. On Unix-like systems, the script must have execute permissiong like:

chmod +x ~/linters/strip-ipython.py

fpm build CFLAGS

In some environments, Fortran Package Manager commands like fpm build or fpm test can fail when a dependency (say HDF5) is resolved through pkg-config and the .pc file includes a system include path such as -I/usr/include.

In that case, allow pkg-config to keep system CFLAGS via pkg-config environment variable for FPM to work:

PKG_CONFIG_ALLOW_SYSTEM_CFLAGS=1 fpm build
PKG_CONFIG_ALLOW_SYSTEM_CFLAGS=1
tells pkg-config not to strip system include paths from pkg-config --cflags output.

The symptom of possibly needing this flag is:

> fpm build
<ERROR> *cmd_build* Model error: Cannot get pkg-config build flags: environment variable error.
STOP 1

Git trailer commit multiple authors

Git does not have a integral mechanism to have multiple authors per Git commit. A Git coauthor notation convention has become accepted by major services including Github and GitLab. Git itself can programmatically parse arbitrary Git commit trailers but does not have a built-in notion of coauthors.

Indicate Git coauthor by placing plaintext in the commit message body. The email address cited must match a registered email with the Git service. The email can be a working public email or the “fake” noreply email provided by the Git service. Multiple coauthors each use the same syntax on the same Git commit like:

added foo function to bar.py

Co-authored-by: David <snake@users.noreply.github.com>

On GitHub, a coauthor commit looks like:

GitHub coauthor detail

Currently, the coauthored commit doesn’t count on the user’s GitHub contribution calendar.

The coauthor commits do show up in GitHub search under “Commits”.

Caveats: as with regular Git commits, there is no authentication to avoid someone masquerading as someone else with Git coauthor commits. Git coauthor commits cannot be GPG signed for each coauthor, only the primary Git committer can GPG sign as usual.

Commands like git rebase can use --trailer to for example show who reviewed a rebase like:

git rebase can add trailers to the Git commit message to indicate who reviewed the rebase like:

git rebase --trailer "Reviewed-by: Nobody <nobody@users.noreply.github.com>"

VS Code Copilot parent repo

When only a subdirectory of a Git repository is opened in Visual Studio Code, repo-root Copilot customizations like .github/copilot-instructions.md are not discovered by default. This can make Copilot ignore repository-wide instructions even though they exist at the top of the current Git repository.

Visual Studio Code has a built-in configuration items to resolve this issue by enabling parent repository discovery for chat customizations.

{
  "chat.useCustomizationsInParentRepositories": true
}

With this setting enabled true, VS Code walks upward from the opened workspace folder until it finds .git. It then discovers chat customizations between the opened folder and the repository root, including:

  • .github/copilot-instructions.md
  • .github/instructions/*.instructions.md
  • prompt files
  • agent files such as AGENTS.md
  • hooks and other chat customizations

This setting is especially useful for monorepos and for workflows that open a focused subdirectory such as content/posts/, src/, or packages/frontend/ instead of the full repository root. Without parent repository discovery, Copilot can miss repository-specific style and validation rules.

A few conditions apply:

  • the opened folder must not itself be a separate Git repository (e.g. Git submodule)
  • a parent folder must contain .git
  • the parent repository folder must be trusted in VS Code

To verify that the repository instructions are in use, inspect the References list on a Copilot Chat response. If parent discovery is working, the response references typically include the repo-root customization files.

Purging computer temp folder

A Linux computer temp folder can be purged on schedule to free up disk space and remove old temporary files. The programs “tmpwatch” or “tmpreaper” can be used to purge the temp folder on a schedule. tmpwatch is available on Red Hat-based Linux distributions, while tmpreaper is available on Debian-based Linux distributions.

To do a “dry run” of the purge command to see what files would be deleted, use the “–test” flag:

<tmpwatch|tmpreaper> --test --mtime 7d /tmp

Set the temp path explicitly, especially on HPC systems where scratch space may be under system-specific paths.

--mtime 7d
purge files older than 7 days – adjust as desired
/tmp
path to the temp folder – adjust as needed

Linux cron example

On Linux, a cron job can run the purge command on schedule.

Edit the crontab with:

crontab -e

Add a line to run the purge command daily at midnight:

0 0 * * * /usr/bin/tmpwatch --mtime 7d /tmp

Homebrew CMake-GUI install

Homebrew Cask packages GUI (graphical) programs. Many users install only the CMake CLI tools with:

brew install cmake

This does not install the cmake-gui program.

To install CMake-GUI:

brew install --cask cmake

To use cmake-gui from terminal, add this to ~/.zprofile:

export PATH=$PATH:/Applications/CMake.app/Contents/bin

Confirm the /Applications path from the cmake-gui line under Artifacts:

brew info --cask cmake

When launching from terminal, specify -S . and -B build to prefill source and build directories:

cmake-gui -S . -B build

Configure defaults for Bash, Zsh, or PowerShell

The default shell for operating systems is typically:

  • Linux: Bash
  • macOS: Zsh
  • Windows: PowerShell

Each shell vendor has configuration files to change the default shell parameters. Here are some useful examples:

Remove duplicate entries in shell history

To remove duplicate entries in shell history for pressing “up” on repeated commands to give the last non-duplicated command, set for the respective shell as follows.

Bash: “~/.bashrc”: ignore duplicate lines, and omits lines that start with space.

export HISTCONTROL=ignoredups:ignorespace

Zsh: “~/.zshrc”

setopt hist_ignore_dups
setopt hist_ignore_space

PowerShell: “$profile”: set the history to ignore duplicates.

Set-PSReadlineOption -HistoryNoDuplicates

Peak RAM usage of process and its children

Measuring the peak RAM usage of a process and all its children can be done using various tools and techniques. OS-dependent tools may be the most accurate, but they can be complex to use. A simpler approach is to periodically sample the RAM usage of the process and its children, like this scripts for Linux and macOS using ps.

It is also possible though less accurate on macOS or Linux to use /usr/bin/time, but this only measures the peak RAM usage of the largest child process, not the total of all children, so this is unsuitable for multiprocess applications like “mpiexec”.

For Linux, a more accurate method is the Cgroup v2, such as implemented by cgmemtime. For macOS, the Instruments tool can be used to measure the RAM usage of a process and its children, but it requires a ‘codesign’d application and is more complex to set up.

xcrun xctrace record --template "Game Memory" --launch -- /path/to/application --output bench_game.trace --time-limit 30s

open bench_game.trace

Git command aliases

Git command aliases can create shortcuts for common Git commands. For example, to create an alias git st for git status:

git config --global alias.st status
git config --global alias.co checkout
git config --global alias.br branch
git config --global alias.ci commit

This can be used to create compact commands for frequently used Git operations. These aliases reduce typing (and typos) for frequent operations. It can also be used in some cases for older versions of Git to use newer-style syntax (at least for the compatible parts of the command).

CMake trace portion of script

CMake cmake_language(TRACE) enables tracing selected nestable portions of CMake script, which is important for debugging CMake projects due to the generally large volume of trace output. The trace output is large as the nature of CMake’s platform-independence means that numerous checks are performed even on minimal CMake scripts. This can make it difficult to find the relevant portion of the trace output for debugging. The cmake_language(TRACE) command allows specification of a named portion of the CMake script to trace, including nested trace regions. This is a powerful debugging tool because it narrows trace output to the relevant part of the CMake script instead of emitting the entire script trace.

To trace only part of a script, wrap that region with cmake_language(TRACE) as in this CMakeLists.txt example:

cmake_minimum_required(VERSION 4.2)

project(demo LANGUAGES C)

cmake_language(TRACE ON)
find_package(Zlib)
cmake_language(TRACE OFF)

find_package(LAPACK)

observe that only the trace output for the find_package(Zlib) command is emitted, while the find_package(LAPACK) command and compiler discovery are not traced.