
- 1977
- Authors: Brian Kernighan(35), Dennis Ritchie(36)
- based on m3(1972), and GPM (1965)
- .m4 (where macros are defined)
- .mc (where macro expansion happens)
- a requirement for autoconf
- general purpose macro pre-processor
- part of POSIX standard
- see man&info page
- used by:
- sendmail (.mc -> .cf)
- autoconf (part of autotools)
- in configure.ac
- Author: David Makenzie
- 1991
- based context-free grammar, automata, stacks, and output queues
- aka the rules for rewritting
- have terminal symbols & non-terminal symbols (m4 macros)
cli

- rlwrap compatible
- cli is a "filter", aka takes data and instructions (stdin) and outputs transformed data (stdout)
- -dV - set debug mode to V for full debugging
language
- turing complete
- no floating point support
- no loops, only recursion
- `' makes non-terminal symbols terminal ones
- removed on each macro pass
- macro names
- recognized if appear surrounded by non-alphanumeric
- are alphanumeric, including "_"
- we use different quoting for each side to allow nesting
- without needing extra escaping
variables
$0 | macro name |
$1 | first argument or empty string |
$# | number of arguments |
$@ | all the arguments, unexpanded |
$* | all the arguments, expanded |
shift($@) | all the argument, except the first |
commands/macros
- built-in commands = 24
- are themselves macros
- all global, no concept of scope
- but there is a stack of definitions
- can function:
- as global variables when expanded without arguments
- as functions when arguments are provides
dnl - deletes to new line include(`foo.m4') # expands to content of foo.m4 sinclude(`foo.m4') # expands to content of foo.m4, silently fail on non exist changequote([,]) # changes `' to [] changequote # restores `'
Defining a new one
pushdef(`NAME', 20) popdef(`NAME') define(`NAME', 0) # = popdef + pushdef undefine(`NAME')
- For each macro definition, m4 creates a stack of definitions
- name
- begin with a letter, followed by letter, digit and underscore
- case significant
- value
- always treated as text even if it is numeric
- leading blanks that occur during argument collection are discarded
- argument references ($1) expand immediately regarless of quoting
- can be preventing by breaking it ($`'1)
Output Queues
divert(N) # switches the Output Queue, N ∈ [0,...] divert(-1) # -1 (invalid queue), used to throw away output define(g,19) # discarded divert`' # same as divert(0) undivert(1) # pushes output queue 1 to output divnum # expands to currently active diversion
Conditional
ifdef(`foo', b) # b if foo is defined ifdef(`foo', b, c) # b if foo is defined, c if not # "switch", 3N+1 arguments ifelse(comment) # discarded argument ifelse(a,b,c,d) # compares a,b ... returns c if match, else d if not ifelse(a,b,c,d,e,f,g)# compares a,b ... returns c if match, else compares d,e ifelse(a,b,c, # same as above, more explicit ifelse(d,e,f,g))
String Manipulation
len(abdcde) # returns 6 substr(abcdef,2) # returns "bcdef" substr(abcdef,3,3) # returns "def" index(abcdef,c) # returns 2 index(abcdef,z) # returns -1 translit(leet,aeio,4310) # returns "l33t" translit(leet,aeio) # returns "lt" regexp(abc88def,`[0-9]') # returns 3 regexp(ab77,`[0-9]',`?') patsubst() # find and replace format(`%05d', `$#')
Integer Arithmetic
+ - ** / * % << >> ~ & ^ | # bitwise operators > >= == != < <= && || !
eval(1 + 1) # 2 eval(-8>>1) # -4 eval(~0) # -1 eval(6&5) # 4 eval(3^2) # 1 eval(1|2) # 3 incr(100) # 101 incr(`i') # i + 1 decr(100) # 99 decr(`i') # i - 1
OS interaction
syscmd(find . -type f) # runs without capturing/interpriting esyscmd(hostname | tr -d '\n') # captures/interprets output sysval # last command exit status maketemp makestemp
Debugging
dumpdef(`NAME') # shows the definition of given macro dumpdef(`NAME',`upcase') debugmode(`V') # V = full debug defn(`NAME') # shows macro definition errprint(`msg') # to stderr
codebases
- css preprocessor https://github.com/djanowski/hasp
- BASIC to C http://www.basic-converter.org/m4basic/
- z80 forth https://github.com/DW0RKiN/M4_FORTH
- https://github.com/nevali/m4
snippets
dnl shift(@) removes an argument and splice it back dnl defines different behaviour depending on how many arguments is called define(`reverse', `ifelse(`$#',`0', ,`$#',`1',``$1'', `reverse(shift($@)), `$1'')') define(`upcase', `translit(`$*', `a-z', `A-Z')') define(`_capitalize`, `regexp(`$1', `^\(\w\)\(\w*\)', `upcase(`\1')`'downcase(`\2')')', `toSpace') define(`PlaylistItem',` upcase($1) ') PlaylistItem(`foo.hml')
self-modifyng macro
define(`ACCEPT',`define(`ACCEPT',`Already accepted')Accepted') ACCEPT # -> Accepted ACCEPT # -> Already accepted
for loop (from mbreen.com)
define(`for', `ifelse($#,0, ``$0'', `ifelse(eval($2<=$3),1, `pushdef(`$1',$2)$4`'popdef(`$1')$0(`$1',incr($2),$3,`$4')')')') for(`x',1,5,`x,') # 1,2,3,4,5...
for each loop (from mbreen.com)
define(`foreach', `ifelse(eval($#>2),1, `pushdef(`$1',`$3')$2`'popdef(`$1')dnl `'ifelse(eval($#>3),1,`$0(`$1,`$2',shift(shift(shift($@))))')')') foreach(`X',`Open the X.',`door',`window') # Open the door. Open the window.
while loop (from mbreen.com)
define(`while', `ifelse($#,0,``$0'', eval($1+0), 1, `$2`'$0($@)')')
trivia
the "xz backdoor"
- https://git.savannah.gnu.org/gitweb/?p=gnulib.git;a=blob;f=m4/build-to-host.m4
https://felipec.wordpress.com/2024/04/04/xz-backdoor-and-autotools-insanity/
AC_CONFIG_COMMANDS([build-to-host], [eval $gl_config_gt | $SHELL 2>/dev/null], [gl_config_gt="eval \$gl_[$1]_config"])
- https://lwn.net/Articles/967205/
- The exploit is in two parts.
- Two "test files" which contain the payload;
- and a modified
m4
script (m4/build-to-host.m4) which initiates the process of loading the payload.
- The exploit is in two parts.