optimize reciprocal square root with fast-math (x86)

$ ./clang -v clang version 3.6.0 (217530) Target: x86_64-apple-darwin13.3.0 Thread model: posix $ cat rsqrt.c #include float reciprocal_square_root(float x) { return 1.0f / sqrtf(x); } $ ./clang -O2 -ffast-math -S -o - rsqrt.c ... sqrtss %xmm0, %xmm1 movss LCPI0_0(%rip), %xmm0 divss %xmm1, %xmm0 --------------------------------------------------------------------- This should be optimized to use 'rsqrtss'. ICC 14 does this at -O2: rsqrtss %xmm0, %xmm2 mulss %xmm2, %xmm0 mulss %xmm2, %xmm0 movss L_2il0floatpacket.2(%rip), %xmm1 mulss %xmm1, %xmm2 subss L_2il0floatpacket.1(%rip), %xmm0 mulss %xmm2, %xmm0 ret L_2il0floatpacket.1: .long 0x40400000 L_2il0floatpacket.2: .long 0xbf000000