lj_str_new() had a separate fast-path and slow-path. This was bad
because (a) the fast-path was complex and (b) the fast-path was
actually slower than the slow-path in practice and (c) in practice it
could cause confusing performance problems depending on the memory
alignment of any often-reused string buffers in a program.
This change specifically makes the 'life' benchmark faster and more
robust to memory layout.