"byte order doesn't matter" - for you local machine; but, should be taken care when handling IO.
The optimization is more interesting: that code block could be optimized by CLang, but not GCC.
Comments from Chris Lattner:
In general, optimizing the original code to a single 32-bit load on little endian targets is not safe: the pointer may not be aligned. However, even after marking the pointer aligned, clang still misses the optimization.
Clang can successfully forward loads and stores when the accessed object is known.
Finally, Clang also optimizes the opposite: turning this code into a "bswapl" on x86.
You won't missing ntohl/htonl, right?
And there is network types support!