* start to cleanup/unify accelerate and common back-ends
* more progress
* simplify
* add half type and allow infs in simd exp
* unify softmax + quantized, more dispatches to simd quantized mm
* add sin/cos, use simd in vector-scalar ops
* faster CPU vectorize quant
* faster erf/erfinv