What “format-preserving encryption” gives you is a hard-to-predict (pseudorandom) permutation on the N elements. The resulting permutation is not guaranteed to be uniformly distributed on the space of all N! permutations; it just comes from some large space that's still much smaller than N! (depending on the number of random bits used), and it's hard to predict which member of that space the permutation will be.
It is simply not possible to sample from a uniform distribution on a set S (here, the set of all N! permutations) in time less than lg|S|. Just think about it: whatever your randomized algorithm, if you only flip a coin M times, there are only 2^M possible outputs of your process. If you want every member of S to be achievable, this requires 2^M ≥ |S|, or M ≥ lg|S|.
Factoring into primes doesn't really affect the lower bound on complexity.
Note that for N=2^64, the number N! is about 10^(10^20.5), i.e. just the number of digits in N! is itself about 3×10^20. The number of bits in N! is itself over 10^89. Every prime number from 1 to 2^64 (over 10^17 primes) is a prime factor of this number N!; if an algorithm is supposed to work with those many primes, it's clearly no longer O(K) as K could be small relative to N.
An algorithm (like one of the proposed encryption schemes) that produces, say, one of 2^1024 permutations may be “good enough for practical purposes” (as 2^1024 is much larger than the number of atoms in the universe), yet as a fraction of the space of all possible permutations, it is dizzyingly insignificant.
It is simply not possible to sample from a uniform distribution on a set S (here, the set of all N! permutations) in time less than lg|S|. Just think about it: whatever your randomized algorithm, if you only flip a coin M times, there are only 2^M possible outputs of your process. If you want every member of S to be achievable, this requires 2^M ≥ |S|, or M ≥ lg|S|.