More

apaszke · on Feb 20, 2018

PyTorch was never a Chainer fork. The whole codebase are C libs from Lua Torch, and a bunch of Python code that was written entirely for this project. Chainer was an inspiration, but no code was ever shared between those two projects.

apaszke · on Sept 24, 2017

PyTorch uses NVIDIA NCCL for multi-GPU communication (under BSD license). Gloo is only one of the three backends that can be used for distributed communication out of the box.

aub3bhat · on Sept 27, 2017

Thanks!

apaszke · on Sept 11, 2017

It's not controlled by Facebook in any way. It's true that a large part of the core team works there, but development is public and guided by community needs first.

apaszke · on Sept 10, 2017

It's not like you have to give up a lot - the graphs are simple data structures and creating them is not the expensive part of the training. The computation has to be re-done at every step in a static framework too, and this is the part that matters.

apaszke · on Sept 10, 2017

gloo is only one of the three currently supported backends. One can easily switch to MPI, and pick an implementation that comes with a license you want.

apaszke · on March 17, 2017

It supports numpy, and conversion from PyTorch tensors to numpy arrays is a matter of calling the numpy() method.

apaszke · on March 15, 2017

These ops are just not needed in PyTorch. while is just a Python while loop. Scan is a for loop, map is a list comprehension that applies modules. No need for anything fancy.

jph00 · on March 15, 2017

Sure - but on pytorch they suffer the kernel launch overhead each time through the loop, whereas on tensorflow and theano they do not. Which really impacts the kinds of algorithms that work well on each platform. Does that seem like a reasonable assessment to you?

smhx · on March 15, 2017

Currently not many frameworks have actual fusion of kernels (to avoid launching many GPU kernels). If you look underneath a theano.scan or TF.scan, GPU kernels are still being launched individually (but are likely stream-overlapped where appropriate).

With TF's XLA compiler, they are slowly getting towards kernel fusion, which will then reduce launch overheads.

We have similar things in the works for pytorch: to quickly JIT at runtime the dynamic graph that is getting executed. More news on this will come when time-appropriate.

whyrt12 · on March 15, 2017

I WANT to use pytorch, but no bayesian learning or stochastic nodes like in edward. Any chance there are plans to for a compatibility layer with Edward or roll your own bayesian stuff?

Also, have you looked at Numba to do the jitting? Probably best not to have yet another separately maintained python JIT.

smhx · on March 15, 2017

as core-devs, we dont plan to build-in something like Edward. However, folks in the community are brewing something:

https://discuss.pytorch.org/t/bayesian-computation-in-pytorc... https://discuss.pytorch.org/t/distribution-implementations/4...

apaszke · on March 15, 2017

To not have the kernel launch overhead you'd need to stop launching GPU kernels but that's now how things work in any framework ;)

apaszke · on Jan 22, 2017

Yes! There's an issue on that, where we'll be coordinating the work: https://github.com/pytorch/pytorch/issues/494

apaszke · on Jan 22, 2017

You save the parameters and the code of the model definition

apaszke · on Jan 19, 2017

torch?