Hacker Newsnew | past | comments | ask | show | jobs | submit | apaszke's commentslogin

PyTorch was never a Chainer fork. The whole codebase are C libs from Lua Torch, and a bunch of Python code that was written entirely for this project. Chainer was an inspiration, but no code was ever shared between those two projects.


PyTorch uses NVIDIA NCCL for multi-GPU communication (under BSD license). Gloo is only one of the three backends that can be used for distributed communication out of the box.


Thanks!


It's not controlled by Facebook in any way. It's true that a large part of the core team works there, but development is public and guided by community needs first.


It's not like you have to give up a lot - the graphs are simple data structures and creating them is not the expensive part of the training. The computation has to be re-done at every step in a static framework too, and this is the part that matters.


gloo is only one of the three currently supported backends. One can easily switch to MPI, and pick an implementation that comes with a license you want.


It supports numpy, and conversion from PyTorch tensors to numpy arrays is a matter of calling the numpy() method.


These ops are just not needed in PyTorch. while is just a Python while loop. Scan is a for loop, map is a list comprehension that applies modules. No need for anything fancy.


Sure - but on pytorch they suffer the kernel launch overhead each time through the loop, whereas on tensorflow and theano they do not. Which really impacts the kinds of algorithms that work well on each platform. Does that seem like a reasonable assessment to you?


Currently not many frameworks have actual fusion of kernels (to avoid launching many GPU kernels). If you look underneath a theano.scan or TF.scan, GPU kernels are still being launched individually (but are likely stream-overlapped where appropriate).

With TF's XLA compiler, they are slowly getting towards kernel fusion, which will then reduce launch overheads.

We have similar things in the works for pytorch: to quickly JIT at runtime the dynamic graph that is getting executed. More news on this will come when time-appropriate.


I WANT to use pytorch, but no bayesian learning or stochastic nodes like in edward. Any chance there are plans to for a compatibility layer with Edward or roll your own bayesian stuff?

Also, have you looked at Numba to do the jitting? Probably best not to have yet another separately maintained python JIT.


as core-devs, we dont plan to build-in something like Edward. However, folks in the community are brewing something:

https://discuss.pytorch.org/t/bayesian-computation-in-pytorc... https://discuss.pytorch.org/t/distribution-implementations/4...


To not have the kernel launch overhead you'd need to stop launching GPU kernels but that's now how things work in any framework ;)


Yes! There's an issue on that, where we'll be coordinating the work: https://github.com/pytorch/pytorch/issues/494


You save the parameters and the code of the model definition


torch?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: