Ask HN: Papers you read in 2015?

nrmn · on Dec 23, 2015

I've been trying to read a paper a day since midsummer. These are a few of the interesting, for me personally, since then:

Generating Sequences With Recurrent Neural Networks - http://arxiv.org/abs/1308.0850 Older one, but important to understand deeply since other recent ideas have come from this!

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks - http://arxiv.org/abs/1511.06434

Unitary Evolution Recurrent Neural Networks - http://arxiv.org/abs/1511.06464

State of the Art Control of Atari Games Using Shallow Reinforcement Learning - http://arxiv.org/abs/1512.01563 Interesting discussion in section 6.1 on the shortcomings/issues of DQN done by Deepmind

Spectral Representations for Convolutional Neural Networks - http://arxiv.org/abs/1506.03767

Deep Residual Learning for Image Recognition - http://arxiv.org/abs/1512.03385

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) - http://arxiv.org/abs/1511.07289 I wish they did more comparisons between similar network architecture with only the units swapped out. Eg. Alexnet, Relu vs Alexnet, Elu.

On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models - http://arxiv.org/abs/1511.09249

Just a few from my list :)

sgt101 · on Dec 23, 2015

Crumbs, it takes me about two weeks to get through a paper properly!

wodenokoto · on Dec 23, 2015

I can't speak for parent, but I believe people who read a paper a day, don't try to understand it deeply enough to be able to start implementing whatever the paper talks about. Rather it is read to get an idea of the approach and what kind of results it will give and what kind of problems it can solve.

smhx · on Dec 23, 2015

For people actively working full-time in the field, some of the papers which have simple but powerful ideas, reading the paper for 2 hours (or glancing at the key diagram / formula) is enough to implement it.

For example:

Deep Residual Learning for Image Recognition

Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs)

elcritch · on Dec 27, 2015

When I read most papers, it will be to do what you say. There are many papers which really aren't worth spending more than 20 minutes perusing since they just rehash or tweak something which was done previously. Unfortunately with the publish-or-perish mentality predominant in most of academia, I'd say this is getting far worse and likely to increase. Sometimes I wish there were a "goodreads" or "netflix" for scientists.

Now, a good paper I will read and grok the main importance in a few hours to the point I can implement the basics. But, a classic paper will be like a well-thumbed classic and might take years to fully grasp.

yankoff · on Dec 23, 2015

Generating Sequences With Recurrent Neural Networks http://arxiv.org/abs/1308.0850

Generating Text with Recurrent Neural Networks http://www.cs.utoronto.ca/~ilya/pubs/2011/LANG-RNN.pdf

Bitcoin whitepaper https://bitcoin.org/bitcoin.pdf

Ethereum paper http://gavwood.com/Paper.pdf

norswap · on Dec 23, 2015

Not an academic paper, but I found the Roslyn (new C# compiler) whitepaper to be an interesting window into the future of programming languages: http://www.microsoft.com/en-us/download/details.aspx?id=2774...

"Tackling the Awkward Squad: monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell" (http://research.microsoft.com/en-us/um/people/simonpj/papers...) finally made me understand monads. Or rather, why they have such an unreasonable draw on Haskell people. tl;dr: Monads are useful to thread data (state, side effects, ...) through a computation, without modifying all your function signatures (the functions can be lifted to work with the monad). But mostly, it turns out you NEED monads (or something like it) to sequence side-effects (since Haskell is lazy).

PeCaN · on Dec 23, 2015

Generalized LL Parsing - http://dotat.at/tmp/gll.pdf

Parse ambiguous context-free grammars in worst-case cubic time and unambiguous grammars in linear time, with an intuitive recursive-descent-ish algorithm. GLL is the future of parsing IMO, more powerful than packrat/PEG parsers and comparatively easy to write by hand. It also handles ambiguities more elegantly than GLR, IMO.

Dependency-Based Word Embeddings - https://levyomer.files.wordpress.com/2014/04/dependency-base...

word2vec algorithm with context based on linguistic dependencies instead of a skip-gram approach. A quick explanation is skip-grams give words related to the embedding (ex: Hogwarts -> Dumbledore) and dependencies give words that can be used like the embedding (ex: Hogwards -> Sunnydale). It's not meant to replace skip-grams, but augment them; skip-gram contexts learn the domain and dependency-based contexts learn the semantic type.

drostie · on Dec 23, 2015

Thanks for the first of these, I've put it on my "eventually if I ever get serious about writing this programming language" list.

PeCaN · on Dec 23, 2015

Heh, that's where it is on mine too. :)

One of the things I find particularly nice about GLL is that it's much more friendly to parser combinators[1] than GLR. (LR-family parsers, and bottom-up parsing in general, is notoriously difficult to implement in a way such that parsers can be combined, and the result framework would be rather awkward to use.)

1: Indeed, it's already been done: https://github.com/bawerd/gll.js https://github.com/epsil/gll https://github.com/djspiewak/gll-combinators

mtrn · on Dec 23, 2015

* Program design in the UNIX environment (1984): http://harmful.cat-v.org/cat-v/unix_prog_design.pdf

Glipse into unix essentials. Do one thing well on an example so you'll never forget it again. Doing less requires more care and attention to detail.

* From Frequency to Meaning (2010): https://www.jair.org/media/2934/live-2934-4846-jair.pdf

A nice summary on vector space models along with three basic matrix layouts: term-document, word-context, pair-pattern and the resulting applications and algorithms.

* A Roadmap towards Machine Intelligence (2015): http://arxiv.org/pdf/1511.08130v1.pdf

Emphasis on communication. I liked the fact, that the AI is pictured as a research assistant, since I would love to see a more dialog oriented interaction with machines.

* 50 years of Data Science (2015): https://dl.dropboxusercontent.com/u/23421017/50YearsDataScie...

Great essay on how the past had a handle on todays data analysis landscape, just without the enormous computing power and data availability, that we have today.

DanBC · on Dec 23, 2015

National Confidential Inquiry into Suicide and Homicide by People with Mental Illness (UK): http://www.bbmh.manchester.ac.uk/cmhs/research/centreforsuic...

That paper tells us that pain medication is often used in completed suicide (paracetamol; paracetamol and opioids combined; and opioids; are three of the top five most commonly used meds)

So I have an interest in pain medication from the angle of suicide prevention, which is why these two are interesting.

Efficacy and safety of paracetamol for spinal pain and osteoarthritis: systematic review and meta-analysis of randomised placebo controlled trials: http://www.bmj.com/content/350/bmj.h1225

(Paracetamol probably doesn't help with long term musculo-skeletal pain, and increases risk of liver damage)

http://www.thelancet.com/journals/lancet/article/PIIS0140-67...

(Paracetamol probably no better than placebo for long term back pain)

MichaelGG · on Dec 23, 2015

It's a bit confusing. For instance around page 80:

Table 5: Male suicide deaths and those aged 45-54 in the general population, by UK country vs Table 7: Patient suicide: male suicide deaths and those aged 45-54, by UK country.

Table 5 shows the rate. Table 7 shows the actual numbers. Why? Even the first key finding speculates about patient suicide increase due to higher numbers of patients. Do they not have this seemingly important statistic? A quick search says "a quarter" of the population will have a mental illness during the year. If true, then we'd expect around 25% of suicides to be from patients, right?

Why separate the APAP/opiod combination in light of suicide if the APAP wasn't a relevant cause? It seems like respiratory depression and liver poisoning aren't that synergistic are they? An opiate naive user with 10/325 oxy/apap would almost certainly hit opiate overdose before liver damage was a life-threatening issue.

The study recommends "safe prescribing" but then shows the majority of opiate suicide isn't with a prescription, and prescription overdose is skews heavily to older females with a "major physical illness". And no comparison on how rx abuse compares with non-mentally-ill patients. Edit: And rx rates, too. I'm guessing older patients generally get way more opiates prescribed than younger ones.

Interesting read though, thanks.

DanBC · on Dec 23, 2015

These are great questions. They're normally pretty good at responding if you want more information.

Here "patient" means "under the care of secondary MH services", so doesn't include people who are being treated by their GP rather than by eg a community MH team.

I think the opioid / APAP stuff is based on bots of history. Co-Proxamol was for years the most common med used in completed suicide. It was put on more restrictive prescribing, and use dropped. But then plain paracetamol use in completed suicide increased. (And also attempted suicide, for a while paracetamol overdose was 4% of UK liver transplants, (but 25% of the super-urgent transplants)). Rules about paracetamol tightened, so we've seen reductions in its use. So, from a public health POV, it's useful to see if plain paracetamol, or the combination, or plain opioids are being used more often, because that means they can look at what's driving sales or prescriptions.

About safe prescribing: one source of medication used in completed suicide is either from your own prescription, or from a relative's prescription. This is often a preventable cause of death, so it's useful to see if safe prescribing helps. It ties into things like "Triangle of Care" and also "Pills Project" (which I want to try to use outside care homes).

You're right about older people. They also often don't lock up the meds in a cupboard (they don't have children in the home anymore, they don't see a need) and tragically grand-children come to visit and accidentally overdose.

https://www.carers.org/triangle-care

http://www.health.org.uk/pills

mrdrozdov · on Dec 23, 2015

A lot of NLP related papers. Here are a few of my favorites.

- HMMs and Perceptrons for Part-of-Speech Tagging and Chunking - http://www.aclweb.org/anthology/W02-1001

- MaxEnt for Part-of-Speech Tagging - http://www.aclweb.org/anthology/W96-0213

- RNNs for Slot Filling - http://www.iro.umontreal.ca/~lisa/pointeurs/RNNSpokenLanguag...

Not related to NLP, but I really like the Facebook paper that covered delta of delta compression for time series data.

- http://www.vldb.org/pvldb/vol8/p1816-teller.pdf

racoonear · on Dec 23, 2015

Deep Residual Learning for Image Recognition http://arxiv.org/abs/1512.03385

Batch Normalization http://jmlr.org/proceedings/papers/v37/ioffe15.pdf

Deep Neural Decision Forests http://research.microsoft.com/pubs/255952/ICCV15_DeepNDF_mai...

Spatial Transformer Networks https://papers.nips.cc/paper/5854-spatial-transformer-networ...

web007 · on Dec 23, 2015

DagCoin : a bitcoin-like cryptocurrency with a "decentralized" blockchain based on directed acyclic graphs - https://bitslog.files.wordpress.com/2015/09/dagcoin-v41.pdf

Visual Search at Pinterest - http://arxiv.org/pdf/1505.07647v1.pdf

Fast Search in Hamming Space with Multi-Index Hashing - http://www.cs.toronto.edu/~norouzi/research/papers/multi_ind...

temuze · on Dec 23, 2015

One weird trick for parallelizing ConvNets: http://arxiv.org/abs/1404.5997

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks: http://arxiv.org/abs/1511.06434

A Neural Network of Artistic Style: http://arxiv.org/abs/1508.06576

tedyoung · on Dec 23, 2015

Surprised nobody mentioned The Morning Paper yet: http://blog.acolyer.org/

anildigital · on Dec 23, 2015

Comparison of Erlang Runtime System and Java Virtual Machine http://ds.cs.ut.ee/courses/course-files/To303nis%20Pool%20.p...

mrswag · on Dec 23, 2015

I've read two rather old papers on different cross-validation techniques:

A Study of CrossValidation and Bootstrap for Accuracy Estimation and Model Selection (1995) http://robotics.stanford.edu/~ronnyk/accEst.pdf

Improvements on Cross-Validation: The .632+ Bootstrap Method (1997) http://www.stat.washington.edu/courses/stat527/s13/readings/...

And one on MIMO techniques:

V-BLAST: An Architecture for Realizing Very High Data Rates Over the Rich-Scattering Wireless Channel (1998) http://www.ee.columbia.edu/~jiantan/E6909/wolnianskyandfosch...

I find it to be a good way to get concise and accessible introductions (with the associated results) to current practices.

serzh · on Dec 24, 2015

This year I read a some cool papers:

Big Ball of Mud

Brian Foote and Joseph Yoder

About the reasons why good software become ugly and complex.

http://www.laputan.org/mud/

----------

The Inevitable Pain of Software Development

Daniel M. Berry

About changes of requirements for the software.

https://cs.uwaterloo.ca/~dberry/FTP_SITE/reprints.journals.c...

----------

No Silver Bullet

Frederick P. Brooks, Jr.

The software developing is in essense very complex.

http://www.cs.nott.ac.uk/~pszcah/G51ISS/Documents/NoSilverBu...

----------

Notes On Structured Programming

Edsger W. Dijkstra

Why we don't have to use goto.

https://www.cs.utexas.edu/users/EWD/ewd02xx/EWD249.PDF

----------

Watermarking, tamper-proofing, and obfuscation - tools for software protection

Collberg, C.S. ; Dept. of Comput. Sci., Arizona Univ., Tucson, AZ, USA ; Thomborson, C.

http://dx.doi.org/10.1109/TSE.2002.1027797

travjones · on Dec 23, 2015

"Unified-theory-of-reinforcement neural networks do not simulate the blocking effect"

Sci-hub link: http://www.sciencedirect.com.sci-hub.io/science/article/pii/...

genbit · on Dec 23, 2015

What is a good resource for someone who want to read good papers from time to time?

mtrn · on Dec 23, 2015

How about: http://paperswelove.org/

hieronymusN · on Dec 23, 2015

A decent resource: http://paperswelove.org/

chrisseaton · on Dec 23, 2015

Follow some academics in the field you are interested in on Twitter.

genbit · on Dec 23, 2015

would you suggest from whom to start?

chrisseaton · on Dec 23, 2015

Well what field are you interested in? I only know people in programming languages and systems really.

gavia1 · on Dec 23, 2015

In search of an understandable consensus algorithm (https://www.usenix.org/conference/atc14/technical-sessions/p...)

SWIM: Scalable Weakly-consistent Infection-style Process Group Membership Protocol (http://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf)

DyslexicAtheist · on Dec 23, 2015

these "classics":

http://blog.valbonne-consulting.com/2014/06/09/an-incomplete...

ipunchghosts · on Dec 23, 2015

"Development and Validation of a Biomarker for Diarrhea-Predominant Irritable Bowel Syndrome in Human Subjects"

http://journals.plos.org/plosone/article?id=10.1371/journal....

Large cohort study done determining what the biological mechanism are of IBS (irritable bowel syndrome).

Defines a new test, IBSChek which can be used to determine if a patient has a subtype of IBS. Anyone can get this test done now.

NotOscarWilde · on Dec 23, 2015

Elaine Levey, Thomas Rothvoss - A Lasserre-based (1+ε)-approximation for Pm∣pj=1,prec∣Cmax http://arxiv.org/abs/1509.07808

People are very excited about graph isomorphism being solvable in quasipolynomial time, but there are a few more problems from the seminal Garey, Johnson book that are still unknown to be in either P or NP-c or neither. One of them is computing the optimal schedule for three machines processing some tasks (jobs), when the tasks have all the same size, but there are dependencies among some of them and you have to do them in order.

This paper proves that there is a (1+ε)-approximation of this problem in "slightly more than quasipolynomial time" (I love this phrasing).

The technique they use is a Lasserre hierarchy which is a very exciting tool in theoretical computer science, although there still exist only a couple results where this hierarchy approach brings more to the table than other methods for designing efficient algorithms. This is one more to the list!

bra-ket · on Dec 23, 2015

memory networks: http://www.thespermwhale.com/jaseweston/ram/

versteegen · on Dec 26, 2015

Thanks for linking this! But I easily overlooked your post. It links to "Reasoning, Attention, Memory (RAM) NIPS Workshop 2015"

hedgehog · on Dec 23, 2015

Using state lattices for motion planning was new to me but seems like an elegant approach:

http://people.csail.mit.edu/rak/www/sites/default/files/pubs...

marcodena · on Dec 23, 2015

Spotify – Large Scale, Low Latency, P2P Music-on-Demand Streaming http://www3.cs.stonybrook.edu/~phillipa/CSE390/spotify-p2p10...

and many others but this is the one I liked the most

jvandonsel · on Dec 23, 2015

That Spotify paper was undated. When was it written?

freyr · on Dec 23, 2015

Presented August 2010 at the IEEE Conference on Peer-to-Peer Computing (P2P)

chaoxu · on Dec 23, 2015

Here is an accessible algorithms paper. It's a cute puzzle problem. It was inspired by answers on cs.stackexchange.

Efficient Algorithms for Envy-Free Stick Division With Fewest Cuts http://arxiv.org/abs/1502.04048

afancy · on Dec 23, 2015

Benchmarking Smart Meter Data Analytics http://openproceedings.org/2015/conf/edbt/paper-55.pdf

binarymax · on Dec 23, 2015

A couple years behind the times but I got really into word2vec and plenty of associated works. On a mobile so not easy to post links, but if you haven't checked out w2v I highly recommend it.

throwawaykf05 · on Dec 24, 2015

Not a specific list of papers, but I find Sigcomm to generally have very good papers in the field of networking and communications. Here's the link for this year's conference:

http://conferences.sigcomm.org/sigcomm/2015/program.php

roninb · on Dec 23, 2015

It didn't get published this year but I though Robust De-Anonymization of Large Datasets - http://arxiv.org/pdf/cs/0610105.pdf

j_juggernaut · on Dec 23, 2015

Wow. Didn't realize how HOT deep learning was in 2015.