Op here! I started working on this project a year ago in a attempt to bring myself some sanity. Reading the news nowadays is a real struggle for me because of all of the sensational, non-news, click-bait, and fluff that is stuffed in to try and entice readers to click on a story.
I'll get around to open sourcing the whole project and I'll write a proper blog post about the process I went through to build this, but for now I'm looking for feedback on whether people here think this tool is personally useful or not.
edit: Also, the the training set for teaching this model is extremely small (1700 examples) which I paid mTurkers to tag. If you see sentences that look like they should not be redacted, add them to the training data set[1].
I'll get around to open sourcing the whole project and I'll write a proper blog post about the process I went through to build this, but for now I'm looking for feedback on whether people here think this tool is personally useful or not.
edit: Also, the the training set for teaching this model is extremely small (1700 examples) which I paid mTurkers to tag. If you see sentences that look like they should not be redacted, add them to the training data set[1].
[1] - https://github.com/getshields/newslist/tree/master/TrainingD...