c-w's comments

c-w · on Oct 2, 2024

Homebase Automation | Pixi.js Developers | Remote (UTC-8 to UTC+2) | Part-time

Approximately $450b is spent annually on home improvement in the U.S. We’re tapping into this market by building tools that empower homeowners to manage the complexity of the renovation lifecycle. This market is vastly underserved, with existing offerings targeting industry professionals as opposed to homeowners. Our tools put everyday people back in charge of what is happening under their roof.

We’re looking for Senior Pixi.js Developers to join our fully distributed team and help build a product from the ground up.

Our interview process aims to respect your time: async technical discussion with our CTO about past experiences and engineering philosophy, paid take-home coding exercise representative of our work, final conversation with CEO and offer.

Core technologies: React + Typescript + XState + Pixi.js Interested? Reach out to [email protected]

c-w · on Jan 2, 2024

Homebase Automation | Full-time or part-time | Remote (UTC-8 to UTC+2)

- Staff frontend developer (React, Typescript, XState, PixiJS)

Approximately $450b is spent annually on home improvement in the U.S. We’re tapping into this market by building tools that empower homeowners to manage the complexity of the renovation lifecycle. This market is vastly underserved, with existing offerings targeting industry professionals as opposed to homeowners. Our tools put everyday people back in charge of what is happening under their roof.

Our interview process aims to respect your time: async technical discussion with our CTO about past experiences and engineering philosophy, paid take-home coding exercise representative of our work, final conversation with CEO and offer.

Interested? Reach out to [email protected]

0xevm1 · on Jan 2, 2024

email does not resolve, neither does the most intuitive next door neighbor of homebase.ai but its not clear if that's the same company, is there any more information about your company?

c-w · on Jan 4, 2024

Apologies, email is now fixed.

surprisetalk · on Jan 2, 2024

That email doesn't seem to work :(

c-w · on Jan 4, 2024

Apologies, email is now fixed.

c-w · on Aug 4, 2014

Your use-case is not what I built the library for (natural language processing, not text consumption), but let's see what we can do...

You can download HTML E-Books using the following command:

  python -m gutenberg.download -vvv --filetypes=html --limit=5mb ./ebooks

This will download 5mb of zipped E-Books for which there exists an HTML version to the ./ebooks directory.

It seems as though the legal disclaimers and copyright notices in the HTML files are all within <pre> tags so we can easily clean-up the files with a small shell script:

  EBOOK_DIR="./ebooks"

  find "${EBOOK_DIR}" -name *.zip -type f -exec unzip -d "${EBOOK_DIR}" {} \;
  find "${EBOOK_DIR}" -name *.html -type f -exec sed -i '/<[pP][rR][eE]>/,/<\/[pP][rR][eE]>/d' {} \;

This will probably not work for all E-Books, but it'll give you something to work with. Note that removing the copyright notices may or may not be against the Project Gutenberg terms of service.

Downloading E-Books via genre, author, etc. is not currently supported but is something that I wanted to implement - so watch this space.

c-w · on Aug 4, 2014

Hey all, OP here.

I built this because I think that Project Gutenberg is a great resource for NLP (e.g. stylometry, tracking writing styles over time, authorship detection, ...) - I wanted to use the data on Project Gutenberg a number of times in the past but always ended up using another corpus because there wasn't an easy way to access the Project Gutenberg data. Hopefully this library fixes that.

The project currently is "works on my machine" quality, so please do report any bugs you stumble across.

Also, if you can think of any use-cases for the Project Gutenberg data that aren't easily doable using the functionality that is currently available in the library, please let me know (e.g. by filing a ticket on the Bitbucket repo).

c-w · on Aug 4, 2014

There's a database of RDF files that describe the books (http://www.gutenberg.org/cache/epub/feeds/rdf-files.tar.bz2), but its a bit of a pain to use and doesn't link the books back to the API that should be used for crawling Project Gutenberg (http://www.gutenberg.org/robot/harvest).

sethish · on Aug 4, 2014

I think the previous version of the metadata included a path to the ftp server. Splitting the book id (4443 -> 4/4/4/4443) works for _most_ books, but there were somewhere between 800 and 3000 books organized in a different folder structure that I still need to track down.