Hacker Newsnew | past | comments | ask | show | jobs | submit | srean's favoriteslogin

Statistical natural language processing is a very prominent example of a field which has developed effective methods for coping with "black swans" or the occurance of improbable events that have never been seen before.

If you take any large corpus of text and start counting the frequencies that different words occur you will rapidly notice that a huge fraction of the words have never been seen before or occur only once. Biblical scholars called these words hapax legomena instead of black swans. Methods for estimating the probabilities of these events go back to Alan Turning and his codebreaking work at Blechley Park. One need not assign zero probability to unseen events a la maximum likelihood.

Taleb's rants against VAR and mediocristan have always seemed like borderline straw man bashing to me. Sure there are many fools who believe in the gaussian or lognormal returns model, but the best knowledge in the field doesn't make these assumptions. Why doesn't he give authors like bouchard and potters who have built on mandelbrot's work their due? or does he?


It's the more subtle instances of this that always are the most worrying to me, to be honest.

Imagine you have a data pipeline that spans multiple hosts, and you want telemetry on when an entity passes through each step? Suddenly you have to start caring about clock alignment in an order of magnitude comparable with network transit/computation time (microseconds) and that level of skew happens VERY often. There are certainly ways around this in how you build your system, but it's something that _must_ be considered and I find often isn't.


"Sometimes, a cigar is just a cigar."

Phones are increasingly powerful, prevalent, and for a lot of people, their primary computer. Being able to use the same code, data, and test cases for every platform you want to train or run your model on is good from a software engineering perspective. Don't underestimate the challenges of software eng. when ML is involved -- see, for example, http://icml.cc/2015/invited/LeonBottouICML2015.pdf


This site doesn't describe the internals of Borland's Turbo Pascal compiler. It describes a compiler written in Turbo Pascal that can compile some subset of Turbo Pascal's language.

Borland's Turbo Pascal compiler was written in 16-bit x86 assembly, mostly by Anders Hejlsberg.

There is an explanation of this on the front page of the site, but it's not clear from the headline.

(I used to work at Borland on the Delphi compiler, and had access to the source of tpc.exe.)


Here’s a comment I made elsewhere recently:

* * *

You might start with teasers like Hestenes’s papers [“Reforming the Mathematical Language of Physics”][oersted], [“Grassmann’s Vision”][gvision], etc.

For a summary from a mathy perspective, try Chisolm’s [book-like thingy on arxiv][chisolm]

If you want a whole book of concrete problems to solve at an advanced undergraduate physics student level, try Hestenes’s [New Foundations for Classical Mechanics][nfcm]. This is the best source I’ve seen anywhere about understanding complex rotations in mechanics problems.

If you want to solve some plane geometry problems, this thing looks fairly accessible [Treatise of plane geometry through the geometric algebra][planegeo] (I haven’t looked too closely).

Two websites are [geocalc.clas.asu.edu][asu] and [geometry.mrao.cam.ac.uk][cambridge]. Also see the [link page at The Net Advance of Physics][netadvance], and this [mirror of Lounesto’s site][lounesto] (he passed away a while back). Lounesto liked to publish [collections of counterexamples][counterexamples].

There are a number of journals, conferences, etc. Try a google search for “Clifford Algebra”.

If you want an introductory undergraduate textbook which tries to teach both geometric algebra and traditional matrix algebra, you could look at [MacDonald][]’s [Linear and Geometric Algebra][laga]; I don’t think Hestenes is really on board with this approach, but there’s not too much else pitched at a similar audience. MacDonald also has a book [Vector and Geometric Calculus][vagc] which I suspect will be substantially easier to work through than Hestenes and Sobczyk’s [Clifford Algebra to Geometric Calculus][cagc]. (I haven’t looked at either of MacDonald’s books)

If you’re interested in geometric modeling for robotics, computer graphics, computer vision, or similar, check out [these papers][invkinematics] and then look at the book [Geometric Algebra for Computer Science][gacs]. The “conformal geometric algebra” model proposed there is pretty neat. Also see [these papers][unifalg] about related topics.

If you’re interested in crystallography, check out the papers [“Point Groups and Space Groups in Geometric Algebra”][crystalsymmetry] and [“The Crystallographic Space Groups in Geometric Algebra”][crystalga].

If you’re interested in Lie theory or representation theory, check out this paper [“Lie Groups as Spin Groups”][lgasg].

If you’re a physicist / physics student / electrical engineer / etc., try the book [Geometric Algebra for Physicists][gap] or perhaps [Understanding Geometric Algebra for Electromagnetic Theory][gaet]

For a slightly different perspective, take a look at [Sobczyk][]’s [New Foundations in Mathematics: The Geometric Concept of Number][nfm]. Sobczyk has some interesting papers about representing geometric algebras using (real or complex-valued) matrices, which might be helpful if you want to write fast numerical code on current computers, which have been optimized to do matrix math.

If you’re interested in the history, I recommend Crowe’s 1967 [History of Vector Analysis][hva].

I believe you can find the collected mathematical papers of Clifford online if you do a google search, or buy a used copy of a nice version published by Chelsea in the 1960s; the recent AMS reprint is awful quality.

Grassmann’s two mid-19th century Ausdehnungslehre books have been relatively recently translated into English, as has Peano’s late-19th century book Geometic Calculus, all three by Kannenberg.

[oersted]: http://geocalc.clas.asu.edu/pdf/OerstedMedalLecture.pdf

[gvision]: http://geocalc.clas.asu.edu/pdf/GrassmannsVision.pdf

[chisolm]: https://arxiv.org/abs/1205.5935

[nfcm]: http://geocalc.clas.asu.edu/html/NFCM.html

[planegeo]: http://web.archive.org/web/20011215062737/http://campus.uab....

[asu]: http://geocalc.clas.asu.edu/

[cambridge]: http://geometry.mrao.cam.ac.uk/

[netadvance]: http://web.mit.edu/redingtn/www/netadv/Xgeomealge.html

[lounesto]: https://users.aalto.fi/%7Eppuska/mirror/Lounesto/

[counterexamples]: https://users.aalto.fi/~ppuska/mirror/Lounesto/counterexampl...

[MacDonald]: http://faculty.luther.edu/~macdonal/

[laga]: http://faculty.luther.edu/~macdonal/laga/index.html

[vagc]: http://faculty.luther.edu/~macdonal/vagc/

[cagc]: http://geocalc.clas.asu.edu/html/CA_to_GC.html

[invkinematics]: http://geocalc.clas.asu.edu/html/InvariantKinematics.html

[gacs]: http://www.geometricalgebra.net

[unifalg]: http://geocalc.clas.asu.edu/html/UAFCG.html

[crystalsymmetry]: http://geocalc.clas.asu.edu/pdf/crystalsymmetry.pdf

[crystalga]: http://geocalc.clas.asu.edu/pdf/CrystalGA.pdf

[lgasg]: http://geocalc.clas.asu.edu/pdf/LGasSG.pdf

[gap]: http://geometry.mrao.cam.ac.uk/2007/01/geometric-algebra-for...

[gaet]: http://www.wiley.com/WileyCDA/WileyTitle/productCd-047094163...

[Sobczyk]: http://www.garretstar.com

[nfm]: http://www.springer.com/us/book/9780817683849

[hva]: https://en.wikipedia.org/wiki/A_History_of_Vector_Analysis


I happened upon this presentation just a couple days ago - compile-time geometric algebra in C++ via metaprogramming. https://www.youtube.com/watch?v=W4p-e-g37tg

Not sure how useful it is in day-to-day reality, but at the very least it might make other GA implementations seem simple by comparison.


The problem with your posts is not that they are hard to understand, it's that they are not nice to read. For one, you come off as arrogant, as if you were the only one who has any idea what they are talking about. You present a solution to help disadvantaged students learn better, and I agree that your solution is fine. But it will only work for students that have the will, time and ability to help themselves, and I think that is only a tiny minority of all students, and even tinier when you only consider economically disadvantaged ones.

That your first post didn't mention any of the factors that could make your solution unworkable probably contributes to the downvotes, because you come across as somewhat naive, and not in a cute way. I don't think it deserved to be flagged, but it wasn't a good post either.

Your second post is much too repetitive. Repetition is very memorable (you got "Get the book. Read the book. Do the exercises." stuck in my head quite well), but if you don't enrich it with convincing arguments, it just becomes tiresome to read.

Your third post is actually quite reasonable (but still really long, not sure if you could have made it shorter). You clarify that your solution is only for talented students and you mention a bunch of things that would be good for such students to know.

Your fourth post is really bad. You directly attack someone who was trying to help you with stylistic advice and tell them that they didn't understand your post, deny that there is any problem and proceed to repeat yourself. That you got called out by a moderator should give you a hint how far off the mark your writing is.

That said, consider me nerd-sniped! I quite enjoyed doing your exercises, although I had to look up some definitions here and there. Unfortunately, they don't fit in a single comment, so I have to see how many self-replies I am allowed.


Actually, it means a lot. It tells us how much we understand the project's timeline. Many projects get estimated way to early in the process, and the true level of uncertainty at that point is massive. If the low confidence level means the estimate has an upper end that isn't feasible, there are things we can do to increase confidence. Proof of Concepts/Spikes are explicitly for the purpose of taken an unknown feature and getting a better understanding of their complexity, scope, and timeline. (http://www.construx.com/Thought_Leadership/Books/The_Cone_of...)

So then the hard part is communicating to the business the current estimates and confidence level and that we can do some up front work to tighten down our estimate and the schedule. This is actually part of ACM's "Software Engineer Code of Ethics": "3.09. Ensure realistic quantitative estimates of cost, scheduling, personnel, quality and outcomes on any project on which they work or propose to work and provide an uncertainty assessment of these estimates."

As for pharmaceuticals, I have no experience there. I doubt that they really just let scientists go off and do whatever they want with unlimited budgets, and just live with the results.


As a PM, of course, I know this padding is happening and generally will add my own:

You know in your mind it will take you 3 days

You give me, your PM, an estimate of 2 weeks

I report up the chain that it will take a month

When it's all said and done, you end up getting it done in a month by the skin of your teeth :-)


I measure effort, complexity, risk, and dependencies all with one number. If you don't

Things that affect story point values for me. Note that some are more about quantifying "risk" than "how long will it take?".

- how "big" is the change?

- how many parts will be changed?

- are the exit criteria vague or open to interpretation?

- is the change easy to test?

- do we have sample inputs and outputs?

- will we have to design for a tricky deployment?

- will we have to design for a tricky rollback?

- will it be hard to peer review?

- if this feature is impossible or more expensive than we thought, will we know early or late?

- is any of the code being changed extra finicky?

- do I need lots of help? code from other teams? approvals? new software installed? new hardware?

- can I iterate on this code? or does it really need to be perfect the first time?

- will we have developer 'concurrency' or 'parallelism' issues? can anyone chip in and help whenever? or is one distracted expert the only one that can do this?

...for me it's an intuitive guess at a number that flattens all that into something we can use to prioritize work and decide that work needs to be broken down more. What exactly is on that list will vary, certainly, but I would put on anything that could cause bugs, make you wait, or make you underestimate a task.


I heard a saying once, "Deadlines slip by the units they're measured in." If you say three weeks, it will slip by weeks, not days or months. If the estimate is months, etc.

> How do you do this?

Give half-order-of-magnitude estimates as confidence intervals. Avoid using "hours" or "days" as estimates. Story points work really well here.

Be extra clear on priorities and burndowns to make it clear that you're not just blowing them off. Give short but frequent demos (not just reports) of progress. If later they're concerned about progress, you can point back to all the times you reviewed the product, touched base about priorities, and agreed on next steps.

Make risk really clear to them. If the project is 'get it done in three months or bust', then the payoff (ten times the principal?) better be high enough to account for the risk (30%? 50%? 70%) that you won't make it on time and on budget. That is, you don't want to wait until failure to discover a bad business plan.

At the end of the day, you need to be willing to be patient with and/or walk away from 'business people' that can't wrap their heads around the fact that 20% profit on a venture with a 50% failure rate is a bad business plan. Making employees stressed or overworked to compensate is not a humane solution to the problem. To that end, don't work crazy hours to meet a deadline. That is bad for you, bad for your team, and bad for management since it enables dysfunctional planning.


45. Still coding, probably until I die. Light years ahead of younger developers in the following:

* Writing less code

* Writing maintainable code

* Re-using existing code (requires reading existing code)

* Know how APIs should be written

* Knowing how to properly map associations

* Know when or when not to use another library

* Knowing when to tell a product manager to go back and do some more product managing

* Knowing when to say "No"

* Knowing how to determine what a stake holder actually needs vs. what they think they want

* Understanding that 8 hours today and 8 hours tomorrow is 10x better than 16 hours today.

* Saying "I don't know" when asked "How long will it take" or "When will it be done"

* Fighting against shitty processes

* Fighting FOR processes

* Forcing PMs to use software designed for the purpose of creating software instead of accepting requirements through email/slack/invision/zeplin/google docs/tool of someone else's choice


"Full attack dev?"

I like that - may I use it?

-- 58 year old full-attack dev


It took me a while to learn how to talk to lawyers.

It is always wrong to ask a lawyer 'What do you think?' of a contract, they will always have different ways of saying what it says, covering different contingencies.

The right question is "What rights does this give me and the other guy?" followed by "What is my financial exposure?"

As an entrepreneur you probably want to leverage your work and sell it to multiple clients so a contract clause that prevents derivative works might be bad. If a contract clause asks you to indemnify someone else, you need to understand if they are going to be sued and for what.

A friend of mine once sagely said to me 'Errors and Omissions insurance is cheaper than a lawyer.' which was basically use insurance and good business ethics to keep your financial risk of being sued low.

That being said, you should read all contracts you are asked to sign. If they are full of a lot of distasteful clauses you can always just say 'no thanks.'


"Failure".

I'm about your same age. I have a carreer which, arrogantly, everyone I know considers "pretty damn good"

I've shipped a fraction of what you have in terms of end to end solutions, let alone piloting the ship as a founder would have to. I've learned skill sets deep in narrow areas, but this has left me wanting for broad exposure and heterogenous understanding.

Your experiences and _successes_ (you SHIPPED, even if it failed after N years, and that's not even including a fucking _sale_; how many engineers get even close to that far?) give you skill sets that I couldn't easily replicate from my entire peer network. Do not sell yourself short. Even if you HAD failed, and spectacularly (and both times!) that's still a remarkable amount of firsthand experience that, outside of any pathological decisions on your part I don't know about, may not say anything negative at all about your choices and decisions. (and even if it did, experience is experience, you make mistakes and learn from them, and are so much more valuable after. Did you see the HN lashback to the eng. getting punished for deleting prod? and that's a pretty damn overt failure.)

I can ramble on as the above well demonstrates, largely because I have a deep wellspring of reasons why you're being silly. Please don't take this to be a statement meant in insult, I'm sitting here amazed at your accomplishments as I'm reading, get to your conclusion, and go "wait what why huge failure no stop that".

Actionably, maybe go work for a bigCo/midstage/something more grounded for a few years. Keep yourself stable and sane, see what exists in the world and what other people have done. I hope it will help you see the perspective I have, which paints your accomplishments in a very favorable light. (more importantly, don't take my advice literally, I'm saying broadly, do something to keep yourself afloat, employable, and to give yourself time to unwind and just _do shit_ as you want. Some amount of stability and freedom has done volumes in my own life for regaining mental strength in periods of conflict.) And do this in the knowledge and confidence that this engineer would consider himself lucky to work with and learn from someone who has "failed" as much as you have :)


Absolutely - I guess this is really better broken down into two points:

Sometimes you need to campaign for yourself, sing your own praises and outline your accomplishments. For a humble person, this can be hard. The inclination to let your work speak for itself is strong. Document your successes. Document your worth.

Also, from my experiences, there will always be non technical people that assume they can dictate certain technical decisions. Tread lightly in these situations - you need to make sure that you're speaking your truth and doing your job to the best of your ability while simultaneously managing the egos of the non-technical people who intend on influencing the decisions without the proper knowledge of the decision being made.


Or Atria's transducers library (a-la Clojure), which can be applied to collections but also to pushy things (e.g. boost.signals, rx.observables, etc.):

Code: https://github.com/Ableton/atria/tree/master/src/atria/xform...

CppCon 2015 session: http://sinusoid.es/talks/transducers-cppcon15/ https://www.youtube.com/watch?v=vohGJjGxtJQ


I would recommend instead Facebook's folly::gen library, which provides these functional combinators in a high-performance fashion, by expressing them as streaming combinators. You get all the map/filter/take/drop/reduce primitives, but in a much cleaner way.

See https://github.com/facebook/folly/blob/master/folly/gen/test... for an example.


> no one’s going to stop you from spending time reading at work or spending time learning

What? You've lived a truly blessed life, Dan Luu. I've observed the opposite, pretty consistently. I've been working as a programmer for 25 years and I've found, across nine separate employers (and lost-track-of-how-many different supervisors) that spending any appreciable time reading (even a book about Angular when you're supposed to be learning Angular) will become a management issue. Everywhere I've ever worked has expected learning to be on your own time. Don't believe me? Put "read three chapters of a book" on a status report and see how many levels of management show up at your desk to micromanage your time.


Duckling, our open-sourced probabilistic parser to detect entities like dates and times, numbers, and durations

Are there any benchmark for how well this compares with something like HeidelTime or SUTime[2]?

[1] https://github.com/HeidelTime

[2] https://nlp.stanford.edu/software/sutime.html


I'm someone who did a few things after reading a research paper attempting to quantify the detrimental effects of climate change.

I no longer own a car, I bike everywhere.

I work remote 100%.

I do not eat beef if I can help it anymore.

I use the train for long distance personal travel. I still fly when it is for my company, though.

I take transit when biking would be too slow.

I signed up for my city's green energy program. I pay more to receive 30% of my energy use from wind farms.

I cancelled Amazon Prime.

I buy items locally as much as possible.

I avoid online shopping if possible.

All of my food comes from local, small producers.

Cutting consumption behavior down has, unsurprisingly, saved me a lot of money and these behaviors are reducing the amount of carbon I emit.


It's always good to hear about IOT companies using CRDTs because it seems like the perfect fit for it, yet you don't hear very much about it. Also interesting hearing you mention you were using plumtree for your replication backend.

To elaborate more on your point, CRDTs are super helpful while trying to distribute a lot of devices, but they only get you so far. While building Lasp[0][1], a distributed data-flow language using CRDTs, we found out a lot of scalability problems with naive usage of CRDTs. We are aiming to reach 10k-20k nodes in the near future, so we are focusing a lot on reducing network usage.

State-based CRDTs send their full state to all their peers, which works ok when your states are small, but they introduce lots of overhead in any other case. Operation-based CRDTs only send the actions performed on it (add 1, or rmv 2, for example), but these are not idempotent and require a lot of guarantees from the underlying distribution backend.

We are focusing on using Delta CRDTs, that combine the low network usage of operation-based structures, with the idempotence of state-based approaches.

Using plumtree for your backend makes it resilient to network failures, but using the default full membership protocol makes it almost unusable when you're dealing with a big number of nodes. Using alternative protocols like Hyparview greatly reduces the number of protocol messages in your network.

Finally, since Lasp is a data-flow language, we are applying control-flow analysis to select and remove unused or intermediate values in the program, thus also reducing network usage.

[0]: http://lasp-lang.org/ [1]: https://github.com/lasp-lang/lasp


This is a good reply and I'll add to it.

Note that CRDT stands for Conflict free Replicated Datatype and there are a few sub-acronym expansions: Convergent Replicated Datatype and Commutative Replicated Datatype.

[EDIT] I wanted to add that CRDTs were mentioned here because in order for them to provide the kind of guarantees they do they must satisfy the law of associativity (or, be a semigroup), commutativity, and idempotency. So, technically, you have to go further than semigroups with commutative semigroups. [/EDIT]

At Plum we made heavy use of CRDT's on-top of an eventually consistent whole-state replication technology called Plumtree, no relation to the company's name, just coincidence, within the internet-connected dimmer we built called the Lightpad.

The primary design goal was to enable advanced configurations: Arbitrary groups of Lightpads could be controlled from any other one on the same network by binding it to a specific finger gesture. We also did not want this configuration to be dependent upon any single master, they had to be truly master-less.

In-general, Eventual Consistency is pretty scary when you're deploying it to embedded internet-connected devices where you can't immediately shell onto the device like you can with a server in a data center. I have lots of stories here I'll write about sometime of the pains I encountered while developing this solution and how I ended up on CRDTs.

Strong Eventual Consistency (typically, CRDTs deployed on-top of an eventually consistent substrate), though, is very safe and provides a lot of guarantees about the concurrent behavior of the algorithm as it acts upon your data.

Using a CRDT (specifically the ORSWOT - Observed Removal Set Without Tombstones) eliminated the majority of our pains and concerns with data-loss, conflicting concurrent writes, and integrity in the face of flaky consumer-grade home networks (Plumtree is highly-available and partition tolerant - perfect for our needs where we have to be able to continue moving forward even if a majority or more of the Lightpads go offline and never come back up).


I disagree. I believe the unidirectional links in the social graph + public by default are the key features.

Because Twitter is public by default I know everything I post should not be confidential. No details of my kids, for instance. So it tends towards people presenting their professional persona.

Because it's unidirectional, following someone does not imply any personal relationship (cf Facebook, where links are bidirectional and you only friends IRL friends). This means I can follow someone without fear of rejection, and likewise I can gain followers without opening myself up to all my followers' content.

With this I can read someone's posts for a while, and comment when I have a feel for them + have something interesting to say.


I disagree somewhat, but I don't have time to write a detailed answer. Instead here's a quick sketch that is likely only intelligible if you have a fair bit of PLT background.

The basic point is this: you don't need to teach "raw" recursion, just like you don't teach control-flow using goto. Just like structured programming introduced for/while/do loops to structure control flow, you can structure recursive programs into a few major groups. The most important one is good old structural recursion over algebraic data types. It's follows a very set formula, so there isn't much opportunity to go wrong. It helps to have pattern matching in your language to do this.

This is the approach taken by How to Design Programs (http://htdp.org/)

Example blog post you might read if you want to know more about the theoretical background: http://blog.sumtypeofway.com/an-introduction-to-recursion-sc...


I believe another important piece is the Fairness doctrine (https://en.wikipedia.org/wiki/Fairness_Doctrine) which forced news outlets in the US to present balanced news. Reagan killed this in 1987. It's made it much easier to live in a bubble. Fox News (and possibly most of the current US TV news; I'm not sure) couldn't exist if this doctrine was in place. The US has allowed many policies that has divided its people.

Have you read the RethinkDB post mortem? http://www.defstartup.org/2017/01/18/why-rethinkdb-failed.ht... It explains well why time to market is very important.

I used to be in the "correct but slow" camp. Now I better understand business I'm more aware of what needs the correct solution and what needs the timely solution, and this has greatly helped my career.


"I guess the moral of this story is that tools are excellent, and you should probably use them."

This is, in my opinion, a rather generous interpretation of events. What I see when reading this is a system (the Linux kernel and surrounding infrastructure like the C language) that is very poorly designed.

For example, resource allocation is a fundamental issue and a constant source of problems. We've known this as a field for a very long time. Why are fundamental things like resource allocation at least not standardised in the kernel? Better would be to enforce safe resource usage by static checking. Rust is an example of what can be done here, but much more is possible. (I'm not trying to suggest the kernel should be rewritten in Rust. I'm just using it as an example that resource usage [in particular, memory allocation in Rust] can be done in a way that prevents errors.)

Burning huge amounts of human effort is a popular approach to software development but I really hope we can do better in the next few decades. The worst thing about the current situation is the programmer blames themself ("This one kept me up until 3AM. In hindsight, every bug looks simple, and when I figured it out, I was embarassed that it took me so long") rather than wondering why this problem is even possible in the first place.


a horde of outsider devs come rushing into the native app space, with little awareness of the nuances of specific platforms and their communities.

The expansion of the industry means that those with little experience vastly outnumber those with plenty, and are highly incentivised to degrade the value of experience. That's why we see the explosion of new "frameworks", if it's only a year old then someone with one year in the industry on paper has as much "experience" as someone with 30. On paper.

That is why hardware gets faster and more reliable every year but software still gets slower and buggier. The HW side has somehow managed to keep this toxic culture in check, and their engineers, with experience on their side, are actually managing to advance their field, while the SW side keeps reinventing the wheel, a little squarer each time.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: