Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A Little Calculus (brown.edu)
155 points by __rito__ on May 4, 2023 | hide | past | favorite | 42 comments


This links to the 2018 edition of this book. Each edition points to a newer one until the book splits into two books. This section doesn’t seem to have changed much throughout the editions, but I figured people might like to see the most recent version of the content overall.

https://papl.cs.brown.edu/2018/index.html

https://papl.cs.brown.edu/2019/

https://papl.cs.brown.edu/2020/

https://papl.cs.brown.edu/

https://dcic-world.org/2023-02-21/func-as-data.html#%28part....


Since this is mentioning d/dx stuff...

In the past I've read that that stuff is more or less just a shorthand for the "formal" definition of a derivative. But I have also seen people solve problems through manipulation of the dx's and the dy's, and treading that as... well as a fraction.

Is there some sort of guide to what you're "allowed" to do with dx and dy in general? Or is the thing basically that people are just "being careful"? This, like discussion of set theory, is always tough because I feel like I'm missing the axioms that could justify certain operations


Another commenter has mentioned nonstandard analysis but I find that most people (commonly physicists, engineers) who are manipulating dy and dx are not actually using the framework of nonstandard analysis.

What usually is going on is you mentally replace dx and dy with Delta x and Delta y = y(x + Delta x) - y(x), where Delta x is some small nonzero quantity. After doing your manipulations, you take the limit as Delta x -> 0. Then you have rigorous statements like Delta y / Delta x -> f'(x) (this is the definition of derivative), and Delta x/Delta x = 1, and (Delta x)^2/Delta x -> 0 (ability to neglect second order terms), etc.

You do this type of thing so many times and it becomes rote and annoying to explicitly mention the limits and the reader is trusted to formalize it themselves as an exercise.

The stuff that you are "allowed" to do is taught in a first course in real analysis. After taking such a course, you will be able to justify for yourself which manipulations are valid.


Yes, it is possible to treat d as a differential operator, and then dy and dx are the differentials of functions, and it makes sense to take their ratio. You can learn about this by reading about differential forms:

https://en.wikipedia.org/wiki/Differential_form

Unfortunately I don't know a treatment of this subject that explains how to apply this concept to basic calculus without introducing some other more difficult concepts.


Differential forms looks really cool and I really want to learn more about them at some point!

That said, isn't calculus, mainly the chain rule, enough to tell you what operations are allowed?


I definitely didn't understand how to use differentials just by learning calculus, but maybe it is possible.


For single variable calculus (y=f(x)), yes.


I get the impression that this is a relic of how calculus developed. Basically the creators had a vague intuition of an infinitesimal, it was later replaced by supposedly much more rigorous limit definitions. Then it turns out just creating a non-standard system in which infinitesimals exist is perfectly fine, just like imaginary numbers are useful.

The equivalent to i = sqrt(-1) might be something like the nilpotent infinitesimal where epsilon^2 = 0 but epsilon =/= 0.


I was told by my Calc 2 instructor that in the past, people treated dx like a variable, but in modern times, mathematicians have identified some edge cases which make that kind of thing problematic if mathematical rigor is your thing.


You can make it rigorous again by studying nonstandard analysis:

https://en.wikipedia.org/wiki/Nonstandard_analysis


For scalar functions of a scalar variable, a less drastic option than nonstandard analysis (and perhaps one that’s more useful for bridging into more advanced stuff) can be as follows:

For any expression y involving x [*], denote by dy the linear part of y|x+t - y|x (that is, y evaluated at x+t less y evaluated at x). For y=f(x), that’s just a fancy way of saying f'(x)t, of course, but the intent is that where y was an “x-dependent scalar”, dy is an “x-dependent linear function” (without a constant term, as it is the convention in most settings outside of high school).

The baby version stops at that: by our rules, dx = t, df(x) = f'(x)t = f'(x)dx, f'(x) = df(x)/dx, the not-a-proof for the chain rule becomes an actual proof except everything is assumed differentiable (where a textbook one would only need differentiable inputs), etc. Of course, at this stage it seems slightly miraculous that nothing ends up t-dependent, but it is what it is. (If you want, you can imagine that the symbol t is “private” to the previous paragraph so it’s not allowed to escape to “the user”, but then you need to prove that it actually doesn’t.)

The adolescent version unholsters linear algebra:

While of course every one-dimensional (real) vector space (i.e. a line with a chosen zero point) is R in disguise, there are multiple choices for what the disguise is (differing by a multiplication by a constant), and you might not know which one to prefer. (This is what choosing a basis in a one-dimensional space amounts to.) Given such a space and two vectors u (whatever) and v (nonzero), denote by u/v the number such that u = (u/v)v (the coefficient of proportionality, aka the coordinate of v when e is the basis vector). Now you can prove, for example, that u/w = (u/v)(v/w), because it is so when you choose any (single-vector) basis and substitute for each vector its (only) coordinate.

(Can you make sense of uv/vw? Yes, as u⊗v / v⊗w, but tensor products are their own can of worms and probably overkill at this point.)

At each value of x, the space of linear functions (of the private variable t) is one-dimensional, so everything in the previous paragraph applies. When we write dy/dx and so on, we mean the things from there except we do them at each value of x (“pointwise”).

One of the things that this more advanced thinking gets you is that you can imagine how all of it generalizes to multiple variables. (Writing v/e_i for coordinates of v in the basis (e_i) is not common, but it does not not make sense—as long as you remember you can only “divide” by a basis, not by a single vector. Write out the coordinate transformation rules in this notation. The differential version will have you end up with df(x)/dx_i instead of the more common ∂f(x)/∂x_i, but again, that makes sense in context—note that, once again, the partial derivative wrt one coordinate on a plane depends on what the other coordinates on that plane are!)

The grown-up version just says I’ve been talking about the cotangent bundle in wishy-washy language. (An “x-dependent scalar”? What’s that? Does it taste good?) Hopefully it was still of some help.

[*] People do write df/dx where I would require df(x)/dx, and it’s convenient to do that, but I’m trying to avoid additional abuses of notation where possible.


d/dx is not a fraction but an operation, dy/dx is a fraction


Is `dy/dx` ever distinct from `d/dx y`? Generally I've treated `df(x)/dx` as just another way to write the operator `d/dx` applied to `f(x)`, which I don't think of as a fraction.


dy/dt = dy/dx * dx/dt


That's just the chain rule, right?

d/dt y(t) = ( d/dx y(x) )( d/dt x(t) )


What series of operations can lead you to end up with dy/dx though? Like Integral{x^2){dx} is how you're introducing dx into a formula, but it's not like integration just works over division, so I'm a bit stuck on the construction phase


y = x^2

dy = 2x dx

dy/dx = 2x

Chain rule makes this not completely trivial formulation.


Someone made a great comment about the matrix approach to teaching the derivative and deleted it, maybe because it wasn't rigorous. Seems like "Differentiation Matrices" is the technical term: https://tobydriscoll.net/fnc-julia/bvp/diffmats.html


I really, really want to learn pyret. It looks like an awesome language. I can't remember the last time I got so excited about a language.

I get the feeling it's not that used or popular outside of a pedagogical context though, whichis a shame since it has a lot of good ideas!


apparently "a little" calculus is too much for me.

or people are still trying to find good ways to teach it.


> So what is it intended to mean? The intent, clearly, is to represent the function that squares its input, just as 2x is meant to be the function that doubles its input. We have nicer ways of writing those:

> fun square(x :: Number) -> Number: x * x end

> fun double(x :: Number) -> Number: 2 * x end

No, that is not a nicer way of writing those. It certainly helps for understanding, but once you understood properly what is going on, writing `fun double(x :: Number) -> Number: 2 * x end` instead of `2x` is just not economical.


I read it as (mostly) a joke. It's not nicer because it's objectively better in all cases, it's nicer because it's pyret.


I haven't considered that it might be a joke...


d/dx F = F' is indeed a concise way to denote that d (\x.F) = (\x.F), where \x.F is the lambda calculus notation of a function that takes x to F.

The /dx part indicates that the following expression should be considered functions of x.

The equation can also be rewritten as d F = F' dx, reflecting that the change in function value is proportional to both the derivate and change in x.


> We’ll implement numerical differentiation, though in principle we could also implement symbolic differentiation

I'd like to argue, that when it comes to computers and differentiation, automatic differentiation is the most useful, most important in practice. But often only symbolic and numerical differentiation are mentioned.


I think this is because automatic differentiation is a manifestation of just one rule of differentiation: the chain rule. With numerical differentiation, you only need to be able to evaluate the function at hand. With automatic differentiation, you need to "seed" your program with the differentials of all existing functions.

What's nice about a discussion of symbolic differentiation is that we can prove a few rules rigorously, and then use those rules to purely mechanically differentiate algebraic expressions we encountered up to trigonometry.

You're right though, in practice, for complex functions expressed as programs, automatic differentiation is superior.


While in college, I wrote a program that did this. I thought I needed precise derivatives for a curve fitting program, so I wrote one that evaluated an expression and all of its derivatives, all at once. This was typical noobie chutzpah. First, I converted an infix expression into its postfix form using a textbook algorithm. Then I applied all of the differentiation rules to the postfix expression, including L'Hospital's rule.

Was it correct? All I know is that I couldn't kill it.


That is because those are the traditional methods, and are easy to teach. And probably the curriculum hasn't been changed. Also, automatic differentiation is _hard_.

Numerical differentiation comes directly from the definition of derivative and symbolic is what you always did for exercises in your calculus classes.

Automatic differentiation come in two flavours, forward and backward modes. Forward mode is based on dual numbers [1], which is the

    quotient of a polynomial ring over the real numbers, by the principal ideal generated by the square on the indeterminate
That is ℝ[X]/≺X²≻.

Another way of thinking of this is to have an element ℇ different from zero, such that ℇ ²= 0 and hand wave the fact that the dual part of the number carry the derivative.

Backward mode builds a graph of computation and doing symbolic differentiation over the graph and compile down the derivative into runable machine code (it could be interpreted, compiled down to IR, neither the form or the execution environment changes the fundamental algorithm).

Maybe they are not really hard, but they are not easy either. Still I think they should be at least mentioned in modern courses.

[1] https://en.wikipedia.org/wiki/Dual_number


It's been a while since I've watched this talk[0], but iirc the upshot is that if you think of the "differentiation at p" operator as sending f to the pair (f(p), D_p f) instead of the usual D_p f, then you get a nice compositional operator that defines a functor on vector spaces, automatic differentiation more or less pops out, and you get reverse mode by looking at what this does when applied to dual spaces.

So in some sense it's maybe not hard, though any attempt to do it without building up some theory (e.g. abstract vector spaces and dual spaces) first will probably come across as magic. On the other hand, magic tricks would be right at home in an intro differential equations class so maybe this would be a perfect addition. Or it can replace Wronskians or something.

[0] https://youtube.com/watch?v=17gfCTnw6uE


> Still I think they [automatic differentiation methods] should be at least mentioned in modern courses.

That might make more sense in another course, like one on numerical methods, rather than in a programming course that happens to use numerical differentiation as a demonstration of programming techniques and language features.


> That is ℝ[X]/≺X²≻.

> Another way of thinking of this is to have an element ℇ different from zero, such that ℇ ²= 0 and hand wave the fact that the dual part of the number carry the derivative.

Aren't these the same way of thinking?


Different way of thinking about same thing.

A more common example of this idea is √(-1): R[x]/<x²+1> vs i²+1=0.


AD is important for training neural networks, or sgd (et al) generally. But that's still only one field. Numerical differentiation is still important e.g. for differential equation solvers. I don't think you can say AD is the most important or useful - maybe for understanding pop culture.


Then we can combine AD and numerical solvers like is found in modern weather and climate models. I don't quite understand but it has something to do with sensitivity analysis and improving data assimilation. (Google "4dvar ecmwf" for more details... eg: https://www2.atmos.umd.edu/~dkleist/docs/da/ECMWF_DA_TUTORIA...)

I think the idea is to use the "tangent linear model" to decide how much importance to give to a particular observation of the initial state.


AD has been a round since at least 1957 (the oldest reference to the class of technique I could find). When I studied it (from a CAS/PLT angle), it was considered a good middle-ground technique between symbolic methods & giving up. You can trace the result of the AD and recover a polynomial solution near the expected results (like a profile guided Taylor expansion, I guess?). It allowed us to run Buchberger's on algorithmic objects without analytic forms, and still have a chance at getting a complete basis for the antiderivative.


> Numerical differentiation is still important e.g. for differential equation solvers.

Is this a misunderstanding? You use differential equation solvers when all you know is how to calculate the derivative(s) of the function, and you want to get the function itself as the solution. Where would you need numerical differentiation in this?


Holy shit, thanks for bringing that up, I had never heard of automatic differentiation. So cool!


Wish they covered co-induction. I've read the paper linked in the definition but still don't have a very strong intuition for it.


A little bit of calculus in my life

A little trigonometry by my side

A little Fibonacci's all I need

A little inequality's what I see

A little bit of lambda in the sun

A little bit binary all night long

A little probability, here I am

A little √2 makes me your man


That distribution is Poissonnnnnnn. Poisson!


Mambo No.5!!


Math notation is different from the notation of that computer language. Oh no.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: