I don't think there's a satisfying concise answer to this question. You're askin...

I don't think there's a satisfying concise answer to this question. You're asking to boil down multiple semesters of engineering and mathematics courses to explain something that isn't simple.

The "why" we need it is because it's a convenient mathematical tool to simplify complex problems dealing with sequences/series. It decomposes sequences of numbers (*) into another sequence of numbers that represents a summation of repeating patterns in the original sequence.

The "where" we use it is anywhere that knowing about which patterns show up in sequence of numbers might be more convenient than looking at the original sequence.

There are other nice properties, too. There's a mathematical operation called convolution that's very useful in domains operating with sequences of numbers (machine learning, control systems, audio and image processing, etc). It happens that convolution in the original signal's domain is equivalent to multiplication in the fourier domain. Why that's useful is that convolution is an O(N^2) algorithm but we can compute the Fourier transform in O(nlogn) complexity (**) and multiplication is constant. We can even do things like deconvolution (taking an output signal that we know was convolved with something else) and extract one of the inputs.

Another nice property of the Fourier transform is that most sequences of numbers do not have lots of energy (meaning magnitude) in "high frequency" (meaning fastly repeating patterns in the original domain). The FT of a sequence of numbers will naturally compact most of the information of the sequence into few places. We can exploit this for lossy compression (***).

Ultimately, it's a way of taking information that is hard to grok and transforming it to a domain that's more meaningful and operations/analysis are easier. Both for humans and machines.

* And it works on continuous functions too, but let's not get into that

** Technically this is a variant called "circular" convolution

*** The FT is not the most convenient tool for this job, so very closely related transforms like the DCT can be used, which have the same computational advantages of the FT.