Depends on the scope of the project. Would the goal be to come up with a better algorithm for cell classification based on histological images? Or to apply an existing algorithm to a new dataset?
The former would be quite difficult without much background in ML/Computer Vision (you would have to spend some time self-teaching basics of ML/Deep Learning and the pre-reqs for those — Basic Linear Algebra and Probability).
The latter is doable. I would recommend a very hands on approach. Pick some computer vision object classification tutorials and code them up (using a high level library). Make a mind map of the concepts and look them up as and when you’re unclear about a concept. Then move on to replicating some well cited, peer reviewed papers. Often papers will have their code on GitHub. Try and relocate their results on their dataset. After this you would have the basic working knowledge to modify the algorithm slightly for your specific use case.
The data in the database comes from a bidimensional matrix (LMNE) where leucocytes are classified on resistivity on one axis and light absorption (?) on the other. (I wonder how they managed the separation by absorption... indirectly via centrifugation ?) So I guess not really histological ?
Looks like it's a new model, I have no idea if they already have any ML models yet. There's also some database work.
I'm finishing a Masters degree in Computational Physics, so Linear Algebra and Probability shouldn't be an issue. (We also have an Image Processing and Analysis course.) I guess that's why they contacted us despite the fact that we don't have any ML training ?
Yeah, this is basically what I thought to do, but thank you for your advice !
Given your background, I think it would be worthwhile for you to pick up ESL [0] and read some relevant sections (supervised/sparse/linear methods). It's a great book and a good starting point for thinking about ML methods for high dimensional data.
Also, might be useful to took at webpages of some researchers in this space and courses they teach [1,2].
Funny (but I guess expected) to see the Markov Chain Monte Carlo method that we very recently learned in that book's table of contents ! (Unless it's another MCMC ?)
The former would be quite difficult without much background in ML/Computer Vision (you would have to spend some time self-teaching basics of ML/Deep Learning and the pre-reqs for those — Basic Linear Algebra and Probability).
The latter is doable. I would recommend a very hands on approach. Pick some computer vision object classification tutorials and code them up (using a high level library). Make a mind map of the concepts and look them up as and when you’re unclear about a concept. Then move on to replicating some well cited, peer reviewed papers. Often papers will have their code on GitHub. Try and relocate their results on their dataset. After this you would have the basic working knowledge to modify the algorithm slightly for your specific use case.