I didn't read on, as the article is beyond me, but I believe the initial example of extracting structured facts from a paragraph about Marie Curie is partially incorrect.
The text doesn't say she was born in Poland. It says she was Polish. It also doesn't say her nationality at death was French, it says she was naturalized-French at some point. It also states she conducted _pioneering_ research on radioactivity, which is not captured by the example output.
The example also shows an inferrence that her job is "researcher." This is a questionable inferrence. Imagine this conversation between two humans: "He's a hard-surface texturing artist, but he coded sometimes when he needed to." "Oh so his jobs are art and coding?" "No his job is art, but he can code."
As humans, we are thinking about role assignments and expectations vs people committing acts. What ultimately defines a "job"?
The point I'm trying to make is that "She was a researcher." and "She did research." should not result in the same output.
There's obviously a lot of inferrence required to discern any structure from the text (like assuming "she" refers to Marie Curie), but I believe these inferrences should be recognisable -- captured in the output in a way they can be queried and reasoned about.
Much of the knowledge that humans derive from reading text is implicit rather than explicit. The derived knowledge is also context-dependent and probabilistic, i.e. they are not binary facts but we assign a degree of confidence to them.
In the context of that sentence "She was a .. physicist and chemist who conducted .. research on radioactivity.", I think most people would say a physicist or a chemist who conducts research is a researcher. In other contexts, such as in your example, that would be a questionable inference. What you're describing is why natural language understanding is hard--it's context-dependent and not syntactic.
The text doesn't say she was born in Poland. It says she was Polish. It also doesn't say her nationality at death was French, it says she was naturalized-French at some point. It also states she conducted _pioneering_ research on radioactivity, which is not captured by the example output.
The example also shows an inferrence that her job is "researcher." This is a questionable inferrence. Imagine this conversation between two humans: "He's a hard-surface texturing artist, but he coded sometimes when he needed to." "Oh so his jobs are art and coding?" "No his job is art, but he can code."
As humans, we are thinking about role assignments and expectations vs people committing acts. What ultimately defines a "job"?
The point I'm trying to make is that "She was a researcher." and "She did research." should not result in the same output.
There's obviously a lot of inferrence required to discern any structure from the text (like assuming "she" refers to Marie Curie), but I believe these inferrences should be recognisable -- captured in the output in a way they can be queried and reasoned about.