I hope we never find out what happens if the US internet goes down. But with 2020's track record and what 2021 is starting to look like, we might find out!
Outside of a rogue president that convinces the military to act in illegal activity or a true government overthrow it won't happen.
No running or existing government in the USA will pull the plug because they know that would end their career. All their fundraising gone. Wall Street RIP. Riots all over the place.
Last night a friend from India popped up on signal. I told him "Welcome!" and he said "You finally wore me down, I've left WhatsApp and I'm trying to move my family off of it..."
This is a really controversial pattern for GUIs. In one camp, the GUI is really a skin over the CLI that acts like a virtual user, translating GUI inputs into underlying CLI. In the other camp, the GUI is all there is (e.g., Windows) and there is no underlying OS that can be accessed via CLI: in fact, the CLI is a "fake GUI" (win32 apps written without a Window). I can't say which is better, but it is fascinating to see that this was an "original pattern".
Windows CLI apps aren't "fake GUI". There's a flag in the executable header that tells the loader (and from there the rest of the OS) whether to ensure a console is allocated for it and wire up stdin/out/err if they're not already wired up, but that's it.
An executable won't have its own window unless it calls CreateWindowEx one way or another, and any such window won't be very functional until the app starts pumping messages, which is work it needs to actively do, with GetMessage and DispatchMessage. Obviously a CLI-only app won't do these things, but it doesn't need to go out of its way to not do them; it doesn't need to fake anything, or hide or otherwise resort to subterfuge to conceal GUI elements.
There's a stronger argument to be made in some COM scenarios; e.g. single threaded apartment threading model creates a hidden window so it can use the message pump as a communication and serialization mechanism. But even here it's mostly just repurposing existing Windows stuff in ways which work well with existing GUI apps.
I'm (slowly!) working towards an OS where every user-invoked application is a GUI, most applications are many processes (more like Erlang than Unix), and every interprocess interaction, including GUI programs asking the OS to be drawn, is message-passing of data.
My theory is that if applications are written as servers that do message-passing, one can have a shell language that orchestrates the passing of messages between servers instead of the flow of bytes between CLI programs; the semantics of the shell language still need to worked out, though. e.g. does it need session types, or can one get reasonable behavior by structuring data specially to indicate "this is only part of a response that's being streamed."
On the GUI side, the idea of describing a UI in pure data (like HTML) seems very reasonable, and seems like it would make it much easier to quickly throw together small GUI programs. So the drawing part of an application would just be a process that sends the screen/compositor process a message describing the state of its window as a tree, and receives messages for events in response.
A big advantage is it makes the semantics of composing GUIs a lot more reasonable "replace this leaf of my tree with this other process' tree" is a simple-to-implement and simple-to-understand operation, but seems like it'd make sharing widgets way easier: widgets are just processes that render without the "this is a window" flag set, and you ask the compositor to put them into your window's tree. Events flow back to the widget, and each side can send the other messages easily. An application could also "proxy" for a widget, including over a network link, so you get fairly simple network transparency this way too.
At some point, this would be come close to the AppleScript protocol or Symbian OS.
> So the drawing part of an application would just be a process that sends the screen/compositor process a message describing the state of its window as a tree, and receives messages for events in response.
I've been toying with an interpretation of this here - https://github.com/Imaginea/inai - and kind of having fun with it .. and even built a prototype internal app using it. Super early stage and so stuff won't necessarily make sense at the outset .. or possibly ever. Thoughts welcome though.
> A big advantage is it makes the semantics of composing GUIs a lot more reasonable "replace this leaf of my tree with this other process' tree" ...
The "dom" service in Inai pretty much feels like that. I felt like an idiot to try and (for lack of a better expression) REST-ify the DOM, but it seemed to work to my surprise.
> An application could also "proxy" for a widget, including over a network link, so you get fairly simple network transparency this way too.
.. yeah due to the "REST" nature, this becomes pretty straightforward.
This is the basis for Composita ie. Every component has it's own message stack which allows a formal interface to the component to a) reference count b) allow static analysis for deterministic memory allocation (max stack depth required).
Performance is great, managed memory with no GC, multithreaded GUIs are possible etc.
Composita is a further development of A2. I think that it was a real missed chance that A2 wasn't chosen instead of Android as the ZUI and compiled modules would have been a great fit for mobile.
I'm glad to have this to read. It brings back memories of hanging out at the local RC shop that sold D&D stuff, back in the 80's.
What I've always wanted to read area transcripts of the early players (like Gygax), perhaps at GenCon, working out very difficult encounters. For example: how did the early DMs approach role-playing a supergenius demigod with high-level cleric/MU spells? That's got to be very hard character to inhabit. I mean the Lolth module, Q1? come on, there are so many high level intelligent creatures in that endgame...
This article is dead-on, but I think it is missing a fairly large segment of where ML is actually working well: anomaly detection and industrial defect detection.
While I agree that everyone was shocked, myself included, when we saw how well SSD and YOLO worked, the last mile problem is stagnating. What I mean is: 7 years ago I wrote an image pipeline for a company using traditional AI methods. It was extremely challenging. When we saw SSDMobileNet do the same job 10x faster with a fraction of the code, our jaws dropped. Which is why the dev ship turned on a dime: there's something big in there.
The industry is stagnated for exactly the reasons brought up: we don't know how to squeeze out the last mile problem because NNs are EFFING HARD and research is very math heavy: e.g., it cannot be hacked by a Zuck-type into a half-assed product overnight, it needs to be carefully researched for years. This makes programmers sad, because by nature we love to brute force trial-and error our code, and homey don't play that game with machine learning.
However, places where it isn't stagnating are things like vibration and anomaly detection. This is a case where https://github.com/YumaKoizumi/ToyADMOS-dataset really shines because it adds something that didn't exist before, and it doesn't have to be 100% perfect: anything is better than nothing.
At Embedded World last year I saw tons of FPGA solutions for rejecting parts on assembly lines. Since every object appears nearly in canonical form (good lighting, centered, homogeneous presentation), NN's are kicking ass bigtime in that space.
It is important to remember Self-Driving Car Magic is just the consumer-facing hype machine. ML/NNs are working spectacularly well in some domains.
I worked and built out a proof of concept industrial defect detection system recently, with a large focus on modern DNN architectures. We worked with a plant to curate a 30000+ multi-class defect dataset, many with varying lighting and environment conditions. As you said, modifying and parameter tuning NN is not always a hopeful endeavor.
However, you can make significant gains to your models by going back to traditional image filtering/augmentation. Sticking with well researched object detectors/segmentation algorithms and putting our effort on improving the algorithms that cleans up the data takes you far. It's impossible to avoid because images will always be full of reflections, artifacts, strange coloration unless you have the perfect lighting tunnel setup; doable nonetheless.
Currently doing the image collection for a NN. Created a custom HW rig to speed things up, lighting, turntables, actuators for novel objects, the works. It's really hard and tedious. We're doing liquid detection and even under IR/UV lights, it's still really hard.
We'd love to be able to work with a company for a few days, get the parameters set up right for our case, and then let them take the thousands of images. My company would easily pay $100K+ for such a data set.
I mean in the sense that using ML for a problem often requires just trying a dozen different modeling techniques, then a bunch of a hyper-parameter searching, then a bunch of stochastic tuning…
Oh. I see what you mean. Yeah, I guess by definition backwards propagation is trial-end-error. Huh, I never thought of it that way. Thanks for clarifying, I thought you were being saucy: my apologies for being snarky.
> However, places where it isn't stagnating are things like vibration and anomaly detection. This is a case where https://github.com/YumaKoizumi/ToyADMOS-dataset really shines because it adds something that didn't exist before, and it doesn't have to be 100% perfect: anything is better than nothing.
This is a link to a dataset, unless I'm missing something it's not about anomaly detection. I looked into this area a few years ago and always try to keep my eye open for breakthroughs... care to share any other links?
>The industry is stagnated for exactly the reasons brought up: we don't know how to squeeze out the last mile problem because NNs are EFFING HARD and research is very math heavy: e.g., it cannot be hacked by a Zuck-type into a half-assed product overnight, it needs to be carefully researched for years. This makes programmers sad, because by nature we love to brute force trial-and error our code, and homey don't play that game with machine learning.
Uh what? You can literally finetune a Fast.ai model overnight to be borderline SOTA on whatever problem you have data for. 0 Math involved, isn't that exactly a hacker's wet dream?
My point being that the reason many products end up not usable, ref. accounts in this thread, is the same reason why science isn’t solved and doing ML correctly isn’t easy.
"And also goes some way to explaining why, despite TSMC offering a nominally 7nm process, the general consensus has been that Intel’s 10nm design is pretty much analogous. But what’s 3nm between fabs? At that level, probably quite a lot. But if the 7nm node is more of a branding exercise than genuinely denoting the physical properties of that production process then you can understand why there’s supposedly not a lot in it."
Every fab plays loose and fast with the terms, including Intel these days.
More or less TSMC 7nm := Intel 10nm, and TSMC 5nm := Intel 7nm. It's more complex than that, one has denser logic while the other has denser SRAM and what have you, but it's a good baseline.
Since Intel is struggling with 10nm but is shipping, that puts TSMC about a node and a half ahead.
Yeah, that's why I counted it as a half node. Shipping, but awful yields. Meanwhile N5 is doing better than N7 at the same time in lifecycle from a yield perspective.
That only lists the number, not what the # actually means in terms of actual lithography or more importantly, transitor performance. It used to be minimum feature size, or just L of the gate, but with finfet it can be an overloaded term. Scaling in x of .75 and y of .7 lead to 10% performance improvement per node at Intel. TSMC hasn't been that clear. And that doesn't even account for increase in metal layers, pitch of layers (or if they use poly lower layers to get even faster gains), or average track density due to above/below electromigration minimums.
EDIT: All of this stuff is usually stated at ISSCC every time a new process is announced, so it isn't NDA. I haven't followed this in years which is why I was asking for a process person to step in.
Good observation. The only thing that kept MS going was the massive inertia of Win & Office. It bought them time to pivot to the cloud. And now we have things like VSCode, an ElectronJS app, which is the first microsoft product I've adored in two decades. And TypeScript, which has made the world a better place.
Intel innovates in process. Everything else is ruled by backwards compatibility and frenetic management scared to stay the course. (The vast majority of projects are killed if they don't tape out in ~2 years.)
Intel will shift to a TSMC model. They have the best fabs on the planet, and the best fab engineers. I believe it is something like a 3 millions dollars lost per minute if they are idle. They have already started doing that a few years ago, I suspect this will be their final form.
IMHO: The only thing holding them back from the transition are the hundreds of small boondoggle-groups staffed by old-timers too scared to retire, and too scared to do something daring, yet still somehow hang on to their hidey-holes. They lost a ton of key architects to Apple a few years ago, which I also suspect was the reason why the M1 is so badass.
...and if you really want to get sentimental, here's an AMD poster we had in our cubes back in ~1991:
>I believe it is something like a 3 millions dollars lost per minute if they are idle.
I think that's overestimating, although you are right it is damn expensive.
Order of magnitude I would say it's more like: Fab has lifetime of 2-3(?) years, and costs $10B to build and amortize. So every minute of idled factory capital = $6,000 in pure cost of the facility.
(although, if you think of it in potential lost revenue terms, then you may be more correct.)
>> Many Intel fabs are from the early 2000s. Some are from the 90s. What do you mean 2-3 years?
> I think they're referring to each line as a new fab, rather than the complex of buildings they're in, which is pretty fair.
Even so, wouldn't the 2-3 year number only apply to CPUs? Couldn't they keep the lines running making things that don't require top of the line processes, like USB controllers, etc.
Yes, for chips that need leading edge density like CPUs.
Of course the lines stay open after that for cases you've laid out, but there's a lot of financial reasons why that needs to be basically gravy train money at that point with the lines being fully amortized and having paid for themselves many times over.
For those retired lines, don't they sell them off to lower tier manufacturers pretty quickly? Say, within 5 years?
I am no expert but I didn't think the site would be just accumulating its n-1, n-2 generation lines to produce lower grade stuff in the same fab walls. They take up space that they want to continue producing top of line latest output. I thought.
They're not really taking up space per se. For the cost of a fab, a new building (or new wing of a building) is peanuts.
Adding on to that, in a lot of ways the fab is the building, and at a bare minimum you'd be resetting a lot of the yield issues you fixed by moving it to a new space. So then you'd be left with an old node and yield issues, and what's the point of that?
There's also a ton of trade secret style IP still in the old nodes that could help a competitor, so why let it leave your property? Intel for instance is so into preventing trade market theft on the specifics of their nodes that they (at least used to) manually drill out every mobile device camera allowed in their fabs, in addition to the normal "no outside electronics past this point" security stations.
That's correct. A reasonable way to think of it is to calculate the cost of the thing that you're replacing every couple years, and the output it produces in that time. So the CPU producing line at <x>nm process, or whatever unit you choose. The building or site itself ("the fab") isn't the important thing.
Yeah, agreed. The designers / integration will probably get the newest nodes--and the headaches with getting their yields up! I suspect the older high-yield nodes will be filled with tenants pretty quickly. I don't have much knowledge of how this is going, at least from the inside.
"Higher yield nodes" are full, that's why Intel is outsourcing to TSMC. Intel has already sold every 14nm, 22nm, 32nm and 45nm wafer it can make. They have zero capacity, which is an amazing problem for a "dying" company to have.
Even if they axed all process R&D and returned the cash to shareholders, due to the eye watering costs of designing at 10nm and lower I expect there will be a lot of business to keep their fabs turning over for the next decade.
In late 2018, it was rumored they were exiting the business because of low uptake and because of constrained supply of their leading edge process, but it doesn't look like that happened.
This question has raged since the 90's. I worked on the Itanium (Madison and McKinley), and the VLIW architecture was brilliant. This was during the time of the Power4 and the DEC ALPHA, two non-x86 competing architectures that were dominating the "Workstation" market (remember that term?). It looked like the server world was going to have three architectural options (Sun was dying, and Motorola's 68000 line wasn't up to the task.)
Microsoft even had a version of NT3.5 for The Itanic. It seemed we were just about to achieve critical mass in the server world to switch to a new architecture with huge address space, ECC up the wazoo and massive integer performance.
Then the PC revolution took off with Win95, and the second war with AMD happened (and NexGen sorta). This couldn't be solved with legal battles. This put all hands on deck because there was SO much money to be made with x86 compatibility. The presence of x86 "up and down" AMD & Intel's roadmap took over the server market as well: it was x86 all over the place.
And that, chil'ren, is why x86 was reborn in the 90's just as it was close to being wiped out.
Now Apple has proven you can you seamless sneak in a brand-new architecture, get hyooj gainz, and we are none the wiser. This is fantastic news. I think we are truly on the cusp of x86 losing its dominance in the consumer space after almost 35 years of dominance.
Lately I've started to wonder if Itanium wasn't a good idea badly executed. I wonder if you went back in time and invested more in compilers and ecosystem if it could have succeeded? VLIW could really reduce complexity by dumping a lot of instruction re-ordering stuff and eliminating the need for tons of baroque vector instructions for specific purposes.
The biggest thing Intel didn't do with Itanium was release affordable AT/ATX form factor boards. They priced it way, way too high, chasing early margins in "enterprise" without realizing that market share is everything in CPUs. This is the same mistake that Sun, DEC, and HP made with their server/workstation CPUs in the previous era. With a new architecture you've got to push hard for market share, wide support, and scale.
If I'd been in charge I would have priced the first iterations of Itanium only a little above fab cost and invested a lot more in compiler support and software.
Edit: also whatever happened to the Mill? The idea sounded tenable but I have to admit that I am not a CPU engineer so my armchair take is dubious.
Anyway the ship has sailed. Later research in out of order execution has yielded at least similar performance gains and post-X86 the momentum is behind ARM and RISC-V.
I'm not sure that pushing the complexity to the compiler makes as much sense.
One good side of x86 style instruction sets is that there is a lot you can do in the cpu to optimize existing programs more. While some really advanced compiler optimizations may make some use of the internal details of the implementation to choose what sequence to output, these details are not part of the ISA, and thus you can change them without breaking backwards compatibility. Changing them could slow down certain code optimized with those details in mind, but the code will still function. And I'm not even talking just about things like out of order execution. Some ISA's leak enough details that just moving from multi-cycle in-order execution to pipelined execution was awkward.
This ability of the implementation to abstract away from the ISA is very handy. And some RISC processors that exposed implementation details like branch-delay slots ended up learning this lesson the hard way. Now the Itanium ISA does largely avoid leaking the implementation details like number of scalar execution units or similar, but it's design does make certain kinds of potential chip-side optimizations more complicated.
In the Itanium ISA the compiler can specify groups of instructions that can run in parallel, specify speculative and advanced loads, and set up loop pipelining. But this is still more limited than what x86 cores can do behind the scenes. For an Itanium style design, adding new types of optimizations generally requires new instructions and teaching the compilers how to use them, since many potential optimizations could only be added to the chip if you add back the very circuitry that you were trying to remove by placing the burden on the compiler.
Even some of the types of optimizations Itanium compilers can do that mimic optimizations x86 processors do behind the scenes can result in needing to write additional code, reducing the effectiveness of the instruction cache. This is not surprising. The benefits of static scheduling are that you pre-compute things that are possible to pre-compute like which instructions can run in parallel, and where you can speculate etc. And thus you don't need to compute that stuff on-die, and don't need to compute it each and every time you run a code fragment. But obviously that information still needs to make it to the CPU, so you are trading that runtime computation for additional instruction storage cost. (I won't deny that the result could still end up more I-cache efficient than x86 is, because x86 is not by any means the most efficient instruction encoding, especially since some rarely used anymore opcodes hog some prime encoding real-estate.)
Basically I'm not sold on static scheduling for high performance but general purpose CPUs, and am especially not sold on the sort of peudo-static scheduling used by Itanium where you are scheduling for instructions with unknown latency, that can differ from model to model. The complete static scheduling where you must target the exact CPU you will run on, and thus know all the timings (like the Mill promised) feels better to me. (But I'm not entirely sure about install type specialization like they mention.)
But I'm also no expert on CPU design, a hobbyist at best.
I thought the majority of their current problems stemmed from having lost the lead in fabs. You can't have the best chips if you don't have the best fab tech.
I mean their Sunnyvale HQ was a kind of mini mimic of the Whitehouse (at least to me, if you use your imagination a tad) --except for the garish storm fences that surrounded it.