Always exciting to see a new Toledo post, but this one is especially inspiring, because it talks about the author's errors that you will probably make too if you don't read the post, and tells how to overcome them. And the activity described is the highly practical activity of designing a PCB this year with the best current free software and getting it built and debugged; the fact that it's an ISA board for a Transputer is fun but not central to the problem-solving process.
I wonder why he used the 74LS00 family instead of 74HCT00, even if he really needed the TTL thresholds? I forget if ISA even requires TTL levels. Is that a question of nostalgia, or is there a practical advantage of TTL over TTL-compatible CMOS in this context that I'm unaware of?
The practical advantage is that the 74LS chips are available in several corner stores at 25 or 35 cents each one, and I preferred the logic to be the same family.
You cannot choose the source, these come mixed from ST, TI, and other manufacturers. I preferred the laser-engraved ones instead of the white ink ones just to have an uniform look.
The crystal oscillator needs something faster so it requires 74F04, and the link communication buffer requires 74F244 or 74AS244. These are more expensive, the 74F are 2 dollars each chip, and the 74AS are 4 dollars each chip.
Oh! I usually think of availability as an advantage for 74HC (maybe not 74HCT) over 74LS, but maybe that's not the case where you are. Local availability is a real advantage.
The electronics parts stores in my town (Morón), which are several blocks from my house, have a fairly limited part selection, mostly for repair purposes. So part availability is a very significant concern. I was shocked to find last month that one of them didn't even have a TL431! But another one on the same block did.
Thanks for the pointers to the autoroute functionality in KiCAD! While wiring manually is quite satisfying, this feeling vanishes quickly when changes in the underlying schematics are required!
Oh my, transputers and Occam. SEQ and PAR, CHAN and whatever was there to split/assign arrays.
One of my favorite go-to places when seeking peace of mind.
with latest (eg TSMC) processes, someone could build a regular array of 32-bit FP transputers (T800 equivalent):
- 8000 CPUs in same die area as Apple M2 (16 TIPS) (ie. 36x faster than an M2)
- 40000 CPUs in single reticle (80 TIPS)
- 4.5M CPUs per 300mm wafer (10 PIPS)
the transputer async link (and C001 switch) allows for decoupled clocking, CPU level redundancy and agricultural interconnect
heat would be the biggest issue ... but >>50% of each CPU is low power (local) memory
Before you know it you'll be going down the compute fabric and 'fleet' rabbit hole. For a long time I thought that was the future (I even worked with Transputers back in the day) but now I'm not so sure. GPUs have gotten awfully powerful and are relatively easy to work with compared to trying to harness a large number of independently operating CPUs. Debugging such a setup is really hard. That said, I still have this hope that maybe one day such an architecture will pay off in a bigger way than what has happened so far. If someone cracks the software nut in a decisive manner then it may well happen.
well - yes ... that's the point of occam[1] ... if it can hang, it will hang deterministically
we have to zoom out from the 1980s when 4 CPUs were a lot ... but now you can build 40,000 (ie 200 x 200 array) of CPUs within the single reticle limit (ie same as a big NVIDIA) then a big MIMD must be coded with algorithmic patterns like map-reduce, pipelining, etc.
but the general CPU nature and HLL coding means that this is far easier than CUDA to get close to theoretical max performance
[1] or any CSP with both input and output descheduling - ie no queueing
That is less useful than you might expect due to issues of timing skew and signal reflections from transmission-line impedance mismatches.
SCSI was already contending with the termination problem in the early 90s; low-voltage differential SCSI was in the SCSI-3 standard from 01995, 30 years ago, in order to hit 80 megabytes per second over the kind of multidrop (bus) parallel interface you're talking about. That's twenty thousand times slower than the main system memory interface on the latest amd64 servers.
At the point where you already had to go to low-voltage differential signaling to get reliable communication, your bus is no longer a useful GPIO interface for blinking LEDs.
But if you have a 50¢ 16-line GPIO expander chip like https://www.digikey.com/en/products/detail/kinetic-technolog... somewhere on the SMBus the board is already using for things like temperature monitoring, maybe for some other minor interface function, all that's required is to run its GPIOs to test points and document them. Not as fast as the ISA bus but plenty of power for simple digital interfacing.
Ironic that this comment is on an article about the Transputer where the entire point was to not use memory mapped I/O, instead only p2p message channels like today's PCIe.
Always exciting to see a new Toledo post, but this one is especially inspiring, because it talks about the author's errors that you will probably make too if you don't read the post, and tells how to overcome them. And the activity described is the highly practical activity of designing a PCB this year with the best current free software and getting it built and debugged; the fact that it's an ISA board for a Transputer is fun but not central to the problem-solving process.
I wonder why he used the 74LS00 family instead of 74HCT00, even if he really needed the TTL thresholds? I forget if ISA even requires TTL levels. Is that a question of nostalgia, or is there a practical advantage of TTL over TTL-compatible CMOS in this context that I'm unaware of?
The practical advantage is that the 74LS chips are available in several corner stores at 25 or 35 cents each one, and I preferred the logic to be the same family.
You cannot choose the source, these come mixed from ST, TI, and other manufacturers. I preferred the laser-engraved ones instead of the white ink ones just to have an uniform look.
The crystal oscillator needs something faster so it requires 74F04, and the link communication buffer requires 74F244 or 74AS244. These are more expensive, the 74F are 2 dollars each chip, and the 74AS are 4 dollars each chip.
Oh! I usually think of availability as an advantage for 74HC (maybe not 74HCT) over 74LS, but maybe that's not the case where you are. Local availability is a real advantage.
The electronics parts stores in my town (Morón), which are several blocks from my house, have a fairly limited part selection, mostly for repair purposes. So part availability is a very significant concern. I was shocked to find last month that one of them didn't even have a TL431! But another one on the same block did.
I thought I'd check Digi-Key, but it seems like 74AS244 is even more than US$4 there in onesies: https://www.digikey.com/en/products/detail/texas-instruments.... However, a CMOS 74AHC244 (nominally something like 5.5ns to the AS244's 6.2ns) is only 27¢: https://www.digikey.com/en/products/detail/texas-instruments...
At Digi-Key, the CMOS parts have better availability in this case, but that says nothing about the availability at the corner store.
Thank you for explaining!
Thanks for the pointers to the autoroute functionality in KiCAD! While wiring manually is quite satisfying, this feeling vanishes quickly when changes in the underlying schematics are required!
Oh my, transputers and Occam. SEQ and PAR, CHAN and whatever was there to split/assign arrays. One of my favorite go-to places when seeking peace of mind.
occam (MIMD) is an improvement over CUDA (SIMD)
with latest (eg TSMC) processes, someone could build a regular array of 32-bit FP transputers (T800 equivalent):
the transputer async link (and C001 switch) allows for decoupled clocking, CPU level redundancy and agricultural interconnectheat would be the biggest issue ... but >>50% of each CPU is low power (local) memory
Adapting your code would be the biggest issue.
true … that’s why the transputer failed in the first attempt.
nevertheless, coding an array of RISC CPUs in an HLL is far easier and would have a broader base than hand tuned, machine specific CUDA
Before you know it you'll be going down the compute fabric and 'fleet' rabbit hole. For a long time I thought that was the future (I even worked with Transputers back in the day) but now I'm not so sure. GPUs have gotten awfully powerful and are relatively easy to work with compared to trying to harness a large number of independently operating CPUs. Debugging such a setup is really hard. That said, I still have this hope that maybe one day such an architecture will pay off in a bigger way than what has happened so far. If someone cracks the software nut in a decisive manner then it may well happen.
well - yes ... that's the point of occam[1] ... if it can hang, it will hang deterministically
we have to zoom out from the 1980s when 4 CPUs were a lot ... but now you can build 40,000 (ie 200 x 200 array) of CPUs within the single reticle limit (ie same as a big NVIDIA) then a big MIMD must be coded with algorithmic patterns like map-reduce, pipelining, etc.
but the general CPU nature and HLL coding means that this is far easier than CUDA to get close to theoretical max performance
[1] or any CSP with both input and output descheduling - ie no queueing
we need to bring back the cpu address/data bus pinout on the back of PCs
i want all 64 bits with a write strobe and a 3ghz clock. let me blink leds by bitbanging /dev/mem
That is less useful than you might expect due to issues of timing skew and signal reflections from transmission-line impedance mismatches.
SCSI was already contending with the termination problem in the early 90s; low-voltage differential SCSI was in the SCSI-3 standard from 01995, 30 years ago, in order to hit 80 megabytes per second over the kind of multidrop (bus) parallel interface you're talking about. That's twenty thousand times slower than the main system memory interface on the latest amd64 servers.
At the point where you already had to go to low-voltage differential signaling to get reliable communication, your bus is no longer a useful GPIO interface for blinking LEDs.
But if you have a 50¢ 16-line GPIO expander chip like https://www.digikey.com/en/products/detail/kinetic-technolog... somewhere on the SMBus the board is already using for things like temperature monitoring, maybe for some other minor interface function, all that's required is to run its GPIOs to test points and document them. Not as fast as the ISA bus but plenty of power for simple digital interfacing.
I think Raspberry Pis basically give you that via GPIO?
Ironic that this comment is on an article about the Transputer where the entire point was to not use memory mapped I/O, instead only p2p message channels like today's PCIe.