I guess 4-6 secs is what to expect, but acceptable? Hmm... I can't see how a digital camera can be ready to shoot in 0,5 seconds and a digital audio recorder sometimes needs up to 15 secs (like the Microtracker 24/96) to record-mode. It's basically the same digital information being saved onto digital memory cards.
I do embedded software. I have done consumer electronics (not at the moment)...
It's not quite accurate to say that both devices are saving digital info onto memory cards, they both do that, but the basic systems aren't that similar. The 4-6 second time you're seeing is basically the time to boot up a computer - for both the camera and the audio device. There's a microcontroller of some sort and a bunch of peripheral devices which may or may not be on the same chip as the micro. Those peripheral devices need to be initialized, and depending on what that initialization involves (I have no idea here!) that might take time. Because of several design choices, some of those actions may or may not happen in parallel. Some of those design choices include the operating system running on the micro (if there is an OS at all!), what kind of interface exists to all the peripheral devices, how much self-test is performed before starting everything up. That last item is always interesting. Self test takes time, but makes it more likely that a system will really be able to do its job when it looks ready.
A somewhat flippant answer might be the 4-6 second is OK for an audio device because the market isn't pushing for .5 second boot time, but the photo market has. There's probably some truth there. The market seems to have asked for a pre-record capability, lots of devices have that. The PMD-660 at least has a soft power switch, so when it is 'off' it's in some kind of standby mode. There are usually power consumption tradeoffs between how fast to come out of standby vs. power consumed in standby. The pre-record mode certainly could be considered the "high power standby" mode.
I think Marantz has certainly reacted to many of the complaints I've heard (and made!) about the 660 -- 1/8" jacks, no digital in, higher sampling rates and bit depth, maybe sound quality (maybe, initial reports are promising). The next generation will do something new, exactly what that "something" is will depend on their marketing and engineering departments.
My TV takes more than 6 seconds to boot. Like the camera and the digital recorder it's got a microcontroller of some sort talking to a bunch of peripherals, this time video and audio processing, and a tuner. My computer takes a lot longer.