Classical computers compute using 0s and 1s which refer to something physical like voltage levels of 0v or 3.3v respectively. Quantum computers also compute using 0s and 1s that also refers to something physical, like the spin of an electron which can only be up or down. Although these qubits differ because with a classical bit, there is just one thing to “look at” (called “observables”) if you want to know its value. If I want to know the voltage level is 0 or 1 I can just take out my multimeter and check. There is just one single observable.
With a qubit, there are actually three observables: σx, σy, and σz. You can think of a qubit like a sphere where you can measure it along its x, y, or z axis. These often correspond in real life to real rotations, for example, you can measure electron spin using something called Stern-Gerlach apparatus and you can measure a different axis by physically rotating the whole apparatus.
How can a single 0 or 1 be associated with three different observables? Well, the qubit can only have a single 0 or 1 at a time, so, let’s say, you measure its value on the z-axis, so you measure σz, and you get 0 or 1, then the qubit ceases to have values for σx or σy. They just don’t exist anymore. If you then go measure, let’s say, σx, then you will get something entirely random, and then the value for σz will cease to exist. So it can only hold one bit of information at a time, but measuring it on a different axis will “interfere” with that information.
It’s thus not possible to actually know the values for all the different observables because only one exists at a time, but you can also use them in logic gates where one depends on an axis with no value. For example, if you measure a qubit on the σz axis, you can then pass it through a logic gate where it will flip a second qubit or not flip it because on whether or not σx is 0 or 1. Of course, if you measured σz, then σx has no value, so you can’t say whether or not it will flip the other qubit, but you can say that they would be correlated with one another (if σx is 0 then it will not flip it, if it is 1 then it will, and thus they are related to one another). This is basically what entanglement is.
Because you cannot know the outcome when you have certain interactions like this, you can only model the system probabilistically based on the information you do know, and because measuring qubits on one axis erases its value on all others, then some information you know about the system can interfere with (cancel out) other information you know about it. Waves also can interfere with each other, and so oddly enough, it turns out you can model how your predictions of the system evolve over the computation using a wave function which then can be used to derive a probability distribution of the results.
What is even more interesting is that if you have a system like this where you have to model it using a wave function, it turns out it can in principle execute certain algorithms exponentially faster than classical computers. So they are definitely nowhere near the same as classical computers. Their complexity scales up exponentially when trying to simulate quantum computers on a classical computer. Every additional qubit doubles the complexity, and thus it becomes really difficult to even simulate small numbers of qubits. I built my own simulator in C and it uses 45 gigabytes of RAM to simulate just 16. I think the world record is literally only like 56.
It is only continuous because it is random, so prior to making a measurement, you describe it in terms of a probability distribution called the state vector. The bits 0 and 1 are discrete, but if I said it was random and asked you to describe it, you would assign it a probability between 0 and 1, and thus it suddenly becomes continuous. (Although, in quantum mechanics, probability amplitudes are complex-valued.) The continuous nature of it is really something epistemic and not ontological. We only observe qubits as either 0 or 1, with discrete values, never anything in between the two.