The answer is yes, and the up and down quarks in the proton are already close enough to massless that the proton is hardly affected by making them completely massless. The proton mass would shift by a few MeV in this case, and the proton and neutron would be very close in mass, with the difference in their mass due entirely to the proton's electromagnetic field, so the proton with its electric field would end up slightly more massive than the neutral neutron.
The confinement mechanism with massless fermionic particles is very interesting and complicated compared to QED, this is why it took so long to understand historically. The fermions are massless, so they don't want to bind into anything, while the confinement means that they are not allowed to propagate separately, because they carry a QCD string along. This tension is hard to resolve. What QCD does to resolve it is to make a quark condensate in the vacuum. This condensate is actual stuff— it's like a relativistically invariant superfluid filling all space, and it makes it so that quarks are not chirally symmetric, so while they are still massless, the vacuum doesn't allow them to propagate while keeping their chirality, when you take their mixing with other quarks in the vacuum into account.
This is analogous to a nematic fluid breaking rotational invariance. The nematic molecules, if they were, in free space can rotate with a fixed angular momentum, since they are rotationally invariant. But in the nematic, if you try to spin up one of the long molecules, you get oscillations in the nematic order parameter. The only sign that it was originally rotationally invariant is that the associated nematic orientation waves are long-range Goldstone-like bosons whose correlations fall off as a power.
Because of the condensate's presence, it is completely wrong to treat quarks as isolated particles— the low energy particles are collective excitations of the quark fluid (and the glue fluid), just like sound-waves in a solid or orientation waves in a nematic. All the low-energy particles (with the exception of the broad resonances, the f0(660), and the f(980)) can be understood as shaking of this low energy fluid. The pions are the goldstone bosons of the broken chiral symmetry, the rho is the effective gauge bosons of the remaining isospin, and the A(1260) isotriplet is the gauge boson for chiral isospin. These effective gauge bosons emerge because the QCD string is like a fundamental string, and global symmetries end up effectively gauged. They are massive because they come with a scalar partner, because they are in 5d by AdS/CFT, not in 4d. The rest of the particles are from extending this picture to strangeness, and the nucleon is the skyrmion.
This picture relates the low energy particles only to the symmetry properties of quark field products. This means that the quarks are not composing these particles the way that a proton and a neutron make a hydrogen atom. They make up these particles the way that iron atoms make up a sound-wave in iron. Only the symmetry properties matter, and the exact atom that makes up the wave changes as the wave propagates.
Similarly, as the pion propagates, the quarks that make it up constantly change, as they flow in and out of the dynamical vacuum. The same is true for rho mesons and for protons and neutrons.
In more qualitative detail
The QCD glue field is random on a scale slightly larger than the proton radius, and this is the origin of confinement. What this means is that if you look at two boxes separated by more than a proton radius, the gauge field is completely uncorrelated in the vacuum in the two boxes (this is true for 4 dimensional boxes in the Euclidean probabilistic version, but the square-root of the Euclidean vacuum probability distribution is the ground state wavefunction— so you should think of the glue correlation in spacelike separated 3d boxes to make this precise).
A random gauge field makes lots of quarks and antiquarks from the vacuum, because it has a flat power-spectrum at energies less than the randomness scale. This means that at any energy less than 1GeV, quarks and antiquarks are freely created and annihilated by the random gauge field. These quarks and antiquarks fill up space with a sea, whose qualitative properties are difficult to predict from the fundamental theory.
This sea is like a conduction electron sea in a metal, in that there is fermionic stuff there, but it is not like a conduction band because the total number of vacuum quarks and antiquarks is equal, so you don't have a Fermi surface, which would be something which would pick out a rest-frame. instead you have a pair-order parameter, which is very much like the pair-order parameter in a BCS superconductor (except this pair-order parameter is neutral, so that you get a paired fermion superfluid instead of a supercounductor).
The chiral condensate is a soup of quarks and antiquarks which make an expectation value for q-qbar. This vacuum condensate breaks the chiral symmetry of the quarks, meaning that quarks are not chirally symmetric below the condensation scale, on the order of 1GeV. The shaking of the condensate consists of light pions, while the vector mesons emerge from the shaking condensate linked by a QCD string, which gives emergent massive gauge bosons because string-theory strings turn flavor symmetries into gauge symmetries in one higher dimension, and QCD strings are not qualitatively too different from string theory strings.
In addition, you can tie a knot in the pion condensate, the knot is the skyrmion, and this knot is the nucleon. The pionic field of a proton is not simply described by high-energy calculations in QCD, it is better described by the classical configuration of the long-range effective condensate theory. The skyrme model is not very precise, it's only good to something like 30% accuracy (quantitatively, this is very bad) in predicting the field distribution in protons and the structure of small nuclei. But it is at least qualitatively consistent with the known vacuum structure of QCD, unlike the picture of the proton as 3 quarks making a bound state.
The picture that the low-lying excitations of QCD are simple bound states made of quarks is completely dynamically wrong. It has no justification, and it is sold to the public, because until recently, the concept of a dynamical vacuum was too far-out. It was Nambu who introduced the dynamical vacuum in 1960, and this pioneering work has too often been buried in the succeeding decade. With Nambu's recognition with a Nobel prize, some of this outrageous recent history can be put right.
Why do proton's quarks stay together? because they have a force between them called the "Strong Force" studied under what's called Quantum Chromodynamics, about which you can read in Wikipedia. The force is a field quantized in the form of bosons called "Gluons", which work like a glue that works the interaction between individuals quarks.
And about your second question: is it always protons/neutrons? Actually not necessarily. The whole story depends on the life-time of the compounds formed by quarks. A simple example can be given from atoms. So the combinations of protons, neutrons and electrons makes atoms. But does that mean that any combination is possible? Of course not. Some combinations are stable, and some are not. Uranium, for example, exists, but isn't stable, and decays to Thorium. Iron exists, but takes a very very long time to decay.
In other words: You may make many many many different combinations of quarks to form either mesons (particles with two quarks) or baryons (particles with three quarks). Some of them are stable that they have a long life-time (like protons), and some others just decay faster (like neutrons, which have around 880 seconds half-life). And some compounds from Quarks decay instantaneously, like Kaons, that live for around $10^{-8}$ seconds.
Best Answer
It is more complicated than this. See how the strong interaction is figuratively modeled in terms of quantum field theory in this article
The invariant mass of the hadron is the sum of the four vectors of all those virtual particles.
As the actual QFT function cannot be modeled because of the large coupling constant of the strong interaction, QCD on the lattice is used to model how the virtual three valence quarks and an innumerable number of quark antiquark and gluons add up to the hadronic bound states . example :