What we observe in nature exists in several scales. From the distances of stars and galaxies and clusters of galaxies to the sizes of atoms and elementary particles.
Now we have to define "observe".
Observing in human size scale means what our ears hear, what our eyes see, what our hands feel, our nose smells , our mouth tastes. That was the first classification and the level of "proxy", i.e. intermediate between fact and our understanding and classification, which is biological. (the term proxy is widely used in climate researches)
A second level of observing comes when we use proxies, like meters, thermometers, telescopes and microscopes etc. which register on our biological proxies and we accumulate knowledge. At this level we can overcome the limits of the human scale and find and study the enormous scales of the galaxies and the tiny scales of the bacteria and microbes. A level of microns and millimeters. We observe waves in liquids with such size wavelengths
Visible light is of the order of Angstroms, $10^{-10}$ meters. As science progressed the idea of light being corpuscles ( Newton) became overcome by the observation of interference phenomena which definitely said "waves".
Then came the quantum revolution, the photoelectric effect (Particle), the double slit experiments( wave) that showed light had aspects of a corpuscle and aspects of a wave. We our now in a final level of use of proxy, called mathematics
The wave particle duality was understood in the theory of quantum mechanics. In this theory depending on the observation a particle will either react as a "particle" i.e. have a momentum and location defined , or as a wave, i.e. have a frequency/wavelength and geometry defining its presence BUT, and it is a huge but, this wavelength is not in the matter/energy itself that is defining the particle , but in the probability of finding that particle in a specific (x,y,z,t) location. If there is no experiment looking for the particle at specific locations its form is unknown and bounded by the Heisenberg Uncertainty Principle.
What is described with words in the last paragraph is rigorously set out in mathematical equations and it is not possible to understand really what is going on if one does not acquire the mathematical tools, as a native on a primitive island could not understand airplanes. Mathematics is the ultimate proxy for understanding quantum phenomena.
Now light is special in the sense that collectively it displays the wave properties macroscopically, and the specialness comes from the Maxwell Equations which work as well in both systems, the classical and the quantum mechanical, but this also needs mathematics to be comprehended.
So a visualization is misleading in the sense that the mathematical wave function coming from the quantum mechanical equations is like a "statistical" tool whose square gives us the probability of observing the particle at (x,y,z,t). Suppose that I have a statistical probability function for you, that you may be in New York on 17/10/2012 and probabilities spread all over the east coast of the US. Does that mean that you are nowhere? does that mean that you are everywhere? Equally with the photons and the elementary particles. It is just a mathematical probability coming out of the inherent quantum mechanical nature of the cosmos.
The demonstration shown in the answer of another respondent, with the time frames showing how the interference patern builds up over time, is one of the best pieces of evidence we have about the wave particle duality of matter at the quantum scale. An intersting aspect in all these mysteries of nature, that I would like to express my opinion about, is the following:
Let us talk about photons, because they are the most missunderstood objects in quantum mechanics discussions.
Wave or particle?
Photons are particles every day of the week, not some days they are waves and some other days they are particles. They are as much particles as the electrons are. We know that from the distinct clicks we hear in our detectors when sufficiently low intensity light arrives at them. The wave property of the photon, or any other particle, is the wave function, and I assume we are familiar with the interpretation given to it, as the probality to observe the photon (or any other particle) at some position $x$ at some time $t$. That is to say that there is no way to tell were actually the photon is before we observe it. The wave function in the mean time occupies the whole of the space that is available for the photon to be in. It is important to undestand that photons of the same colour are all identical (they have the same energy).
Two slit experiment: Now let us see what happens when a photon approches the two slits. The wave function that represents the photon will pass through the slits like waves do. It will split into two waves and recombine to interfere on the aray of detectors on the other side. The maxima corespond to high probability, the minima to zero probability. The consequence of this is that the photon is most like to show up in one of these maxima and will only hit one detector, but we don't know which one. Likewise, we don't know which slit it has gone through. An interesting point to make here is this, there is no way that one photon will hit two detectors at a time. Any attempt, or trick we might do to determine which slit the photon has gone through, destroys the interference pattern as all wave properties are removed!
Conclussion: The interference pattern people had seen in the Young experiment when they did it, they observed the pattern forming instantly because they used high intensity light. But we discover the reality when we use very low intensity light. It is like you turn down the water tap, and you start getting droplets instead of that continuous flow you had when the tap was fully open. And we know that if we look closer we will see molecules of water.
For a deep discusion on all these, try to google: Richard Feynman's lectures at the university of Auckland, New Zealand, First Lecture. Very entertaining too! Try this link: http://vega.org.uk/video/subseries/8
Best Answer
The concepts "particle" and "wave" started from classical physics and from the everyday use of the terms, to begin with. A particle of dust got into one's eye, and the sea had huge waves.
Physics came into its reign when mathematics was seriously used to model observations.
For classical physics "particle" means an entity with small mass and a center of mass tracked at coordinates (x,y,z) at time t. Solutions of kinematic differential equations described the trajectory with accuracy determined by experimental errors.
For classical physics, waves are modeled by sinusoidal functions, i.e. functions that were the solution of "wave equations", could describe the behavior of sea waves, sound waves, and finally electromagnetic waves. Classically a wave is a variation of a measurable quantity like energy, or electric field, in space at a given time t, and the theoretical models were very successful in describing the observations of periodic energy distributions in bulk matter, and even in empty space ( electromagnetic waves).
Then quantum mechanics became necessary, from the discreteness of atoms, the black body radiation spectrum, the photoelectric effect it was finally understood that there were regions in the variables measured that displayed a quantization of energy.
It so happens that the equations that successfully describe the quantum mechanical state of matter were diferential equations with sinusoidal solutions, i.e. wave equations, like the Schrodinger equation. The solutions for the hydrogen atom were able to explain the spectral series adhoc assigned by the Bohr model, IF the postulate was assumed that the wavefunction squared did not represent the energy of the electron at (x,y,z) at time t, but a probability distribution. i.e. if one accumulated with the same boundary conditions a large number of measurements and plotted the (x,y,z) at time t distributions one would know how probable it would be to find the electron at that location.
As an example, this is similar to taking a census of the population of a city by age, and gauging how probable it would be that the first person you meet will be 8 years old. The wave function's function is just that, to give probabilities mathematically which are checked experimentally, and have been very accurate.
The "wave" part confused and continues to confuse people, because they think that the quantum mechanical entity, the electron for example, is spread out according to the solution of the Schrodinger equation. This is a misunderstanding, as the double slit interference experiments show with incoming single electrons:
Note the top photo, where the electron impinges on the screen, it is one whole electron . The probability pattern accumulated though shows clearly the interference effect that is expected by the sinusoidal form of the wave functions describing the electron when it hits the slits and goes through one or the other.
The "particle" facet of the electron is that it appears as a point at ( x,y,z_0) of the screen, and the "wave" facet is the probability distribution displayed in its trajectories.
If one is becoming a physicist it is simple to accept this fact, that the microcosm behaves differently than the macroscopic world we are used to. Bohm was stuck on classical frameworks and tried to derive the quantum mechanical probabilities from an underlying classical description. He succeeded in reproducing the same results as the usual quantum mechanical solutions, but afaik his model is complicated and limited, and cannot be extended into second quantization where the ball game has gone now.