If every observation is on the same scale (i.e. the same performance metric was used each day), then in general I would not recommend normalization, since the scores already have comparable location and spread information.
Since the Z-transform of a random variable is a linear transformation that does not change the sign of the variable, and since the correlation operation is invariant to linear transformations that preserve sign, z-transforming columns and then clustering on columns with correlation distance will not be different than using the raw scores (See http://www.math.uah.edu/stat/expect/Covariance.html just after #8). Z-normalizing rows (within employee) will wipe out their average performance, which may be highly inappropriate, depending on your goals.
If you are clustering days based on employee performance, presumably you would like a cluster of "high performing days", and a separate cluster of "low performing days." If this is the case, DO NOT use correlation distance, since it ignores mean differences. For example, correlation distance assigns a very low distance (=high similarity) to day 1 with scores (11,12,13,14,15) and day 2 with scores (2,2,3,4,5); and assigns a much higher distance to day 1 and day 3 with scores (12,13,12,13,12). This is probably not the sort of result you want. You probably want something like euclidean distance.
It is imperative that you think carefully about your goals here and select normalization methods and distance metrics (not to mention clustering algorithms) accordingly.
There are not many divisive hierarchical clusterings that I know of. In fact, I know exactly one such algorithm: DIANA (DIvisive ANAlysis or so) and I would not call it "popular", but exotic and only of historical interest.
A divisive scheme needs to find the best of O(2^n) possible splits - this is very expensive, and even heuristics don't help that much to get a good result. Top-down isn't the method of choice.
Agglomerative methods are much more popular, but still scale badly, O(n^2) or worse (the standard HAC is O(n^3) runtime, O(n^2) memory). In many cases any O(n^2) method (in particular any that needs a full distance or similarity matrix) will be unacceptably expensive, which is why people keep on using k-means.
Best Answer
Whereas $k$-means tries to optimize a global goal (variance of the clusters) and achieves a local optimum, agglomerative hierarchical clustering aims at finding the best step at each cluster fusion (greedy algorithm) which is done exactly but resulting in a potentially suboptimal solution.
One should use hierarchical clustering when underlying data has a hierarchical structure (like the correlations in financial markets) and you want to recover the hierarchy. You can still apply $k$-means to do that, but you may end up with partitions (from the coarsest one (all data points in a cluster) to the finest one (each data point is a cluster)) that are not nested and thus not a proper hierarchy.
If you want to dig into finer properties of clustering, you may not want to oppose flat clustering such as $k$-means to hierarchical clustering such as the Single, Average, Complete Linkages. For instance, all these clustering are space-conserving, i.e. when you are building clusters you do not distort the space, whereas a hierarchical clustering such as Ward is not space-conserving, i.e. at each merging step it will distort the metric space.
To conclude, the drawbacks of the hierarchical clustering algorithms can be very different from one to another. Some may share similar properties to $k$-means: Ward aims at optimizing variance, but Single Linkage not. But they can also have different properties: Ward is space-dilating, whereas Single Linkage is space-conserving like $k$-means.
-- edit to precise the space-conserving and space-dilating properties
Space-conserving: $$D_{ij} \in \left[ \min_{x \in C_i, y \in C_j} d(x,y), \max_{x \in C_i, y \in C_j} d(x,y) \right]$$ where $D_{ij}$ is the distance between clusters $C_i$ and $C_j$ you want to merge, and $d$ is the distance between datapoints.
Space-dilating: $$D(C_i \cup C_j, C_k) \geq \max(D_{ik}, D_{jk}),$$ i.e. by merging $C_i$ and $C_j$ the algorithm will push further away the cluster $C_k$.