One thing to keep in mind before turning to weights is that gender can be considered a "swamping" variable in a two-step cluster analysis. Differences between gender are oftentimes large, and thus overpower weaker, but still substantively interesting heterogeneity in your data.
Instead of down-weighting gender, you could consider looking into a finite mixture regression model. Finite mixture models are a model-based cluster analysis (clusters are usually assumed to be multivariate Gaussian) and a finite mixture regression model essentially combines a cluster analysis with a regression. In your case, you could use gender as a predictor, perform this analysis, and detect clusters while taking into account the predictive power of gender (as well as other variables of interest). More information can be found here and here in the flexmix R package documentation.
It is obvious that using "option 1" dataset you virtually declare that similar respondents are those respondents who bought (or stole) the same items at the same turn or visit. I don't believe this is your research goal. In addition, problematic NA responses arise.
"Option2" dataset is what you should use, but do recode 2 into 1, to make the data binary. You then can take these variables in TwoStep as categorical variables. But here arises another doubt. TwoStep is for nominal categorical variables only; and you probably will not want to treat the binary variables as dichotomous nominal. Treating them as nominal means that, for you, respondents who did not buy the same item are as similar with each other as those who did bought the same item. Rather, you'll want to treat those "0 and 0" respondents as neither similar nor dissimilar, - this suggests using similarity measures such as Jaccard measure. But this in turn precludes using TwoStep and requires using Hierarchical clustering or other clustering methods apt for binary data (those other, unfortunalely, are not found in SPSS).
But you can't do hierarchical clustering of 75000 respondents - its too many, both practically and theoretically, for hierarchical method. You see - you've got trapped.
One way out will be to look for clustering algorithms (outside SPSS) which are for big (large) sparse binary data
. Search this site for this word combination: you'll get a few related questions, to read.
Another way out may be to do hierarchical clustering on random subsets of your data (subsets of size, say, 500 respondents). Clustering of subsamples and cross-validation is beneficial, as it escapes overfitting threat. But, in the context of clustering, it is quite a big work. I recommend you to read papers on cluster analysis by subsamples.
A third and the easiest way will be to do K-means clustering of your data. It solves the problem of big dataset. However, K-means is often seen as theoretically inappropriate for binary as well as count variables. They say that because the method involves computation of floating point geometric centroids, it requires interval, ideally - continuous, variables. That said, people use K-means with binary or count data "all the time". It appears to me that in high-dimensional settings such as yours (1500 variables) the relative contributions of "continuety" and "dimensionality" to the formation of centroids shifts towards dimensionality anyway - even if you had quite fine-grained Likert scale variables. That seems to excuse, to an extent, applying K-means to a wide binary dataset as yours.
If you choose K-means then I recommend you to normalize each row (respondent) in the dataset to unit sum-of-squares (= unit sum, since the data are binary) prior clustering. Why to do it? According to this, when you normalize vector magnitudes (L2 norms) you make the euclidean distance between the vectors directly reflect cosine similarity between them: $d^2=2(1-cos)$. And it is cosine similarity (= Ochiai binary measure) which is a justifiable alternative to Jaccard measure I mentioned in the 2nd paragraph above. Both Jaccard and Ochiai treat "0 and 0" respondents as neither similar nor dissimilar - that is what you need. So, using K-means on such way normalized data is in a sense analogous to using hierarchical clustering on Ochiai measure.
Best Answer
Two-step clustering in SPSS extends beyond just numeric variables. Usually rescaling is something necessary for k-means, but it can be used here to make interpretation (and graphing the solution) easier. Unfortunately there is no way to test the 1-cluster solution in the two-step to determine if clustering is even necessary. Given the poor results based on the Silhouette measure, there probably isn't much heterogeneity to detect among the variables you used, regardless of whether you decide to scale the variables or not.
If you are interested in using the clusters for predictive purposes, another option to pursue here could be a regression tree using the clustering variables as predictors.