Step 1: Filter
For each event $X$, define a filter $F_X$ on data points which filters out the elements on $X$ and sorts the timestamps. The output of this filter is now a vector of sorted non-negative reals.
Example: on the data point
[(A,0), (B,1), (C,1), (A,2.2), (B,2.2), (A,2.5), (C,2.7), (A,3.3)]
the filter $F_A$ would yield
[0, 2.2, 2.5, 3.3]
and the filter $F_B$ would yield
[1, 2.2]
Step 3: Window
Next, select a non-negative real "window size" and partition the time-axis into a sequence of right half-open intervals of this size. For example if your size were 1.0 your windows would be the half-open intervals:
[0,1.0), [1.0,2.0), [2.0,3.0), ...
Now from the output of $F_X$ you will group the elements which occurred in the same window. So if your window size is 1.0 then the $F_A$ from above
(0, 2.2, 2.5, 3.3)
would be grouped as
([0], [], [2.2,2.5], [3.3])
Step 3: Aggregate
Now perform a COUNT over these groups; continuing the example we obtain
(1, 0, 2, 1)
Let's denote this event signature of the data point with respect to the event $X$ and the chosen window size $w$ by $E_{X,w}$.
Now define the similarity measure between two data points $v_1$ and $v_2$ with respect to the event $X$ as $|E_{X,w}(v_1) - E_{X,w}(v_2)|$. This could be an $L^1$ norm or an $L^2$ norm, see what works for you.
Note that the $L^*$ norms involve a sum over components of the vector, so you should scale by the dimension of the vectors to normalize.
So now for every event $X$ you have a similarity measure $S_X$. To get a global measure you can just add them up:
$S(v_1,v_2) = \sum_X S_X(v_1,v_2)$
(I'm assuming there is no similarities between these events, so a straight sum is appropriate. I'm also assuming that you have a fixed number of events $X$; if not then you may want to scale by the number to normalize).
You need to choose a window size which provides the right degree of separability between closely occurring events. You should take into account your measurement accuracy.
If you want to get fancy you can do different types of windowing. For example, instead of counting the number of events within a time window, you could ask how long it takes to get up to a fixed number of events within a count-window. Play around and see what fits your data.
Finally, now that you have a real-valued similarity measure you can use $K$-means or whatever other methods you already know of.
Best Answer
This question about time series clustering is similar. Essentially, your question boils down to determining distances or (dis)similarities between series of timestamps. (Once you have distances, you can use any clustering algorithm - I personally like DBSCAN.)
A couple of possibilities come to mind, depending on whether the number of events should have a higher impact than the timing or vice versa. Should two series of timestamps be "similar" if they both have 20 events, but at wildly different times... or if one has 20 events and the other 5, but these 5 events occur at exactly the same time as 5 out of the 20 in the first series?
You could bucketize your timestamps into smaller time intervals, e.g., two-minute buckets for your car example, then get integer time series by counting how many time stamps fall into each bucket, then calculate correlations over time, or Hellinger distances. Depending on how you answer the "number vs. timing" question above, you may or may not first want to normalize each time series by the total number of events. Or, if you want to include additional time-dependent information like car weight in your example, you could add the weights in each time bucket, instead of counting the number of events.