What I wish source systems would tell us and they hardly ever do. Best laid out as an example, look at this data:
πΊπ»π½πΎπΈπ·, πΉ πΆπΆπΆ, πΈπΆπΈπΆ-πΆπΏ-πΈπΆ
This alone does not tell us much, so along with this we need context, commonly in the form of column names:
π²ππππΎπΌπ΄π π½ππΌπ±π΄π, π±π°π»π°π½π²π΄, ππΈπΌπ΄πππ°πΌπΏ
Fine, this is usually all we get. Now, let’s shake things up a bit by introducing a second line of data. Now we have:
πΊπ»π½πΎπΈπ·, π·πΌ πΆπΆπΆ, πΈπΆπΈπΆ-πΆπΏ-πΈπΆ
πΊπ»π½πΎπΈπ·, πΉ πΆπΆπΆ, πΈπΆπΈπΆ-πΆπΏ-πΈπΆ
Confusing, but this happens. Is the timestamp not granular enough and these were actually in succession? Is one a correction of the other? Can customers have different accounts and we are missing the account number?
Even if you can get all that sorted out, we can shake it up further. Put this in a different context:
πΏπ°ππΈπ΄π½π π½ππΌπ±π΄π, ππ°π³πΈπ°ππΈπΎπ½ π³πΎππ΄, ππΈπΌπ΄πππ°πΌπΏ
Now I feel the need to know more. Are these measurements made by different persons and how certain are they? What is the margin of error? If these were in succession, what were their durations? If only one of them is correct, which one is it?
More sources should communicate data as if it was a matter of life and death. This is what Transitional modeling is all about.