

Here is a rough diagram of the situation:


These coordinates are what OP needs to draw 3D pictures to a 2D surface. If we know the 3D coordinates in the above coordinate system of interesting details outside, 3D projection tells us their coordinates on the surface of the window. Thus, the center of the window is at $(0, 0, d)$, where $d$ is the distance from the eye to the window. Using OP's conventions, $x$ axis increases up, $y$ axis right, and $z$ axis outside the window. If you stand in the center of the window, looking out through the center of the window, then we can treat the center of your eye (more precisely, the center of the lens in the pupil of your dominant eye) the origin in 3D coordinates. Let's assume you stand in front of a window, looking out. It is all based on optics, and (linear) algebra. The hard part is understanding how it is done and that is what I shall try to explain here.
