This is an old revision of the document!
Utterance segmentation can more straightforward in a corpus of spoken data than a corpus of written data, because much of the utterance segmentation may have been already been carried out by the transcriber.
The LCDC has recently been updated to a format where the transcription file is aligned with the audio, so that utterances are grouped by the transcriber into breath units. In the original LCDC transcription format that was not audio-aligned (and which makes up the training corpus for these guidelines), the transcriber relies on their own intuitions about which punctuation should go where to break up the running speech into utterances. The utterance segmentation in these guidelines is heavily reliant upon the intuitions of the original transcriber.