Break a CSExport XML file into multiple smaller files

I’ve had various stabs over the years at tools that will dump out the whole connector space, or just the pending exports, and convert it into a CSV file for easy analysis. They often fall down on two things: the XML file produced by CSExport can be very large (way too big for Get-Content), and the whole file is all on one line. I’ve now taken the approach of breaking the XML out into multiple files which I can then parse easily.

Step one is to tackle the single line problem. Because the XML file produced by CSExport is all on a single line I can’t use a StreamReader to read it line by line. I looked into various other reading options (chunks and characters), but eventually decided to use an XSLT stylesheet to insert carriage retuns between each <cs-object> node.

The stylesheet looks like this (saved as CSExportSplitLines.xslt):

<?xml version="1.0"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:template match="//cs-object">
    <xsl:copy-of select="."/>
	<xsl:text>&#10;</xsl:text>
  </xsl:template>
</xsl:stylesheet>

Next I use PowerShell to create a copy of the CSExport file with the carriage returns added:

$XSLTPath = “CSExportSplitLines.xslt”
$SourceFile = “AD.XML”
$TargetFile = “AD_SplitLines.XML”

$xslt = new-object system.xml.xsl.XslTransform
$xslt.load($XSLTPath)
$xslt.Transform($SourceFile,$TargetFile)

Then it’s a simple matter to read the new XML file one line at a time, writing out a temporary file for each one (note the snippet below uses $TempFolder which must be defined):

$reader = [System.IO.File]::OpenText($TargetFile)
$i = 0
do {
    $line = $reader.ReadLine()
    $line | out-file ($TempFolder + "\" + $i.ToString().PadLeft(10,'0') + ".XML")
    $i += 1
} until ($reader.EndOfStream)

$reader.Close()
Remove-Item $TargetFile

Depending on the size of your CSExport file this may produce a lot of files! But they’re all small and easy enough to loop through and load with Get-Content.

Leave a Reply

Your email address will not be published. Required fields are marked *


*