{"id":2867,"date":"2015-12-07T03:45:19","date_gmt":"2015-12-07T03:45:19","guid":{"rendered":"https:\/\/www.wapshere.com\/missmiis\/?p=2867"},"modified":"2015-12-07T03:45:19","modified_gmt":"2015-12-07T03:45:19","slug":"break-a-csexport-xml-file-into-multiple-smaller-files","status":"publish","type":"post","link":"https:\/\/www.wapshere.com\/missmiis\/break-a-csexport-xml-file-into-multiple-smaller-files","title":{"rendered":"Break a CSExport XML file into multiple smaller files"},"content":{"rendered":"<p>I&#8217;ve had various stabs over the years at tools that will dump out the whole connector space, or just the pending exports, and convert it into a CSV file for easy analysis. They often fall down on two things: the XML file produced by CSExport can be very large (way too big for Get-Content), and the whole file is all on one line. I&#8217;ve now taken the approach of breaking the XML out into multiple files which I can then parse easily.<\/p>\n<p><!--more--><\/p>\n<p>Step one is to tackle the single line problem. Because the XML file produced by CSExport is all on a single line I can&#8217;t use a StreamReader to read it line by line. I looked into various other reading options (chunks and characters), but eventually decided to use an XSLT stylesheet to insert carriage retuns between each &lt;cs-object&gt; node.<\/p>\n<p>The stylesheet looks like this (saved as CSExportSplitLines.xslt):<\/p>\n<blockquote>\n<pre>&lt;?xml version=\"1.0\"?&gt;\r\n\r\n&lt;xsl:stylesheet version=\"1.0\" xmlns:xsl=\"http:\/\/www.w3.org\/1999\/XSL\/Transform\"&gt;\r\n  &lt;xsl:template match=\"\/\/cs-object\"&gt;\r\n    &lt;xsl:copy-of select=\".\"\/&gt;\r\n\t&lt;xsl:text&gt;&amp;#10;&lt;\/xsl:text&gt;\r\n  &lt;\/xsl:template&gt;\r\n&lt;\/xsl:stylesheet&gt;\r\n<\/pre>\n<\/blockquote>\n<p>Next I use PowerShell to create a copy of the CSExport file with the carriage returns added:<\/p>\n<blockquote><p>$XSLTPath = &#8220;CSExportSplitLines.xslt&#8221;<br \/>\n$SourceFile = &#8220;AD.XML&#8221;<br \/>\n$TargetFile = &#8220;AD_SplitLines.XML&#8221;<\/p>\n<p>$xslt = new-object system.xml.xsl.XslTransform<br \/>\n$xslt.load($XSLTPath)<br \/>\n$xslt.Transform($SourceFile,$TargetFile)<\/p><\/blockquote>\n<p>Then it&#8217;s a simple matter to read the new XML file one line at a time, writing out a temporary file for each one (note the snippet below uses $TempFolder which must be defined):<\/p>\n<blockquote>\n<pre>$reader = [System.IO.File]::OpenText($TargetFile)\r\n$i = 0\r\ndo {\r\n    $line = $reader.ReadLine()\r\n    $line | out-file ($TempFolder + \"\\\" + $i.ToString().PadLeft(10,'0') + \".XML\")\r\n    $i += 1\r\n} until ($reader.EndOfStream)\r\n\r\n$reader.Close()\r\nRemove-Item $TargetFile<\/pre>\n<\/blockquote>\n<p>Depending on the size of your CSExport file this may produce a lot of files! But they&#8217;re all small and easy enough to loop through and load with Get-Content.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve had various stabs over the years at tools that will dump out the whole connector space, or just the pending exports, and convert it into a CSV file for easy analysis. They often fall down on two things: the XML file produced by CSExport can be very large (way too big for Get-Content), and&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"footnotes":"","jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":[]},"categories":[42,60,58,23],"tags":[],"class_list":["post-2867","post","type-post","status-publish","format-standard","hentry","category-fim-2010","category-fim-2010-r2","category-fim-sync-service","category-powershell"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/pkp1o-Kf","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.wapshere.com\/missmiis\/wp-json\/wp\/v2\/posts\/2867","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.wapshere.com\/missmiis\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.wapshere.com\/missmiis\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.wapshere.com\/missmiis\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.wapshere.com\/missmiis\/wp-json\/wp\/v2\/comments?post=2867"}],"version-history":[{"count":3,"href":"https:\/\/www.wapshere.com\/missmiis\/wp-json\/wp\/v2\/posts\/2867\/revisions"}],"predecessor-version":[{"id":2870,"href":"https:\/\/www.wapshere.com\/missmiis\/wp-json\/wp\/v2\/posts\/2867\/revisions\/2870"}],"wp:attachment":[{"href":"https:\/\/www.wapshere.com\/missmiis\/wp-json\/wp\/v2\/media?parent=2867"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.wapshere.com\/missmiis\/wp-json\/wp\/v2\/categories?post=2867"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.wapshere.com\/missmiis\/wp-json\/wp\/v2\/tags?post=2867"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}