powershell: Retrieving data from Excel

Note June 2011: This post gets a lot of hits but is very old now. I haven’t revisited the powershell-Excel story since writing this post and it may well be a lot easier now.

I would love nothing better than to do one ILM project after another, but it doesn’t always pan out like that, so now I’m back to the bread-and-butter work of server installations, email migrations and security audits. Still, it’s giving me an opportunity to get my powershell skills up to date.

This post, after some blurb, includes a script I wrote to extract data from an Excel spreadsheet. At this point I don’t think I’ll actually use it, but it was an interesting exercise.

Like most people, I find it frustrating learning new methods when the old way would be much quicker. I’ve been sticking firmly to vbscript, but Exchange 2007 has forced me into this alleged shell of power.

Some of my frustration has, I think, been perfectly valid. Just try setting file system ACLs. Set-acl just does not do enough – for instance you can’t block inheritance. You can call cacls from powershell, but just try using the “<” to pipe in the “y” character and suppress the prompting – powershell politely informs you that it doesn’t support “<” redirection. Big raspberry.

Other things, however, it does do very well indeed. I love the handling of CSV files. This command

import-csv file.csv | foreach { ... }

is both elegant and really, really useful.

But on to the subject of this post. Similar to a vbscript I wrote for another customer, I was asked to get some info into AD based on an excel spreadsheet. Time to figure out how to do this in powershell.

First off, I ran into an irritating bug based on the culture settings. If you are using english Excel you have to set your culture to English(US) to be able to interact with Excel from powershell. Another big raspberry to Microsoft for that one!

Next I struggled with the usual lack of documentation on the COM object properties and methods, and the fact that they don’t seem to work the same in powershell as vbscript. For instance in vbscript I could retrieve the top-left cell value by doing this:

objExcel.Cells(1,1).Value

but in powershell I was told that method didn’t exist, and the only way I could get at the data was by instatiating a worksheet object and doing this:

$objSheet.Cells.Item(1,1).Text

Why the difference? I really don’t know.

The script below was more an exercise that anything else. What with the culture bug and some other considerations I think I’ll end up using a CSV approach to the actual task, but I think this is worth posting just to show how it’s done.

# import_excel.ps1
# Retrieves data from a mult-worksheet Excel file
# and writes to the console in csv format.
$strExcelFile = "MySpreadsheet.xls"
$objExcel = New-Object -comobject Excel.Application
$objWorkbook = $objExcel.Workbooks.Open($strExcelFile)
write-host "Numer of worksheets: " $objWorkbook.Sheets.Count
$row = 1
$col = 1
$sheet = 1
while ($sheet -le $objWorkbook.Sheets.Count)
{
  $objSheet = $objWorkbook.Sheets.Item($sheet)
  write-Host "Worksheet: " $objSheet.Name
  #Count number of columns in row 1 until first blank
  $numCols = 0
  while ($objSheet.Cells.Item(1,$col).Text -ne "")
  { $col += 1 ; $numCols += 1 }
  $col = 1
  #Iterate through rows, stopping at the first blank in col 1
  while ($objSheet.Cells.Item($row,1).Text -ne "")
  {
    $strLine = ""
    while ($col -le $numCols)
      {
         if ($col -eq 1) {$strLine = $objSheet.Cells.Item($row,$col).Text}
         else {$strLine = $strLine + "," + $objSheet.Cells.Item($row,$col).Text}
         $col += 1
      }
    write-host $strLine
    $row += 1
    $col = 1
  }

  $sheet += 1
  $row = 1
  $col = 1
  write-host
}