shooter.logo.3

Sunday,August,29,2004

Bulk importing weblog entries into Expression Engine

A commented version of my Perl script that I used to bulk import 3,000 weblog entries into my Expression Engine weblog/CMS tool.

I’m using a weblog to hold more than 3k photos that I’m offering for sale to participants in a rodeo that I shot. There was no way I was going to manually add thousands of entries. So like any good geek, I scripted it.  I’ve posted my script along with copious comments. Please keep in mind that this script worked for me. I’m sure there is a better way to do this. USE THIS SCRIPT AT YOUR OWN RISK. I’m posting it as an example and tutorial rather than a “run it as is” tool.

insert.pl

insert.pl


   1 #!/usr/local/bin/perl -w
   2 #
   3 #
   4 #	 MODFY AND USE THIS SCRIPT AT YOUR OWN RISK!!!
   5 #
   6 #
   7 #	Author: 	Steve Malik
   8 #			Shooter.net
   9 #
  10 #
  11 #	This script is used for bulk importing weblog entries into Expression Engine 
  12 #	Content Management/Blogging system. How to structure your EE templates to
  13 #	use data that is imported via the script is beyond the scope of the documentation
  14 #	for this script. If you aren't familiar enough with Perl and Mysql and EE to 
  15 #	FULLY understand this script, you are likely taking a very large risk with 
  16 #	your database, and should not proceed without qualified assistance.
  17 #
  18 #	I don't do any error checking or data validation in this script,
  19 #	which is very dangerous.
  20 #
  21 #	This entire script steps through a line at a time from my datafile. For each line it
  22 #	parses the line, caluculates dates, and adds the appropriate data to three tables in
  23 #	the mysql database.
  24 #	
  25 #	I'm sure there is a much better way to do this, but this worked for me...
  26 #
  27 # 	Here is a sample of the data file:
  28 #
  29 #	_A011940.JPG    Default 8/21/04 5:22:42 PM      Bay Area Chapter IGRA 2004 Rodeo        Assorted, Pole Bending
  30 #	_A011941.JPG    Default 8/21/04 5:22:48 PM      Bay Area Chapter IGRA 2004 Rodeo        Assorted, Pole Bending
  31 #	_A011942.JPG    Default 8/21/04 5:22:50 PM      Bay Area Chapter IGRA 2004 Rodeo        Assorted, Pole Bending
  32 #	_A012236.JPG    Default 8/22/04 11:26:49 AM     Bay Area Chapter IGRA 2004 Rodeo        Assorted, Mounted Breakaway Roping
  33 #	_A012551.JPG    Default 8/22/04 1:13:13 PM      Bay Area Chapter IGRA 2004 Rodeo        Assorted, Grand Entrance
  34 #	_A012552.JPG    Default 8/22/04 1:13:21 PM      Bay Area Chapter IGRA 2004 Rodeo        Assorted, Grand Entrance
  35 #	_A010706.JPG    Default 8/21/04 9:11:26 AM      Bay Area Chapter IGRA 2004 Rodeo        Assorted
  36 #	_A010707.JPG    Default 8/21/04 9:11:27 AM      Bay Area Chapter IGRA 2004 Rodeo        Assorted
  37 #	_A010708.JPG    Default 8/21/04 9:34:55 AM      Bay Area Chapter IGRA 2004 Rodeo        Assorted
  38 #	_A010709.JPG    Default 8/21/04 9:35:39 AM      Bay Area Chapter IGRA 2004 Rodeo        Assorted
  39 #	_A010710.JPG    Default 8/21/04 9:36:12 AM      Bay Area Chapter IGRA 2004 Rodeo        Assorted
  40 #	_A010711.JPG    Default 8/21/04 9:36:21 AM      Bay Area Chapter IGRA 2004 Rodeo        Assorted
  41 #	_A010715.JPG    Default 8/21/04 9:46:59 AM      Bay Area Chapter IGRA 2004 Rodeo        Calf Roping On Foot
  42 #
  43 # 	FWIW, I exported this text file from iView MediaPro, my asset management program, which is where
  44 #	I sorted and categorized the photos.
  45 #
  46 #	Note: this is a tab delimited file (even if it doesn't show up that way here), and the last field
  47 #	may contain one or more items separated by commas.
  48 #	
  49 #	I perform no data checking, as I know the data file contains all the required fields or placholders. 
  50 #	Problems with the data file will cause the script to crash, trash your database, and impregnate your
  51 #	dog with the spawn of space aliens. You've been warned.
  52 # 	
  53 #	This script requires a Perl module called sdb.pm
  54 #
  55 
  56 #
  57 #Here's what's in the perl module:
  58 #
  59 
  60 #package sdb;
  61 #
  62 #use strict;
  63 #use DBI;
  64 #
  65 #sub connect {
  66 #my $db_name = "dbname";
  67 #my $host_name = "dbhost";
  68 #my $user_name = "dbuser
  69 #my $password = "dbpasswrd";
  70 #my $dsn = "DBI:mysql:host=$host_name;database=$db_name";
  71 #
  72 #return (DBI->connect ($dsn, $user_name, $password, { PrintError => 1, RaiseError => 0}));
  73 #}
  74 #
  75 #1;      # return true
  76 #
  77 
  78 ## END OF sdb.pm
  79 
  80 
  81 #	Now, we're on to the insert.pl script!
  82 ####################################################################################################
  83 
  84 #
  85 # Call a few pragmas and modules
  86 #
  87 
  88 use strict;
  89 use sdb;
  90 use locale;
  91 use Time::localtime;
  92 use Time::Local;
  93 
  94 #
  95 # Define some variable we'll be using
  96 #
  97 
  98 use vars qw(
  99 @thenames $shit $filename $orientation $date_time @date_time $event @categories $cat
 100 $month $day $year $hour $minutes $seconds $ampm $date $time @fields $entry_id
 101 );
 102 
 103 # 
 104 # Open up our connection to the database (using sdb.pm module)
 105 #
 106 
 107 my $dbh = sdb::connect();
 108 
 109 #
 110 # open up our file for reading
 111 #
 112 
 113 open (INFILE,"<rodeo.2004.txt") or die "can't open text.table.from.iview.txt\n";
 114 
 115 #
 116 # Step through each line in the file
 117 #
 118 
 119 while (<INFILE>){
 120 chomp;
 121 
 122 # debugging print of each line from our input file
 123 #print "$_\n";
 124 
 125 
 126 #
 127 # split it out into field. The fields
 128 # are tab delimited. We know the order of the fields, so we assign them
 129 # names that make sense.
 130 #
 131 
 132 @fields=split("\t",$_);
 133 
 134 #
 135 # grab our first field, it's the filename
 136 #
 137 
 138 $filename="$fields[0]";
 139 
 140 #
 141 # the second field is the orientation of the image
 142 #
 143 
 144 $orientation="$fields[1]";
 145 
 146 
 147 #
 148 # grab the date information which is field 3 (remember, the first field is $fields[0]
 149 #
 150 
 151 $date_time="$fields[2]";
 152 
 153 # split out month day year HH MM SS
 154 ($date,$time,$ampm)=split(' ',$fields[2]);
 155 ($hour,$minutes,$seconds)=split(':',$time);
 156 
 157 # deal with the AM PM issues by converting to 24 hour time
 158 
 159 #
 160 # Deal with PM first
 161 #
 162 
 163 if ($ampm =~ "PM" && $hour < 12){
 164 $hour=($hour+'12');
 165 }
 166 
 167 #
 168 # Then deal with AM
 169 #
 170 
 171 if ($ampm =~ "AM" && $hour == 12){
 172 $hour=($hour-'12');
 173 }
 174 
 175 #
 176 # Next we need to convert our local time from our data file into epoch time
 177 #
 178 
 179 my ($month,$day,$year)=split ('/',$date);
 180 
 181 #
 182 # In timelocal module, months are an array, hence januray is month 0 
 183 # So we subtract one from the month value
 184 #
 185 
 186 $month=$month-1;
 187 
 188 #
 189 # We had a two digit year in our data file, let's make that a four digit year by add a "20" prefix
 190 #
 191 
 192 $year="20".$year;
 193 
 194 # 
 195 # Now we get the epoch time (google it if you don't know what it is)
 196 #
 197 
 198 my $TIME = timelocal($seconds, $minutes, $hour, $day, $month,$year);
 199 
 200 #
 201 # Account for the time zone difference by adding 8 hours. Epoch time is in seconds, hence the math.
 202 #
 203 
 204 $TIME = $TIME+(60*60*8);
 205 
 206 #
 207 # grabbing the event that is stored in field 4 in our data file.
 208 #
 209 
 210 $event="$fields[3]";
 211 
 212 #
 213 # The categories field can have multiple values separated by commas. So we
 214 # split that field out into an array
 215 #
 216 @categories=split(',', "$fields[4]");
 217 
 218 #
 219 # when testing and debugging, I print out all the fields that I've pulled outta the
 220 # data file to make sure there are not mistakes. Uncomment these for testing.
 221 #
 222 
 223 #print "Filename is: $filename\n";
 224 #print "Orientation is: $orientation\n";
 225 #print "Date is: $date\n";
 226 #print "Time is: $hour $minutes $seconds\n";
 227 #print "AMPM is: $ampm\n";
 228 #print "EPOCH TIME IS: $TIME\n";
 229 #print "Event is: $event\n";
 230 
 231 # Insert the entry into the exp_weblog_titles table in my database:
 232 
 233 #
 234 # uncoment this for debugging purposes
 235 #
 236 
 237 #print "insert into exp_weblog_titles (
 238 #weblog_id, author_id, pentry_id, ip_address, title, url_title, status, allow_comments, allow_trackbacks,
 239 #sticky, entry_date, year, month, day, expiration_date, edit_date, recent_comment_date, comment_total,
 240 #trackback_total, sent_trackbacks, recent_trackback_date)
 241 #VALUES 
 242 #( 5, 1, 0, \"64.171.182.90\", \"Image id: $filename\", \"$filename\", \"open\", \"n\", \"n\", \"n\",
 243 #\"$TIME\", \"$year\", \"$month\", \"$day\", \"0\", \"$year$month$day$hour$minutes$seconds\", 0, 0, 0, 0, 0)\n";
 244 
 245 # 
 246 # Here's the meat of the script.
 247 #
 248 # A couple of things to note. I'm hardcoding several values here: weblog_id, author_id, pentry_id, IP address etc.
 249 #
 250 # the entry_id field is auto_increment, so I don't have to provide that value. I just
 251 # let mysql provide it.
 252 
 253 
 254 my $count = $dbh->do ("insert into exp_weblog_titles (
 255 weblog_id, author_id, pentry_id, ip_address, title, url_title, status, allow_comments, allow_trackbacks,
 256 sticky, entry_date, year, month, day, expiration_date, edit_date, recent_comment_date, comment_total,
 257 trackback_total, sent_trackbacks, recent_trackback_date)
 258 
 259 VALUES 
 260 
 261 ( 5, 1, 0, \"64.171.182.90\", \"$filename\", \"$filename\", \"open\", \"n\", \"n\", \"n\",
 262 \"$TIME\", \"$year\", \"$month\", \"$day\", \"0\", \"$year$month$day$hour$minutes$seconds\", 0, 0, 0, 0, 0) ");
 263 if ($count)
 264 {
 265        $count +=0;
 266 
 267        #
 268        # this is kinda cool. mysql_isertid will provide me with the entry_id of the last
 269        # entry that I added (meaning this one) to the database. I need the entry_id for
 270        # populating fields and linking categories to weblog entries.
 271        #
 272 
 273        $entry_id=$dbh->{mysql_insertid};
 274        print "$count rows were updated with an entry of of $entry_id\n";
 275 }
 276 
 277 #
 278 # now let's create the field entries. This is where we populate the fields for 
 279 # the weblog entry. Note that we're not using field names, but rather identifying 
 280 # the fields by the field_id. Also note that we're setting the formatting for 
 281 # field id 2 to be "none"
 282 #
 283 # Note that I'm hard coding in the weblog_id, as all these are weblog_id = 5
 284 #
 285 # Note that I'm using the $entry_id that I grabbed above.
 286 #
 287 
 288 my $count = $dbh->do (" insert into exp_weblog_data (
 289 entry_id, weblog_id,field_id_1,field_id_2,field_ft_2) 
 290 VALUES (
 291 $entry_id,5,\"$filename\",\"$filename\", \"none\"wink ");
 292 
 293 # Now I need to create the categories that I'm using. Some of these categories may already exist, and
 294 # some will have been created by previous entries in our data file. I don't want to create duplicate
 295 # field entries as this will break things in EE. So... I modified the exp_categories table to add
 296 # a unique index on three columns. Then I use an insert ignore to add the new category. If it already
 297 # exists for the category group that I'm using, mysql does not insert it, but rather ignores the command.
 298 # Kinda nifty huh?
 299 #
 300 
 301 #
 302 # Step through each category
 303 #
 304 
 305 foreach $cat (@categories){
 306 
 307 #
 308 # Get rid of leading and trailing spaces
 309 #
 310 	$cat =~ s/^\s+//;
 311 	$cat =~ s/\s+$//;
 312 
 313 #
 314 # Another debugging placeholder that prints the entire line including category
 315 #
 316 
 317 #print "$filename $date_time $orientation $event $cat\n";
 318 
 319 # print each category
 320 
 321 #
 322 # attempt to insert into exp_categories field, in case it isn't already there.
 323 # Note that I've hardcoded in the group_id and parent_id.
 324 #
 325 
 326 my $count = $dbh->do ("insert ignore into exp_categories (group_id, parent_id, cat_name) VALUES (3,0,\"$cat\"wink");
 327 if ($count)
 328 {
 329 $count +=0;
 330 
 331 #
 332 # Another debuggin placeholder
 333 #
 334 
 335 #print "$count categories were inserted with an entry of of $seq\n";
 336 
 337 }
 338 
 339 #
 340 # Whether we just added it or it was already in the exp_categories table, our category should
 341 # now be in the database. Let's find out what the category_id is
 342 #
 343 
 344 my $cat_id = $dbh->selectrow_array ("select cat_id from exp_categories where cat_name like \"$cat\" and group_id = 3");
 345 if ($cat_id) {
 346 
 347 #
 348 # Another debugging placeholder
 349 #
 350 
 351 #print "cat_id is $cat_id\n";
 352 
 353 #
 354 # Now that we've got the category_id and we know the entry_id from above, let's link the new entry to the
 355 # category.
 356 #
 357 
 358 my $count = $dbh->do ("insert ignore into exp_category_posts (entry_id, cat_id) VALUES ($entry_id,$cat_id)");
 359 }else{
 360 print "problem with inserting category: $cat $cat_id for entry_id $entry_id\n";
 361 }
 362 
 363 #print "$cat\n";
 364 
 365 }
 366 #print "\n";
 367 }
 368 close INFILE;
 369 
 370 #
 371 # Now we're done
 372 #
 373 
 374 #
 375 # These were a couple of little tests to try to resolve a problem that I had with the epoch
 376 # Timing of entries. I'm just leaving them in here for kicks.
 377 #
 378 
 379 #my ($seconds, $minutes, $hours, $day_of_month, $month, $year, $wday, $yday, $isdst) = localtime('1092548948');
 380 #print "$seconds, $minutes, $hours, $day_of_month, $month, $year, $wday, $yday, $isdst\n";
 381 
 382 #my $tm = localtime('1093339068');
 383 #printf("Dateline: %02d:%02d:%02d-%04d/%02d/%02d\n",
 384 #$tm->hour, $tm->min, $tm->sec, $tm->year+1900,
 385 #$tm->mon+1, $tm->mday);
 386 
 387 #my $tm = localtime('1093136085');
 388 #printf("Dateline: %02d:%02d:%02d-%04d/%02d/%02d\n",
 389 #$tm->hour, $tm->min, $tm->sec, $tm->year+1900,
 390 #$tm->mon+1, $tm->mday);

Feedback (0)
Posted by stevem on 08/29 at 11:12,PM • Category: Geek Speak
Permalink

Share this entry with a friend

Page 1 of 1 pages

Next entry: Conservatory of Flowers in San Francisco

Previous entry: Sue Goes Bull Riding

<< Previous Page

Shooter.net Home