Sunday,August,29,2004
Bulk importing weblog entries into Expression Engine
A commented version of my Perl script that I used to bulk import 3,000 weblog entries into my Expression Engine weblog/CMS tool.
I’m using a weblog to hold more than 3k photos that I’m offering for sale to participants in a rodeo that I shot. There was no way I was going to manually add thousands of entries. So like any good geek, I scripted it. I’ve posted my script along with copious comments. Please keep in mind that this script worked for me. I’m sure there is a better way to do this. USE THIS SCRIPT AT YOUR OWN RISK. I’m posting it as an example and tutorial rather than a “run it as is” tool.
insert.pl
1 #!/usr/local/bin/perl -w 2 # 3 # 4 # MODFY AND USE THIS SCRIPT AT YOUR OWN RISK!!! 5 # 6 # 7 # Author: Steve Malik 8 # Shooter.net 9 # 10 # 11 # This script is used for bulk importing weblog entries into Expression Engine 12 # Content Management/Blogging system. How to structure your EE templates to 13 # use data that is imported via the script is beyond the scope of the documentation 14 # for this script. If you aren't familiar enough with Perl and Mysql and EE to 15 # FULLY understand this script, you are likely taking a very large risk with 16 # your database, and should not proceed without qualified assistance. 17 # 18 # I don't do any error checking or data validation in this script, 19 # which is very dangerous. 20 # 21 # This entire script steps through a line at a time from my datafile. For each line it 22 # parses the line, caluculates dates, and adds the appropriate data to three tables in 23 # the mysql database. 24 # 25 # I'm sure there is a much better way to do this, but this worked for me... 26 # 27 # Here is a sample of the data file: 28 # 29 # _A011940.JPG Default 8/21/04 5:22:42 PM Bay Area Chapter IGRA 2004 Rodeo Assorted, Pole Bending 30 # _A011941.JPG Default 8/21/04 5:22:48 PM Bay Area Chapter IGRA 2004 Rodeo Assorted, Pole Bending 31 # _A011942.JPG Default 8/21/04 5:22:50 PM Bay Area Chapter IGRA 2004 Rodeo Assorted, Pole Bending 32 # _A012236.JPG Default 8/22/04 11:26:49 AM Bay Area Chapter IGRA 2004 Rodeo Assorted, Mounted Breakaway Roping 33 # _A012551.JPG Default 8/22/04 1:13:13 PM Bay Area Chapter IGRA 2004 Rodeo Assorted, Grand Entrance 34 # _A012552.JPG Default 8/22/04 1:13:21 PM Bay Area Chapter IGRA 2004 Rodeo Assorted, Grand Entrance 35 # _A010706.JPG Default 8/21/04 9:11:26 AM Bay Area Chapter IGRA 2004 Rodeo Assorted 36 # _A010707.JPG Default 8/21/04 9:11:27 AM Bay Area Chapter IGRA 2004 Rodeo Assorted 37 # _A010708.JPG Default 8/21/04 9:34:55 AM Bay Area Chapter IGRA 2004 Rodeo Assorted 38 # _A010709.JPG Default 8/21/04 9:35:39 AM Bay Area Chapter IGRA 2004 Rodeo Assorted 39 # _A010710.JPG Default 8/21/04 9:36:12 AM Bay Area Chapter IGRA 2004 Rodeo Assorted 40 # _A010711.JPG Default 8/21/04 9:36:21 AM Bay Area Chapter IGRA 2004 Rodeo Assorted 41 # _A010715.JPG Default 8/21/04 9:46:59 AM Bay Area Chapter IGRA 2004 Rodeo Calf Roping On Foot 42 # 43 # FWIW, I exported this text file from iView MediaPro, my asset management program, which is where 44 # I sorted and categorized the photos. 45 # 46 # Note: this is a tab delimited file (even if it doesn't show up that way here), and the last field 47 # may contain one or more items separated by commas. 48 # 49 # I perform no data checking, as I know the data file contains all the required fields or placholders. 50 # Problems with the data file will cause the script to crash, trash your database, and impregnate your 51 # dog with the spawn of space aliens. You've been warned. 52 # 53 # This script requires a Perl module called sdb.pm 54 # 55 56 # 57 #Here's what's in the perl module: 58 # 59 60 #package sdb; 61 # 62 #use strict; 63 #use DBI; 64 # 65 #sub connect { 66 #my $db_name = "dbname"; 67 #my $host_name = "dbhost"; 68 #my $user_name = "dbuser 69 #my $password = "dbpasswrd"; 70 #my $dsn = "DBI:mysql:host=$host_name;database=$db_name"; 71 # 72 #return (DBI->connect ($dsn, $user_name, $password, { PrintError => 1, RaiseError => 0})); 73 #} 74 # 75 #1; # return true 76 # 77 78 ## END OF sdb.pm 79 80 81 # Now, we're on to the insert.pl script! 82 #################################################################################################### 83 84 # 85 # Call a few pragmas and modules 86 # 87 88 use strict; 89 use sdb; 90 use locale; 91 use Time::localtime; 92 use Time::Local; 93 94 # 95 # Define some variable we'll be using 96 # 97 98 use vars qw( 99 @thenames $shit $filename $orientation $date_time @date_time $event @categories $cat 100 $month $day $year $hour $minutes $seconds $ampm $date $time @fields $entry_id 101 ); 102 103 # 104 # Open up our connection to the database (using sdb.pm module) 105 # 106 107 my $dbh = sdb::connect(); 108 109 # 110 # open up our file for reading 111 # 112 113 open (INFILE,"<rodeo.2004.txt") or die "can't open text.table.from.iview.txt\n"; 114 115 # 116 # Step through each line in the file 117 # 118 119 while (<INFILE>){ 120 chomp; 121 122 # debugging print of each line from our input file 123 #print "$_\n"; 124 125 126 # 127 # split it out into field. The fields 128 # are tab delimited. We know the order of the fields, so we assign them 129 # names that make sense. 130 # 131 132 @fields=split("\t",$_); 133 134 # 135 # grab our first field, it's the filename 136 # 137 138 $filename="$fields[0]"; 139 140 # 141 # the second field is the orientation of the image 142 # 143 144 $orientation="$fields[1]"; 145 146 147 # 148 # grab the date information which is field 3 (remember, the first field is $fields[0] 149 # 150 151 $date_time="$fields[2]"; 152 153 # split out month day year HH MM SS 154 ($date,$time,$ampm)=split(' ',$fields[2]); 155 ($hour,$minutes,$seconds)=split(':',$time); 156 157 # deal with the AM PM issues by converting to 24 hour time 158 159 # 160 # Deal with PM first 161 # 162 163 if ($ampm =~ "PM" && $hour < 12){ 164 $hour=($hour+'12'); 165 } 166 167 # 168 # Then deal with AM 169 # 170 171 if ($ampm =~ "AM" && $hour == 12){ 172 $hour=($hour-'12'); 173 } 174 175 # 176 # Next we need to convert our local time from our data file into epoch time 177 # 178 179 my ($month,$day,$year)=split ('/',$date); 180 181 # 182 # In timelocal module, months are an array, hence januray is month 0 183 # So we subtract one from the month value 184 # 185 186 $month=$month-1; 187 188 # 189 # We had a two digit year in our data file, let's make that a four digit year by add a "20" prefix 190 # 191 192 $year="20".$year; 193 194 # 195 # Now we get the epoch time (google it if you don't know what it is) 196 # 197 198 my $TIME = timelocal($seconds, $minutes, $hour, $day, $month,$year); 199 200 # 201 # Account for the time zone difference by adding 8 hours. Epoch time is in seconds, hence the math. 202 # 203 204 $TIME = $TIME+(60*60*8); 205 206 # 207 # grabbing the event that is stored in field 4 in our data file. 208 # 209 210 $event="$fields[3]"; 211 212 # 213 # The categories field can have multiple values separated by commas. So we 214 # split that field out into an array 215 # 216 @categories=split(',', "$fields[4]"); 217 218 # 219 # when testing and debugging, I print out all the fields that I've pulled outta the 220 # data file to make sure there are not mistakes. Uncomment these for testing. 221 # 222 223 #print "Filename is: $filename\n"; 224 #print "Orientation is: $orientation\n"; 225 #print "Date is: $date\n"; 226 #print "Time is: $hour $minutes $seconds\n"; 227 #print "AMPM is: $ampm\n"; 228 #print "EPOCH TIME IS: $TIME\n"; 229 #print "Event is: $event\n"; 230 231 # Insert the entry into the exp_weblog_titles table in my database: 232 233 # 234 # uncoment this for debugging purposes 235 # 236 237 #print "insert into exp_weblog_titles ( 238 #weblog_id, author_id, pentry_id, ip_address, title, url_title, status, allow_comments, allow_trackbacks, 239 #sticky, entry_date, year, month, day, expiration_date, edit_date, recent_comment_date, comment_total, 240 #trackback_total, sent_trackbacks, recent_trackback_date) 241 #VALUES 242 #( 5, 1, 0, \"64.171.182.90\", \"Image id: $filename\", \"$filename\", \"open\", \"n\", \"n\", \"n\", 243 #\"$TIME\", \"$year\", \"$month\", \"$day\", \"0\", \"$year$month$day$hour$minutes$seconds\", 0, 0, 0, 0, 0)\n"; 244 245 # 246 # Here's the meat of the script. 247 # 248 # A couple of things to note. I'm hardcoding several values here: weblog_id, author_id, pentry_id, IP address etc. 249 # 250 # the entry_id field is auto_increment, so I don't have to provide that value. I just 251 # let mysql provide it. 252 253 254 my $count = $dbh->do ("insert into exp_weblog_titles ( 255 weblog_id, author_id, pentry_id, ip_address, title, url_title, status, allow_comments, allow_trackbacks, 256 sticky, entry_date, year, month, day, expiration_date, edit_date, recent_comment_date, comment_total, 257 trackback_total, sent_trackbacks, recent_trackback_date) 258 259 VALUES 260 261 ( 5, 1, 0, \"64.171.182.90\", \"$filename\", \"$filename\", \"open\", \"n\", \"n\", \"n\", 262 \"$TIME\", \"$year\", \"$month\", \"$day\", \"0\", \"$year$month$day$hour$minutes$seconds\", 0, 0, 0, 0, 0) "); 263 if ($count) 264 { 265 $count +=0; 266 267 # 268 # this is kinda cool. mysql_isertid will provide me with the entry_id of the last 269 # entry that I added (meaning this one) to the database. I need the entry_id for 270 # populating fields and linking categories to weblog entries. 271 # 272 273 $entry_id=$dbh->{mysql_insertid}; 274 print "$count rows were updated with an entry of of $entry_id\n"; 275 } 276 277 # 278 # now let's create the field entries. This is where we populate the fields for 279 # the weblog entry. Note that we're not using field names, but rather identifying 280 # the fields by the field_id. Also note that we're setting the formatting for 281 # field id 2 to be "none" 282 # 283 # Note that I'm hard coding in the weblog_id, as all these are weblog_id = 5 284 # 285 # Note that I'm using the $entry_id that I grabbed above. 286 # 287 288 my $count = $dbh->do (" insert into exp_weblog_data ( 289 entry_id, weblog_id,field_id_1,field_id_2,field_ft_2) 290 VALUES ( 291 $entry_id,5,\"$filename\",\"$filename\", \"none\""); 292 293 # Now I need to create the categories that I'm using. Some of these categories may already exist, and 294 # some will have been created by previous entries in our data file. I don't want to create duplicate 295 # field entries as this will break things in EE. So... I modified the exp_categories table to add 296 # a unique index on three columns. Then I use an insert ignore to add the new category. If it already 297 # exists for the category group that I'm using, mysql does not insert it, but rather ignores the command. 298 # Kinda nifty huh? 299 # 300 301 # 302 # Step through each category 303 # 304 305 foreach $cat (@categories){ 306 307 # 308 # Get rid of leading and trailing spaces 309 # 310 $cat =~ s/^\s+//; 311 $cat =~ s/\s+$//; 312 313 # 314 # Another debugging placeholder that prints the entire line including category 315 # 316 317 #print "$filename $date_time $orientation $event $cat\n"; 318 319 # print each category 320 321 # 322 # attempt to insert into exp_categories field, in case it isn't already there. 323 # Note that I've hardcoded in the group_id and parent_id. 324 # 325 326 my $count = $dbh->do ("insert ignore into exp_categories (group_id, parent_id, cat_name) VALUES (3,0,\"$cat\"
"); 327 if ($count) 328 { 329 $count +=0; 330 331 # 332 # Another debuggin placeholder 333 # 334 335 #print "$count categories were inserted with an entry of of $seq\n"; 336 337 } 338 339 # 340 # Whether we just added it or it was already in the exp_categories table, our category should 341 # now be in the database. Let's find out what the category_id is 342 # 343 344 my $cat_id = $dbh->selectrow_array ("select cat_id from exp_categories where cat_name like \"$cat\" and group_id = 3"); 345 if ($cat_id) { 346 347 # 348 # Another debugging placeholder 349 # 350 351 #print "cat_id is $cat_id\n"; 352 353 # 354 # Now that we've got the category_id and we know the entry_id from above, let's link the new entry to the 355 # category. 356 # 357 358 my $count = $dbh->do ("insert ignore into exp_category_posts (entry_id, cat_id) VALUES ($entry_id,$cat_id)"); 359 }else{ 360 print "problem with inserting category: $cat $cat_id for entry_id $entry_id\n"; 361 } 362 363 #print "$cat\n"; 364 365 } 366 #print "\n"; 367 } 368 close INFILE; 369 370 # 371 # Now we're done 372 # 373 374 # 375 # These were a couple of little tests to try to resolve a problem that I had with the epoch 376 # Timing of entries. I'm just leaving them in here for kicks. 377 # 378 379 #my ($seconds, $minutes, $hours, $day_of_month, $month, $year, $wday, $yday, $isdst) = localtime('1092548948'); 380 #print "$seconds, $minutes, $hours, $day_of_month, $month, $year, $wday, $yday, $isdst\n"; 381 382 #my $tm = localtime('1093339068'); 383 #printf("Dateline: %02d:%02d:%02d-%04d/%02d/%02d\n", 384 #$tm->hour, $tm->min, $tm->sec, $tm->year+1900, 385 #$tm->mon+1, $tm->mday); 386 387 #my $tm = localtime('1093136085'); 388 #printf("Dateline: %02d:%02d:%02d-%04d/%02d/%02d\n", 389 #$tm->hour, $tm->min, $tm->sec, $tm->year+1900, 390 #$tm->mon+1, $tm->mday);
Feedback (0)
Posted by stevem on 08/29 at 11:12,PM
•
Category: Geek Speak
Permalink
Next entry: Conservatory of Flowers in San Francisco
Previous entry: Sue Goes Bull Riding