Rescuing Panoramio Memories: How to Restore Panoramio Photos

Jack Lee
JackOnTheRoad
Published in
11 min readDec 3, 2023

--

The recovery process is crawling data from the Web Archive (Wayback Machine) and converting the data to KML. The Python scripts are on GitHub.

First, let me show you the result of recovery.

The final result is a KML file containing almost all photo locations from a specific user.

photo_list_u2304697_v2_complete.kml
photo_list_u2304697_v2_complete.kml

The photos data I’m using as the demo is from a global bike-packing traveler Valentinas Kabasinskas. There are his home pages on Panoramio-WebArchive and Google Maps.

One of his 10,000 photos was taken at a lakeside in my hometown, Hebei, China. So, I found him by chance, and his adventure was amazing.

Next, I will show you the process of recovery.

I’ve uploaded the Python scripts to GitHub.

  1. Retrieve the Panoramio User ID.
  2. Get the URLs of all the pages for the photo list archived on Web Archive for the User.
  3. Crawl the photo list on Web Archive to get the photo detail page URL for every photo.
  4. Get the Geo-Information, Big picture URL, Taken Date, etc for every photo by crawling every detail page URL.
  5. Convert the CSV file with all photo data of a specific user to KML.
  6. Download all photos in the CSV file. (Optional)

The details of every step.

1. Retrieve the Panoramio User ID.

Panoramio was closed in Nov 2016. If a Panoramio user was linked with a Google account, all the photos of the user have been transferred to Google Maps. However, the geo-information of the transferred photos on Google Map are missing. But we can still see them in the correct location on Google Earth. So we can easily get the Panoramio User ID.

  • Copy Image Address
  • Open the Address in a browser and press Ctrl + S. If the photo is from Panoramio, we can see the default file name for saving is like this panoramio-18561646.jpg .The number is the Photo ID on Panoramio.
  • Hover the cursor over the Username, and we can see the home page link for the user.
  • The User ID is in the red box on the bottom screen.
  • Copy the link out and extract the User ID.

For now, We get the User ID when the user is linked with a Google account.

If your Panoramio account wasn’t linked with Google. All of your photos were removed permanently. So we cannot see them on Google Earth. But we can still find them on the Web Archive if there are some locations where you have uploaded photos curved in your mind clearly. And other users have uploaded photos in this location or nearby locations and their accounts are linked with Google. So their photos can be shown on Google Earth. We can use the nearby function on Panoramio.

Like this, I had seen a photo taken by a kayaker before Panoramio was closed. It’s about an excavator half sinking in water. I remember it very clearly. Recently, I had a kayaking trip on the same lake, and I wanted to see the picture again, but I couldn’t find it. So, I’ve googled it for a while and found out that. In a forum, an elderly man asked how to retrieve his 5,000+ archaeology photos on Panoramico from the web archive. He didn’t want to click more than 5000 times one by one. From this, I knew that Web Archive backed up all the photos on Panoramio. This elderly man maybe didn’t know coding. After this blog, I’ll give him a reply.

If your Panoramio account wasn’t linked to Google.

  • Randomly choose a photo in an approximate location in your mind on Google Earth. The red dots are recovered from the removed user. I only remember the approximate location where I saw the photo that an excavator sinking in water.
  • Use the procedure I mentioned above to open the detail page of that photo. Click the photos nearby in the red box. You may need to try many times to find your target user or photo.

2. Get the URLs of all the pages for the photo list archived on Web Archive for the User.

  • The URL for retrieving all the photo list page URLs of a specific user. Open it in a browser.
  • Replace the number “2304697” with the User ID you want.

https://web.archive.org/cdx/search/cdx?url=www.panoramio.com/user/2304697?with_photo_id=*&output=txt&limit=999999

  • Ctrl + S, save the page as a txt file. Check the number of lines in the file. Compare the number with the number of Photos on the user’s home page.

As we can see, the two numbers are not very different. The number of cached photos on Web Archive is 10439. The number of Photos shown on a user’s Photo list Page is 10345.

Next, we will remove the duplicates in the txt file.

Convert the txt file to a structured CSV file.

generate_photo_ids_csv.py

  • (Optional) Code editor VS Code.
  • Change the input and output file names.
  • Keep the output file name in that format.
  • Replace the number in the file name “photo_ids_u2304697.csv” with the User ID you want.
  • Press the run button on the up-right corner of VS Code.
  • Check the number of rows in the CSV file.

As we can see here, the number of rows in the CSV file without duplicates is 10288. The first row is the title of the columns. Excluding the first row, the number is 10287. The photo number on the webpage is 10345. The two numbers are very close. We don’t need to worry about the missing ones.

3. Crawl the photo list on Web Archive to get the photo detail page URL for every photo.

Request 432 times to get the detail page URL for every photo.

Every URL in this format “https://www.panoramio.com/user/2304697?with_photo_id=xxxxxx” is not for a photo detail page but a photo list page.

  • In a photo list page, we can get all the URLs of the photo detail page for all the photos on this page.
  • The geo-information is stored on the photo detail page. That is why we need to retrieve photos’ detail page URLs.

The Python script will not request more than 10000 times for every photo_id.

  • The script will keep a done list for the photo_ids that have already got the detail page URL. At first, the done list is empty.
  • The Python script reads the CSV file with all photo_ids row by row. And check if the photo_id is in the done list.
  • If the photo_id is in the done list, the Python script will not make a network request, it will just skip.
  • If the photo_id is not in the done list, the Python script will make a network request to get the detail page URL for this photo and other photos on this page. Then all the photo_ids will be put in the done list.
  • So, the Python script will only make network requests about the same times as the total page number.

panoramio_photo_list_processing_v2.py

  • Change the input CSV file name csv_file=photo_ids_u2304697.csv to your own.
  • Keep the input CSV file name in this format photo_ids_u2304697.csv. Just replace the number “2304697” with the User ID you want.
  • The input file is photo_ids_u2304697.csv.
  • The output file is photo_list_u2304697_v2.csv.
  • Web Archive has a request rate limit of 15 times per minute.
  • You may want to add a proxy server to make the process have no interruption.
  • You may need to change the request interval time if it keeps crashing or timeout.
  • I have two proxy servers. One is a free Cloudflare VPN, and the other is a free Oracle Cloud VPS. About every 40 times requests on the Cloudflare VPN, it timeouts once. About every 20 times requests on the Oracle VPS, it crashes once.
  • The output CSV file name is automatically generated. photo_list_u2304697_v2.csv.
  • It can be restarted from the interrupted point.
  • It will auto-retry if a timeout or crash happens.
  • Run it. It’s a time-consuming process. It took me about 20 minutes to make the 400 requests.
  • When it’s finished. Check the number of rows in the output CSV file.

The number of rows in photo_list_u2304697_v2.csv is 10346.

The number of rows in photo_ids_u2304697.csv is 10288.

The number of photos on the webpage is 10345.

As the first row of a CSV file is the headers, photo_list_u2304697_v2.csv, and the webpage have the same number of photos. The number in photo_ids_u2304697.csv is a bit less. Maybe some of the URLs with this format “www.panoramio.com/user/{user_id}?with_photo_id={photo_id}” weren’t cached by Web Archive. But URLs with this format “http://www.panoramio.com/photo/{photo_id}” were cached.

4. Get the Geo-Information, Big picture URL, Taken Date, etc for every photo by crawling every detail page URL.

panoramio_photo_detail_processing_v2.py

  • The input file is photo_list_u2304697_v2.csv
  • The output file is photo_list_u2304697_v2_complete.csv
  • Change csv_file to your own file name.
  • The output file name is generated automatically.
  • Proxy Server and Request Interval have the same policy as Step 3.
  • I have two IP addresses. It took me about 10 hours to get the data of 10,000 photos.
  • Check the number of rows in the complete CSV file.

5. Convert the CSV file with all photo data of a specific user to KML.

panoramio_csv2kml.py

  • The input file is photo_list_u2304697_v2_complete.csv
  • The output file is photo_list_u2304697_v2_complete.kml
  • It takes several seconds to run.
  • The printed info is the number of the photos that have Geo-Information.
  • The number of generated points in KML is 9931. The number of photos is 10345.
  • Double-click the KML file. It will be opened in Google Earth Desktop.
  • If you want to show the big image on a popup window, change the code by yourself.

6. Download all photos in the CSV file. (Optional)

image_downloader.py

  • The input file is photo_list_u2304697_v2_complete.csv
  • Change the input file and image saving folder.
  • The Proxy Server and Request Interval have the same logic as the above.
  • You can choose to save photos or thumbnails.
  • The big image is about 1024x600. The thumbnail is about half the size.
  • No high resolution Pictures cached in Web Archive.

Summary

All of the scripts and files are uploaded to GitHub.

The scripts.

  • generate_photo_ids_csv.py
  • panoramio_photo_list_processing_v2.py
  • panoramio_photo_detail_processing_v2.py
  • panoramio_csv2kml.py
  • image_downloader.py

The files.

  • cdx_2304697_with_photo_id.txt
  • photo_ids_u2304697.csv
  • photo_list_u2304697_v2.csv
  • photo_list_u2304697_v2_complete.csv
  • photo_list_u2304697_v2_complete.kml

There are other scripts without v2 suffixes. You can try them in the following scenario.

It has a different URL format for paging.

https://web.archive.org/web/20161105193832/http://www.panoramio.com/user/5347106?comment_page=1&photo_page=41

http://www.panoramio.com/user/5347106?comment_page=1&photo_page=41

There are many cases for many Panoramio users that their page URLs are only cached a few, far from complete. So, I will not recommend this way.

If this kind of URL “http://www.panoramio.com/user/2304697?with_photo_id=18538111” we used above is not cached very well for a specific user. You can try this way.

At first, you need to check the caching condition of a specific user.

  • For some users, it cached really well. All the pages’ URLs are cached for this user 263050.
  • web.archive.org/cdx/search/cdx?url=www.panoramio.com%2Fuser%2F263050%3Fcomment_page%3D1%26photo_page%3D*&output=txt&limit=999999
  • This URL “http://www.panoramio.com/user/263050?comment_page=1&photo_page=*” needs to be encoded. Use online URL encoding like https://www.urlencoder.org/
  • Encoded URL “http%3A%2F%2Fwww.panoramio.com%2Fuser%2F263050%3Fcomment_page%3D1%26photo_page%3D%2A”
  • Open this spliced Link. web.archive.org/cdx/search/cdx?url=http%3A%2F%2Fwww.panoramio.com%2Fuser%2F263050%3Fcomment_page%3D1%26photo_page%3D%2A&output=txt&limit=999999
  • Check if all the pages are cached.

You may need to use the scripts in the following sequence.

  • panoramio_photo_list_processing.py
  • panoramio_photo_detail_processing.py
  • panoramio_csv2kml.py

In the end, maybe someday in the future, I will make a script to convert the geo-points and photos of a user to a video or to a video editor project file.

There is no such product like Panoramio for now. During that time, somebody made a really great sharing, and their journey is amazing.

This guy is a kayaker, and most of his photos are of him kayaking. More than 5000 photos were uploaded into Panoramio.

This is not the end. There is another paragraph about managing geo-tagged photos on desktops like Panoramio.

Manage geotagged photos on a map.

I’ve made a modified version of Photoview. It’s named photoview_like_desktop_app. Originally, it had the function of managing photos on a map view. The feature I added is to select and open photos in File Explorer. So, it’s like a desktop app. I need this feature for editing traveling YouTube videos. I have taken plenty of photos during the journey, and finding a single one in File Explorer has been a nightmare. So I need such a desktop feature.

After the process of Panoramio Recovery, I realized that KML could be a nice data format to store geoinformation and file paths of all the photos. For now, photoview_like_desktop_app is still a self-hosted web application, and the photo-scanning process is tedious. Maybe someday I can make it a real desktop app, and the input file can be a KML file.

--

--

Jack Lee
JackOnTheRoad

A big fan of kayaking and geography. Traveling around China by kayaking now. YouTube JackOnTheRoad_en .