It does't seems to be working if the hashtag is not in english and hashtag without any post yet #2

jamezun · 2018-11-25T19:13:35Z

Appreciate your hardwork! I am trying to make it work but I only have limited programing skills.. I never use python before so its hard for me to figure out what the problem is.

Problems:
1] I got attribute error whenever my hashtag is non-english (e.g chinese, korean)
2] I got index error when the hashtag does not exisit or number of post is 0 (sometime you might want to test some hashtag which may turn out nobody else has used before)

Is there any way to fix it? I edited the code a bit to make it run

# coding=utf8

from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import datetime


driver = webdriver.Chrome()

# Extract description of a post from Instagram link
#driver.get('https://www.instagram.com/p/BiRnjDsFKzl/')
#soup = BeautifulSoup(driver.page_source,"lxml")
#desc = " "

#for item in soup.findAll('a'):
#    desc= desc + " " + str(item.string)

# Extract tag list from Instagram post description
#taglist = desc.split()
#taglist = [x for x in taglist if x.startswith('#')]
#index = 0
#while index < len(taglist):
#    taglist[index] = taglist[index].strip('#')
#    index += 1

# (OR) Copy-paste your tag list manually here
taglist = ['korea', '데일리룩'']
print(taglist)


# Define dataframe to store hashtag information
tag_df  = pd.DataFrame(columns = ['Hashtag', 'Number of Posts', 'Posting Freq (mins)'])

# Loop over each hashtag to extract information
for tag in taglist:
    
    driver.get('https://www.instagram.com/explore/tags/'+str(tag))
    soup = BeautifulSoup(driver.page_source,"lxml")
 
    # Extract current hashtag name
    tagname = tag
    # Extract total number of posts in this hashtag
    # NOTE: Class name may change in the website code
    # Get the latest class name by inspecting web code

    try:
        nposts = soup.find('span', {'class': 'g47SY'}).text
        
    
        if nposts !='0':
         
            try: # Extract all post links from 'explore tags' page
                # Needed to extract post frequency of recent posts
                myli = []
                for a in soup.find_all('a', href=True):
                    myli.append(a['href'])

                # Keep link of only 1st and 9th most recent post 
                newmyli = [x for x in myli if x.startswith('/p/')]
                del newmyli[:9]
                del newmyli[9:]
                del newmyli[1:8]

                timediff = []

                # Extract the posting time of 1st and 9th most recent post for a tag
                for j in range(len(newmyli)):
                    driver.get('https://www.instagram.com'+str(newmyli[j]))
                    soup = BeautifulSoup(driver.page_source,"lxml")

                    for i in soup.findAll('time'):
                        if i.has_attr('datetime'):
                            timediff.append(i['datetime'])
                            #print(i['datetime'])

                # Calculate time difference between posts
                # For obtaining posting frequency
                datetimeFormat = '%Y-%m-%dT%H:%M:%S.%fZ'
                print(timediff)
                diff = datetime.datetime.strptime(timediff[0], datetimeFormat)\
                    - datetime.datetime.strptime(timediff[1], datetimeFormat)
                pfreq= int(diff.total_seconds()/(9*60))
                
                # Add hashtag info to dataframe
                tag_df.loc[len(tag_df)] = [tagname, nposts, pfreq]
            
                
             
            except IndexError:
                pfreq = 'ERROR'
                print("INDEX ERROR")
        
        else:
            pfreq = 0  
            nposts = 0   
            tag_df.loc[len(tag_df)] = [tagname, nposts, pfreq]
    
    except AttributeError: 
        
        nposts = 0
        pfreq =0    
        print("ATTRIBUTE ERROR")        
        tag_df.loc[len(tag_df)] = [tagname, nposts, pfreq]
       
        
    
driver.quit()

# Check the final dataframe
print(tag_df)

# CSV output for hashtag analysis
tag_df.to_csv('hashtag_list.csv', encoding='utf-8')

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It does't seems to be working if the hashtag is not in english and hashtag without any post yet #2

It does't seems to be working if the hashtag is not in english and hashtag without any post yet #2

jamezun commented Nov 25, 2018

It does't seems to be working if the hashtag is not in english and hashtag without any post yet #2

It does't seems to be working if the hashtag is not in english and hashtag without any post yet #2

Comments

jamezun commented Nov 25, 2018