Python Forum
News Gathering - String Manipulation Help
Thread Rating:
  • 2 Vote(s) - 3.5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
News Gathering - String Manipulation Help
#1
Hello, I am working on a program that will go to specified news sites, and read through the HTML, picking out all of the "href" links as it goes. 

I am now trying to write each link on a separate line of a file, so I may access them with another program later. 

Any thoughts on how I would go about this? 



--CODE--


import urllib.request
import re

linkOne = urllib.request.urlopen("https://www.nytimes.com/?WT.z_jog=1&hF=t&vS=undefined")
mybytes = linkOne.read()
mystr = mybytes.decode("utf8")
links = str(re.findall('"((http|ftp)s?://.*?)"', mystr))
linkOne.close()

f = open("link.txt","w")
f.write(links)
f.close
**P.S. As I am new to the forum I cannot put the new york times link into the code on here. If running, please insert it as a string.
Reply
#2
1) i changed your user group. So you should be able to post links, as well as edit your posts.
2) Please use code tags when inputting code into the forums
3) Although you can use stdlibs for this, there are better tools for this than urllib and re
BeauitfulSoup parses the html and replaces regex
request module handles opening the url to get the html content (Whether logging in, using a proxy, ,etc.) replaces urllib

Both these 3rd party libs were made for this specific task and is much quicker and easier to obtain this info....especially if you need to tack on extra tasks later on. All you need to do is pip install them.

4)what are you trying to get? All links visible or links from a specific element? the nytimes site is filled with links and im not sure which ones you want to get.
Recommended Tutorials:
Reply
#3
Great thanks!
Reply
#4
To give you a example what metulburr talks about.
If look in tutorial section of forum,i have a two part tutorial.
import requests
from bs4 import BeautifulSoup
from pprint import pprint

url = "https://www.nytimes.com/?WT.z_jog=1&hF=t&vS=undefined"
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')
print(soup.find('title').text)
# Get linkes
pprint([link.get('href') for link in soup.find_all('a') if 'http' in link.get('href')][:4])
Output:
The New York Times - Breaking News, World News & Multimedia ['http://www.nytimes.com/content/help/site/ie9-support.html',  'http://cn.nytimes.com',  'http://www.nytimes.com/es/',  'http://www.nytimes.com/']
Reply
#5
(Mar-03-2017, 06:21 PM)snippsat Wrote: To give you a example what metulburr talks about. If look in tutorial section of forum,i have a two part tutorial.
import requests from bs4 import BeautifulSoup from pprint import pprint url = "https://www.nytimes.com/?WT.z_jog=1&hF=t&vS=undefined" url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'html.parser') print(soup.find('title').text) # Get linkes pprint([link.get('href') for link in soup.find_all('a') if 'http' in link.get('href')][:4])
Output:
The New York Times - Breaking News, World News & Multimedia ['http://www.nytimes.com/content/help/site/ie9-support.html', 'http://cn.nytimes.com', 'http://www.nytimes.com/es/', 'http://www.nytimes.com/']

Thank you!
Reply
#6
One further question...

Is there a way to save the links in this format to a text file?

Here is the code I have so far.

import requests
from bs4 import BeautifulSoup
from pprint import pprint

nytLinksFile = open("nytLinks.txt","w")

def nycGet():
   url = "https://www.nytimes.com/?WT.z_jog=1&hF=t&vS=undefined"
   url_get = requests.get(url)
   soup = BeautifulSoup(url_get.content, 'html.parser')
   pprint([link.get('href') for link in soup.find_all('a') if 'http' in link.get('href')])
   
nycGet()

nytLinksFile.close
Output:
['http://www.nytimes.com/content/help/site/ie9-support.html',  'http://cn.nytimes.com',  'http://www.nytimes.com/es/',  'http://www.nytimes.com/',  'http://www.nytimes.com/pages/todayspaper/index.html',  'http://www.nytimes.com/video',  'https://www.nytimes.com/pages/world/index.html',  'https://www.nytimes.com/pages/national/index.html',  'https://www.nytimes.com/pages/politics/index.html',  'https://www.nytimes.com/pages/nyregion/index.html',  'https://www.nytimes.com/pages/business/index.html',  'https://www.nytimes.com/pages/business/international/index.html',  'https://www.nytimes.com/pages/opinion/index.html',  'https://www.nytimes.com/pages/opinion/international/index.html',  'https://www.nytimes.com/pages/technology/index.html',  'https://www.nytimes.com/section/science',  'https://www.nytimes.com/pages/health/index.html',  'https://www.nytimes.com/pages/sports/index.html',  'https://www.nytimes.com/pages/sports/international/index.html',  'https://www.nytimes.com/pages/arts/index.html',  'https://www.nytimes.com/pages/arts/international/index.html',  'https://www.nytimes.com/pages/fashion/index.html',  'https://www.nytimes.com/pages/style/international/index.html',  'https://www.nytimes.com/pages/dining/index.html',  'https://www.nytimes.com/pages/dining/international/index.html',  'https://www.nytimes.com/section/travel',  'https://www.nytimes.com/pages/magazine/index.html',  'https://www.nytimes.com/section/t-magazine',  'https://www.nytimes.com/section/realestate',  'http://www.nytimes.com/pages/politics/index.html?src=hpHeader',  'https://www.nytimes.com/2017/03/05/us/politics/trump-seeks-inquiry-into-allegations-that-obama-tapped-his-phones.html',  'https://www.nytimes.com/2017/03/04/us/politics/trump-obama-tap-phones.html',  'https://www.nytimes.com/2017/03/05/us/politics/trump-deregulation-guns-wall-st-climate.html',  'https://www.nytimes.com/2017/03/04/world/asia/north-korea-missile-program-sabotage.html',  'https://www.nytimes.com/2017/03/04/world/asia/north-korea-missile-program-sabotage.html?hp&target=comments#commentsContainer',  'https://www.nytimes.com/2017/03/04/world/asia/north-korea-missile-program-sabotage-korean.html',  'http://cn.nytimes.com/usa/20170304/north-korea-missile-program-sabotage/',  'https://www.nytimes.com/2017/03/04/world/asia/left-of-launch-missile-defense.html',  'https://www.nytimes.com/2017/03/05/business/dealbook/trump-stocks.html',  'https://www.nytimes.com/2017/03/05/us/politics/koch-brothers-affordable-care-act.html',  'https://www.nytimes.com/2017/03/04/us/politics/us-troops-syria.html',  'https://www.nytimes.com/2017/03/04/world/asia/china-xi-jinping-economic-reform.html',  'https://www.nytimes.com/2017/03/04/world/asia/china-xi-jinping-economic-reform.html',  'https://www.nytimes.com/2017/03/04/business/china-economy-national-peoples-congress.html',  'https://www.nytimes.com/tips',  'https://www.nytimes.com/tips',  'https://www.nytimes.com/2017/03/05/sports/baseball/new-york-yankees-clint-fraziers-hair.html',  'https://www.nytimes.com/2017/03/05/sports/baseball/new-york-yankees-clint-fraziers-hair.html',  'https://www.nytimes.com/2017/03/05/briefing/donald-trump-mosul-china.html',  'https://www.nytimes.com/2017/03/05/briefing/donald-trump-mosul-china.html',  'https://www.nytimes.com/2017/03/05/briefing/north-korea-russia-obamacare-weekend-briefing.html',  'https://www.nytimes.com/2017/03/05/briefing/north-korea-russia-obamacare-weekend-briefing.html',  'https://www.nytimes.com/NativeSponsor',  'http://www.nytimes.com/spotlight/times-tips?contentCollection=smarter-living',  'https://www.nytimes.com/2017/02/28/well/eat/vitamins-gone-gummy.html',  'https://www.nytimes.com/2017/02/28/well/eat/vitamins-gone-gummy.html',  'https://www.nytimes.com/2017/02/28/well/eat/vitamins-gone-gummy.html?hp&target=comments#commentsContainer',  'https://www.nytimes.com/2017/03/03/realestate/design-risks-worth-taking.html',  'https://www.nytimes.com/2017/03/03/realestate/design-risks-worth-taking.html',  'https://www.nytimes.com/2017/03/04/world/africa/zimbabwe-economy-work-force.html',  'https://www.nytimes.com/2017/03/04/world/africa/zimbabwe-economy-work-force.html',  'https://www.nytimes.com/2017/03/04/us/migrants-facing-old-deportation-orders.html',  'https://www.nytimes.com/2017/03/04/us/migrants-facing-old-deportation-orders.html',  'https://www.nytimes.com/2017/03/04/us/los-angeles-deportation-immigration.html',  'https://www.nytimes.com/2017/03/04/us/afghan-family-detained-los-angeles-visas.html',  'https://www.nytimes.com/2017/03/02/world/middleeast/asli-erdogan-prison-turkey.html',  'https://www.nytimes.com/2017/03/02/world/middleeast/asli-erdogan-prison-turkey.html',  'https://www.nytimes.com/2017/03/03/arts/television/buffy-the-vampire-slayer-20-year-anniversary.html',  'https://www.nytimes.com/2017/03/03/arts/television/buffy-the-vampire-slayer-20-year-anniversary.html',  'https://www.nytimes.com/interactive/2017/arts/television/buffy-fan-fiction.html',  'https://www.nytimes.com/2017/03/04/nyregion/helen-marshall-dead.html',  'https://www.nytimes.com/2017/03/04/nyregion/helen-marshall-dead.html',  'http://www.nytimes.com/video/the-daily-360',  'https://www.nytimes.com/video/world/asia/100000004868768/a-chilly-walk-amid-chinas-ice-art.html',  'https://www.nytimes.com/2017/03/05/movies/logan-pulls-in-85-3-million-as-foxs-bet-pays-off.html',  'https://www.nytimes.com/2017/03/05/world/asia/sikh-shooting-washington-state.html',  'https://www.nytimes.com/2017/03/04/world/europe/northern-ireland-election-sinn-fein.html',  'https://www.nytimes.com/2017/03/04/world/europe/britain-mansion-child-abuse-archbishop-canterbury.html',  'https://www.nytimes.com/2017/03/05/arts/television/saturday-night-live-kate-mckinnon-jeff-sessions.html',  'https://www.nytimes.com/newsletters/the-interpreter',  'https://www.nytimes.com/newsletters/the-interpreter',  'https://www.nytimes.com/2017/03/04/style/palm-springs-hotels-airbnb-vacation-rental-homes.html',  'https://www.nytimes.com/2017/03/04/style/palm-springs-hotels-airbnb-vacation-rental-homes.html',  'https://www.nytimes.com/section/books/review',  'https://www.nytimes.com/section/books/review',  'https://www.nytimes.com/2017/03/03/arts/dance/merce-cunningham-walker-art-center-mca.html',  'https://www.nytimes.com/2017/03/03/arts/dance/merce-cunningham-walker-art-center-mca.html',  'http://www.nytimes.com/pages/opinion/index.html',  'https://www.nytimes.com/2017/03/04/opinion/sunday/is-the-pope-the-anti-trump.html',  'https://www.nytimes.com/2017/03/04/opinion/sunday/is-the-pope-the-anti-trump.html',  'https://www.nytimes.com/2017/03/04/opinion/sunday/is-the-pope-the-anti-trump.html?hp&target=comments#commentsContainer',  'https://www.nytimes.com/2017/03/04/opinion/sing-o-muse-of-the-mall-of-america.html',  'https://www.nytimes.com/2017/03/04/opinion/sing-o-muse-of-the-mall-of-america.html',  'https://www.nytimes.com/2017/03/04/opinion/sunday/president-trumps-island-mentality.html',  'https://www.nytimes.com/2017/03/04/opinion/sunday/donald-trump-vs-the-food-snobs.html',  'https://www.nytimes.com/2017/03/04/opinion/sunday/travel-abroad-in-your-own-country.html',  'https://www.nytimes.com/2017/03/04/opinion/sunday/a-different-bargain-on-race.html',  'https://www.nytimes.com/2017/03/04/opinion/sunday/mad-trump-happy-w.html',  'https://www.nytimes.com/2017/03/04/opinion/sunday/shes-17-and-needs-birth-control-do-we-turn-our-backs.html',  'https://www.nytimes.com/2017/03/04/opinion/tax-credits-are-no-substitute-for-obamacare.html',  'https://www.nytimes.com/2017/03/03/insider/the-eyes-of-san-francisco-in-a-texas-courtroom-the-sequel.html',  'https://www.nytimes.com/2017/03/02/insider/news-tips-signal-whatsapp.html',  'http://www.nytimes.com/section/insider',  'http://www.nytimes.com/section/insider',  'https://www.nytimes.com/2017/03/03/insider/the-eyes-of-san-francisco-in-a-texas-courtroom-the-sequel.html',  'http://www.nytimes.com/crosswords',  'http://www.nytimes.com/crosswords',  'http://www.nytimes.com/crosswords',  'http://www.nytimes.com/crosswords',  'http://www.nytimes.com/crosswords',  'http://www.nytimes.com/crosswords',  'http://wordplay.blogs.nytimes.com',  'http://nyt.qualtrics.com/jfe/form/SV_4IT4QI7ZLunvfdH',  'http://www.nytimes.com/video',  'http://www.nytimes.com/video?src=vidm',  'https://www.nytimes.com/section/arts/television',  'https://www.nytimes.com/2017/03/03/arts/television/tv-review-time-after-time-making-history-time-travel.html',  'https://www.nytimes.com/pages/opinion/index.html',  'https://www.nytimes.com/2017/03/04/opinion/how-low-can-the-presidential-bar-go.html',  'https://www.nytimes.com/topic/subject/retirement',  'https://www.nytimes.com/2017/03/04/business/retirement/replacing-work-a-new-purpose-can-lift-your-emotional-well-being.html',  'https://www.nytimes.com/section/arts/design',  'https://www.nytimes.com/2017/03/03/arts/design/inside-sara-bermans-closet-at-the-met-museum.html',  'https://www.nytimes.com/pages/opinion/index.html#sundayreview',  'https://www.nytimes.com/2017/03/04/opinion/sunday/i-remember-when-appalachia-wasnt-trump-country.html',  'https://www.nytimes.com/section/insider',  'https://www.nytimes.com/2017/03/03/insider/what-were-reading.html',  'https://www.nytimes.com/section/sports/ncaabasketball',  'https://www.nytimes.com/2017/03/02/sports/ncaabasketball/ivy-league-conference-ncaa-tournament.html',  'https://www.nytimes.com/pages/opinion/index.html',  'https://www.nytimes.com/2017/03/04/opinion/sing-o-muse-of-the-mall-of-america.html',  'https://www.nytimes.com/section/nyregion',  'https://www.nytimes.com/2017/03/02/nyregion/marian-javits-dead.html',  'https://www.nytimes.com/section/theater',  'https://www.nytimes.com/2017/03/02/theater/significant-other-review.html',  'https://www.nytimes.com/pages/opinion/index.html',  'https://www.nytimes.com/interactive/2017/03/04/opinion/syria-refugees-america-stay-or-go.html',  'https://www.nytimes.com/section/nyregion',  'https://www.nytimes.com/2017/03/02/nyregion/at-this-staten-island-restaurant-a-kitchen-run-by-grandmas.html',  'http://www.nytimes.com/pages/world/index.html',  'https://www.nytimes.com/2017/03/04/world/africa/war-south-sudan.html',  'https://www.nytimes.com/2017/03/04/world/asia/north-korea-missile-program-sabotage.html',  'https://www.nytimes.com/2017/03/05/world/africa/airport-will-temporarily-shut-disrupting-a-nigerian-lifeline.html',  'http://www.nytimes.com/pages/business/index.html',  'https://www.nytimes.com/2017/03/03/business/retirement/working-longer-may-benefit-your-health.html',  'https://www.nytimes.com/2017/03/04/business/china-economy-national-peoples-congress.html',  'https://www.nytimes.com/2017/03/04/world/asia/china-xi-jinping-economic-reform.html',  'http://www.nytimes.com/pages/opinion/index.html',  'https://www.nytimes.com/2017/03/04/opinion/how-low-can-the-presidential-bar-go.html',  'https://www.nytimes.com/2017/03/03/opinion/the-pope-on-panhandling-give-without-worry.html',  'https://www.nytimes.com/2017/03/03/opinion/what-to-do-with-jeff-sessions.html',  'http://www.nytimes.com/section/us',  'https://www.nytimes.com/2017/03/03/us/politics/trump-vehicle-emissions-regulation.html',  'https://www.nytimes.com/2017/03/04/us/los-angeles-deportation-immigration.html',  'https://www.nytimes.com/2017/03/04/us/migrants-facing-old-deportation-orders.html',  'http://www.nytimes.com/pages/technology/index.html',  'https://www.nytimes.com/2017/03/03/technology/uber-greyball-program-evade-authorities.html',  'https://www.nytimes.com/2017/03/03/technology/nintendo-switch-game-console.html',  'https://www.nytimes.com/2017/03/03/technology/uber-said-to-consider-changes-to-employee-stock-compensation.html',  'http://www.nytimes.com/pages/arts/index.html',  'https://www.nytimes.com/2017/03/02/arts/television/feud-fx-ryan-murphy-jessica-lange-susan-sarandon.html',  'https://www.nytimes.com/2017/03/03/arts/the-battle-over-your-political-bubble.html',  'https://www.nytimes.com/2017/03/05/arts/television/saturday-night-live-kate-mckinnon-jeff-sessions.html',  'http://www.nytimes.com/pages/politics/index.html',  'https://www.nytimes.com/2017/03/05/us/politics/koch-brothers-affordable-care-act.html',  'https://www.nytimes.com/2017/03/05/us/politics/trump-deregulation-guns-wall-st-climate.html',  'https://www.nytimes.com/2017/03/05/us/politics/trump-seeks-inquiry-into-allegations-that-obama-tapped-his-phones.html',  'http://www.nytimes.com/section/fashion',  'https://www.nytimes.com/2017/03/05/fashion/paris-fashion-week-comme-des-garcons.html',  'https://www.nytimes.com/2017/03/05/fashion/claims-of-model-abuse-and-racist-casting-roil-fashion-week.html',  'https://www.nytimes.com/2017/03/04/style/palm-springs-hotels-airbnb-vacation-rental-homes.html',  'http://www.nytimes.com/section/movies',  'https://www.nytimes.com/2017/03/05/movies/logan-pulls-in-85-3-million-as-foxs-bet-pays-off.html',  'https://www.nytimes.com/2017/03/03/movies/interracial-couples-onscreen-loving-get-out.html',  'https://www.nytimes.com/2017/03/02/movies/junction-48-palestinian-hip-hop-film-from-israel.html',  'http://www.nytimes.com/section/nyregion',  'https://www.nytimes.com/2017/02/28/nyregion/lights-camera-color-new-film-school-focuses-on-industrys-diversity.html',  'https://www.nytimes.com/2017/03/02/nyregion/broomball-brooklyn-lefrak.html',  'https://www.nytimes.com/2017/03/02/nyregion/bill-de-blasio-homelessness.html',  'http://www.nytimes.com/pages/sports/index.html',  'https://www.nytimes.com/2017/03/05/sports/baseball/new-york-yankees-clint-fraziers-hair.html',  'https://www.nytimes.com/2017/03/05/sports/tennis/ernesto-escobedo-public-courts-mexican-americans.html',  'https://www.nytimes.com/2017/03/03/sports/ncaabasketball/samuelson-sisters-uconn-stanford-ncaa-tournament.html',  'http://www.nytimes.com/pages/theater/index.html',  'https://www.nytimes.com/2017/03/04/theater/justin-trudeau-broadway-musical-come-from-away.html',  'https://www.nytimes.com/2017/03/03/theater/gay-histories-close-enough-to-touch-but-dont.html',  'https://www.nytimes.com/2017/03/03/theater/review-little-miss-sunshine-trips-into-a-slasher-film-in-all-the-fine-boys.html',  'http://www.nytimes.com/section/science',  'https://www.nytimes.com/2017/03/02/science/woolly-mammoth-extinct-genetics.html',  'https://www.nytimes.com/2017/03/03/science/supernova-sn1987a-hubble-space-telescope.html',  'https://www.nytimes.com/2017/03/03/science/amazon-rain-forest-plants-domesticate.html',  'http://www.nytimes.com/section/obituaries',  'https://www.nytimes.com/2017/03/03/world/americas/rene-preval-dead-president-of-haiti.html',  'https://www.nytimes.com/2017/03/03/books/paula-fox-dead.html',  'https://www.nytimes.com/2017/03/03/arts/ren-hang-dead-photographer-china.html',  'http://www.nytimes.com/section/arts/television',  'https://www.nytimes.com/2017/03/02/arts/television/feud-fx-ryan-murphy-jessica-lange-susan-sarandon.html',  'https://www.nytimes.com/2017/03/03/arts/television/buffy-the-vampire-slayer-20-year-anniversary.html',  'https://www.nytimes.com/2017/03/05/arts/television/saturday-night-live-kate-mckinnon-jeff-sessions.html',  'http://www.nytimes.com/pages/health/index.html',  'https://www.nytimes.com/2017/03/03/health/utah-obamacare.html',  'https://www.nytimes.com/2017/02/28/well/live/colon-and-rectal-cancers-rising-in-young-people.html',  'https://www.nytimes.com/2017/03/03/well/live/should-i-take-a-vitamin-for-brittle-nails.html',  'http://www.nytimes.com/section/travel',  'https://www.nytimes.com/2017/03/01/travel/bangkok-thailand-city-of-spirits-culture.html',  'https://www.nytimes.com/interactive/2017/03/02/travel/what-to-do-36-hours-in-fez-morocco.html',  'https://www.nytimes.com/2017/03/03/travel/paiva-river-walkways-in-portugal.html',  'http://www.nytimes.com/section/books',  'https://www.nytimes.com/2017/02/27/books/review/horse-walks-into-a-bar-david-grossman-.html',  'https://www.nytimes.com/2017/02/27/books/review/well-always-have-casablanca-noah-isenberg.html',  'https://www.nytimes.com/2017/03/02/books/review/richard-holmes-by-the-book.html',  'http://www.nytimes.com/section/education',  'https://www.nytimes.com/2017/03/01/us/politics/trump-school-vouchers-campaign-pledge.html',  'https://www.nytimes.com/2017/03/02/us/kansas-supreme-court-school-spending.html',  'https://www.nytimes.com/2017/02/27/sports/ncaabasketball/manhattan-berkeley-college-knights.html',  'http://www.nytimes.com/pages/dining/index.html',  'https://www.nytimes.com/2017/03/02/dining/little-egypt-review-ridgewood-queens-restaurant.html',  'https://www.nytimes.com/2017/03/03/dining/dinner-recipes-cooking-ideas.html',  'https://www.nytimes.com/2017/02/28/dining/black-bean-soup-recipe-video.html',  'http://www.nytimes.com/pages/opinion/index.html#sundayreview',  'https://www.nytimes.com/2017/03/04/opinion/sunday/what-biracial-people-know.html',  'https://www.nytimes.com/2017/03/04/opinion/sunday/president-trumps-island-mentality.html',  'https://www.nytimes.com/2017/03/04/opinion/sunday/donald-trump-vs-the-food-snobs.html',  'http://www.nytimes.com/pages/realestate/index.html',  'https://www.nytimes.com/2017/03/03/realestate/the-bronx-is-building.html',  'https://www.nytimes.com/2017/03/03/realestate/design-risks-worth-taking.html',  'https://www.nytimes.com/2017/03/03/realestate/finding-a-roommate-when-youre-in-your-60s.html',  'http://www.nytimes.com/section/upshot',  'https://www.nytimes.com/2017/03/03/upshot/obamacare-got-their-goat-an-illustrated-guide-to-republicans-metaphors.html',  'https://www.nytimes.com/2017/03/03/upshot/state-ira-plans-are-ready-if-congress-doesnt-interfere.html',  'https://www.nytimes.com/2017/03/02/upshot/arizona-shows-what-can-go-wrong-with-tax-credit-vouchers.html',  'http://www.nytimes.com/section/magazine',  'https://www.nytimes.com/2017/02/28/magazine/jeff-sessions-stephen-bannon-justice-department.html',  'https://www.nytimes.com/2017/03/01/magazine/sand-mining-india-how-to-steal-a-river.html',  'https://www.nytimes.com/2017/03/02/magazine/how-emmanuel-carrere-reinvented-nonfiction.html',  'http://www.nytimes.com/pages/automobiles/index.html',  'https://www.nytimes.com/2017/03/02/automobiles/wheels/self-driving-cars-gps-maps.html',  'https://www.nytimes.com/2017/03/02/automobiles/autoreviews/genesis-g80-review.html',  'https://www.nytimes.com/2017/02/23/business/cadillac-oscars-advertising.html',  'http://www.nytimes.com/section/t-magazine',  'https://www.nytimes.com/slideshow/2017/03/05/t-magazine/fashion/junya-watanabe-comme-des-garcons-paris-fashion-week.html',  'https://www.nytimes.com/2017/03/01/t-magazine/beck-tom-waits-kendrick-lamar.html',  'https://www.nytimes.com/2017/03/04/t-magazine/fashion/nina-ricci-hair-curls-beauty-paris-fashion-week.html',  'http://www.nytimes.com/section/insider',  'https://www.nytimes.com/2017/03/03/insider/the-eyes-of-san-francisco-in-a-texas-courtroom-the-sequel.html',  'https://www.nytimes.com/2017/03/03/insider/events/race-in-america-racial-progress-or-racist-progress.html',  'https://www.nytimes.com/2017/03/03/insider/what-were-reading.html',  'https://www.nytimes.com/section/realestate',  'https://www.nytimes.com/2017/03/03/realestate/downtown-brooklyn-rentals-ashland-hub.html',  'https://www.nytimes.com/2017/03/03/realestate/downtown-brooklyn-rentals-ashland-hub.html',  'http://www.nytimes.com/real-estate/find-a-home',  'http://realestateads.nytimes.com/',  'http://www.nytimes.com/gst/mostemailed.html',  'http://www.nytimes.com/gst/mostpopular.html',  'http://www.nytimes.com/trending/',  'http://www.nytimes.com/recommendations',  'http://www.nytimes.com/',  'http://www.nytimes.com/',  'https://www.nytimes.com/pages/world/index.html',  'https://www.nytimes.com/pages/national/index.html',  'https://www.nytimes.com/pages/politics/index.html',  'https://www.nytimes.com/pages/nyregion/index.html',  'https://www.nytimes.com/pages/business/index.html',  'https://www.nytimes.com/pages/technology/index.html',  'https://www.nytimes.com/section/science',  'https://www.nytimes.com/pages/health/index.html',  'https://www.nytimes.com/pages/sports/index.html',  'https://www.nytimes.com/pages/education/index.html',  'https://www.nytimes.com/pages/obituaries/index.html',  'https://www.nytimes.com/pages/todayspaper/index.html',  'https://www.nytimes.com/pages/corrections/index.html',  'https://www.nytimes.com/pages/opinion/index.html',  'https://www.nytimes.com/pages/opinion/index.html#columnists',  'https://www.nytimes.com/pages/opinion/index.html#editorials',  'https://www.nytimes.com/pages/opinion/index.html#contributing',  'https://www.nytimes.com/pages/opinion/index.html#op-ed',  'https://www.nytimes.com/pages/opinion/index.html#opinionator',  'https://www.nytimes.com/pages/opinion/index.html#letters',  'https://www.nytimes.com/pages/opinion/index.html#sundayreview',  'https://www.nytimes.com/pages/opinion/index.html#takingNote',  'https://www.nytimes.com/roomfordebate',  'http://topics.nytimes.com/top/opinion/thepubliceditor/index.html',  'https://www.nytimes.com/video/opinion',  'https://www.nytimes.com/pages/arts/index.html',  'https://www.nytimes.com/pages/arts/design/index.html',  'https://www.nytimes.com/pages/books/index.html',  'https://www.nytimes.com/pages/arts/dance/index.html',  'https://www.nytimes.com/pages/movies/index.html',  'https://www.nytimes.com/pages/arts/music/index.html',  'https://www.nytimes.com/events/',  'https://www.nytimes.com/pages/arts/television/index.html',  'https://www.nytimes.com/pages/theater/index.html',  'https://www.nytimes.com/video/arts',  'https://www.nytimes.com/pages/automobiles/index.html',  'https://www.nytimes.com/crosswords/',  'https://www.nytimes.com/pages/dining/index.html',  'https://www.nytimes.com/pages/education/index.html',  'https://www.nytimes.com/pages/fashion/index.html',  'https://www.nytimes.com/pages/health/index.html',  'https://www.nytimes.com/section/jobs',  'https://www.nytimes.com/pages/magazine/index.html',  'https://www.nytimes.com/events/',  'https://www.nytimes.com/section/realestate',  'https://www.nytimes.com/section/t-magazine',  'https://www.nytimes.com/section/travel',  'https://www.nytimes.com/pages/fashion/weddings/index.html',  'https://www.nytimes.com/ref/classifieds/',  'https://www.nytimes.com/marketing/tools-and-services/',  'https://www.nytimes.com/pages/topics/',  'http://topics.nytimes.com/top/opinion/thepubliceditor/index.html',  'https://www.nytimes.com/events/',  'https://www.nytimes.com/interactive/blogs/directory.html',  'https://www.nytimes.com/pages/multimedia/index.html',  'https://lens.blogs.nytimes.com/',  'https://www.nytimes.com/video',  'https://www.nytimes.com/store/?&t=qry542&utm_source=nytimes&utm_medium=HPB&utm_content=hp_browsetree&utm_campaign=NYT-HP&module=SectionsNav&action=click®ion=TopBar&version=BrowseTree&contentCollection=NYT%20Store&contentPlacement=2&pgtype=Homepage',  'https://www.nytimes.com/times-journeys/?utm_source=nytimes&utm_medium=HPLink&utm_content=hp_browsetree&utm_campaign=NYT-HP',  'https://www.nytimes.com/seeallnav',  'https://www.nytimes.com/membercenter',  'http://www.nytimes.com/hdleftnav',  'http://www.nytimes.com/digitalleftnav',  'http://www.nytimes.com/tpnav',  'http://www.nytimes.com/crosswords/index.html',  'http://www.nytimes.com/marketing/newsletters',  'https://myaccount.nytimes.com/mem/tnt.html',  'http://www.nytimes.com/giftleftnav',  'http://www.nytimes.com/corporateleftnav',  'http://www.nytimes.com/educationleftnav',  'http://www.nytimes.com/services/mobile/index.html',  'http://eedition.nytimes.com/cgi-bin/signup.cgi?cc=37FYY',  'http://www.nytimes.com/content/help/rights/copyright/copyright-notice.html',  'http://www.nytimes.com/ref/membercenter/help/infoservdirectory.html',  'http://www.nytco.com/careers',  'http://nytmediakit.com/',  'http://www.nytimes.com/content/help/rights/privacy/policy/privacy-policy.html#pp',  'http://www.nytimes.com/privacy',  'http://www.nytimes.com/ref/membercenter/help/agree.html',  'http://www.nytimes.com/content/help/rights/sale/terms-of-sale.html',  'http://spiderbites.nytimes.com',  'http://www.nytimes.com/membercenter/sitehelp.html',  'https://myaccount.nytimes.com/membercenter/feedback.html',  'http://www.nytimes.com/subscriptions/Multiproduct/lp5558.html?campaignId=37WXW',  'http://mobile.nytimes.com/']
Thanks in Advance!
Reply
#7
If you modify your function to return list of links instead of printing them, you can use:
links = nycGet()

with open("nytlinksfile.txt", "w") as outfile:
    outfile.write("\n".join(links))
You do not need to open/close your file separately.
Reply
#8
This worked great! Thank you :)
Reply
#9
(Mar-05-2017, 08:47 PM)zivoni Wrote: If you modify your function to return list of links instead of printing them, you can use:
links = nycGet()

with open("nytlinksfile.txt", "w") as outfile:
    outfile.write("\n".join(links))
You do not need to open/close your file separately.

Or print()!

links = # get stuff

with open("links.txt", "w") as outfile:
    for link in links:
        print(link, file=outfile)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Webscraping news articles by using selenium cate16 7 3,098 Aug-28-2023, 09:58 AM
Last Post: snippsat
  string manipulation , code structure maxx 2 2,639 Feb-22-2018, 04:49 PM
Last Post: ebngrtr083

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020