Python Forum
How can we transcode encoding file uml url format
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How can we transcode encoding file uml url format
#1
How we can transcode encoding format files likes xxxxx.tar.gz
I have download files that show encoding format but i want to read the text data for some special reasons
Error:
�;'Kjl� 7��Ť��!���p�����`��(�D��Y�+F\�t{���һ�Eb>݊���3^N�~�Z\RU+@�� c�!��&+>ݒ��4/�m�;Q���p�$�)m�����Q�a�)�1 �,�P�$��.�k��fT������� ���sG
Reply
#2
It is a gzipped tar file. You can read it with the tarfile module from the standard library.
Reply
#3
@Gribouillis give me more details how
Reply
#4
Follow the examples given in this blog page https://pymotw.com/3/tarfile/. Use the mode 'r:gz' to open your compressed archive file.
Reply
#5
@Gribouillis i have file Name: news_sohusite_xml.full.tar.gz i just need to read text data form file with help of software not coding
Reply
#6
(Jul-24-2021, 10:31 AM)Anldra12 Wrote: @Gribouillis i have file Name: news_sohusite_xml.full.tar.gz i just need to read text data form file with help of software not coding
Software there is many eg i use 7-zip.
From command line using tar if on Windows may need download Tar for Windows,or just use cmder
G:\div_code
λ tar -xvf holdem_calc-1.0.0.tar.gz
holdem_calc-1.0.0/
holdem_calc-1.0.0/PKG-INFO
.....
From Python as posted in link is not hard to use.
All files to output_dir:
import tarfile
import os

os.mkdir('output_dir')
with tarfile.open('holdem_calc-1.0.0.tar.gz', 'r') as t:
    t.extractall('output_dir')

print(os.listdir('output_dir'))
Get a specific file:
import tarfile
import os

os.mkdir('outdir')
with tarfile.open('holdem_calc-1.0.0.tar.gz', 'r') as t:
    #print(t.getmembers())
    t.extractall('outdir',
                 members=[t.getmember('holdem_calc-1.0.0/README.md')],
                 )
Reply
#7
If your OS is Linux, simply run the following command in a terminal

Output:
tar xvzf news_sohusite_xml.full.tar.gz
Anldra12 likes this post
Reply
#8
@snippsat and Gribouillis the codes and method are not applicable on this type of files ['./news_sohusite_xml.dat']
The file in _xml _url format my purpose to read this type in text form downloads from here: http://www.sogou.com/labs/resource/cs.php
I cannot read text data apply the above code and i apply others methods but tr..gz file is not readable in text data
Reply
#9
I understand that you uncompressed the .tar.gz file and you obtain a .dat file. I tried to do the same with the short version of the file on the same site (the 110 Kb file instead of the 600 Mb one) and I obtained a .dat file as well. This file is simply an xml file containing a sequence of entries as described in the web site, that is to say
Output:
<doc> <url>页面URL</url> <docno>页面ID</docno> <contenttitle>页面标题</contenttitle> <content>页面内容</content> </doc>
You can open this .dat file with any application that can open an xml file, for example a text editor (I opened it with kwrite in kubuntu linux). On the other hand, as the complete file is large (more than 600 MB), it may be difficult for an editor to load and manipulate the whole content. You could perhaps cut the file by extracting a certain number of entries. For example you could read the file until the first line </doc> and that is the first entry, etc. You can also process the file with a Python program that reads xml.
Anldra12 likes this post
Reply
#10
@Gribouillis that all what i needs Thanks
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Reshape txt file into particular format using python shantanu97 0 1,423 Dec-10-2021, 11:44 AM
Last Post: shantanu97
  How to design a save file format? philipbergwerf 5 4,130 Apr-26-2021, 07:39 PM
Last Post: Gribouillis
  CPC File Format (Cartesian Perceptual Compression) - Can Python Convert / Handle Them PSKrieger 2 2,460 Nov-11-2020, 02:57 PM
Last Post: PSKrieger
  Need help implmenting if/else or case statements for option to choose file format. samlee916 1 2,007 Jul-22-2020, 06:06 PM
Last Post: Larz60+
  copy/pasting in excel WHILE keep file format zarize 0 1,954 Jun-23-2020, 03:51 PM
Last Post: zarize
  Preserve xml file format tanffn 3 3,881 Jan-03-2020, 09:35 AM
Last Post: Larz60+
  Write the XML file from elementtree with hexa decimal encoding Dillibabu 4 3,479 Dec-24-2019, 10:10 AM
Last Post: Dillibabu
  Load and format a CSV file fioranosnake 11 4,502 Oct-30-2019, 12:32 PM
Last Post: perfringo
  Please suggest python code to format DNA sequence FASTA file rajamdade 4 3,174 Oct-24-2019, 04:36 AM
Last Post: rajamdade
  Appending data into a file in tab delimited format metro17 1 4,133 Aug-06-2019, 07:34 AM
Last Post: fishhook

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020