Python Forum
Thai Text Segmentation Module
Thread Rating:
  • 3 Vote(s) - 4 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Thai Text Segmentation Module
#1
Good day everyone. I'm working with a project that deals with Thai texts processing. Do you have any suggestions of what module to use? I need to detect the name of the places in the texts. I'm considering pythai module, however, can't get it running to my ubuntu. 
Thank you in advance and God bless.
Reply
#2
Hello,
quick search found this alternative - Thai language NLP:
https://pypi.python.org/pypi/pythainlp/1.0.0
If you would like us to help you get pythai module running on your system you will need to provide us with more details of problems encountered.
Other more brute-force-like solution is searching and comparing strings, if you can get a list of place names that you are after.
Good luck!
Reply
#3
One of the biggest changes in Python 3 was Unicode.
Not gone talk about,better to show it.
# Python 3.6
language = 'หลาม'
char = 'า'
if char in language:
    #print('Yes {} is in {}'.format(char, language))
    print(f'Yes {char} is in {language}') #The new way
Output:
Yes า is in หลาม
In and out is utf-8 okay?
>>> char = 'า'
>>> char
'า'
>>> e = char.encode()
>>> e
b'\xe0\xb8\xb2'
>>> e.decode('utf-8')
'า'
Yes it is,so out/in test.
language = 'หลาม'
with open('thai.txt', 'w', encoding='utf-8') as f_out:
    f_out.write(language)
with open('thai.txt', encoding='utf-8') as f_in:
    f = f_in.read()
    print(f) #--> หลาม
Reply
#4
[quote pid='9514' dateline='1486018570']
j.craterHello, quick search found this alternative - Thai language NLP: https://pypi.python.org/pypi/pythainlp/1.0.0 If you would like us to help you get pythai module running on your system you will need to provide us with more details of problems encountered. Other more brute-force-like solution is searching and comparing strings, if you can get a list of place names that you are after. Good luck!
[/quote]

Thank you for the suggestion sir. Actually, I been working with python in windows. I searched a lot already for modules and I never get into pythainlp. I don't know why, but I suspect it's because of my browser setting? I really don't know. If I can get pythai running in windows, then I will used it. I was forced to try pythai and other solutions in linux because all suggested modules from my searches are done in linux platform. I'll give updates later. Thank you for the help sir.

[quote pid='9516' dateline='1486021218']
snippsatOne of the biggest changes in Python 3 was Unicode. Not gone talk about,better to show it.
# Python 3.6 language = 'หลาม' char = 'า' if char in language:     #print('Yes {} is in {}'.format(char, language))     print(f'Yes {char} is in {language}') #The new way
Output:
Yes า is in หลาม
In and out is utf-8 okay?
>>> char = 'า' >>> char 'า' >>> e = char.encode() >>> e b'\xe0\xb8\xb2' >>> e.decode('utf-8') 'า'
Yes it is,so out/in test.
language = 'หลาม' with open('thai.txt', 'w', encoding='utf-8') as f_out:     f_out.write(language) with open('thai.txt', encoding='utf-8') as f_in:     f = f_in.read()     print(f) #--> หลาม
[/quote]


Thank you for the response sir and sorry for very late reply since. Sorry if this my a a stupid question for you from a newbee. Will this work in python 3.4? Currently, I using python in windows and fetching data from postgresql and the latest version supported as I have read by psycopg2 is python 3.4. I'll try installing 3.6 and I'll give updates. Thank you.
#tried this code in 3.4 IDLE give me error.
>>> language = 'หลาม'
>>> char = 'า'
>>> if char in language:
print(f'Yes {char} is in {language}')
SyntaxError: invalid syntax

I tried pythainlp in python 2.7 and 3.4 in windows and it gives this error.

Error:
      File "C:\Python34\lib\subprocess.py", line 1112, in _execute_child         startupinfo)     FileNotFoundError: [WinError 2] The system cannot find the file specified Command "python setup.py egg_info" failed with error code 1 in C:\Users\DRMS~1\AppData\Local\Temp\pip-build-z8g9b7zi\pyicu\
I also tried it in linux (slackware and ubuntu) and gives this.

I tried pythainlp in python 2.7 and 3.4 in windows and it gives this error.

Error:
    File "C:\Python34\lib\subprocess.py", line 1112, in _execute_child         startupinfo)     FileNotFoundError: [WinError 2] The system cannot find the file specified     ---------------------------------------- Command "python setup.py egg_info" failed with error code 1 in C:\Users\DRMS~1\AppData\Local\Temp\pip-build-z8g9b7zi\pyicu\
I also tried it in linux (slackware and ubuntu) and gives this.

Error:
    File "/usr/local/lib/python3.4/subprocess.py", line 1460, in _execute_childraise child_exception_type(errno_num, err_msg)     FileNotFoundError: [Errno 2] No such file or directory: 'icu-config'     ---------------------------------------- Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-cjz6f7eb/pyicu/
Reply
#5
Quote:#tried this code in 3.4 IDLE give me error.
You can not use f-string in Python 3.4.
You use that line that i comment out.
Also you need indentation,like i have here. 
# Python 3.4
>>> language = 'หลาม'
>>> language
'หลาม'
>>> char = 'า'
>>> if char in language:
...     print('Yes {} is in {}'.format(char, language)) 
...     
Yes า is in หลาม
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Use module docx to get text from a file with a table Pedroski55 8 5,806 Aug-30-2022, 10:52 PM
Last Post: Pedroski55
  make: *** [Makefile:29: all] Segmentation fault Anldra12 2 1,813 May-01-2022, 06:17 PM
Last Post: Anldra12
  Segmentation fault (core dumped) hobbyist 1 10,350 Jun-07-2021, 12:56 PM
Last Post: supuflounder
  Segmentation fault with large files kusal1 3 2,692 Oct-01-2019, 07:32 AM
Last Post: Gribouillis
  OpenCV - Segmentation fault samtwilliams 6 7,196 Sep-18-2019, 12:01 AM
Last Post: Larz60+
  Multiple calls to Python interpreter embedded in C++ application yield segmentation f mmoelle1 0 2,796 Mar-21-2019, 08:54 PM
Last Post: mmoelle1
  Segmentation fault when connecting to modbus device with Libmodbus alice 0 2,418 Dec-18-2018, 04:03 PM
Last Post: alice
  module requests, .text extension Truman 1 2,737 Jul-19-2018, 11:47 PM
Last Post: gontajones
  calling python function in c++ callback getting segmentation fault error Jotirling 3 7,121 Oct-26-2017, 08:55 AM
Last Post: Larz60+

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020