Saturday, November 14, 2015

Installing python newspaper on El Capitan

I've beeing struggling to install newspaper library on El Capitan. Finally found a command with which it would install without any issues.

sudo STATIC_DEPS=true pip install newspaper --ignore-installed six

Errors which I was getting, for lxml:

  cc -fno-strict-aliasing -fno-common -dynamic -arch i386 -arch x86_64 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch x86_64 -pipe -I/usr/include/libxml2 -I/private/tmp/pip-build-ckZpLW/lxml/src/lxml/includes -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/lxml/lxml.etree.c -o build/temp.macosx-10.11-intel-2.7/src/lxml/lxml.etree.o -w -flat_namespace
    In file included from src/lxml/lxml.etree.c:346:
    /private/tmp/pip-build-ckZpLW/lxml/src/lxml/includes/etree_defs.h:9:10: fatal error: 'libxml/xmlversion.h' file not found
    #include "libxml/xmlversion.h"
             ^
    1 error generated.
    error: command 'cc' failed with exit status 1
    
    ----------------------------------------
Command "/usr/bin/python -c "import setuptools, tokenize;__file__='/private/tmp/pip-build-ckZpLW/lxml/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-K1rk1V-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/tmp/pip-build-ckZpLW/lxml

and for six module:

  Found existing installation: six 1.4.1
    DEPRECATION: Uninstalling a distutils installed project (six) has been deprecated and will be removed in a future version. This is due to the fact that uninstalling a distutils project will only partially uninstall the project.
    Uninstalling six-1.4.1:
Exception:
Traceback (most recent call last):
  File "/Library/Python/2.7/site-packages/pip/basecommand.py", line 211, in main
    status = self.run(options, args)
  File "/Library/Python/2.7/site-packages/pip/commands/install.py", line 311, in run
    root=options.root_path,
  File "/Library/Python/2.7/site-packages/pip/req/req_set.py", line 640, in install
    requirement.uninstall(auto_confirm=True)
  File "/Library/Python/2.7/site-packages/pip/req/req_install.py", line 716, in uninstall
    paths_to_remove.remove(auto_confirm)
  File "/Library/Python/2.7/site-packages/pip/req/req_uninstall.py", line 125, in remove
    renames(path, new_path)
  File "/Library/Python/2.7/site-packages/pip/utils/__init__.py", line 315, in renames
    shutil.move(old, new)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 302, in move
    copy2(src, real_dst)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 131, in copy2
    copystat(src, dst)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shutil.py", line 103, in copystat
    os.chflags(dst, st.st_flags)
OSError: [Errno 1] Operation not permitted: '/tmp/pip-8Vwuj2-uninstall/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/six-1.4.1-py2.7.egg-info'

Saturday, October 3, 2015

Content without the Webview

I've always liked the Reader View in Safari browser (and now I believe evry other browser has it)

For those who don't know what Reader View is, it removes all the clutter from a web page and shows only the text and few relevant images.

Before applying the Reader View
After applying the Reader View





















I've been working on an Android application and wanted to implement something like Reader View for the News section in the application. Webview didn't seem to be the perfect solution.

I thought what if I could extract the text from html on the server side and just send the text to the client and then show the text in whatever format I want to show it. This approach also saves the bandwidth for the user, as less data would be downloaded.

I came across this Python library, newspaper, which did exactly which I expected it to do. With just couple of lines of code I could extract the text from the html page. Sample code:

from newspaper import Article

article = Article(url)
article.download()
article.parse()

news['text'] = article.text.encode("utf-8")

I'm using a firebase based server and the extracted text above could be saved directly at the firebase URL. The client simply connects to it and shows the news.

This is how the news looks on the client at the moment. It's implemented using the plain old TextView.


Of course, a lot has to improve still. Time to explore TextView Spannable! 

Links

  • Newspaper Python library: Link
  • Iphone's Reader View: Link

Monday, September 14, 2015

Better logging with logging

I've been using Python for more than 2 years now. It's sad that I never came across the amazing logging library. I was relying on the plain old print and used .format to its full potential.

Now that I know about logging, I can't think of using anything else.

There are various approaches to initialize the logging module. I prefer the one with the config file. That way it's easier to configure the format etc.

This is a sample python code using the logging library
import logging
import logging.config

logging.config.fileConfig('loggerconfig.ini')
logger = logging.getLogger('server')

def testmodule():
    logger.info("This is a test info message")
    logger.critical("This is a test critical message")

testmodule()
Output of this sample code:
[10-11-15 23:55:12]  INFO     test                      This is a test info message (testmodule:8)
[10-11-15 23:55:12]  CRITICAL test                      This is a test critical message (testmodule:9)
This is the sample loggerconfig.ini file used to achieve the above output
[loggers]
keys=root,server

[handlers]
keys=consoleHandler

[formatters]
keys=simpleFormatter

[logger_root]
level=DEBUG
handlers=consoleHandler

[logger_server]
level=DEBUG
handlers=consoleHandler
qualname=server
propagate=0

;Available debug levels
;CRITICAL 50 ERROR 40 WARNING 30 INFO 20 DEBUG 10 NOTSET 0
[handler_consoleHandler]
class=StreamHandler
level=DEBUG
formatter=simpleFormatter
args=(sys.stdout,)

[formatter_simpleFormatter]
format=[%(asctime)s]  %(levelname)-8s %(module)-25s %(message)s (%(funcName)s:%(lineno)d)
datefmt=%m-%d-%y %H:%M:%S

format and datefmt are the most important fields in the config file which you need to edit to get the desired output. List of all the attributes are documented here

Friday, February 6, 2015

An eye-saver.

Came across this interesting tool called f.lux which adjusts display's color temperature based on location and time of the day.

I downloaded it for Windows from here and looks like it's available for all the major operating systems

Wednesday, January 7, 2015

First things first!

Before I thought I could start blogging with Blogger, I had two major concerns:
  • I didn't like the default font which the blogger template provides 
  • There's no easy way to put your code in a formatted way 
I had to search a lot to finally make these work. So I thought why shouldn't this be my first blog ?

For both the below mentioned steps you need to edit your blog's template (Your Blog > Template > Edit HTML)

Steps to change the default font

  • Search the font which you want, I used Google fonts to do so. I liked the font Karla

  • Paste this code before <b:skin> tag
<link href='http://fonts.googleapis.com/css?family=Karla:700italic,700,400italic,400' rel='stylesheet' type='text/css'/>
  • Update the font style for all the default variables in template
Replace this:
default="normal normal 13px Arial, Tahoma, Helvetica, FreeSans, sans-serif" value="normal normal 13px Arial, Tahoma, Helvetica, FreeSans, sans-serif"/>
With:
default="normal normal 13px 'Karla', sans-serif;" value="normal normal 13px 'Karla', sans-serif;"/>

Steps to include formatted code

We want to include SyntaxHighlighter  in the blog and here are the steps:
  • Copy the css in this link before <b:skin>
  • Paste following code before </head> tag
<script src="http://syntaxhighlighter.googlecode.com/svn/trunk/Scripts/shCore.js" type="text/javascript"></script>
<script src="http://syntaxhighlighter.googlecode.com/svn/trunk/Scripts/shBrushCpp.js" type="text/javascript"></script>
<script src="http://syntaxhighlighter.googlecode.com/svn/trunk/Scripts/shBrushCSharp.js" type="text/javascript"></script>
<script src="http://syntaxhighlighter.googlecode.com/svn/trunk/Scripts/shBrushCss.js" type="text/javascript"></script>
<script src="http://syntaxhighlighter.googlecode.com/svn/trunk/Scripts/shBrushDelphi.js" type="text/javascript"></script>
<script src="http://syntaxhighlighter.googlecode.com/svn/trunk/Scripts/shBrushJava.js" type="text/javascript"></script>
<script src="http://syntaxhighlighter.googlecode.com/svn/trunk/Scripts/shBrushJScript.js" type="text/javascript"></script>
<script src="http://syntaxhighlighter.googlecode.com/svn/trunk/Scripts/shBrushPhp.js" type="text/javascript"></script>
<script src="http://syntaxhighlighter.googlecode.com/svn/trunk/Scripts/shBrushPython.js" type="text/javascript"></script>
<script src="http://syntaxhighlighter.googlecode.com/svn/trunk/Scripts/shBrushRuby.js" type="text/javascript"></script>
<script src="http://syntaxhighlighter.googlecode.com/svn/trunk/Scripts/shBrushSql.js" type="text/javascript"></script>
<script src="http://syntaxhighlighter.googlecode.com/svn/trunk/Scripts/shBrushVb.js" type="text/javascript"></script>
<script src="http://syntaxhighlighter.googlecode.com/svn/trunk/Scripts/shBrushXml.js" type="text/javascript"></script>
  • Paste the following code before </body> tag
<script language='javascript'>
    dp.SyntaxHighlighter.BloggerMode();
    dp.SyntaxHighlighter.HighlightAll('code');
</script>
  • Save the Blogger template

Links

I have put my entire blogger template on gist here. Use this to replace your template. You would get the same layout as this blog.