Python 示例:validate_url
validate_url
标量函数会从表中读取字符串,即 URL。然后,它会验证该 URL 是否会做出响应,以返回状态代码或指示尝试失败的字符串。
您可以在 Vertica Github 存储库中找到更多 UDx 示例:https://github.com/vertica/UDx-Examples。
UDSF Python 代码
import vertica_sdk
import urllib.request
import time
class validate_url(vertica_sdk.ScalarFunction):
"""Validates HTTP requests.
Returns the status code of a webpage. Pages that cannot be accessed return
"Failed to load page."
"""
def __init__(self):
pass
def setup(self, server_interface, col_types):
pass
def processBlock(self, server_interface, arg_reader, res_writer):
# Writes a string to the UDx log file.
server_interface.log("Validating webpage accessibility - UDx")
while(True):
url = arg_reader.getString(0)
try:
status = urllib.request.urlopen(url).getcode()
# Avoid overwhelming web servers -- be nice.
time.sleep(2)
except (ValueError, urllib.error.HTTPError, urllib.error.URLError):
status = 'Failed to load page'
res_writer.setString(str(status))
res_writer.next()
if not arg_reader.next():
# Stop processing when there are no more input rows.
break
def destroy(self, server_interface, col_types):
pass
class validate_url_factory(vertica_sdk.ScalarFunctionFactory):
def createScalarFunction(self, srv):
return validate_url()
def getPrototype(self, srv_interface, arg_types, return_type):
arg_types.addVarchar()
return_type.addChar()
def getReturnType(self, srv_interface, arg_types, return_type):
return_type.addChar(20)
加载函数和库
创建库和函数。
=> CREATE OR REPLACE LIBRARY pylib AS 'webpage_tester/validate_url.py' LANGUAGE 'Python';
=> CREATE OR REPLACE FUNCTION validate_url AS LANGUAGE 'Python' NAME 'validate_url_factory' LIBRARY pylib fenced;
使用函数查询数据
以下查询显示了如何使用 UDSF 运行查询。
=> SELECT url, validate_url(url) AS url_status FROM webpages;
url | url_status
-----------------------------------------------+----------------------
http://www.vertica.com/documentation/vertica/ | 200
http://www.google.com/ | 200
http://www.mass.gov.com/ | Failed to load page
http://www.espn.com | 200
http://blah.blah.blah.blah | Failed to load page
http://www.microfocus.com/ | 200
(6 rows)